Matthew Beckman & Justin Post June 25, 2021
dcData
and tidyverse
packages
dcData
is installed from GitHub, so it requires an extra
step. You may have already done this from the instructions prior
to the workshop, but it is shown again here if needed:devtools::install_github("mdbeckman/dcData")
BabyNames
BabyNames
data from dcData
into your R
environment using the data()
function
BabyNames
object in the Environment paneUse the spreadsheet view to answer the following:
Task 2: how many total rows of data are included?
Task 3: describe what each row in the data actually represents.
Task 4: find the largest count
in the BabyNames
data. How
would you interpret this result?
Task 5: what are the max & min available year
in the data?
BabyNamesSupp
The file “BabyNameSupp.csv” includes a few years of more recent data to
augment the BabyNames
data. Run the starter code shown below to read
the data and complete the tasks.
Important: The starter code will produce a warning message! Don’t worry, it’s part of the exercise!
# starter code for BabyNamesSupp
library(tidyverse)
BabyNamesSupp <-
read_csv("https://jbpost2.github.io/TeachingWithR/datasets/BabyNamesSupp.csv")
## Warning: 84619 parsing failures.
## row col expected actual file
## 19208 sex 1/0/T/F/TRUE/FALSE M 'https://jbpost2.github.io/TeachingWithR/datasets/BabyNamesSupp.csv'
## 19209 sex 1/0/T/F/TRUE/FALSE M 'https://jbpost2.github.io/TeachingWithR/datasets/BabyNamesSupp.csv'
## 19210 sex 1/0/T/F/TRUE/FALSE M 'https://jbpost2.github.io/TeachingWithR/datasets/BabyNamesSupp.csv'
## 19211 sex 1/0/T/F/TRUE/FALSE M 'https://jbpost2.github.io/TeachingWithR/datasets/BabyNamesSupp.csv'
## 19212 sex 1/0/T/F/TRUE/FALSE M 'https://jbpost2.github.io/TeachingWithR/datasets/BabyNamesSupp.csv'
## ..... ... .................. ...... ....................................................................
## See problems(...) for more details.
Task 1: Read the warning message carefully; what seems to have gone wrong?
Task 2: open the spreadsheet view to investigate
BabyNamesSupp
…
BabyNamesSupp
?Task 3: use the following R functions to investigate
BabyNamesSupp
further:
head(BabyNamesSupp)
tail(BabyNamesSupp)
str(BabyNamesSupp)
Task 4 (Challenge): Why did read_csv( )
seem to have this
problem with the data intake? Any ideas how we might fix it?
At this point, we aren’t attempting to prepare the BabyNamesSupp
data
for analysis. We’re just reading it into the R environment and making
observations. We’ll be using these data again in later exercises, so we
will make the necessary corrections at that point.
Search “RStudio >> Help” to learn about the data…
BabyNames
data from RStudio
Help?BabyNamesSupp
data from
RStudio Help? What happened?Task 1: Want to include 2020 data too? See if you can locate it,
read the data into R, and review the data intake (hint: BabyNames
help
documentation includes a source to investigate).
Again, we aren’t attempting to process the 2020 data yet. We’re just reading it into the R environment and making observations about that process. We’ll be using this data again later in the exercises, so we will make the necessary corrections at that point.
Note: you might hang onto the RStudio default text provided in the new R Markdown file for the moment… it’s packed with tiny examples that will come in handy!
[coming up next…]