Files corresponding to Short Course: Introduction to Data Science Using R
We’ll continue to work on the same .Rmd file from the previous exercise.
We’ll consider the names of interest we had from before. Filter the
data to only include those names then group the data by name and
sex. Now use summarize()
to create a new variable called total
that is the sum of the counts (remove NA
’s with na.rm = TRUE
).
Filter the BabyNamesFull
data object to only include rows where
the count
is more than 50000. Save this as an R object. Then,
create a contingency table to count the number of times each name
appears.
Let’s investigate the total number of names in each year. Group the
data by year and find the sum of the count variable. Once you’ve
done that, run summary()
on the total counts.