Matthew Beckman & Justin Post June 25, 2021
Here, we join an analysis already in progress…
We’re investigating the popularity of names in the US each year. Matt has chosen to investigate the names of each person in his immediate family: Matthew, Sarah, Eden, Jack, and Hazel. They’re his favorite people, and also his favorite names! He’s feeling torn about how to include his son Jack in the analysis. Jack’s legal name is “Jon” but he is nearly always called “Jack”–the spelling of “Jon” honors Scandinavian heritage on both sides of the family, and the nickname “Jack” specifically honors his great-grandfather.
Some famous persons by each name of the family include:
This document was last modified 2021-06-25 10:30:13.
According to US Social Security data from 1880 through 2020, “Matthew” was the most frequently occurring name in the family.
# vector of names
beckmans <- c("Matthew", "Sarah", "Eden", "Jack", "Hazel")
BabyNamesFull %>%
filter(name %in% beckmans) %>%
group_by(name) %>%
summarise(total = sum(count, na.rm = TRUE)) %>%
arrange(desc(total))
## `summarise()` ungrouping output (override with `.groups` argument)
We also learned that “Eden” was more balanced between the two sexes when compared to the other names in the family.
BabyNamesFull %>%
filter(name %in% beckmans) %>%
group_by(name, sex) %>% # Task 4.2.3
summarise(total = sum(count, na.rm = TRUE)) %>% # Task 4.2.3
arrange(name)
## `summarise()` regrouping output by 'name' (override with `.groups` argument)
Matt joined Penn State University in 2015, and coincidentally his son Jack was born earlier that year. Interestingly, the name “Jack” was more popular than “Sarah” in 2015, despite the fact that “Sarah” had been far more common when all years had been combined. Perhaps more surprisingly, the name “Hazel” was nearly as common as “Sarah” in that year!
BabyNamesFull %>%
filter(name %in% beckmans, year == 2015) %>%
group_by(name) %>%
summarise(total = sum(count, na.rm = TRUE)) %>%
arrange(desc(total))
## `summarise()` ungrouping output (override with `.groups` argument)
BabyNamesFull %>%
filter(name %in% beckmans, year == 2015) %>%
group_by(name, sex) %>%
summarise(total = sum(count, na.rm = TRUE)) %>%
arrange(name)
## `summarise()` regrouping output by 'name' (override with `.groups` argument)
We want to create a graph that highlights the change in popularity among the names you have chosen over the years.
Task 1: sketch (by hand) the plot you plan to make
Task 2: is your data set aligned to the intended features of this plot?
Task 3: run esquisse::esquisser(GlyphReadyDataSet)
to draft a
plot
esquisser( )
allows you to prototype various plot features,
and then view the corresponding ggplot2
code for your plotesquisser( )
step as you get
moer comfortable with the ggplot2
frameworkTask 4: clean up axis labels, add a title, etc if you have not already done so.
Task 5 (Challenge): add a layer to somehow display overall birth trend for context
Task 6 (Challenge): plot the trend for each name as a relative frequency, rather than raw counts
#library(esquisse)
# Task 5.1 My sketch was a line chart of name popularity over time
# variable mapping:
# x-position: year
# y-position: frequency
# line color: name
# Task 5.2
BeckmanNamesLine <-
BabyNamesFull %>%
filter(name %in% beckmans) %>%
group_by(name, year) %>%
summarise(total = sum(count))
## `summarise()` regrouping output by 'name' (override with `.groups` argument)
# Task 5.3
# esquisse::esquisser(BeckmanNamesLine)
ggplot(BeckmanNamesLine) +
aes(x = year, y = total, colour = name) +
geom_line(size = 1L) +
scale_color_hue() +
theme_minimal()
# Task 5.4
TotalBabies <-
BabyNamesFull %>%
group_by(year) %>%
summarise(totalBorn = sum(count, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
# Task 5.5 (Challenge)
# add context propotional to total births each year to the plot
ggplot(BeckmanNamesLine) +
geom_line(aes(x = year, y = total, colour = name)) +
geom_line(data = TotalBabies, aes(x = year, y = totalBorn * 0.02),
linetype = "dashed", alpha = 0.3, ) +
ggtitle(label = "Popularity Trend of Names in the Beckman Family (1880 - 2020)",
subtitle = "Trend proportional to total births per year shown for context.\n(Source: US Social Security Administration)") +
ylab("Frequency") +
xlab("Year")
# Task 5.6 (Challenge)
BeckmanNamesRF <-
BabyNamesFull %>%
group_by(year) %>% # groups for subsequent mutate()
mutate(annualTotal = sum(count)) %>% # new column for yearly totals
ungroup() %>%
filter(name %in% beckmans) %>%
group_by(name, year) %>% # new groups for subsequent summarise()
summarise(prop = sum(count) / annualTotal)
## `summarise()` regrouping output by 'name', 'year' (override with `.groups` argument)
ggplot(BeckmanNamesRF) +
geom_line(aes(x = year, y = prop, colour = name)) +
ggtitle(label = "Popularity Trend of Names in the Beckman Family (1880 - 2020)",
subtitle = "(Source: US Social Security Administration)") +
ylab("Relative Proportion") +
xlab("Year")