What’s in a Name?? (Part 4 Solutions)

What’s in a Name?? (Part 4 Solutions)

Matthew Beckman & Justin Post June 25, 2021


Here, we join an analysis already in progress…


Names to be investigated


We’re investigating the popularity of names in the US each year. Matt has chosen to investigate the names of each person in his immediate family: Matthew, Sarah, Eden, Jack, and Hazel. They’re his favorite people, and also his favorite names! He’s feeling torn about how to include his son Jack in the analysis. Jack’s legal name is “Jon” but he is nearly always called “Jack”–the spelling of “Jon” honors Scandinavian heritage on both sides of the family, and the nickname “Jack” specifically honors his great-grandfather.

Some famous persons by each name of the family include:

This document was last modified 2021-06-25 10:28:47.


Part 4. Data wrangling


4.1 Combine our Data Sets with dplyr::bind_rows( )


BabyNames2020 <- 
  read_csv("https://jbpost2.github.io/TeachingWithR/datasets/yob2020.txt", 
           col_names = FALSE, col_types = cols(X2 = col_character()))

Solution

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dcData)

data("BabyNames", package = "dcData")

# Task 4.1.2
BabyNamesSupp <- 
  read_csv("https://jbpost2.github.io/TeachingWithR/datasets/BabyNamesSupp.csv",
           col_types = cols(sex = col_character()))  # fixes `sex`
    
# Tasks 4.1.1 & Task 4.1.3
BabyNames2020 <- 
    read_csv("https://jbpost2.github.io/TeachingWithR/datasets/yob2020.txt", 
             col_names = FALSE, col_types = cols(X2 = col_character())) %>%
    rename(name = X1, sex = X2, count = X3) %>%  # rename solution to Task 4.1.3
    mutate(year = 2020)                          # year solution to Task 4.1.3

# Task 4.1.4
BabyNamesFull <- bind_rows(BabyNames, BabyNamesSupp, BabyNames2020)

4.2 Data Wrangling and Summaries


Solution

# vector of names
beckmans <- c("Matthew", "Sarah", "Eden", "Jack", "Hazel")

BabyNamesFull %>%
    filter(name %in% beckmans) %>%                   # Task 4.2.1
    group_by(name) %>%                               # Task 4.2.2
    summarise(total = sum(count, na.rm = TRUE)) %>%  # Task 4.2.2
    arrange(desc(total))
## `summarise()` ungrouping output (override with `.groups` argument)
BabyNamesFull %>%
    filter(name %in% beckmans) %>%
    group_by(name, sex) %>%                          # Task 4.2.3
    summarise(total = sum(count, na.rm = TRUE)) %>%  # Task 4.2.3
    arrange(name)
## `summarise()` regrouping output by 'name' (override with `.groups` argument)
# Task 4.2.4--Matt joined Penn State in 2015
BabyNamesFull %>%
    filter(name %in% beckmans, year == 2015) %>%
    group_by(name) %>%
    summarise(total = sum(count, na.rm = TRUE)) %>% 
    arrange(desc(total))
## `summarise()` ungrouping output (override with `.groups` argument)
BabyNamesFull %>%
    filter(name %in% beckmans, year == 2015) %>%
    group_by(name, sex) %>%
    summarise(total = sum(count, na.rm = TRUE)) %>% 
    arrange(name)
## `summarise()` regrouping output by 'name' (override with `.groups` argument)

Part 5. Graph it

[coming up next…]