Exercises 4 - Reading Data Solutions

Files corresponding to Short Course: Introduction to Data Science Using R

Exercises 4 - Reading Data Solutions

4.1 Read data contained in an R package.

  1. Load the dcData and tidyverse packages. Use the
# devtools::install_github("mdbeckman/dcData")

library(tidyverse)  # this actually loads a group of packages all at once
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.4     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dcData)     
data(BabyNames)
#or
data("BabyNames", package = "dcData")

4.2 Read a .csv file from the web

The file “BabyNameSupp.csv” includes a few years of more recent data to augment the BabyNames data. The file can be downloaded straight to R from the link here:

https://github.com/jbpost2/Basics-of-R-for-Data-Science-and-Statistics/raw/master/datasets/BabyNamesSupp.csv

Read this file in using read_csv() from the readr package. Save the data as an object called BabyNamesSupp.

Note: Reading the data will produce a warning message! Read the warning message carefully; what seems to have gone wrong?

BabyNamesSupp <- read_csv("https://github.com/jbpost2/Basics-of-R-for-Data-Science-and-Statistics/raw/master/datasets/BabyNamesSupp.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   name = col_character(),
##   sex = col_logical(),
##   count = col_double(),
##   year = col_double()
## )

## Warning: 84619 parsing failures.
##   row col           expected actual                                                                                                           file
## 19208 sex 1/0/T/F/TRUE/FALSE      M 'https://github.com/jbpost2/Basics-of-R-for-Data-Science-and-Statistics/raw/master/datasets/BabyNamesSupp.csv'
## 19209 sex 1/0/T/F/TRUE/FALSE      M 'https://github.com/jbpost2/Basics-of-R-for-Data-Science-and-Statistics/raw/master/datasets/BabyNamesSupp.csv'
## 19210 sex 1/0/T/F/TRUE/FALSE      M 'https://github.com/jbpost2/Basics-of-R-for-Data-Science-and-Statistics/raw/master/datasets/BabyNamesSupp.csv'
## 19211 sex 1/0/T/F/TRUE/FALSE      M 'https://github.com/jbpost2/Basics-of-R-for-Data-Science-and-Statistics/raw/master/datasets/BabyNamesSupp.csv'
## 19212 sex 1/0/T/F/TRUE/FALSE      M 'https://github.com/jbpost2/Basics-of-R-for-Data-Science-and-Statistics/raw/master/datasets/BabyNamesSupp.csv'
## ..... ... .................. ...... ..............................................................................................................
## See problems(...) for more details.