Week 3 Overview
This wraps up the content for week 2. Now we require some practice! You should head back to our Moodle site to check out your homework assignment for this week.
We are now ready to really look at bringing data in and how to handle that data more seamlessly. We’ll look at working within the tidyverse
. This is a suite of packages that all work together and allow you to read in data and do most common data manipulations.
This week we’ll also see how we can use the tidyr
package to change the format of a dataset (long to wide), see how we can connect R
to a database, learn about SQL style joins
Week 3 Additional Readings/Learning Materials
Manipulating Data
- Chapters 3, 4, and 5 of R 4 Data Science
- (Optional) R Packages book
- (Optional) List of CRAN approved packages
- (Optional) List of useful R packages
Reading Data
- Chapter 7 of R 4 Data Science
- (Optional) SQL syntax
SQL Joins
Reading Data
read delimited data, SAS data files, SPSS files, and other file types into R (CO 2)
- describe the term delimiter
- read comma separated value files into R using the readr package
- explain how the read_ functions determine column types
- describe the readxl package and its functions
- compare and contrast tibbles and data frames
write a stored data set to a file using different delimiters (CO 2)
Manipulating Data
use logical statements and indexing vectors to subset common data objects using common functions such as
[
, subset, or dplyr::filter functions (CO 2, 3, 4)list favorable things to look for in an R package (CO 2, 3)
- describe the general purpose of the tidyverse package
- outline the difference between require and library
- explicitly use functions from a particular package using the :: operator
- discuss the idea of masking of R functions and objects
describe the uses of and program with functions from the dplyr package(CO 2, 3, 4)
- explain the benefits of using the dplyr package over base R methods
- program with the arrange, filter, select, and rename functions from the dplyr package
- optimize selecting variables from a data frame using the select function’s options (such as starts_with)
- combine functions in the dplry package to subset and summarize a data set in R
- describe the uses of and program with the mutate, group_by, and summarise functions in the dplyr R package
- combine functions in the dplry package to subset and summarize a data set in R
program using the chain of commands or chaining/piping operators (CO 1, 4)
Other ways to connect R to data
explain the general process of connecting R to a database, connect R to a database, and request data (CO 2)
- define the term SQL and RDBMS
- compare terminology between statistics and SQL (tables vs data sets, etc.)
- extract SQL code from dplyr commands
- write very basic SQL code to select and merge data
- describe why the collect function is required when using R to query a database
- determine the appropriate type of join to extract information of interest from given tables
query APIs to return appropriate data (CO 2)
- define the term API
- explain the common syntax often used for APIs
Other Data Manipulations
- utilize the tidyr package to manipulate data (CO 2)
a. change data between wide to long formats
b. split or combine columns using the tidyr package
Use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!