Week 3 Overview

Published

2025-06-02

This wraps up the content for week 2. Now we require some practice! You should head back to our Moodle site to check out your homework assignment for this week.

We are now ready to really look at bringing data in and how to handle that data more seamlessly. We’ll look at working within the tidyverse. This is a suite of packages that all work together and allow you to read in data and do most common data manipulations.

This week we’ll also see how we can use the tidyr package to change the format of a dataset (long to wide), see how we can connect R to a database, learn about SQL style joins

Week 3 Additional Readings/Learning Materials

Manipulating Data

Reading Data

SQL Joins

Reading Data

  1. read delimited data, SAS data files, SPSS files, and other file types into R (CO 2)

    1. describe the term delimiter
    2. read comma separated value files into R using the readr package
    3. explain how the read_ functions determine column types
    4. describe the readxl package and its functions
    5. compare and contrast tibbles and data frames
  2. write a stored data set to a file using different delimiters (CO 2)

Manipulating Data

  1. use logical statements and indexing vectors to subset common data objects using common functions such as [, subset, or dplyr::filter functions (CO 2, 3, 4)

  2. list favorable things to look for in an R package (CO 2, 3)

    1. describe the general purpose of the tidyverse package
    2. outline the difference between require and library
    3. explicitly use functions from a particular package using the :: operator
    4. discuss the idea of masking of R functions and objects
  3. describe the uses of and program with functions from the dplyr package(CO 2, 3, 4)

    1. explain the benefits of using the dplyr package over base R methods
    2. program with the arrange, filter, select, and rename functions from the dplyr package
    3. optimize selecting variables from a data frame using the select function’s options (such as starts_with)
    4. combine functions in the dplry package to subset and summarize a data set in R
    5. describe the uses of and program with the mutate, group_by, and summarise functions in the dplyr R package
    6. combine functions in the dplry package to subset and summarize a data set in R
  4. program using the chain of commands or chaining/piping operators (CO 1, 4)

Other ways to connect R to data

  1. explain the general process of connecting R to a database, connect R to a database, and request data (CO 2)

    1. define the term SQL and RDBMS
    2. compare terminology between statistics and SQL (tables vs data sets, etc.)
    3. extract SQL code from dplyr commands
    4. write very basic SQL code to select and merge data
    5. describe why the collect function is required when using R to query a database
    6. determine the appropriate type of join to extract information of interest from given tables
  2. query APIs to return appropriate data (CO 2)

    1. define the term API
    2. explain the common syntax often used for APIs

Other Data Manipulations

  1. utilize the tidyr package to manipulate data (CO 2)
a. change data between wide to long formats
b. split or combine columns using the tidyr package

Use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!