Data Science for Statisticians

Published

2026-05-01

Welcome to ST 558 - Data Science for Statisticians!

In this course we’ll look at common tasks done by data scientists:

We’ll adopt the R programming language to do so and learn about using quarto, git, and github to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.

Course Learning Outcomes

At the end of this course students will be able to

  • explain the steps and purpose of programs (CO 1)
  • efficiently read in, combine, and manipulate data (CO 2)
  • utilize help and other resources to customize programs (CO 3)
  • write programs using good programming practices (CO 4)
  • explore data and perform common analyses (CO 5)
  • create reports, web pages, and dashboards to display and communicate results (CO 6)

Weekly To-do List

Generally speaking, each week will have a few videos to watch, readings to do, and homework to practice the material. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, and project/exam information.

Getting Help!

To obtain course help there are a number of options:

  • Slack - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on slack. This will be the fastest way to receive a response! (See the Moodle page for how to join the space.)
  • E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
  • Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!

Summer 2026 Course Schedule

Week Learning Materials Assignments
Week 1
W-F
5/20-5/22
01 - Read - What is Data Science?
02 - Watch - Workflows & Git/GitHub Basics
03 - Read - Git & GitHub Practice
04 - Watch - R Basics
05 - Read & Watch - R projects and Connecting with Github
06 - Read & Watch - Quarto
HW 1 due T 5/26
Week 2
T-F
5/26-5/29
07 - Base R Data Structures: Vectors
08 - Base R Data Structures: Matrices
09 - Base R Data Structures: Data Frames
10 - Base R Data Structures: Lists
11 - Control Flow: Logicals & if/then/else
12 - Control Flow: Loops
13 - Control Flow: Vectorized Functions
14 - Writing Functions
HW 2 due M, 6/1
Week 3
M-F
6/1-6/5
15 - Packages
16 - Tidyverse Essentials
17 - Reading Delimited Data
18 - Reading Excel Data
19 - Manipulating Data with dplyr
20 - Manipulating Data with tidyr
21 - Databases and Basic SQL
22 - SQL Joins
HW 3 due M, 6/8
Week 4
M-F
6/8-6/12
23 - Querying APIs
24 - EDA Concepts
25 - Summarizing Categorical Variables
26 - Barplots & ggplot2 Basics
27 - Numerical Variable Summaries
28 - Numerical Variable Graphs & More ggplot2
HW 4 due M 6/15
Week 5
6/15-6/18
M-Th
No new material Project 1 due M, 6/22
Week 6
M-W
6/22-6/24
29 - Recap & Direction!
30 - apply Family of Functions
31 - purrr & List Columns
32 - Advanced Function Writing
33 - Introduction to RShiny
34 - Tutorials Part I
Exam Window T-W, 6/23-6/24
HW 5 due W, 7/1
Week 7
M-Th
6/29-7/2
35 - Connecting the UI and Server
36 - Tutorials Part II
37 - Reactivity
38 - Tutorials Part III 39 - Dynamic User Interfaces
40 - Flexible UI Layouts & Dashboards
41 - Sharing Apps
42 - Debugging & Useful Things
43 - Control Reactivity with isolate()
HW 6 due M 7/6
Week 8
M-F
7/6-7/10
No new material Project 2 due M, 7/13
Week 9
M-F
7/13-7/17
44 - Modeling Concepts
45 - Prediction & Training/Test Sets
46 - Cross Validation
47 - Multiple Linear Regression
48 - Modeling with tidymodels
49 - tidymodels Tutorial
HW 7 due M, 7/20
Week 10
M-F
7/20-7/24
50 - LASSO Models
51 - Modeling Recap
52 - Logistic Regression Models
53 - Regression & Classification Trees
54 - Ensemble Trees
HW 8 due M, 7/27
Week 11
M-F
7/27-7/31
55 - Creating an API in R
56 - Docker Basics
57 - Building a Docker Image
58 - Dockerizing a Shiny App
Final Project due T, 8/4