Data Science for Statisticians

Published

2025-08-12

Welcome to ST 558 - Data Science for Statisticians!

In this course we’ll look at common tasks done by data scientists:

We’ll adopt the R programming language to do so and learn about using quarto, git, and github to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.

Course Learning Outcomes

At the end of this course students will be able to

  • explain the steps and purpose of programs
  • efficiently read in, combine, and manipulate data
  • utilize help and other resources to customize programs
  • write programs using good programming practices
  • explore data and perform common analyses
  • create reports, web pages, and dashboards to display and communicate results

Weekly To-do List

Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, project information, and exam information.

Getting Help!

To obtain course help there are a number of options:

  • Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
  • E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
  • Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this! See Moodle for the office hour times and links.

Fall 2025 Course Schedule

Schedule
Topic/Week Learning Materials Assignments
Week 1
8/18-8/22
M-F
Welcome to the Course
01 - What is Data Science?
02 - Workflows and Git/GitHub Basics
03 - Git and GitHub Practice
04 - R Basics
05 - R Projects and Connecting with GitHub
06 - Quarto Basics
HW 1 due W, 8/27
Week 2
8/25-8/29
M-F
Week 2 Overview
07 - Base R Data Structures: Vectors
08 - Base R Data Structures: Matrices
09 - Base R Data Structures: Data Frames
10 - Base R Data Structures: Lists
HW 2 due W, 9/3
Week 3
9/2-9/5
T-F (Off M)
Week 3 Overview
11 - Control Flow: Logicals & if/then/else
12 - Control Flow: Loops
13 - Control Flow: Vectorized Functions
14 - Writing Functions
HW 3 due W, 9/10
Week 4
9/8-9/12
M-F
Week 4 Overview
15 - Packages
16 - tidyverse Essentials
17 - Reading Delimited Data
18 - Reading Excel Data
19 - Manipulating Data with dplyr
HW 4 due W, 9/17
Week 5
9/15-9/19
M, W-F (Off T)
Week 5 Overview
20 - Manipulating Data with tidyr
21 - Databases and Basic SQL
22 - SQL Joins
23 - Querying APIs
Project 1 due W, 10/1
Week 6
9/22-9/26
M-F
No new material. Exam study time and project work time! Exam 1 - T-Th, 9/23-9/25
Week 7
9/29-10/3
M-F
Weeks 6 & 7 Overview
24 - EDA Concepts
25 - Creating Contingency Tables
26 - Barplots and ggplot2 Basics
27 - Numerical Variable Summaries
28 - Numerical Variable Graphs
HW 5 due W, 10/8
Week 8
10/6-10/10
M-F
Week 8 Overview
29 - Recap & Direction!
30 - apply Family of Functions
31 - purrr and List Columns
32 - Advanced Function Writing
HW 6 due W, 10/15
Week 9
10/15-10/17
Off M/T
Weeks 9 Through 11 Overview
33 - Introduction to RShiny
34 - Tutorials Part I
35 - Connecting the UI and Server
36 - Tutorials Part II
37 - Reactivity
38 - Tutorials Part III
HW 7 due W, 10/22
Week 10
10/20-10/24
M-F
39 - Dynamic User Interfaces
40 - Flexible UI Layouts & Dashboards
41 - Sharing Apps
42 - Debugging & Useful Things
43 - Control Reactivity with isolate()
Project 2 due W, 11/5
Week 11
10/27-10/31
No new material. Project work time!
Week 12
11/3-11/7
M-F
Weeks 12 & 13 Overview
44 - Modeling Concepts
45 - Prediction & Training/Test Sets
46 - Cross Validation
47 - Multiple Linear Regression
48 - Modeling with tidymodels
49 - tidymodels Tutorial
HW 8 due W, 11/12
Week 13
11/10-11/14
M-F
50 - LASSO Models
51 - Modeling Recap
52 - Logistic Regression Models
53 - Regression & Classification Trees
54 - Ensemble Trees
HW 9 due W, 11/19
Week 14
11/17-11/21
M-F
Weeks 14 Through 16 Overview
55 - Creating an API in R
56 - Docker Basics
57 - Building a Docker Image
58 - Dockerizing a Shiny App
Exam 2 - W-F, 11/19-11/21
Week 15-16
11/24-11/25, 12/1-12/2
M-T, M-T
No new material. Final project work time! Final Project due Th, 12/4