Data Science for Statisticians

Published

2025-05-09

Welcome to ST 558 - Data Science for Statisticians!

In this course we’ll look at common tasks done by data scientists:

We’ll adopt the R programming language to do so and learn about using quarto, git, and github to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.

Course Learning Outcomes

At the end of this course students will be able to

  • explain the steps and purpose of programs
  • efficiently read in, combine, and manipulate data
  • utilize help and other resources to customize programs
  • write programs using good programming practices
  • explore data and perform common analyses
  • create reports, web pages, and dashboards to display and communicate results

Weekly To-do List

Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, project information, and exam information.

Getting Help!

To obtain course help there are a number of options:

  • Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
  • E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
  • Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!

Summer 2025 Course Schedule

Topic/Week Learning Materials Assignments Code-alongs

Week 1

5/14-5/16

W-F

00 - Watch - Welcome to the Course
01 - Read - What is Data Science?
02 - Watch - Workflows & Git/GitHub Basics
03 - Read - Git & GitHub Practice
04 - Watch - R Basics
05 - Read & Watch - R projects and Connecting with Github
06 - Read & Watch - Quarto
HW 1 due Tu 5/20 Code-alongs (optional attendance) on Thursdays

Week 2

5/19-5/23

M-F

07 - Base R Data Structures: Vectors
08 - Base R Data Structures: Matrices
09 - Base R Data Structures: Data Frames
10 - Base R Data Structures: Lists

11 - Control Flow: Logicals & if/then/else
12 - Control Flow: Loops
13 - Control Flow: Vectorized Functions
14 - Writing Functions

HW 2 due Tu, 5/27

Week 3

5/27-5/30

T-F

15 - Packages
16 - Tidyverse Essentials
17 - Reading Delimited Data
18 - Reading Excel Data
19 - Manipulating Data with dplyr

20 - Manipulating Data with tidyr
21 - Connecting to Databases
22 - SQL Style Joins

HW 3 due Tu, 6/3

Week 4

6/2-6/6

M-F

23 - EDA Concepts
24 - Summarizing Categorical Variables
25 - Barplots & ggplot2 Basics
26 - Numerical Variable Summaries
27 - Numerical Variable Graphs & More ggplot2
Project 1 due Tu, 6/17

Week 5

6/9-6/13

M-F

No new material Exam-1 (Wednesday or Thursday)

Week 6

M-W, F

6/16-6/18, 6/20

28 - Big Recap
29 - Apply Family of Functions
30 - purrr & List Columns
31 - Advanced Function Writing

32 - Querying APIs & Dealing with JSON Data

HW 4 due Tu, 6/24

Week 7

M-F

6/23-6/27

33 - R Shiny Basics & UI

34 -R Shiny Server
35 - Dynamic UI
36 - Deploying, Debugging, & Other Useful Stuff

Project 1 due M, 7/7

Week 8

M-Th

6/30-7/3

37 - Simple Linear Regression
38 - Multiple Linear Regression
39 - Choosing Regression Models
40 - Candidate Model Selection
41 - Basic Logistic Regression
42 - Extended Logistic Regression

43 - Generalized Linear Models
44 - Basic Use of the caret package
45 - Cross Validation

Week 9

M-F

7/7-7/11

46 - k Nearest Neighbors
47 - Regression and Classification Trees
48 - Fitting Regression Tres
49 - Fitting Classification Trees
50 - Bagged Trees
51 - Random Forests

52 - Boosted Trees
53 - Model Fitting Using the caret Package
54 - Principle Components
55 - Clustering

HW 5 due Tu, 7/15

Week 10

M-F

7/14-7/18

56 - Creating an API
57 - Installing Docker
58 - Docker Containers
59 - Dockerizing Shiny Apps
Exam 2 (Wednesday or Thursday)
Week 11
10/27-10/31
No new material. Project work time!