Data Science for Statisticians

Published

2025-06-02

Welcome to ST 558 - Data Science for Statisticians!

In this course we’ll look at common tasks done by data scientists:

We’ll adopt the R programming language to do so and learn about using quarto, git, and github to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.

Course Learning Outcomes

At the end of this course students will be able to

  • explain the steps and purpose of programs
  • efficiently read in, combine, and manipulate data
  • utilize help and other resources to customize programs
  • write programs using good programming practices
  • explore data and perform common analyses
  • create reports, web pages, and dashboards to display and communicate results

Weekly To-do List

Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, project information, and exam information.

Getting Help!

To obtain course help there are a number of options:

  • Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
  • E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
  • Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!

Summer 2025 Course Schedule

Topic/Week

Week 1

5/14-5/16

W-F

Learning Materials

01 - Read - What is Data Science?
02 - Watch - Workflows & Git/GitHub Basics
03 - Read - Git & GitHub Practice
04 - Watch - R Basics
05 - Read & Watch - R projects and Connecting with Github
06 - Read & Watch - Quarto

Assignments

HW 1 due Tu 5/20

Code-alongs

Code-alongs (optional attendance) on Thursdays

Week 2

5/19-5/23

M-F

07 - Base R Data Structures: Vectors
08 - Base R Data Structures: Matrices
09 - Base R Data Structures: Data Frames
10 - Base R Data Structures: Lists

11 - Control Flow: Logicals & if/then/else
12 - Control Flow: Loops
13 - Control Flow: Vectorized Functions
14 - Writing Functions

HW 2 due Tu, 5/27

Week 3

5/27-5/30

T-F

15 - Packages
16 - Tidyverse Essentials
17 - Reading Delimited Data
18 - Reading Excel Data
19 - Manipulating Data with dplyr

20 - Manipulating Data with tidyr
21 - Connecting to Databases
22 - SQL Style Joins

HW 3 due Tu, 6/3

Week 4

6/2-6/6

M-F

23 - EDA Concepts
24 - Summarizing Categorical Variables
25 - Barplots & ggplot2 Basics
26 - Numerical Variable Summaries
27 - Numerical Variable Graphs & More ggplot2
Project 1 due Tu, 6/17

Week 5

6/9-6/13

M-F

No new material Exam-1 (Wednesday or Thursday)

Week 6

M-W, F

6/16-6/18, 6/20

28 - Big Recap
29 - Apply Family of Functions
30 - purrr & List Columns
31 - Advanced Function Writing

32 - Querying APIs & Dealing with JSON Data

HW 4 due Tu, 6/24

Week 7

M-F

6/23-6/27

33 - R Shiny Basics & UI

34 -R Shiny Server
35 - Dynamic UI
36 - Deploying, Debugging, & Other Useful Stuff

Project 2 due Tu, 7/8 | | | | | | | | | |

Week 8

M-Th

6/30-7/3

37 - Modeling Concepts
38 - Prediction & Training/Test Sets
39 - Cross Validation
40 - Multiple Linear Regression
41 - Modeling with tidymodels (caret package)
42 - tidymodels Tutorial

43 - LASSO Models
44 - Models recap

Week 9

M-F

7/7-7/11

45 - Logistic Regression
46 - Regression and Classification Trees
47 - Ensemble Trees (Random Forest; Bagged Trees)
48 - Fitting Classification Trees
HW 5 due Tu, 7/15

Week 10

M-F

7/14-7/18

49 - Creating an API
50 - Docker Basics
51 - Building a Docker Image
52 - Dockerizing Shiny Apps

Exam 2 (Wednesday or Thursday)

Project Final due 7/29

Week 11
7/21-7/25
No new material. Project work time!

Week 12

M,Tu

7/28-7/29

Final Examinations (Project due 7/29)