Data Science for Statisticians
Welcome to ST 558 - Data Science for Statisticians!
In this course we’ll look at common tasks done by data scientists:
- Reading in raw data and manipulating it
- Combining data sources
- Summarizing data to glean insights
- Applying common analysis methods
- Collaborating and communicating effectively
We’ll adopt the R programming language to do so and learn about using quarto, git, and github to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.
Course Learning Outcomes
At the end of this course students will be able to
- explain the steps and purpose of programs (CO 1)
- efficiently read in, combine, and manipulate data (CO 2)
- utilize help and other resources to customize programs (CO 3)
- write programs using good programming practices (CO 4)
- explore data and perform common analyses (CO 5)
- create reports, web pages, and dashboards to display and communicate results (CO 6)
Weekly To-do List
Generally speaking, each week will have a few videos to watch, readings to do, and homework to practice the material. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, and project/exam information.
Getting Help!
To obtain course help there are a number of options:
- Slack - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on slack. This will be the fastest way to receive a response! (See the Moodle page for how to join the space.)
- E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
- Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!
Summer 2026 Course Schedule
| Week | Learning Materials | Assignments |
|---|---|---|
| Week 1 W-F 5/20-5/22 |
01 - Read - What is Data Science? 02 - Watch - Workflows & Git/GitHub Basics 03 - Read - Git & GitHub Practice 04 - Watch - R Basics 05 - Read & Watch - R projects and Connecting with Github 06 - Read & Watch - Quarto |
HW 1 due T 5/26 |
| Week 2 T-F 5/26-5/29 |
07 - Base R Data Structures: Vectors 08 - Base R Data Structures: Matrices 09 - Base R Data Structures: Data Frames 10 - Base R Data Structures: Lists 11 - Control Flow: Logicals & if/then/else 12 - Control Flow: Loops 13 - Control Flow: Vectorized Functions 14 - Writing Functions |
HW 2 due M, 6/1 |
| Week 3 M-F 6/1-6/5 |
15 - Packages 16 - Tidyverse Essentials 17 - Reading Delimited Data 18 - Reading Excel Data 19 - Manipulating Data with dplyr 20 - Manipulating Data with tidyr 21 - Databases and Basic SQL22 - SQL Joins |
HW 3 due M, 6/8 |
| Week 4 M-F 6/8-6/12 |
23 - Querying APIs 24 - EDA Concepts 25 - Summarizing Categorical Variables 26 - Barplots & ggplot2 Basics 27 - Numerical Variable Summaries 28 - Numerical Variable Graphs & More ggplot2 |
HW 4 due M 6/15 |
| Week 5 6/15-6/18 M-Th |
No new material | Project 1 due M, 6/22 |
| Week 6 M-W 6/22-6/24 |
29 - Recap & Direction! 30 - apply Family of Functions31 - purrr & List Columns32 - Advanced Function Writing 33 - Introduction to RShiny 34 - Tutorials Part I |
Exam Window T-W, 6/23-6/24 HW 5 due W, 7/1 |
| Week 7 M-Th 6/29-7/2 |
35 - Connecting the UI and Server 36 - Tutorials Part II 37 - Reactivity 38 - Tutorials Part III 39 - Dynamic User Interfaces 40 - Flexible UI Layouts & Dashboards 41 - Sharing Apps 42 - Debugging & Useful Things 43 - Control Reactivity with isolate() |
HW 6 due M 7/6 |
| Week 8 M-F 7/6-7/10 |
No new material | Project 2 due M, 7/13 |
| Week 9 M-F 7/13-7/17 |
44 - Modeling Concepts 45 - Prediction & Training/Test Sets 46 - Cross Validation 47 - Multiple Linear Regression 48 - Modeling with tidymodels49 - tidymodels Tutorial |
HW 7 due M, 7/20 |
| Week 10 M-F 7/20-7/24 |
50 - LASSO Models 51 - Modeling Recap 52 - Logistic Regression Models 53 - Regression & Classification Trees 54 - Ensemble Trees |
HW 8 due M, 7/27 |
| Week 11 M-F 7/27-7/31 |
55 - Creating an API in R56 - Docker Basics 57 - Building a Docker Image 58 - Dockerizing a Shiny App |
Final Project due T, 8/4 |