Data Science for Statisticians
Welcome to ST 558 - Data Science for Statisticians!
In this course we’ll look at common tasks done by data scientists:
- Reading in raw data and manipulating it
- Combining data sources
- Summarizing data to glean insights
- Applying common analysis methods
- Communicating effectively
We’ll adopt the R programming language to do so and learn about using quarto
, git
, and github
to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.
Course Learning Outcomes
At the end of this course students will be able to
- explain the steps and purpose of programs
- efficiently read in, combine, and manipulate data
- utilize help and other resources to customize programs
- write programs using good programming practices
- explore data and perform common analyses
- create reports, web pages, and dashboards to display and communicate results
Weekly To-do List
Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, project information, and exam information.
Getting Help!
To obtain course help there are a number of options:
- Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
- E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
- Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this! See Moodle for the office hour times and links.
Fall 2025 Course Schedule
Topic/Week | Learning Materials | Assignments |
---|---|---|
Week 1 8/18-8/22 M-F |
Welcome to the Course 01 - What is Data Science? 02 - Workflows and Git /GitHub Basics03 - Git and GitHub Practice04 - R Basics05 - R Projects and Connecting with GitHub 06 - Quarto Basics |
HW 1 due W, 8/27 |
Week 2 8/25-8/29 M-F |
Week 2 Overview 07 - Base R Data Structures: Vectors08 - Base R Data Structures: Matrices09 - Base R Data Structures: Data Frames10 - Base R Data Structures: Lists |
HW 2 due W, 9/3 |
Week 3 9/2-9/5 T-F (Off M) |
Week 3 Overview 11 - Control Flow: Logicals & if/then/else 12 - Control Flow: Loops 13 - Control Flow: Vectorized Functions 14 - Writing Functions |
HW 3 due W, 9/10 |
Week 4 9/8-9/12 M-F |
Week 4 Overview 15 - Packages 16 - tidyverse Essentials17 - Reading Delimited Data 18 - Reading Excel Data 19 - Manipulating Data with dplyr |
HW 4 due W, 9/17 |
Week 5 9/15-9/19 M, W-F (Off T) |
Week 5 Overview 20 - Manipulating Data with tidyr 21 - Databases and Basic SQL 22 - SQL Joins23 - Querying APIs |
Project 1 due W, 10/1 |
Week 6 9/22-9/26 M-F |
No new material. Exam study time and project work time! | Exam 1 - T-Th, 9/23-9/25 |
Week 7 9/29-10/3 M-F |
Weeks 6 & 7 Overview 24 - EDA Concepts 25 - Creating Contingency Tables 26 - Barplots and ggplot2 Basics27 - Numerical Variable Summaries 28 - Numerical Variable Graphs |
HW 5 due W, 10/8 |
Week 8 10/6-10/10 M-F |
Week 8 Overview 29 - Recap & Direction! 30 - apply Family of Functions31 - purrr and List Columns32 - Advanced Function Writing |
HW 6 due W, 10/15 |
Week 9 10/15-10/17 Off M/T |
Weeks 9 Through 11 Overview 33 - Introduction to RShiny 34 - Tutorials Part I 35 - Connecting the UI and Server 36 - Tutorials Part II 37 - Reactivity 38 - Tutorials Part III |
HW 7 due W, 10/22 |
Week 10 10/20-10/24 M-F |
39 - Dynamic User Interfaces 40 - Flexible UI Layouts & Dashboards 41 - Sharing Apps 42 - Debugging & Useful Things 43 - Control Reactivity with isolate() |
Project 2 due W, 11/5 |
Week 11 10/27-10/31 |
No new material. Project work time! | |
Week 12 11/3-11/7 M-F |
Weeks 12 & 13 Overview 44 - Modeling Concepts 45 - Prediction & Training/Test Sets 46 - Cross Validation 47 - Multiple Linear Regression 48 - Modeling with tidymodels 49 - tidymodels Tutorial |
HW 8 due W, 11/12 |
Week 13 11/10-11/14 M-F |
50 - LASSO Models 51 - Modeling Recap 52 - Logistic Regression Models 53 - Regression & Classification Trees 54 - Ensemble Trees |
HW 9 due W, 11/19 |
Week 14 11/17-11/21 M-F |
Weeks 14 Through 16 Overview 55 - Creating an API in R 56 - Docker Basics 57 - Building a Docker Image 58 - Dockerizing a Shiny App |
Exam 2 - W-F, 11/19-11/21 |
Week 15-16 11/24-11/25, 12/1-12/2 M-T, M-T |
No new material. Final project work time! | Final Project due Th, 12/4 |