Data Science for Statisticians
Welcome to ST 558 - Data Science for Statisticians!
In this course we’ll look at common tasks done by data scientists:
- Reading in raw data and manipulating it
- Combining data sources
- Summarizing data to glean insights
- Applying common analysis methods
- Communicating effectively
We’ll adopt the R programming language to do so and learn about using quarto
, git
, and github
to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.
Course Learning Outcomes
At the end of this course students will be able to
- explain the steps and purpose of programs
- efficiently read in, combine, and manipulate data
- utilize help and other resources to customize programs
- write programs using good programming practices
- explore data and perform common analyses
- create reports, web pages, and dashboards to display and communicate results
Weekly To-do List
Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, project information, and exam information.
Getting Help!
To obtain course help there are a number of options:
- Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
- E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
- Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!
Summer 2025 Course Schedule
Topic/WeekWeek 1 5/14-5/16 W-F |
Learning Materials01 - Read - What is Data Science? |
AssignmentsHW 1 due Tu 5/20 |
Code-alongsCode-alongs (optional attendance) on Thursdays |
|
Week 2 5/19-5/23 M-F |
07 - Base R Data Structures: Vectors 11 - Control Flow: Logicals & if/then/else |
HW 2 due Tu, 5/27 | ||
Week 3 5/27-5/30 T-F |
15 - Packages 20 - Manipulating Data with tidyr |
HW 3 due Tu, 6/3 | ||
Week 4 6/2-6/6 M-F |
23 - EDA Concepts 24 - Summarizing Categorical Variables 25 - Barplots & ggplot2 Basics 26 - Numerical Variable Summaries 27 - Numerical Variable Graphs & More ggplot2 |
Project 1 due Tu, 6/17 | ||
Week 5 6/9-6/13 M-F |
No new material | Exam-1 (Wednesday or Thursday) | ||
Week 6 M-W, F 6/16-6/18, 6/20 |
28 - Big Recap 32 - Querying APIs & Dealing with JSON Data |
HW 4 due Tu, 6/24 | ||
Week 7 M-F 6/23-6/27 |
33 - R Shiny Basics & UI 34 -R Shiny Server |
Project 2 due Tu, 7/8 | | | | | | | | | | | ||
Week 8 M-Th 6/30-7/3 |
37 - Modeling Concepts 43 - LASSO Models |
|||
Week 9 M-F 7/7-7/11 |
45 - Logistic Regression 46 - Regression and Classification Trees 47 - Ensemble Trees (Random Forest; Bagged Trees) 48 - Fitting Classification Trees |
HW 5 due Tu, 7/15 | ||
Week 10 M-F 7/14-7/18 |
49 - Creating an API 50 - Docker Basics 51 - Building a Docker Image 52 - Dockerizing Shiny Apps |
Exam 2 (Wednesday or Thursday) Project Final due 7/29 |
||
Week 11 7/21-7/25 |
No new material. Project work time! | |||
Week 12 M,Tu 7/28-7/29 |
Final Examinations (Project due 7/29) |