Data Science for Statisticians
Welcome to ST 558 - Data Science for Statisticians!
In this course we’ll look at common tasks done by data scientists:
- Reading in raw data and manipulating it
- Combining data sources
- Summarizing data to glean insights
- Applying common analysis methods
- Communicating effectively
We’ll adopt the R programming language to do so and learn about using quarto
, git
, and github
to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.
Course Learning Outcomes
At the end of this course students will be able to
- explain the steps and purpose of programs
- efficiently read in, combine, and manipulate data
- utilize help and other resources to customize programs
- write programs using good programming practices
- explore data and perform common analyses
- create reports, web pages, and dashboards to display and communicate results
Weekly To-do List
Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments. We’ll have some projects and exams as well. Please see the syllabus on Moodle for homework policies, project information, and exam information.
Getting Help!
To obtain course help there are a number of options:
- Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
- E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
- Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!
Summer 2025 Course Schedule
Topic/Week | Learning Materials | Assignments | Code-alongs |
---|---|---|---|
Week 1 5/14-5/16 W-F |
00 - Watch - Welcome to the Course 01 - Read - What is Data Science? 02 - Watch - Workflows & Git/GitHub Basics 03 - Read - Git & GitHub Practice 04 - Watch - R Basics 05 - Read & Watch - R projects and Connecting with Github 06 - Read & Watch - Quarto |
HW 1 due Tu 5/20 | Code-alongs (optional attendance) on Thursdays |
Week 2 5/19-5/23 M-F |
07 - Base R Data Structures: Vectors 11 - Control Flow: Logicals & if/then/else |
HW 2 due Tu, 5/27 | |
Week 3 5/27-5/30 T-F |
15 - Packages 20 - Manipulating Data with tidyr |
HW 3 due Tu, 6/3 | |
Week 4 6/2-6/6 M-F |
23 - EDA Concepts 24 - Summarizing Categorical Variables 25 - Barplots & ggplot2 Basics 26 - Numerical Variable Summaries 27 - Numerical Variable Graphs & More ggplot2 |
Project 1 due Tu, 6/17 | |
Week 5 6/9-6/13 M-F |
No new material | Exam-1 (Wednesday or Thursday) | |
Week 6 M-W, F 6/16-6/18, 6/20 |
28 - Big Recap 32 - Querying APIs & Dealing with JSON Data |
HW 4 due Tu, 6/24 | |
Week 7 M-F 6/23-6/27 |
33 - R Shiny Basics & UI 34 -R Shiny Server |
Project 1 due M, 7/7 | |
Week 8 M-Th 6/30-7/3 |
37 - Simple Linear Regression 43 - Generalized Linear Models |
||
Week 9 M-F 7/7-7/11 |
46 - k Nearest Neighbors 52 - Boosted Trees |
HW 5 due Tu, 7/15 | |
Week 10 M-F 7/14-7/18 |
56 - Creating an API 57 - Installing Docker 58 - Docker Containers 59 - Dockerizing Shiny Apps |
Exam 2 (Wednesday or Thursday) | |
Week 11 10/27-10/31 |
No new material. Project work time! |