Data Science for Statisticians
Welcome to ST 558 - Data Science for Statisticians!
In this course we’ll look at common tasks done by data scientists:
- Reading in raw data and manipulating it
- Combining data sources
- Summarizing data to glean insights
- Applying common analysis methods
- Communicating effectively
We’ll adopt the R programming language to do so and learn about using quarto
, git
, and github
to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.
Course Learning Outcomes
At the end of this course students will be able to
- explain the steps and purpose of programs
- efficiently read in, combine, and manipulate data
- utilize help and other resources to customize programs
- write programs using good programming practices
- explore data and perform common analyses
- create reports, web pages, and dashboards to display and communicate results
Weekly To-do List
Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments (see the syllabus on Moodle for homework policies).
- There will be two exams and the exam windows (days when you can take the exams) are available on the syllabus and course schedule.
- There will be three projects, the third of which will count as the final for the course. These will require a reasonably substantial time commitment.
Getting Help!
To obtain course help there are a number of options:
- Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
- E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
- Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!
Fall 2024 Course Schedule
Topic/Week | Learning Materials | Assignments |
---|---|---|
Week 1 8/19-8/23 M-F |
Read - Week 1 Overview 01 - Read - What is Data Science? 02 - Watch - Workflows & Git/GitHub Basics 03 - Read - Git & GitHub Practice 04 - Watch - R Basics 05 - Read & Watch - R projects and Connecting with Github 06 - Read & Watch - Quarto |
HW 1 due W, 8/28 |
Week 2 8/26-8/30 M-F |
07 - Base R Data Structures: Vectors 08 - Base R Data Structures: Matrices 09 - Base R Data Structures: Data Frames 10 - Base R Data Structures: Lists |
HW 2 due W, 9/4 |
Week 3 9/3-9/6 T-F (Off M) |
11 - Control Flow: Logicals & if/then/else 12 - Control Flow: Loops 13 - Control Flow: Vectorized Functions 14 - Writing Functions |
HW 3 due W, 9/11 |
Week 4 9/9-9/13 M-F |
15 - Packages 16 - Tidyverse Essentials 17 - Reading Delimited Data 18 - Reading Excel Data 19 - Manipulating Data with dplyr |
HW 4 due W, 9/18 |
Week 5 9/16-9/20 M, W-F (Off T) |
20 - Manipulating Data with tidyr 21 - Connecting to Databases 22 - SQL Style Joins 23 - Querying APIs & Dealing with JSON Data |
Project 1 due W, 10/2 |
Week 6 9/23-9/27 M-F |
No new material. Exam study time and project work time! | Exam 1 - Th-F, 9/26-9/27 |
Week 7 9/30-10/4 M-F |
24 - EDA Concepts 25 - Summarizing Categorical Variables 26 - Barplots & ggplot2 Basics 27 - Numerical Variable Summaries 28 - Numerical Variable Graphs & More ggplot2 |
HW 5 due W, 10/9 |
Week 8 10/7-10/11 M-F |
28 - Big Recap 29 - Apply Family of Functions 30 - purrr & List Columns 31 - Advanced Function Writing |
HW 6 due W, 10/16 |
Week 9 1 0/16-10/18 Off M/T |
32 - Introduction to RShiny 32.5 - Selected Shiny Tutorials 33 - Connecting the UI and Server 33.5 - More selected Tutorials 34 - Reactivity 34.5 - Build an App Tutorial |
HW 7 due W, 10/23 |
Week 10 1 0/21-10/25 M-F |
35 - Dynamic UIs 36 - UI Layouts 36.1 - Sharing an App via shinyapps.io 36.2 - Sharing an App via GitHub 37 - Debugging & Useful Things 37.1 - Control Reactivity with isolate() |
Project 2 due W, 11/6 |
Week 11 10/28-11/1 |
No new material. Project work time! | |
Week 12 11/4-11/8 M-F |
38 - Modeling Concepts: Inference vs Prediction 39 - Prediction & Training/Test Set Ideas 40 - Cross Validation 41 - Multiple Linear Regression Models 42 - Modeling with tidymodels 42.5 - tidymodels Tutorial |
HW 8 due W, 11/13 |
Week 13 1 1/11-11/15 M-F |
43 - LASSO Models & Selecting Models with Both CV and a Test Set 44 - Modeling Recap 45 - Logistic Regression Models 46 - Regression & Classification Trees 47 - Ensemble Learning: Bagged Trees & Random Forests |
HW 9 due W, 11/20 |
Week 14 1 1/18-11/22 M-F |
48 - Creating an API 49 - Docker Basics 50 - Building a Docker Image 51 - Dockerizing a Shiny App |
Exam 2 - Th-F, 11/21-11/22 |
Week 15-16 1 1/25-11/26, 12/2-12/3 M-T, M-T |
No new material. Final project work time! | Final Project due Th, 12/5 |