Data Science for Statisticians

Published

2024-12-13

Welcome to ST 558 - Data Science for Statisticians!

In this course we’ll look at common tasks done by data scientists:

We’ll adopt the R programming language to do so and learn about using quarto, git, and github to ensure our data analysis workflow is reproducible, has version control, and can easily include collaborators.

Course Learning Outcomes

At the end of this course students will be able to

  • explain the steps and purpose of programs
  • efficiently read in, combine, and manipulate data
  • utilize help and other resources to customize programs
  • write programs using good programming practices
  • explore data and perform common analyses
  • create reports, web pages, and dashboards to display and communicate results

Weekly To-do List

Generally speaking, each week will have a few videos to watch and readings to do as well as corresponding homework assignments (see the syllabus on Moodle for homework policies).

  • There will be two exams and the exam windows (days when you can take the exams) are available on the syllabus and course schedule.
  • There will be three projects, the third of which will count as the final for the course. These will require a reasonably substantial time commitment.

Getting Help!

To obtain course help there are a number of options:

  • Discussion Forum on Moodle - This should be used for any question you feel comfortable asking and having others view. The TA, other students, and I will answer questions on this board. This will be the fastest way to receive a response!
  • E-mail - If there is a question that you don’t feel comfortable asking the whole class you can use e-mail. The TA and I will be checking daily (during the regular work week).
  • Zoom Office Hour Sessions - These sessions can be used to share screens and have multiple users. You can do text chat, voice, and video. They are great for a class like this!

Fall 2024 Course Schedule

Topic/Week Learning Materials Assignments
Week 1
8/19-8/23
M-F
Read - Week 1 Overview
01 - Read - What is Data Science?
02 - Watch - Workflows & Git/GitHub Basics
03 - Read - Git & GitHub Practice
04 - Watch - R Basics
05 - Read & Watch - R projects and Connecting with Github
06 - Read & Watch - Quarto
HW 1 due W, 8/28
Week 2
8/26-8/30
M-F
07 - Base R Data Structures: Vectors
08 - Base R Data Structures: Matrices
09 - Base R Data Structures: Data Frames
10 - Base R Data Structures: Lists
HW 2 due W, 9/4
Week 3
9/3-9/6
T-F (Off M)
11 - Control Flow: Logicals & if/then/else
12 - Control Flow: Loops
13 - Control Flow: Vectorized Functions
14 - Writing Functions
HW 3 due W, 9/11
Week 4
9/9-9/13
M-F
15 - Packages
16 - Tidyverse Essentials
17 - Reading Delimited Data
18 - Reading Excel Data
19 - Manipulating Data with dplyr
HW 4 due W, 9/18
Week 5
9/16-9/20
M, W-F (Off T)
20 - Manipulating Data with tidyr
21 - Connecting to Databases
22 - SQL Style Joins
23 - Querying APIs & Dealing with JSON Data
Project 1 due W, 10/2
Week 6
9/23-9/27
M-F
No new material. Exam study time and project work time! Exam 1 - Th-F, 9/26-9/27
Week 7
9/30-10/4
M-F
24 - EDA Concepts
25 - Summarizing Categorical Variables
26 - Barplots & ggplot2 Basics
27 - Numerical Variable Summaries
28 - Numerical Variable Graphs & More ggplot2
HW 5 due W, 10/9
Week 8
10/7-10/11
M-F
28 - Big Recap
29 - Apply Family of Functions
30 - purrr & List Columns
31 - Advanced Function Writing
HW 6 due W, 10/16
Week 9
1 0/16-10/18
Off M/T
32 - Introduction to RShiny
32.5 - Selected Shiny Tutorials
33 - Connecting the UI and Server
33.5 - More selected Tutorials
34 - Reactivity
34.5 - Build an App Tutorial
HW 7 due W, 10/23
Week 10
1 0/21-10/25
M-F
35 - Dynamic UIs
36 - UI Layouts
36.1 - Sharing an App via shinyapps.io
36.2 - Sharing an App via GitHub
37 - Debugging & Useful Things
37.1 - Control Reactivity with isolate()
Project 2 due W, 11/6
Week 11
10/28-11/1
No new material. Project work time!
Week 12
11/4-11/8
M-F
38 - Modeling Concepts: Inference vs Prediction
39 - Prediction & Training/Test Set Ideas
40 - Cross Validation
41 - Multiple Linear Regression Models
42 - Modeling with tidymodels
42.5 - tidymodels Tutorial
HW 8 due W, 11/13
Week 13
1 1/11-11/15
M-F
43 - LASSO Models & Selecting Models with Both CV and a Test Set
44 - Modeling Recap
45 - Logistic Regression Models
46 - Regression & Classification Trees
47 - Ensemble Learning: Bagged Trees & Random Forests
HW 9 due W, 11/20
Week 14
1 1/18-11/22
M-F
48 - Creating an API
49 - Docker Basics
50 - Building a Docker Image
51 - Dockerizing a Shiny App
Exam 2 - Th-F, 11/21-11/22
Week 15-16
1 1/25-11/26, 12/2-12/3
M-T, M-T
No new material. Final project work time! Final Project due Th, 12/5