Week 8 Overview

Published

2025-07-15

Welcome to week 8!

It’s time to start modeling data!

Statistical modeling is a way to help us and others understand, interpret and make informed decisions based on data. We will also spend a lot of time on using different techniques to help us pick a best performing model from many. In addition, we will learn how to model data in the tidy framework using tidymodels. R tidymodels aims to do the same but for machine learning. It presents itself as a one-stop shop for everything ML-related, from processing data to training and evaluation models. It’s an ecosystem of its own.

Week 8 Additional Readings/Learning Materials

tidymodels homepage for you to explore

short video on testing vs training data sets

Linear regression with multiple predictors

TidyTuesday LASSO demonstration

Extra Video I highly recommend

Although this course does not have a project, I want to challenge us to start thinking about how to present data in a way that makes sense. We have a problem in the scientific community when it comes to outreach. If we can not communicate with the general public, then what we do does not hold much value.

This is one of my favorite videos on how to communicate a story with data. Please see Hans Rosling here.

Learning Objectives

The learning objectives for this week can be thought of at a high level. Modeling data can be equated to storytelling as much as anything else. I want everyone to put a heavy emphasis on being able to articulate / justify each procedure that you do throughout the modeling process. Such as:

– Can you define what a testing / training data set are? Can you articulate why this is a useful strategy to help pick a best performing model?

– What is MLR? When is it appropriate to use MLR?

– What are LASSO models? When are they appropriate? What do they help accomplish?

Can you explain all of this to a non-statisticians / data scientists?

Let’s have a wonderful week 8! As always, please reach out if you have any questions.