Weeks 12 & 13 Overview

Published

2025-08-11

This wraps up the content for weeks 10 & 11. Now we require some practice! You should head back to our Moodle site to check out your assessment for this week.

It’s time to start modeling data!

Statistical modeling is a way to help us and others understand, interpret and make informed decisions based on data. We will also spend a lot of time on using different techniques to help us pick a best performing model from many. In addition, we will learn how to model data in the tidy framework using tidymodels. R tidymodels aims to do the same but for machine learning. It presents itself as a one-stop shop for everything ML-related, from processing data to training and evaluation models. It’s an ecosystem of its own.

Weeks 12 & 13 Additional Readings/Learning Materials

Learning Objectives

Upon completion of these two weeks, students will be able to:

Linear Regression Models

  1. describe the idea of supervised learning and compare and contrast it with unsupervised learning

  2. conduct an exploratory data analysis

  3. utilize the lm() function along with formula notation in R to fit linear models

    1. lay out the basic simple linear regression model and explain how the models are commonly fit to data in R
    2. access elements of a fitted lm object
    3. find predictions using a simple linear regression model and provide standard errors, confidence bounds, and prediction bounds
    4. utilize the lm() function and formula notation in R to fit multiple linear regression models and polynomial regression models
    5. define the terms polynomial regression and multiple linear regression
  4. explain the concept of how a multiple linear regression model is usually fit and what the effect of adding in more predictor terms is on the model

    1. find predictions using a multiple linear regression model and provide standard errors, confidence bounds, and prediction bounds
    2. explain how adding categorical predictors changes a linear model
    3. interpret the coefficients of a general linear model and use it for prediction purposes
  5. describe common methods used to select between models

  6. define prediction error, training sets, and test sets

    1. explain why splitting data into training and tests sets is needed
    2. discuss the nature/behavior of predicting on your training set vs predicting on a test set
  7. use a linear regression or logistic regression model to perform classification

    1. describe the type of scenario where logistic regression may be a reasonable modeling choose
    2. define the logistic regression model and state its advantages for modeling binary outcomes
    3. create visuals to help explore binary outcome data appropriate for logistic regression
    4. interpret model coefficients from a logistic regression model
    5. fit logistic regression models in R and use them for prediction purposes (probability, link, odds, or classification)
    6. predict for new data using a logistic regression model

Nonlinear & Ensemble Models

  1. fit and interpret regression and classification trees in R

    1. describe the terms regression tree and classification tree
    2. explain the difference between using a tree based method and using a linear method
    3. provide visuals of tree fits in R
    4. predict using a tree fit in R
    5. roughly break down the steps used in fitting a regression tree
    6. explain the term “greedy” algorithm
    7. prune a fitted tree and describe why this is often needed
    8. compare and contrast the fitting and pruning of regression trees vs classification trees
    9. describe the pros and cons of using tree based methods
  2. select a final model using cross validation

  3. fit and interpret bagged tree and random forests models in R

    1. explain the term ensemble methods and how it can be applied to tree based methods
    2. describe why ensemble methods can often improve predictions
    3. give the cons of using ensemble methods
    4. investigate variable importance measures for ensemble trees
    5. Outline the bagging algorithm
    6. outline the random forest algorithm

Tidymodels Framework and Model Fitting

  1. Use the tidymodels framework to fit and evaluate models

    1. describe how recipes work and why they are useful when fitting models
    2. outline the purpose of how tidymodels creates models by specifying the model type and engine
    3. utilize workflows for fitting models
  2. Compare and contrast using a training/test set, using cross-validation only, and using both the training set with CV and a test set

  3. Describe and explain the pros and cons of commonly used model metrics

  4. Explain the difference between a loss function and a model metric

Use the table of contents on the left or the arrows at the bottom of this page to navigate to the next learning material!