class: center, middle, inverse, title-slide .title[ # Loss Functions & Model Performance ] .author[ ### Justin Post ] --- layout: false class: title-slide-section-red, middle # Loss Functions & Model Performance Justin Post --- layout: true <div class="my-footer"><img src="data:image/png;base64,#img/logo.png" style="height: 60px;"/></div> --- # Loss Functions - Loss functions are the function that we use to **fit** or **train** our model --- # Loss Functions - Loss functions are the function that we use to **fit** or **train** our model - Ex: MLR with minimizing the sum of squared errors + Response: `\(y\)` = brozek score + Predictors: `\(x_1\)` = age, `\(x_2\)` = height, ... `$$\min\limits_{\beta's}\sum_{i=1}^{n}(y_i-(\beta_0+\beta_1x_{1i}+...+\beta_px_{pi}))^2$$` --- # Loss Functions - Loss functions are the function that we use to **fit** or **train** our model - Ex: MLR with minimizing the sum of squared errors plus a penalty + Response: `\(y\)` = brozek score + Predictors: `\(x_1\)` = age, `\(x_2\)` = height, ... `$$\min\limits_{\beta's}\sum_{i=1}^{n}(y_i-(\beta_0+\beta_1x_{1i}+...+\beta_px_{pi}))^2 + \alpha\sum_{j=1}^{p}|\beta_j|$$` --- # Loss Functions - Loss functions are the function that we use to **fit** or **train** our model - Ex: MLR with minimizing the mean absolute error + Response: `\(y\)` = brozek score + Predictors: `\(x_1\)` = age, `\(x_2\)` = height, ... `$$\min\limits_{\beta's}\sum_{i=1}^{n}\left|y_i-(\beta_0+\beta_1x_{1i}+...+\beta_px_{pi})\right|$$` --- # Loss Functions - Loss functions are the function that we use to **fit** or **train** our model - Ex: Logistic Regression with (negative) binary cross entropy + Response: `\(y\)` = Potability (1 or 0) + Predictors: `\(x_1\)` = Hardness, `\(x_2\)` = Chloramines, ... `$$\min\limits_{\beta's} = -\sum_{i=1}^{n}(y_i log(p(x_1,...,x_n))+(1-y_i)log(1-p(x_1,...,x_n)))$$` where `\(p(x_1,...,x_n) = \frac{1}{1+e^{-\beta_0-\beta_1x_{1i}-...-\beta_px_{pi}}}\)` --- # Loss Functions - Loss functions are the function that we use to **fit** or **train** our model - Ex: Logistic Regression with (negative) binary cross entropy and penalty + Response: `\(y\)` = Potability (1 or 0) + Predictors: `\(x_1\)` = Hardness, `\(x_2\)` = Chloramines, ... `$$\min\limits_{\beta's} = -\sum_{i=1}^{n}(y_i log(p(x_1,...,x_n))+(1-y_i)log(1-p(x_1,...,x_n))) + \lambda\sum_{i=1}^{p}\beta_j^2$$` where `\(p(x_1,...,x_n) = \frac{1}{1+e^{-\beta_0-\beta_1x_{1i}-...-\beta_px_{pi}}}\)` --- # Model Metric - Model metrics are used to determine the quality of the predictions - Pretty much any loss function can also act as a metric! - Often choose to use the same loss function used as the metric --- # Model Metric - Model metrics are used to determine the quality of the predictions - Pretty much any loss function can also act as a metric! - Often choose to use the same loss function used as the metric - Ex: + Fit 'usual' least squares regression (minimize sum of squared errors) + Determine quality with RMSE or mean absolute error (MAE) --- # Model Metric - Model metrics are used to determine the quality of the predictions - Pretty much any loss function can also act as a metric! - Often choose to use the same loss function used as the metric - Ex: + Fit (MLR) LASSO model (minimize sum of squared errors subject to L1 penalty) + Determine quality with RMSE or MAE --- # Model Metric - Model metrics are used to determine the quality of the predictions - Pretty much any loss function can also act as a metric! - Often choose to use the same loss function used as the metric - Ex: + Fit Logistic Regression model (minimize (negative) binary cross entropy) + Determine quality with (negative) binary cross entropy (`neg_log_loss`) or accuracy --- # Other Commonly Used Model Metrics For a categorical response, many rely on: <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#img/confusion_matrix.jpg" alt="From google's ML crash course" width="850px" /> <p class="caption">From google's ML crash course</p> </div> --- # Other Commonly Used Model Metrics For a categorical response: - Accuracy = `\(\frac{TP + TN}{TP+TN+FP+FN}\)` - Precision = `\(\frac{TP}{TP+FP}\)` - Recall (or True positive rate, TPR) = `\(\frac{TP}{TP+FN}\)` - False Positive Rate (FPR) = `\(\frac{FP}{FP+TN}\)` --- # Other Commonly Used Model Metrics For a categorical response: - Accuracy = `\(\frac{TP + TN}{TP+TN+FP+FN}\)` - Precision = `\(\frac{TP}{TP+FP}\)` - Recall (or True positive rate) = `\(\frac{TP}{TP+FN}\)` - False Positive Rate (FPR) = `\(\frac{FP}{FP+TN}\)` Built off of these ideas - Receiver Operating Characteristic (ROC) curve - Plots FPR vs TPR at different classification thresholds - Area under ROC curve often used! --- # Note: Model Selection Without Training/Test - For a numeric response, these are just calculated on the training data - AIC - AICc - BIC - Mallow's Cp - Adjusted R-squared - Can be used to select a model without a training/test split --- # Recap - Loss functions are used during model fitting - Model metrics are used to evaluate a model + Can be the same! + Often still call it a loss function when using as a metric