layout: false class: title-slide-section-red, middle # LASSO Models Justin Post --- layout: true <div class="my-footer"><img src="data:image/png;base64,#img/logo.png" style="height: 60px;"/></div> --- # Recap - Judge the model's effectiveness using a **Loss** function - Often split data into a training and test set + Perhaps 70/30 or 80/20 - Cross-validation gives a way to use more of the data while still seeing how the model does on test data - Commonly 5 fold or 10 fold is done - Once a best model is chosen, model is refit on entire data set --- # Recap - Judge the model's effectiveness using a **Loss** function - Often split data into a training and test set + Perhaps 70/30 or 80/20 - Cross-validation gives a way to use more of the data while still seeing how the model does on test data - Commonly 5 fold or 10 fold is done - Once a best model is chosen, model is refit on entire data set - Often use both! Let's see why by introducing a model with a **tuning parameter** --- # LASSO Model - [Least Angle Subset and Selection Operator](https://www.jstor.org/stable/2346178) or LASSO + Similar to Least Squares but a penalty is placed on the sum of the absolute values of the regression coefficients + `\(\alpha\)` (>0) is called a tuning parameter `$$\min\limits_{\beta's}\sum_{i=1}^{n}(y_i-(\beta_0+\beta_1x_{1i}+...+\beta_px_{pi}))^2 + \alpha\sum_{j=1}^{p}|\beta_j|$$` --- # LASSO Model - [Least Angle Subset and Selection Operator](https://www.jstor.org/stable/2346178) or LASSO + Similar to Least Squares but a penalty is placed on the sum of the absolute values of the regression coefficients + Sets coefficients to 0 as you 'shrink'! <img src="data:image/png;base64,#img/lasso_path.png" width="450px" style="display: block; margin: auto;" /> --- # Tuning Parameter - When choosing the tuning parameter, we are really considering a **family of models**! - Consider an `\(\alpha = 0.1\)` (small amount of shrinkage here) ```python from sklearn import linear_model lasso = linear_model.Lasso(alpha=0.1) lasso.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values) ``` ```python print(lasso.intercept_,lasso.coef_) ``` ``` ## -164.6120947286609 [ 0.08761607 -0.11092474] ``` --- # Tuning Parameter - When choosing the tuning parameter, we are really considering a **family of models**! - Consider an `\(\alpha = 0.1\)` (small amount of shrinkage here) ```python from sklearn import linear_model lasso = linear_model.Lasso(alpha=0.1) lasso.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values) ``` ```python print(lasso.intercept_,lasso.coef_) ``` ``` ## -164.6120947286609 [ 0.08761607 -0.11092474] ``` - Consider an `\(\alpha = 1.05\)` (a larger amount of shrinkage) ```python lasso = linear_model.Lasso(alpha=1.05) lasso.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values) ``` ```python print(lasso.intercept_,lasso.coef_) ``` ``` ## -86.65630892150766 [ 0.04835598 -0. ] ``` --- # LASSO Fits Visual - Perfect place for CV to help select the best `\(\alpha\)`! ``` ## (-0.09950000000000003, 2.3095000000000003, -0.23074900910594778, 0.10996726863779523) ``` <img src="data:image/png;base64,#27-LASSO_files/figure-html/unnamed-chunk-12-1.svg" width="450px" style="display: block; margin: auto;" /> --- # Using CV to Select the Tuning Parameter - Return the optimal `\(\alpha\)` using `LassoCV` ```python from sklearn.linear_model import LassoCV lasso_mod = LassoCV(cv=5, random_state=0, alphas = np.linspace(0,2.2,100)) \ .fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values) ``` ``` ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization. ## model = cd_fast.enet_coordinate_descent_gram( ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization. ## model = cd_fast.enet_coordinate_descent_gram( ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization. ## model = cd_fast.enet_coordinate_descent_gram( ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization. ## model = cd_fast.enet_coordinate_descent_gram( ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization. ## model = cd_fast.enet_coordinate_descent_gram( ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:1771: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator ## model.fit(X, y) ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:648: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged. ## model = cd_fast.enet_coordinate_descent( ## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:648: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.385e+02, tolerance: 5.358e-02 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead. ## model = cd_fast.enet_coordinate_descent( ``` --- # Using CV to Select the Tuning Parameter - Return the optimal `\(\alpha\)` using `LassoCV` ```python pd.DataFrame(zip(lasso_mod.alphas, lasso_mod.mse_path_), columns = ["alpha_value", "MSE_by_fold"]) ``` ``` ## alpha_value MSE_by_fold ## 0 0.000000 [0.5496710875578059, 0.6805679103740427, 0.500... ## 1 0.022222 [0.5496710875578059, 0.6805679103740427, 0.500... ## 2 0.044444 [0.5496710875578059, 0.6805679103740427, 0.500... ## 3 0.066667 [0.5496710875578059, 0.6805679103740427, 0.500... ## 4 0.088889 [0.5496710875578059, 0.6805679103740427, 0.500... ## .. ... ... ## 95 2.111111 [0.30465461356828655, 0.3626276356362589, 0.19... ## 96 2.133333 [0.2998496943347467, 0.35411026477928403, 0.18... ## 97 2.155556 [0.29626758731409036, 0.3464595664117418, 0.18... ## 98 2.177778 [0.2939082925063059, 0.3396755405336323, 0.182... ## 99 2.200000 [0.2927721424754243, 0.33375857421410604, 0.18... ## ## [100 rows x 2 columns] ``` --- # Using CV to Select the Tuning Parameter - Return the optimal `\(\alpha\)` using `LassoCV` ```python fit_info = np.array(list(zip(lasso_mod.alphas_, np.mean(lasso_mod.mse_path_, axis = 1)))) fit_info[fit_info[:,0].argsort()] ``` ``` ## array([[0. , 0.26832555], ## [0.02222222, 0.26904464], ## [0.04444444, 0.27086204], ## [0.06666667, 0.27377741], ## [0.08888889, 0.27779157], ## [0.11111111, 0.28290414], ## [0.13333333, 0.28911508], ## [0.15555556, 0.29642441], ## [0.17777778, 0.30483239], ## [0.2 , 0.30967893], ## [0.22222222, 0.31098715], ## [0.24444444, 0.31159763], ## [0.26666667, 0.31226415], ## [0.28888889, 0.31298674], ## [0.31111111, 0.31376537], ## [0.33333333, 0.31460007], ## [0.35555556, 0.31549081], ## [0.37777778, 0.31643761], ## [0.4 , 0.31744046], ## [0.42222222, 0.31849937], ## [0.44444444, 0.31961433], ## [0.46666667, 0.32078535], ## [0.48888889, 0.32201242], ## [0.51111111, 0.32329554], ## [0.53333333, 0.32463472], ## [0.55555556, 0.32602995], ## [0.57777778, 0.32748123], ## [0.6 , 0.32898857], ## [0.62222222, 0.33055197], ## [0.64444444, 0.33217142], ## [0.66666667, 0.33384692], ## [0.68888889, 0.33557847], ## [0.71111111, 0.33736608], ## [0.73333333, 0.33920974], ## [0.75555556, 0.34110946], ## [0.77777778, 0.34306523], ## [0.8 , 0.34507706], ## [0.82222222, 0.34714494], ## [0.84444444, 0.34926887], ## [0.86666667, 0.35144886], ## [0.88888889, 0.3536849 ], ## [0.91111111, 0.355977 ], ## [0.93333333, 0.35832515], ## [0.95555556, 0.36072935], ## [0.97777778, 0.36318961], ## [1. , 0.36570592], ## [1.02222222, 0.36827828], ## [1.04444444, 0.3709067 ], ## [1.06666667, 0.37359118], ## [1.08888889, 0.3763317 ], ## [1.11111111, 0.37912829], ## [1.13333333, 0.38198092], ## [1.15555556, 0.38488961], ## [1.17777778, 0.38785435], ## [1.2 , 0.39087515], ## [1.22222222, 0.393952 ], ## [1.24444444, 0.39708491], ## [1.26666667, 0.40027387], ## [1.28888889, 0.40351888], ## [1.31111111, 0.40681995], ## [1.33333333, 0.41017707], ## [1.35555556, 0.41359025], ## [1.37777778, 0.41705948], ## [1.4 , 0.42058476], ## [1.42222222, 0.4241661 ], ## [1.44444444, 0.42780349], ## [1.46666667, 0.43149694], ## [1.48888889, 0.43524644], ## [1.51111111, 0.43905199], ## [1.53333333, 0.4429136 ], ## [1.55555556, 0.44683126], ## [1.57777778, 0.45080497], ## [1.6 , 0.45483474], ## [1.62222222, 0.45892057], ## [1.64444444, 0.46306244], ## [1.66666667, 0.46726038], ## [1.68888889, 0.47151436], ## [1.71111111, 0.4758244 ], ## [1.73333333, 0.4801905 ], ## [1.75555556, 0.48461264], ## [1.77777778, 0.48853102], ## [1.8 , 0.49159568], ## [1.82222222, 0.49469748], ## [1.84444444, 0.4978364 ], ## [1.86666667, 0.50012924], ## [1.88888889, 0.50220551], ## [1.91111111, 0.50430831], ## [1.93333333, 0.50643762], ## [1.95555556, 0.50859345], ## [1.97777778, 0.51077579], ## [2. , 0.51225503], ## [2.02222222, 0.51284361], ## [2.04444444, 0.51344202], ## [2.06666667, 0.51405027], ## [2.08888889, 0.51466834], ## [2.11111111, 0.51496947], ## [2.13333333, 0.51496947], ## [2.15555556, 0.51496947], ## [2.17777778, 0.51496947], ## [2.2 , 0.51496947]]) ``` - Best alpha give by the `.alpha_` attribute ```python lasso_mod.alpha_ ``` ``` ## 0.0 ``` --- # Using CV to Select the Tuning Parameter - Now fit using that optimal `\(\alpha\)` ```python lasso_best = linear_model.Lasso(lasso_mod.alpha_) #warning thrown since we are using 0, but can ignore lasso_best.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values) ``` ```python print(lasso_best.intercept_,lasso_best.coef_) ``` ``` ## -148.79329107788135 [ 0.0803366 -0.22686129] ``` --- # So Do We Just Need CV? Sometimes! - If you are only considering one type of model, you can use just a training/test set or k-fold CV to select the best version of that model - When you have multiple types of models to choose from, usually use both! + When we use the test set too much, we may have '**data leakage**' + Essentially we end up training our models to the test set by using it too much --- # Training/Validation/Test or CV/Test - Instead, we sometimes split into a training, validation, and test set - CV can be used to replace the validation set! <img src="data:image/png;base64,#img/training_validation_test.png" width="600px" style="display: block; margin: auto;" /> --- # Training/Validation/Test or CV/Test - Instead, we sometimes split into a training, validation, and test set - CV can be used to replace the validation set! <img src="data:image/png;base64,#img/training_validation_test.png" width="600px" style="display: block; margin: auto;" /> - Compare only the **best** model from each model type on the test set --- # Recap - LASSO is similar to an MLR model but shrinks coefficients and may set some to 0 + Tuning parameter must be chosen (usually by CV) - Training/Test split gives us a way to validate our model's performance - CV can be used on the training set to select **tuning parameters** - Helps determine the 'best' model for a class of models - With many competing model types, determine the best from each type check performance on the test set