LASSO Model

layout: false
class: title-slide-section-red, middle

# LASSO Models
Justin Post

---
layout: true

---

# Recap

- Judge the model's effectiveness using a **Loss** function

- Often split data into a training and test set
    + Perhaps 70/30 or 80/20
    
- Cross-validation gives a way to use more of the data while still seeing how the model does on test data
    - Commonly 5 fold or 10 fold is done
    - Once a best model is chosen, model is refit on entire data set

---

# Recap

- Judge the model's effectiveness using a **Loss** function

- Often use both! Let's see why by introducing a model with a **tuning parameter**

---

# LASSO Model

- [Least Angle Subset and Selection Operator](https://www.jstor.org/stable/2346178) or LASSO
    + Similar to Least Squares but a penalty is placed on the sum of the absolute values of the regression coefficients
    + `$\alpha$` (>0) is called a tuning parameter
  
`$$\min\limits_{\beta's}\sum_{i=1}^{n}(y_i-(\beta_0+\beta_1x_{1i}+...+\beta_px_{pi}))^2 + \alpha\sum_{j=1}^{p}|\beta_j|$$`

---

# LASSO Model

---

# Tuning Parameter

- When choosing the tuning parameter, we are really considering a **family of models**!

- Consider an `$\alpha = 0.1$` (small amount of shrinkage here)

```python
from sklearn import linear_model
lasso = linear_model.Lasso(alpha=0.1) 
lasso.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values)
```

```python
print(lasso.intercept_,lasso.coef_)
```

```
## -164.6120947286609 [ 0.08761607 -0.11092474]
```

---

# Tuning Parameter

- When choosing the tuning parameter, we are really considering a **family of models**!

- Consider an `$\alpha = 0.1$` (small amount of shrinkage here)

```python
from sklearn import linear_model
lasso = linear_model.Lasso(alpha=0.1) 
lasso.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values)
```

```python
print(lasso.intercept_,lasso.coef_)
```

```
## -164.6120947286609 [ 0.08761607 -0.11092474]
```

- Consider an `$\alpha = 1.05$` (a larger amount of shrinkage)

```python
lasso = linear_model.Lasso(alpha=1.05) 
lasso.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values)
```

```python
print(lasso.intercept_,lasso.coef_)
```

```
## -86.65630892150766 [ 0.04835598 -0.        ]
```

---

# LASSO Fits Visual

- Perfect place for CV to help select the best `$\alpha$`!

```
## (-0.09950000000000003, 2.3095000000000003, -0.23074900910594778, 0.10996726863779523)
```

---

# Using CV to Select the Tuning Parameter

- Return the optimal `$\alpha$` using `LassoCV`

```python
from sklearn.linear_model import LassoCV
lasso_mod = LassoCV(cv=5, random_state=0, alphas = np.linspace(0,2.2,100)) \
    .fit(bike_data[["year", "log_km_driven"]].values,
          bike_data["log_selling_price"].values)
```

```
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization.
##   model = cd_fast.enet_coordinate_descent_gram(
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization.
##   model = cd_fast.enet_coordinate_descent_gram(
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization.
##   model = cd_fast.enet_coordinate_descent_gram(
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization.
##   model = cd_fast.enet_coordinate_descent_gram(
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:634: UserWarning: Coordinate descent without L1 regularization may lead to unexpected results and is discouraged. Set l1_ratio > 0 to add L1 regularization.
##   model = cd_fast.enet_coordinate_descent_gram(
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:1771: UserWarning: With alpha=0, this algorithm does not converge well. You are advised to use the LinearRegression estimator
##   model.fit(X, y)
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:648: UserWarning: Coordinate descent with no regularization may lead to unexpected results and is discouraged.
##   model = cd_fast.enet_coordinate_descent(
## C:\python\lib\site-packages\sklearn\linear_model\_coordinate_descent.py:648: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.385e+02, tolerance: 5.358e-02 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.
##   model = cd_fast.enet_coordinate_descent(
```

---

# Using CV to Select the Tuning Parameter

- Return the optimal `$\alpha$` using `LassoCV`

```python
pd.DataFrame(zip(lasso_mod.alphas, lasso_mod.mse_path_), columns = ["alpha_value", "MSE_by_fold"])
```

```
##     alpha_value                                        MSE_by_fold
## 0      0.000000  [0.5496710875578059, 0.6805679103740427, 0.500...
## 1      0.022222  [0.5496710875578059, 0.6805679103740427, 0.500...
## 2      0.044444  [0.5496710875578059, 0.6805679103740427, 0.500...
## 3      0.066667  [0.5496710875578059, 0.6805679103740427, 0.500...
## 4      0.088889  [0.5496710875578059, 0.6805679103740427, 0.500...
## ..          ...                                                ...
## 95     2.111111  [0.30465461356828655, 0.3626276356362589, 0.19...
## 96     2.133333  [0.2998496943347467, 0.35411026477928403, 0.18...
## 97     2.155556  [0.29626758731409036, 0.3464595664117418, 0.18...
## 98     2.177778  [0.2939082925063059, 0.3396755405336323, 0.182...
## 99     2.200000  [0.2927721424754243, 0.33375857421410604, 0.18...
## 
## [100 rows x 2 columns]
```

---

# Using CV to Select the Tuning Parameter

- Return the optimal `$\alpha$` using `LassoCV`

```python
fit_info = np.array(list(zip(lasso_mod.alphas_, np.mean(lasso_mod.mse_path_, axis = 1))))
fit_info[fit_info[:,0].argsort()]
```

```
## array([[0.        , 0.26832555],
##        [0.02222222, 0.26904464],
##        [0.04444444, 0.27086204],
##        [0.06666667, 0.27377741],
##        [0.08888889, 0.27779157],
##        [0.11111111, 0.28290414],
##        [0.13333333, 0.28911508],
##        [0.15555556, 0.29642441],
##        [0.17777778, 0.30483239],
##        [0.2       , 0.30967893],
##        [0.22222222, 0.31098715],
##        [0.24444444, 0.31159763],
##        [0.26666667, 0.31226415],
##        [0.28888889, 0.31298674],
##        [0.31111111, 0.31376537],
##        [0.33333333, 0.31460007],
##        [0.35555556, 0.31549081],
##        [0.37777778, 0.31643761],
##        [0.4       , 0.31744046],
##        [0.42222222, 0.31849937],
##        [0.44444444, 0.31961433],
##        [0.46666667, 0.32078535],
##        [0.48888889, 0.32201242],
##        [0.51111111, 0.32329554],
##        [0.53333333, 0.32463472],
##        [0.55555556, 0.32602995],
##        [0.57777778, 0.32748123],
##        [0.6       , 0.32898857],
##        [0.62222222, 0.33055197],
##        [0.64444444, 0.33217142],
##        [0.66666667, 0.33384692],
##        [0.68888889, 0.33557847],
##        [0.71111111, 0.33736608],
##        [0.73333333, 0.33920974],
##        [0.75555556, 0.34110946],
##        [0.77777778, 0.34306523],
##        [0.8       , 0.34507706],
##        [0.82222222, 0.34714494],
##        [0.84444444, 0.34926887],
##        [0.86666667, 0.35144886],
##        [0.88888889, 0.3536849 ],
##        [0.91111111, 0.355977  ],
##        [0.93333333, 0.35832515],
##        [0.95555556, 0.36072935],
##        [0.97777778, 0.36318961],
##        [1.        , 0.36570592],
##        [1.02222222, 0.36827828],
##        [1.04444444, 0.3709067 ],
##        [1.06666667, 0.37359118],
##        [1.08888889, 0.3763317 ],
##        [1.11111111, 0.37912829],
##        [1.13333333, 0.38198092],
##        [1.15555556, 0.38488961],
##        [1.17777778, 0.38785435],
##        [1.2       , 0.39087515],
##        [1.22222222, 0.393952  ],
##        [1.24444444, 0.39708491],
##        [1.26666667, 0.40027387],
##        [1.28888889, 0.40351888],
##        [1.31111111, 0.40681995],
##        [1.33333333, 0.41017707],
##        [1.35555556, 0.41359025],
##        [1.37777778, 0.41705948],
##        [1.4       , 0.42058476],
##        [1.42222222, 0.4241661 ],
##        [1.44444444, 0.42780349],
##        [1.46666667, 0.43149694],
##        [1.48888889, 0.43524644],
##        [1.51111111, 0.43905199],
##        [1.53333333, 0.4429136 ],
##        [1.55555556, 0.44683126],
##        [1.57777778, 0.45080497],
##        [1.6       , 0.45483474],
##        [1.62222222, 0.45892057],
##        [1.64444444, 0.46306244],
##        [1.66666667, 0.46726038],
##        [1.68888889, 0.47151436],
##        [1.71111111, 0.4758244 ],
##        [1.73333333, 0.4801905 ],
##        [1.75555556, 0.48461264],
##        [1.77777778, 0.48853102],
##        [1.8       , 0.49159568],
##        [1.82222222, 0.49469748],
##        [1.84444444, 0.4978364 ],
##        [1.86666667, 0.50012924],
##        [1.88888889, 0.50220551],
##        [1.91111111, 0.50430831],
##        [1.93333333, 0.50643762],
##        [1.95555556, 0.50859345],
##        [1.97777778, 0.51077579],
##        [2.        , 0.51225503],
##        [2.02222222, 0.51284361],
##        [2.04444444, 0.51344202],
##        [2.06666667, 0.51405027],
##        [2.08888889, 0.51466834],
##        [2.11111111, 0.51496947],
##        [2.13333333, 0.51496947],
##        [2.15555556, 0.51496947],
##        [2.17777778, 0.51496947],
##        [2.2       , 0.51496947]])
```

- Best alpha give by the `.alpha_` attribute

```python
lasso_mod.alpha_
```

```
## 0.0
```

---

# Using CV to Select the Tuning Parameter

- Now fit using that optimal `$\alpha$`

```python
lasso_best = linear_model.Lasso(lasso_mod.alpha_) #warning thrown since we are using 0, but can ignore
lasso_best.fit(bike_data[["year", "log_km_driven"]].values, bike_data["log_selling_price"].values)
```

```python
print(lasso_best.intercept_,lasso_best.coef_)
```

```
## -148.79329107788135 [ 0.0803366  -0.22686129]
```

---

# So Do We Just Need CV?

Sometimes!

- If you are only considering one type of model, you can use just a training/test set or k-fold CV to select the best version of that model

- When you have multiple types of models to choose from, usually use both!
    + When we use the test set too much, we may have '**data leakage**'
    + Essentially we end up training our models to the test set by using it too much
    
---

# Training/Validation/Test or CV/Test

- Instead, we sometimes split into a training, validation, and test set
- CV can be used to replace the validation set!

---

# Training/Validation/Test or CV/Test

- Instead, we sometimes split into a training, validation, and test set
- CV can be used to replace the validation set!

- Compare only the **best** model from each model type on the test set

---

# Recap

- LASSO is similar to an MLR model but shrinks coefficients and may set some to 0
    + Tuning parameter must be chosen (usually by CV)
    
- Training/Test split gives us a way to validate our model's performance
    - CV can be used on the training set to select **tuning parameters**
    - Helps determine the 'best' model for a class of models 
    
- With many competing model types, determine the best from each type check performance on the test set