Lasso Regression clearly explained

Bhabani Shankear Basak

Data Science Trainer at KnowledgeHut upGrad

Published Oct 25, 2020

+ Follow

Before talking about Lasso Regression, let us do a quick recap of Ridge Regression. In Ridge Regression, we minimize

SSE+λ∗(slope)^2

where SSE = Sum of squares of residuals or errors, λ = severity of the penalty, slope = y-intercept

We know that the basic idea behind Ridge Regression is to get a line that will fit the testing data well at the cost of some increased error in the training data set. That means by bringing in some bias we are getting a huge drop in the variance.

When λ = 0, Ridge Regression line = SSE+λ∗(slope)^2 = SSE = Least-squares line As λ increases, Ridge Regression slope decreases. When λ becomes very large, the Ridge Regression slope will shrink to a great extent but it will never be equal to zero.

Now, coming to Lasso Regression. There is a lot of similarity between Ridge Regression and Lasso Regression. Both help us to overcome the problem of overfitting. But unlike Ridge Regression, where we minimize SSE+λ∗(slope)^2, in Lasso Regression, we minimize SSE+λ∗ abs(slope), where abs(slope) denotes the absolute value of the slope.

In other words, in Ridge Regression we take the square of the slope whereas, in Lasso Regression, we take the absolute value of the slope.

Why taking the absolute value of the slope so important?

When λ = 0, Lasso Regression line = Least-squares line As the value of λ increases, the slope gets smaller. If we take λ to be very large, the slope will exactly become equal to 0 which is never possible in Ridge Regression.

So, if we consider a Sales example, we can write

Sales = β0+slope∗(Advertisement)+β1∗(Price of the Product)+β2∗(Size of the Shop)

and the list can go on.

Now using some common sense we can say that not all the independent variables are important in predicting Sales.

Let us say, here the Size of the Shop is not an important feature in predicting the Sales of the product as all the shops are of the same size in that location.

So, if we run Ridge Regression, as λ increases to a large extent, slope and β1will shrink to a small extent (as the features Advertisement and Price of the Product are important in predicting Sales) whereas β2 will shrink to a large extent (as Size of the Shop is not an important feature in predicting Sales) but β2 will never be equal to zero.

On the other hand, if we use Lasso Regression, then as λ increases to a large extent, slope and β1 will shrink to a small extent but β2 will exactly become equal to 0.

Finally, we can say that Lasso Regression helps us in Dimension Reduction i.e features that are not important in predicting the dependent variable will be removed from the model. Hence Lasso Regression is far more superior to Ridge Regression because it helps us in overcoming overfitting as well as in Dimension Reduction. But when most of the variables in the model are useful, we can use Ridge Regression.

Lasso Regression clearly explained

Bhabani Shankear Basak

Data Science Trainer at KnowledgeHut upGrad

More articles by this author

Insights from the community

Others also viewed

Ridge 🏠 vs. Lasso 🎸 Regression: Choose Your Path 🛤️

a. What is logistic regression and how it is different from linear regression? b. What are the model descriptions for the logistic regression? c. How

Logistic Regression: Evaluating Credit Risk

Ridge & Lasso Regression

Assumptions of Linear Regression

Beyond Basics @ Logistic Regression

About Logistic Regression

Regression: Evaluation Techniques

What distinguishes x and y in Regression?

Consider BLUE regressions.

Explore topics

Understanding Gradient Descent

Mar 23, 2021

Understanding Bagging vs Boosting

Mar 20, 2021

Ridge Regression in layman's terms

Oct 18, 2020

Bias-Variance Tradeoff

Sep 30, 2020

Cross-Validation Clearly Explained

Sep 28, 2020