Linear Regression — the power of the simplicity

Agnis Liukis
6 min readJan 6, 2022

Possibly, the most undervalued ML algorithm.

Photo by GR Stocks on Unsplash

What does come into your mind when talking about the bleeding edge machine learning algorithm? Deep networks, transformers? Maybe gradient boosting tree algorithms? Whatever it is, I’m quite sure, that Linear Regression won’t be on the list.

And that shouldn’t be a surprise, as Linear Regression isn’t something we are used to calling the bleeding-edge algorithm in machine learning. It isn’t the most powerful ML technique out there. Linear Regression is one of the simplest ones. But still, it is very powerful and useful. Much more useful than many ML practitioners think. And this article will be exactly about that — the hidden power of Linear Regression.

To uncover this power, we’ll start with a simple example of an imagined situation.

The situation

Let’s imagine a small town. In this town, there are three small stores selling milk. The prices can differ a bit, but in general, they are mostly quite similar, as the town is small and owners of these stores are very well informed about how much milk costs in rival stores. If one store will have a much higher price, nobody will go to this store.

Let’s name these stores “Store A”, “Store B” and “Store C” for simplicity.

Photo by Mike Petrucci on Unsplash

Now, let’s say, we want to predict the price of milk in “Store A” for tomorrow. And as features we can use the current milk prices from all the 3 stores. In reality, in a similar situation, we would want to use also different historical price features, like prices of yesterday, average prices for the previous weeks, and so on. But for the sake of simplicity, in this example, we’ll use only 3 mentioned features — current milk prices in all the stores.

First, let’s think, how would we approach this problem without the help of machine learning. The most straightforward solution for predicting the milk price of tomorrow in “Store A” could be to use some kind of weighted average of today’s prices. E.g.:

[Price in A] * 0.5 + [Price in B] * 0.3 +…

Agnis Liukis

Software Architect, Data Scientist, Kaggle Grandmaster (https://www.kaggle.com/alijs1). Sometimes I write about Python, Data Science and Machine Learning…