Why Optimization Is Critical to Machine Learning, but Often Overlooked

The key role that optimization plays in Machine Learning (ML) is often overlooked. ML models use optimization to determine how their intermediate predictions can best be improved. Understanding this concept leads to better modeling, because often the objective being optimized must be tailored to the problem.

eBook:

Mining data to extract useful and enduring patterns remains a skill arguably more art than science. Pressure enhances the appeal of early apparent results, but it is all too easy to fool yourself. In this eBook, we briefly describe and illustrate from examples, what we believe are the “Top 10” mistakes of data science, in terms of frequency and seriousness.

Get the eBook

Video Transcription

What do you call a sports car with the engine of a golf cart? Well, you probably don’t call it a sports car. You can’t really cut corners when it comes to car engines. The engine is the heart and the soul of the car.

What about when it comes to machine learning? What is the engine of machine learning? Is it data? Maybe it’s algorithms or the people, the analysts? We would argue that optimization is the engine of machine learning.

An Illustration of Optimization

So to illustrate, let’s look at an example. Let’s say we’re trying to forecast how much new houses are gonna sell for, and we’ve got a data set of houses, how much they sold for, how large they are and other attributes, and we wanna create a machine learning model that can learn from this data set.

In that data set, we have two particular houses. We’ve got house number one that’s sold for 400,000 a couple years ago. And we’ve got house number two which sold for 5 million a couple years ago also. And as the model is learning, as it’s halfway through learning from the data, it’s halfway through forecasts are 1.4 million for the house number one and 6 million for house number two.

So in both cases, the forecast is off by a million but which of these two would you say that the model is doing a better job of forecasting? So you probably say that house two has a better forecast because the error is only 20%. So we could say here, the error for house two is 1 million which is, you know, 20%, which is not that bad. House one though, off by the same amount, but a much larger percentage because the actual price is much lower.

A Surprising Result

So it may surprise you to learn that most machine learning models out of the box will treat these errors equally. They will treat them equally because the value is the same in both the amount that it’s off by. So what this practically means is that as the model continues to learn, it will prioritize learning from the mistakes it made on both models equally which is not ideal because you’d like it to prioritize house number one because that errors are much larger from a percentage basis than house number two.

So what’s going on here? So when you create a machine learning model, it has to have a way of knowing how good its predictions are and how those predictions can be improved at every step of the process. And normally, this is called an objective function. And so you can picture the objective function looking something like this curve up here, where the lowest point on the curve is the best possible model and every other point on the curve represents all the other possible models that could exist for this problem. So optimization is the method that is used to find this ideal model as quickly as possible. So you can see how it operates as an engine of machine learning because it’s the primary thing running when you’re creating a model from data.

How Does Optimization Work?

So how does this work though? How do we use optimization to find the optimal model?

Objective

Well, one thing we could do is we could just start measuring every possible model we could think of. And every time we test a different model, we find out how well that model does on this data. So we test, let’s say up here first, it does really poorly. Remember, the best model has the lowest value. And then, we might test here, and then over here. But eventually, we’d have to test every possible model we could think of, all the way from the left to the right side here, to find the model that’s closest to this optimal minimal point. The problem with this is that would require testing so many models that it would take an almost infinite amount of time. So this is where optimization comes in to make this process much more efficient.

Slope

So optimization allows you to not only measure how good a model is but how it can best be improved. The first method that it uses to do this requires using the slope. So instead of just finding out how good a model is, we also measure how it can be improved. What’s the direction towards the minimum, given that point? What’s the slope of the objective function at that point? And that tells you what direction to go in.

It doesn’t tell you how far to go in that direction. So normally, you take a relatively small step size and you repeat the process. And eventually, you get down to this minimum here where the slope is flat and that’s how you know you’ve reached the minimum. So that takes far, far fewer measurements than trying every possible model that could exist.

Curve

Now there’s another class of methods that uses not only the slope but also the curvature at a given point along the objective function. So we would measure, at this particular point, we would create a kind of guess curve that perfectly matches the curvature of the objective function at that point. So it might look something like this. Now what this does is it allows you to know how far to step in the right direction in addition to which direction to step in. So our next guess would be the minimum of this guess curve which would come down to here and we’d repeat the same process again, and we’d normally get to the bottom much quicker than using just the slope alone.

The problem is it takes more time to measure both the slope and the curvature. So there’s a trade-off between these two methods, but usually, one of these two methods is what you see used in almost any machine learning algorithm. It’s a very complicated process but it can be simplified to these two relatively simple concepts.

So hopefully, that explains how optimization is so important to machine learning. Thanks for watching. Have a great rest of your day.