Least Squares Regression with NYC Skyscrapers

Updated: Sep 8, 2020

How's your labor day weekend so far? I had a great time with my family at Staten Island (my fav place in NY). An interesting idea came into my mind when I saw the tall buildings in New York from the top of the island. How would the skyline look like if the height of the skyscrapers changes with the number of the streets accordingly? The answer would be - a line! This idea reminds me of a classic regression technique called Least Squares Regression, which is very useful to solve the linear regression problem.

Why not take the height of the buildings in New York as an example today (not the actual heights). We have the number of streets on the horizontal axis and the average height of the buildings on the vertical axis. We would predict the average height of the buildings in a certain number of streets. The visualization gives us an intuition that this is a linear regression problem.

😜 Yuchen's tip: I like to define the goal each time I'd solve a problem, which is leading you to the clearest and shortest way.

The final goal of this regression problem would be to find the best model to minimize the errors, so let's start from here to see the technique behind Least Squares Regression.

First, we would view this model as y = mx + b where we would figure out the slope m and intercept b. The residual is the difference between the actual value and the predicted value. r1 = -300 and r2 = 100 are two residuals in the given example which is marked in red.

The most typical way in statistics is to find m and b to minimize the sum of the squares of all residuals to avoid the negatives and positives canceling out with each other.

Let's make it more general. We have had a bunch of data points (X1, Y1), ( X2, Y2) ... (Xn, Yn). The equation of our regression line is y = mx + b. So the errors can be calculated as:

The goal is to find m and b to:

We're going to do some algebraic manipulations with some transformations:

This is the most simplified equation I get:

Going back to our goal which is to find m and b to minimize SE Line, we would view all other variables as constant values except for m and b. If we make a visualization here, the idea would be to find the minimum point for the 3-D surface with m and b. What we can do to make this happen is to find the partial derivative of SE Line with respect to m and the partial derivative of SE Line with respect to b is equal to zero. You may find the definition of partial derivative here.

Then we would do more math works and head to these results (in the red box). The bottom equation shows us the mean of x and the mean of y lies on the line!

The exciting moment is finally here - the best m and b that make the optimal line!

Let's go back to our 'New York skyscrapers' example, and we assume the average height of the buildings decreases with the street number this time. If we know the coordinate of each data point (street, avg height), then we can get the regression line quickly. I may not give you the best values in this example, but you may get a better understanding of how we use the Least Square Regression to find the best-fit line.

⚠️ Alert: Terrible paintings ahead!

With regression line y = -285x + 12050, we can easily find the avg height of the buildings in the 40th street which is 650 feet.

The mathematical methods behind each algorithm because it can guide you to more sophisticated models for your prediction. Least Squares Regression is just a start of the journey of predictive statistics. I hope to give you an intuition of the interesting techniques behind the algorithms and also encourage myself to advance bravely with this post!

😜 Happy machine learning!

START SMALL

Least Squares Regression with NYC Skyscrapers

Recent Posts

Comments