Bias Vs Variance

Umaiskhan
2 min readSep 30, 2023

Day 1 of #66daysofdata challenge.

Imagine we use a relationship between persons height and weight, the purple dots here represents the training data, we will use Linear Regression algorithm that will fit a straight line as seen above.

Now we consider that the red line here is the true relation. Note that regression line above will not capture the true relation. This inability to capture true relation is called Bias.

Now we have a method that might fit this squiggly line represented by yellow. Here the Bias is very little.

We measure the distance of fit line(yellow) to data (purple points), square them and add them as a measure of how well it performs. Note that squiggly line fits the data very well.

Now the green dots represents here the test dataset. Straight line here performs better than squiggly line. The difference in fits between datasets is called Variance.

Squiggly line is Overfit because it fits training data very well so well but performs badly on test dataset.

So ideally we need low bias and low variance.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Umaiskhan
Umaiskhan

Written by Umaiskhan

Mechanical Engineer with a twist of AI

No responses yet

Write a response