'तरी न्यून ते पुरते |अधिक ते सरते |'- संत ज्ञानेश्वर Regression : Concept, meaning, definition, Lines of regression.
Regression
The term ‘regression’, literally means ‘stepping back towards the average’. It was first used by a British Biometrician Sir Francis Galton (1822-1911). He found that although tall parents have tall children, the average height of children is less than the average height of their parents and although short parents have short children, average height of children is more than the average height of their parents. In other words the average height of children of tall parents or short parents will regress or go back to the average height of population. (This is like: संत ज्ञानेश्वरांच्या ओवीप्रमाणे 'न्यून ते पुरते, अधिक ते सरते')This phenomenon was described by him as 'regression’.
Definition: The method of estimating the value of one variable when that of the other is known and the variables are known to be correlated.
Lines of regression:-
The line of regression is the line which gives the best estimate to the value of one variable for any specific value of the other variable. Thus the line of regression is the line of best fit.
In order to obtain the line of regression, we have to find a line such that the distance or deviations of the points from that line will be minimum. We measure the distance i.e. deviation i) vertically & ii) horizontally and get one line when distances are minimized vertically & second line when distances are minimized horizontally. Thus we get two lines of regression.
If we minimize the distances or
deviations of the points from the line measured along y- axis, we get a line
which is called the line of regression of y on x. It's equation is written in
the form y=a+bx. This line is used for estimating value of y for a given value
of x. The distance 'd’ is minimized.
If we minimize the distances or deviations of
the points from the line measured along x-axis, we get a line which is called
the line of regression of x on y. It's
equation is written in the form x=a+b.y. This line is used for estimating value
of x for a given value of y. The distance 'd’ is minimized. ©
There
are two methods of obtaining the lines of regression.
i)
The method of scatter diagram &
ii)
The method of least squares
3. Lines of Regression
In simple linear regression, we can have:
-
Regression of Y on X
-
→ dependent variable
-
→ independent variable
-
→ intercept (value of Y when X = 0)
-
→ regression coefficient of Y on X (rate of change of Y for unit change in X)
-
→ error term (random deviation)
-
-
Regression of X on Y
-
→ regression coefficient of X on Y.
-
⚠ Note: The two regression lines are generally different (unless correlation is ±1).
Fitting of regression lines by the method of least squares
y= a + b. x. ----- (*)
According
to the principle of least squares, the normal equations for estimating a &
b are given by,
Σ
y = Σ a + b Σ x --- (1)
(taking
summation on both sides of (*)) and
Σ
xy = a Σ x +b Σ x2 --- (2)
(multiplying
by x and taking summation on both sides of (*)
Dividing by n to (1), we get
Concept of Residual
Residual means the difference between observed value of dependent variable (y) and predicted value of dependent variable (ŷ) , Sometimes residuals are also called "errors".
Residual =Observed value of y - Predicted value of y.
That is, e= y -ŷ
The data points usually don't fall exactly on the regression line, they are scattered around. A residual is the vertical distance between a data point and the regression line. Each data point has one residual. They are: Positive if they are above the regression line. Negative if they are below the regression line, Zero if the regression line actually passes through the point. The sum of the residuals and the mean of the residuals are equal to zero.

RESIDUAL PLOT
A residual plot is a graph
that shows the residuals on the vertical axis and the independent variable on
the horizontal axis. If the points in a residual plot are randomly dispersed
around the horizontal axis, a linear regression model is appropriate for the
data; otherwise, a non-linear model is more appropriate.
The table below shows inputs and outputs from a simple linear regression analysis.
x |
y |
ŷ |
e |
60 |
70 |
65.411 |
4.589 |
70 |
65 |
71.849 |
-6.849 |
80 |
70 |
78.288 |
-8.288 |
85 |
95 |
81.507 |
13.493 |
95 |
85 |
87.945 |
-2.945 |
And a residual plot by plotting the residual (e) verses independent variable (X) is as given below.
The residual plot shows a fairly random
pattern - the first residual is positive, the next two are negative, the fourth
is positive, and the last residual is negative. This random pattern indicates
that a linear model provides a decent fit to the data.
Below, the residual plots show three typical
patterns. The first plot shows a random pattern, indicating a good fit for a
linear model.
The other plot patterns are non-random (U-shaped and inverted U), suggesting a better fit for a nonlinear model.
Comments
Post a Comment