Unit 1 : Multiple Regression , Multiple Correlation and Partial Correlation 1.1: Multiple Linear Regression (for trivariate data)
Unit 1 : Multiple Regression , Multiple Correlation and Partial Correlation
Unit 1.1
Multiple Linear Regression (for tri variate data)
Some statistical methods serve as forecasting or estimation techniques. One of such is regression analysis. We have learnt concept of linear regression and correlation with respect to two variables. It is called as simple correlation and simple regression.
In practice, the variable under study is influenced by two or more variables. For example: national income based on several variables such as agricultural yield, industrial production, import, export production of minerals, marine wealth etc.
For accuracy in prediction it is inevitable to include all interrelated variables in the regression model. Among such variables, whose value is to be predicted is called as dependent variable or response variable and remaining variables are treated as independent variables or explanatory variables. The regression based on dependent variable and two or more independent variables is referred as multiple regression.
Some practical situations involving tri variate data:
1. The number of units sold (X1) depends on number of times an advertisement (X2) and price of the product (X3).
2. The blood pressure (X1) depends on weight of a person (X2) and his age (X3).
3. Monthly rent of a flat is likely to be based on area of the flat (X2) and distance from central place (X3).
4.Marks secured by students (X1) may be related with their I.Q. s (X2) and number of hours of study (X3).
5. A company manufactures two products and sales. Profit of company (X1) depends on number of units sold of product (X2) and number of units sold of product (X3).
When the values of one
variable are associated with or influenced by other variable, e.g. the age of
husband and wife, the height of father and son, the supply and demand of a
commodity and so on, Karl Pearson’s coefficient of correlation can be used as a
measure of linear relationship between them.
But sometimes there is
interrelation between many variables and the value of one variable may be
influenced by many others. e.g. The yield of crop per acre say (X1)
depends upon quality of seed (X2), fertility of soil (X3),
fertilizer used (X4), irrigation facilities (X5), weather
conditions (X6) and so on. Whenever we are interested in studying
the joint effect of a group of variables upon a variable not included in that
group, our study is that of multiple correlations and multiple regressions.
Yule’s Notation
Let us consider a distribution involving three
random variables X1, X2, and X3. Then the
equation of the regression plane of X1 on X2 and X3
is,
The study of multiple regressions, multiple correlations etc. using
the matrix of correlation coefficients become convenient. The matrix of
correlation coefficient is denoted by R and is given by,
Remarks:
1)
The subscripts before the dot (.) are known as primary
subscripts and those after the dot are known as secondary subscripts.
2)
The order
of partial regression coefficient depends on secondary subscripts. e.g. b12.3 is regression coefficient of
order one. While b12.345
is regression coefficient of order three.
3)
Order in which the secondary subscripts written are
immaterial but the order of the primary subscripts is important. e.g. in b12.3,X1 is dependent
variable and X2 is independent variable but in b21.3,X2 is dependent variable and X1 is
independent variable. Thus of the two primary subscripts, former refers to
dependent variable and the latter refers to independent variable.
4) The order of a residual is also determined by the
number of secondary subscripts in it. e.g. X1.23 is
residual of order two, while X1.234 is residual of order three.
.
1.2 Multiple
Correlation and Partial Correlation (for trivariate data only)
Multiple Correlation:
Degree of goodness fit of a multiple
regression plane is decided by the multiple correlation coefficients.
Definition:
Multiple
correlation between X1 and (X2 , X3 ) is the
maximum correlation between X1 and a linear combination of X2 and X3. The multiple
correlation coefficient of X1 on X2 and X3 is
usually denoted by R1.23 is the simple correlation coefficient
between X1 and joint effect of X2 and X3 on X1. In other words R1.23 is the
correlation coefficient between X1and its estimated value as given
by the plane of regression of X1 on X2 and X3. (i. e. e 1.23 =
b12.3 X2 + b13.2 X3).
Derivation:
The equation of regression
plane of X1 on X2 and X3 is given by,
X1 = b12.3 X2 + b13.2
X3.
The
estimate of X1 for known X2 and X3 is given by,
Xˆ1 = e1.23 = b12.3X2
+ b13.2X3
and the
residual of X1 on X2 and X3 is given by
X1.23=X1- Xˆ1 = X1 - e 1.23 , this gives,
e 1.23 = X1- X1.23
Since the
variables X1, X2 and X3 have been measured
from their respective means we have,
E (X1)= E (X2)= E (X3)=0 and E(X1.23)= 0 = E(e1.23)
By definition, multiple correlation
coefficient of X1 on X2 and X3 is given by,
Comments
Post a Comment