Unit 1 : Multiple Regression , Multiple Correlation and Partial Correlation 1.1: Multiple Linear Regression (for trivariate data)

 Unit 1 : Multiple Regression , Multiple Correlation and Partial Correlation 



Unit 1.1

Multiple Linear Regression (for tri variate data)

     Some statistical methods serve as forecasting or estimation techniques. One of such is regression analysis. We have learnt concept of linear regression and correlation with respect to two variables. It is called as simple correlation and simple regression.

    In practice, the variable under study is influenced by two or more variables. For example: national income based on several variables such as agricultural yield, industrial production, import, export production of minerals, marine wealth etc. 

    For accuracy in prediction it is inevitable to include all interrelated  variables in the regression model. Among such variables, whose value is to be predicted  is called as dependent variable or  response variable and remaining variables are treated as independent variables or explanatory variables. The regression based on dependent variable and two or more independent variables is referred as  multiple regression. 

Some practical situations involving tri variate data:

1. The number of units sold (X1) depends on number of times an advertisement (X2) and price of the product (X3).

2. The blood pressure (X1) depends on weight of a person (X2) and his age (X3).

3. Monthly rent of a flat is likely to be based on area of the flat (X2) and distance from central place (X3).

4.Marks secured by students (X1) may be related with their  I.Q. s (X2) and number of hours of study (X3).

5. A company manufactures two products and sales. Profit of  company  (X1) depends on number of units sold of product (X2) and number of units sold of product (X3).

    When the values of one variable are associated with or influenced by other variable, e.g. the age of husband and wife, the height of father and son, the supply and demand of a commodity and so on, Karl Pearson’s coefficient of correlation can be used as a measure of linear relationship between them.

            But sometimes there is interrelation between many variables and the value of one variable may be influenced by many others. e.g.  The yield of crop per acre say (X1) depends upon quality of seed (X2), fertility of soil (X3), fertilizer used (X4), irrigation facilities (X5), weather conditions (X6) and so on. Whenever we are interested in studying the joint effect of a group of variables upon a variable not included in that group, our study is that of multiple correlations and multiple regressions.


Yule’s Notation


 Let us consider a distribution involving three random variables X1, X2, and X3. Then the equation of the regression plane of X1 on X2 and X3 is,

There will be three pairs of variables viz. (X1, X2), (X2, X3) and (X1, X3). The correlation coefficient between these pairs of variables will be r12, r23 and r13 respectively. Although there are three variables in the data for calculation of these correlation coefficients we consider two variables at a time. Hence, r12, r23 and r13 are termed as total correlation coefficient.

The study of multiple regressions, multiple correlations etc. using the matrix of correlation coefficients become convenient. The matrix of correlation coefficient is denoted by R and is given by,



Hence on taking expectation of both sides of equation (1) we get a = 0. Thus the equation of the regression plane of X1 on X2 and X3 becomes,

 X1 = b12.3 X2 + b13.2 X3              -------    (2)

The coefficients b12.3 and b13.2 are known as the partial regression coefficients of X1 on X2 and X1 on X3 respectively.

Remarks:

1)   The subscripts before the dot (.) are known as primary subscripts and those after the dot are known as secondary subscripts.

2)   The order of partial regression coefficient depends on secondary subscripts. e.g. b12.3 is regression coefficient of order one. While b12.345 is regression coefficient of order three.

3)   Order in which the secondary subscripts written are immaterial but the order of the primary subscripts is important. e.g. in b12.3,X1 is dependent variable and X2 is independent variable  but in b21.3,X2 is dependent variable and X1 is independent variable. Thus of the two primary subscripts, former refers to dependent variable and the latter refers to independent variable.

4)  The order of a residual is also determined by the number of secondary subscripts in it. e.g. X1.23 is residual of order two, while X1.234 is residual of order three.
















   

.

1.2   Multiple Correlation and Partial Correlation (for trivariate data only)                                                                                                              

         Multiple Correlation:          

      Degree of goodness fit of a multiple regression plane is decided by the multiple correlation       coefficients.

  Definition:

                Multiple correlation between X1 and (X2 , X3 ) is the maximum correlation between X1 and a linear combination of  X2 and X3. The multiple correlation coefficient of X1 on X2 and X3 is usually denoted by R1.23 is the simple correlation coefficient between X1 and joint effect of X2 and X3 on X1. In other words R1.23 is the correlation coefficient between X1and its estimated value as given by the plane of regression of X1 on X2 and X3. (i. e. e 1.23 = b12.3 X2 + b13.2 X3).

   Derivation:

                The equation of regression plane of X1 on X2 and X3 is given by,

                  X1 = b12.3 X2 + b13.2 X3.

      The estimate of X1 for known X2 and X3 is given by,

      Xˆ1   = e1.23 = b12.3X2 + b13.2X3                          

     and the residual of X1 on X2 and X3 is given by

   X1.23=X1- Xˆ1 = X1 - e 1.23 , this gives,  e 1.23   = X1- X1.23

      Since the variables X1, X2 and X3 have been measured from their respective means we have,

      E (X1)= E (X2)= E (X3)=0                                and        E(X1.23)= 0                            =                            E(e1.23)

     By definition, multiple correlation coefficient of X1 on X2 and X3 is given by,






*****


                























Comments

Popular posts from this blog

B. Sc. Part I Semester I I.I Introduction to Statistics :Nature of Data, Sampling, Classification and Tabulation