2.2.12. Multiple Linear Regression
The mutiple linear regression its a method of associating varibales depending from others independents. Its a efficient method for variables with a godd behavior or that not present multicollinearity, what occurs when two or more independent variables presents correlations.
To avoid the collinearity problem, the correlations matrix between the independent variables must be analyzed. As general rule, if the correlation between two variables is superior then 0.70, may exist collinearity problems. This rule, however, its inefficient when the independent variable combinations presents correlations. So an alternative, would analysis the correlation between each variable independent and all possible combinations between other independent variables.
Looking to the correlation coefficient R2, the problem of collineatity also can be identified when this coefficient is high and however, all the coefficients of regression are statistically insignificant.
To eliminate the collinearity problem, such independent variables can be removed after correlations analyzing.
Other problemns can bind the multiple regression as heteroskedasticity (error variance is constant) and autocorrelation (correlation in serie between errors).
An important observation of analysis of multiple regression and the existence of correlations between a independent variable and a independent variable set does not mean a casuality relation, but only one relation statistics.
2.2.12.1. MLR command
Access:
- Menu - Metrixus | MLR
- Toolbar Metrixus
Description:
Make the multiple linear regression for the variables informed. Return the regression coefficients and others parameters which allows analysis the regression quality.
The option Show regression equation reproduce the linear regression for the independent variable.
For exhibiting the table ANOVA - analysis of variance, the option Show anova must be selected.
Through the option Show Cross Correlations its possible to evaluate the correlation between each independet variable and the possibles combinations of others independents variables, extending the study of the collinearity, the presented result contains the 5 bigger correlations coefficients found for each variable, as well the combination of other independent variables which has generated that correlation. This option effects possible combinations between independent variables and effects the regression between this combinations and a independent variable specific and by this can reduce the running performance.
Important:
The analysis of the correlations table its a powerful tool in the collinearity analysis!
|
The region or data interval must be a contiguous area where each column represents values (real numbers) for a variable, being the first column for the dependent variable and others column for the independent variables. Are necessary at least 2 columns (or 1 independent variable) and a minimum row number superior than the column number. Field with text format or empty will not be considered, as well all correspondents data of others columns. The data range must be selected before call this command.
This command generates a new file containing the results in table format.
The sheets generting without color allow easy printing also represents better running performance.
The results from he multiple linear regression its a new sheet with statistics data, that is, without connection with database which origins the result. In this new sheet is there the following informations, where n its the total of valid data and m its the number of independent variables:
- Equation: equation of multiple linear regression, where bi are the regression coefficients, xi are the independent variables and y its the dependent variable. Only informed if the correspondent option is selected.

- Mean: mean of each variable or column.

- St. Dev.: standard deviation of each variable or column from sample.

Important:
To determinating data statistics parameters - as mean, standar deviation - does not is aplied any mathematic operator (logarithmic, in example). On this way, the mean presented must be understood as the values hopes, as the standard deviation must be understood as the volatileness of the values hope!
|
Important:
All data are considered samples and therefore all statistics calculus of standard deviation are based in samples and not in the population.
|
- CR: table containing the regression coefficients for each independent variable. Contain:
- Slope: regression coefficients for each variable, including the intersection.
- SE: standard error for the regression coefficients.
 ERRO=ERROR; where rj is the correlation coefficient between the actual independet variable and others independent variables.
- t: value t of Student calculated for each regression coefficient that is signifcant (not null).

- p: significance probability for the regression coefficients. Uses the distribution function t of Student of Microsoft Excel to return the values of significance probability.
- Indicators: table containing the statistics indicators for regression analysis. Contain:
- R: coefficient of multiple linear regression correlation.
 Regressão=Regression ;
Important:
The correlation between a dependet variable and a group of independent variable its always informed positive, by the signs of intersection and of the slopes can be inverted.
|
- R2: coefficient of regression determination. Means the total variance percentual of the independent variable which is explained by the regression equation and its equal to the coefficients square.
 Regressão=Regression;
- R2A: ajusted determination coefficient. In the multiple regression analysis, all times which a independent variable is added, the determination coefficient increases, exactly that this new variable does not contributes significantly for the residual variance explanation. On this way, its useful analysis the ajusted coefficient, wich, in contrary to determination coefficient, decreseas if the added variable has a low explaining power, being able to have a negative value too.

- DW: Durbin-Watson statistics. Used for verifying autocorrelation between the independent variables. Are analysed the residues (errors) the regression equation.

- ANOVA: table containing the variance analysis. Only informed if the correspondent option was selected. COntain:
- DF: degrees of free.
- SS: square sum.
- MSS: the mean square variance.

- F: F statistics.
 Regressão=Regression ; Erro=Error ;
- p: probability of significance for F or for the multiple linear regression equation as a whole. Uses the distribution function F of Microsoft Excel to return the values of probability of significance.
- Correlations: matrix m x m conatining the correlations between the variables. Only informed if exist more than 1 independent variable.

- Combinations: table containing for each variables the correlations coeficients from groups formed by combinations of other independent variables. Are presented the values modules from the five bigger absolute correlations (|R|), as well the combination from others variables which originates this correlations. This values are used in the collinearity analysis. Only informed if the correspondent option was selected. Only is possible to generate the combined correlation tables if exists more than two independet variables.
Important:
The correlation between a independent variable and a set of independent variables its always informed in module, cause the intersection operators and from slopes can be inverted. The objective its to verify the existence of high correlations between an independent variable and combinations from others.
|
Using example:
The following data are hypothetical and they only serve to illustrate the functions of this command. However, would can represent, in example, economic variables withdrew against profits from companies or variables micro-economics.
- Show regression equation
- Show ANOVA
- Calculate the combinations for regression to study collinearity
- Data: Cells A2:B17, where the column A represents the dependent variable Y. The columns B, C and D represents the independent variables X1, X2 e X3, respectively.
| y | x1 | x2 | x3 |
| 7.0 | 2.9 | 1.0 | 4.0 |
| 5.0 | - | 3.0 | 3.0 |
| 10.0 | 0.9 | 5.0 | 6.0 |
| 13.0 | 0.1 | 7.0 | 7.0 |
| 15.0 | 3.8 | 4.0 | 8.0 |
| 16.0 | 1.1 | 8.0 | 9.0 |
| 5.0 | (7.0) | 9.0 | 2.0 |
| 11.0 | 3.0 | 3.0 | 6.0 |
| 5.0 | (1.0) | 3.0 | 2.0 |
| 8.0 | (0.9) | 4.0 | 3.0 |
| 10.0 | 2.0 | 5.0 | 7.0 |
| 15.0 | 3.0 | 6.0 | 9.0 |
| 3.0 | (9.5) | 12.0 | 2.0 |
| 8.0 | 1.5 | 4.0 | 6.0 |
| 10.0 | - | 8.0 | 8.0 |
| 6.0 | (2.5) | 7.0 | 4.0 |
|
Results:
Equation Y = 1.990 + 1.384*X1+ 1.209*X2+ 0.129*X3
Equation of linear multiple regression.
| Y | X1 | X2 | X3 |
| Means | 9.188 | -0.163 | 5.563 | 5.375 |
| St. Dev. | 4.020 | 3.615 | 2.804 | 2.527 |
|
Statistics indicators for data.
| CR | Slope | SE | t | p |
| Intersection | 1.990 | 1.325 | 1.502 | 0.159 |
| X1 | 1.384 | 2.441 | 0.567 | 0.581 |
| X2 | 1.209 | 2.326 | 0.520 | 0.613 |
| X3 | 0.129 | 2.374 | 0.055 | 0.957 |
|
Regression coefficients, including intersection. Also are presented the errors, the statistic value t of Student and the probabilty of significance from coefficients. Perceives that all coefficients are insignificant with 95% of sure. The better regression coefficient is the intersection, which is only significant with 84.1% of sure.
| R | 0.930 |
| R2 | 0.864 |
| R2A | 0.830 |
| DW | 0.924 |
|
Correlations coefficient, determination, R2 ajusted and Durbin-Watson. The value of determination coefficient0.864 its relativity high and indicates a explanation power of regression equation of 86.4%. However, as any regression coefficient are significant, can suspect the existence of colinearity in independent variables (correlated independent variables).
The value presented as Durbin-Watson allows the autocorrelation analysis, through consulting and comparating this value with the statistic with same name - statistic Durbin-Watson.
| ANOVA | DF | SS | MSS | F | p |
| Regression | 3 | 209.497 | 69.832 | 25.439 | 0.000 |
| Error | 12 | 32.941 | 2.745 | | |
| Total | 15 | 242.438 | | | |
|
Table ANOVA allows infer over the general regression quality. Analysing the p value, perceives that regession equation has at least one significant coefficient. Looking again for the regression coefficients, perceives that any its significant, what contradicts the ANOVA table. This is one of the effects of collinearity between independent variables!
| Correlations | X1 | X2 | X3 |
| X1 | 1.000 | -0.734 | 0.674 |
| X2 | -0.734 | 1.000 | 0.006 |
| X3 | 0.674 | 0.006 | 1.000 |
|
The table of correlation between variables confirms the suspects of collinearity, cause exist a high correlation from X1 relating to X2. The correlation between X1 and X3 is high too, what suggest more studies to conclude about collinearity.
| Combinated |
X1 |R| | X2:X3 0.999 | X2 0.734 | X3 0.674 |
X2 |R| | X1:X3 0.998 | X1 0.734 | X3 0.006 |
X3 |R| | X1:X2 0.997 | X1 0.674 | X2 0.006 |
|
The table of combined correlations its a powerful tool in verifying correlations between independents variables (collinearity). By the table analysis, verified that variable X1 presents strong correlation with the combination from variables X2 and X3, being probabily the responsible for contradictory interpretations of significance. By the analysis only a correlations matrix, usually, there is no sufficient data to conclude which variables can or not be removed from regression. Looking to the combineds, can says that variable X1 must be removed from regression, cause its represented by te combination from X2 and X3.
|
|