(solution) Regression Analysis: SCORE versus TEST, PERF, …

(solution) Regression Analysis: SCORE versus TEST, PERF, …

15 Printouts for PART A MODEL 1: ? ? ? ?? ? ? ?? ? 01 2 3 4 5 67 8 E SCORE TEST PERF G G TEST G PERF R R TEST R PERF ?? ? ? ? ? ?? ? ? ? ? ? ? ? ?? ? ? ? ? ? ?? ? ? ? ? Regression Analysis: SCORE versus TEST, PERF, … The regression equation is SCORE = – 11.7 + 0.373 TEST + 0.742 PERF + 8.9 G – 0.083 G*TEST – 0.112 G*PERF – 8.2 R + 0.083 R*TEST – 0.037 R*PERF Predictor Coef SE Coef T P VIF Constant -11.71 12.68 -0.92 0.357 TEST 0.3732 0.1486 2.51 0.013 3.878 PERF 0.7418 0.1902 3.90 0.000 3.896 G 8.94 14.99 0.60 0.552 64.391 G*TEST -0.0828 0.1816 -0.46 0.649 51.776 G*PERF -0.1116 0.2386 -0.47 0.640 81.078 R -8.21 15.29 -0.54 0.592 66.660 R*TEST 0.0829 0.1761 0.47 0.638 46.949 R*PERF -0.0372 0.2378 -0.16 0.876 77.507 S = 13.1427 R-Sq = 38.0% R-Sq(adj) = 35.4% Analysis of Variance Source DF SS MS F P Regression 8 20242.8 2530.4 14.65 0.000 Residual Error 191 32991.4 172.7 Total 199 53234.2 Source DF Seq SS TEST 1 11742.2 PERF 1 6235.8 G 1 827.6 G*TEST 1 174.0 G*PERF 1 86.2 R 1 1137.7 R*TEST 1 35.1 R*PERF 1 4.2 Correlations: TEST, PERF, G, G*TEST, G*PERF, R, R*TEST, R*PERF TEST PERF G G*TEST G*PERF R R*TEST PERF 0.486 G 0.066 0.066 G*TEST 0.212 0.151 0.975 G*PERF 0.136 0.190 0.983 0.982 R -0.028 -0.029 -0.113 -0.108 -0.101 R*TEST 0.131 0.036 -0.096 -0.073 -0.077 0.973 R*PERF 0.025 0.079 -0.091 -0.079 -0.071 0.985 0.976 Cell Contents: Pearson correlation 16 MODEL 2: ? ? 01 2 E SCORE TEST PERF ? ?? ?? ?? ? Regression Analysis: SCORE versus TEST, PERF The regression equation is SCORE = – 10.3 + 0.370 TEST + 0.663 PERF Predictor Coef SE Coef T P VIF Constant -10.339 7.245 -1.43 0.155 TEST 0.36993 0.08789 4.21 0.000 1.309 PERF 0.6625 0.1122 5.90 0.000 1.309 S = 13.3778 R-Sq = 33.8% R-Sq(adj) = 33.1% Analysis of Variance Source DF SS MS F P Regression 2 17978.0 8989.0 50.23 0.000 Residual Error 197 35256.2 179.0 Total 199 53234.2 Best Subsets Regression: SCORE versus TEST, PERF, … Response is SCORE G G R R * * * * T P T P T P E E E E E E Mallows S R S R S R Vars R-Sq R-Sq(adj) Cp S T F G R T F T F 1 27.8 27.5 26.5 13.931 X 1 22.1 21.7 44.2 14.476 X 2 33.8 33.1 10.1 13.378 X X 2 29.8 29.1 22.4 13.775 X X 3 35.6 34.6 6.4 13.222 X X X 3 35.6 34.6 6.5 13.226 X X X 4 37.8 36.5 1.7 13.030 X X X X 4 37.8 36.5 1.8 13.034 X X X X 5 37.9 36.3 3.4 13.053 X X X X X 5 37.9 36.3 3.4 13.056 X X X X X 6 38.0 36.0 5.2 13.082 X X X X X X 6 38.0 36.0 5.2 13.082 X X X X X X 7 38.0 35.8 7.0 13.109 X X X X X X X 7 38.0 35.7 7.2 13.116 X X X X X X X 8 38.0 35.4 9.0 13.143 X X X X X X X X 17 Stepwise Regression: SCORE2 versus TEST, PERF, … Forward selection. Alpha-to-Enter: 0.25 Response is SCORE2 on 8 predictors, with N = 200 Step 1 2 3 4 Constant 11.8323 2.3926 0.6750 3.0635 PERF 0.774 0.570 0.618 0.617 T-Value 7.91 5.26 5.78 5.85 P-Value 0.000 0.000 0.000 0.000 TEST 0.329 0.343 0.340 T-Value 3.88 4.13 4.16 P-Value 0.000 0.000 0.000 G*PERF -0.083 -0.089 T-Value -3.18 -3.45 P-Value 0.002 0.001 R -4.5 T-Value -2.51 P-Value 0.013 S 13.3 12.9 12.6 12.4 R-Sq 24.00 29.41 32.88 34.99 R-Sq(adj) 23.62 28.69 31.85 33.65 Mallows Cp 28.3 14.4 6.1 1.9 Stepwise Regression: SCORE2 versus TEST, PERF, … Backward elimination. Alpha-to-Remove: 0.1 Response is SCORE2 on 8 predictors, with N = 200 Step 1 2 3 4 5 Constant -1.356 -1.344 -4.173 -4.201 1.132 TEST 0.379 0.379 0.389 0.376 0.370 T-Value 2.67 3.30 3.49 4.54 4.49 P-Value 0.008 0.001 0.001 0.000 0.000 PERF 0.64 0.64 0.67 0.68 0.61 T-Value 3.53 3.68 4.34 5.20 5.81 P-Value 0.001 0.000 0.000 0.000 0.000 G 10 10 11 12 T-Value 0.68 0.72 0.87 0.90 P-Value 0.494 0.473 0.386 0.370 R -5 -5 T-Value -0.37 -0.37 P-Value 0.712 0.711 G*TEST -0.00 T-Value -0.01 P-Value 0.996 G*PERF -0.228 -0.229 -0.249 -0.253 -0.088 T-Value -1.00 -1.17 -1.34 -1.37 -3.44 P-Value 0.318 0.242 0.183 0.173 0.001 18 R*TEST -0.068 -0.068 -0.087 -0.060 -0.062 T-Value -0.41 -0.41 -0.56 -2.41 -2.50 P-Value 0.685 0.680 0.576 0.017 0.013 R*PERF 0.09 0.09 0.03 T-Value 0.38 0.38 0.18 P-Value 0.704 0.701 0.858 S 12.5 12.5 12.5 12.4 12.4 R-Sq 35.30 35.30 35.25 35.24 34.97 R-Sq(adj) 32.59 32.94 33.24 33.57 33.64 Mallows Cp 9.0 7.0 5.1 3.2 2.0 MODEL 3: ? ? ? ?? ? 01 2 3 4 E SCORE TEST PERF G TEST R TEST ? ?? ?? ?? ? ? ? ?? ? ? ? Regression Analysis: SCORE versus TEST, PERF, G*TEST, R*TEST The regression equation is SCORE = – 11.2 + 0.435 TEST + 0.670 PERF – 0.0669 G*TEST – 0.0656 R*TEST Predictor Coef SE Coef T P VIF Constant -11.232 7.095 -1.58 0.115 TEST 0.43492 0.08781 4.95 0.000 1.373 PERF 0.6695 0.1097 6.10 0.000 1.314 G*TEST -0.06691 0.02583 -2.59 0.010 1.062 R*TEST -0.06565 0.02589 -2.54 0.012 1.029 S = 13.0533 R-Sq = 37.6% R-Sq(adj) = 36.3% Analysis of Variance Source DF SS MS F P Regression 4 20008.7 5002.2 29.36 0.000 Residual Error 195 33225.6 170.4 Total 199 53234.2 Predicted Values for New Observations New Obs Fit SE Fit 95% CI 95% PI 1 78.298 2.325 (73.713, 82.884) (52.150, 104.447) 2 73.280 2.236 (68.871, 77.690) (47.162, 99.399) 3 73.375 2.389 (68.664, 78.086) (47.204, 99.546) 4 68.357 2.464 (63.497, 73.216) (42.158, 94.555) Values of Predictors for New Observations New Obs TEST PERF G*TEST R*TEST 1 75.0 85.0 0 0 2 75.0 85.0 75.0 0 3 75.0 85.0 0 75.0 4 75.0 85.0 75.0 75.0 1 Part A. [76 points] The workers in a large, multi-national corporation undergo annual evaluations by their immediate supervisors. A key element of the evaluation isthe summary score (on a scale from 1 to 100 ) that each worker receives; these scores are essential ingredients for salary and promotion reviews. It is crucial, then, that the scores be assigned fairly; in particular, they must be free of any gender or racial bias. The corporation hires consultants in the human resource management field to investigate the objectivity of the scoring system. As part of their study, the consultants take a random sample of workers. Each worker in the sample is given a written test that assesses the worker’s command of the knowledge their job requires; furthermore, the consultants measure how well each worker has performed recently on the job. Theconsultants considerregressionanalysesusingthefollowingvariables: SCORE , the most recent summary score received from the immediate supervisor; TEST , the test score (on a scale of 1 to 100 ); PERF , the job performance measure (on a scale of 1 to 100 ); G, gender ( 0 for male and 1 for female); R , race ( 0 for white and 1 for non-white). Some ofthe analyses performed by the consultants are given in the printouts; base your answers to the following questions on MODEL 1. 1. [2 points] How many workers are included in the random sample? 2. [3 points] What isthe fitted regression equation for white males? 3. [3 points] What isthe fitted regression equation for non-white females? 2 4. [2 points] What is the interpretation of ?1 , the coefficient of TEST ? 5. [1 point] What is the least squares estimate of?1 ? 6. [1 point] What isthe estimate ofthe standard deviation of the leastsquares estimate of ?1 ? 7. [1 point] What isthe value ofthe teststatistic fortesting that ?1 is equalto zero, i.e., for testing the null hypothesis H0 :?1?0? 8. [1 point] What isthe p-value fortesting the null hypothesis H0 :?1?0 against the alternative hypothesis Ha :?1?0 ? 9. [2 points] What distribution doesthe computer use to compute the p-valuesin question 8? 10. [2 points] What isthe p-value fortesting the null hypothesis H0 :?1?0 againstthe alternative hypothesis Ha :?1?0 ? 3 11. [2 points] Would the null hypothesis H0 : ?1? 0 be rejected in favor of the alternative hypothesis Ha : ?1? 0 in a test at the 1% level? 12. [2 points] Isthere evidence that mean supervisorscore increases as workerjob knowledge increases for white males? 13. [4 points] Construct a 95% confidence interval for ?1 . 14. [3 points] What isthe value ofthe teststatistic fortesting the null hypothesis H0 :?1?0.6 ? 4 15. [4 points] Whatisthe p-value fortesting the null hypothesis ?1?0.6 againstthe alternative hypothesis Ha :?1?0.6 ? 16. [4 points] What is the value of the test statistic for testing the null hypothesis that the supervisorscores exhibit no racialbias on average, i.e.,fortesting the null hypothesisthatthe mean supervisorscore does not depend on any of the predictorsinvolving the variable R ? 17. [4 points] Consider the p-value for testing the null hypothesis that the supervisor scores exhibit no racial bias on average againstthe alternative hypothesisthatthey do exhibitsuch bias. Isthe p-value lessthan 5% ? 5 18. [4 points] By using MODEL 2, compute the value of the test statistic for testing the null hypothesis in MODEL 1 that the supervisor scores exhibit neither gender nor racial bias on average, i.e., fortesting the null hypothesisthatthe mean supervisorscore does not depend on any of the predictorsinvolving either of the variables G or R . 19. [4 points] Is the p-value for testing the null hypothesis that the supervisor scores exhibit neither gender norracial bias on average againstthe alternative hypothesisthatthey do exhibit such biaslessthan 5% ? 20. [1 point] What proportion of the variability in the supervisor scores is explained by MODEL 1? 21. [2 points] Based on the information given in the printout for MODEL 1, would you judge multicollinearity tobe present? 6 22. [4 points] Based on the given correlations, describe the nature of anymulticollinearity that might bepresent. 23. [4 points] Based on the given printout for bestsubsetsregression, which predictor variables should be included in themodel? (Be sure to justify your answer.) 24. [3 points] Based on the stepwise regression printout, what final model doesthe forward selection processfitto the model? (Simply report the fitted model.) 25. [3 points] Based on the stepwise regression printout, whatfinal model doesthe backwards elimination processfitto the model? (Simply report the fitted model.) 7 The consultants decide to use MODEL 3 . 26. [1 point] Based on MODEL 3, what isthe estimate of the mean supervisorscore for nonwhitemale workers whose testscore is 75 and whose performance score is 85? 27. [1 point] Based on MODEL 3 , what is the estimate of the standard deviation of the supervisorscoresfor white female workers whose testscore is 75 and whose performance score is 85 ? 28. [1 point] Based on MODEL 3 , give a 95% confidence interval for the mean supervisor score for white male workers whose testscore is 75 and whose performance score is 85. 29. [3 points] Let ? denote the mean supervisorscore for white male workers whose testscore is 75 and whose performance score is 85. Based on MODEL 3, what isthe value of the test statistic fortesting the null hypothesis H0 😕 ?72? 30. [1 point] Based on MODEL 3 , give a 95% prediction interval for the supervisor score of a randomly chosen non-white female worker whose testscore is 75 and whose performance score is 85 . 8 31. [3 points] Compute the standard error of prediction that is used to compute the prediction interval in question 30.