Investigate The Determinants Of Exam Performance
Essay by 24 • June 1, 2011 • 2,458 Words (10 Pages) • 1,524 Views
EC226 Econometrics 1
Assignment 1 2006-7
Aim: Investigate the determinants of exam performance
Introduction
The main objective of this assignment is to investigate how different variables such as age, ability, course and expenditure on alcohol affect an individual's ability to perform in a first year statistics exam. We will examine the effects of such independent variables on the dependent variable qtmark. Achieving this will require the production and analysis of several regressions and the use a variety of statistical tests.
Question 1
Figure 1.1 in the appendix refers to some basic descriptive statistics for the dependent variable qtmark. From the graph it is observed that the bulk of the data lies within marks of 35 to the maximum at 95. The mean mark is 64.65% which is relatively similar to the median at 65. However the difference in these two statistics is likely to have been caused by the anomalous value of 9 at the minimum, which affects the mean but not the median. Importantly the standard deviation of 13.01 is fairly large signifying that the data is widely distributed about the average. The data is skewed negatively to the right.
Figure 1.2 represents a cross-plot between ability and qtmark. The upward sloping scatter graph shows how ability affects an individual's performance in statistics exams. Figure 1.3 is a correlation matrix that shows that there is a mildly positive correlation between the variables of 0.321. Comparing this correlation with that in Figure 1.4; showing the relationship between attl and qtmark of 0.231, the relationship of qtmark with ability is stronger. Similar comparisons can also be made with other variables and qtmark.
Figure 1.5 represents the marks of students studying different subjects. The data has been split into 3 groups Economics, Industrial Economics and Economic History and EPAIS. The table shows that students studying Economics generally scored higher than those studying other subjects. This is as the mean and median are both highest for Economics. Those on Industrial Economics and Economic History degrees tend to have scored less on the statistics exam. The maximum is for Economics and the minimum is for Industrial Economics and Economic History while EPAIS scores tend to lie between the other two. High standard deviations and skewness also suggests that data for Industrial Economics and Economic History is the most dispersed.
Question 2
(a) To separate students who have observations on ability and hrsqt from those who don't Ð''abilityna and hrsqtna' was entered into the IF conditional box. This left us with only those people who have observations for both variables (334 obs). A bivariate regression of qtmark against attl could then be constructed using the restricted data set. The result of this is Figure 2.1, which information on the regression including coefficient values and standard errors. Figure 2.1 results in the following regression.
qtmark = 53.95194 + 0.132357attl + e (2.1)
The intercept in 2.1 shows that the expected mark when an individual attends no lectures is 53.95. Interpreting the coefficient for attl leads us to believe that every additional lecture attended causes the expected mark in the statistics exam to increase by 0.132. e represents the error term which on average is zero.
(b) Including the additional explanatory variables ability and hrsqt in the above regression yields new results for the coefficients. From Figure 2.2 we can establish a new regression as follows.
qtmark = 41.55371 + 0.114605attl + 0.470988ability + 0.554558hrsqt + e (2.2)
This equation shows that the coefficient of attl falls from 0.132 to 0.115. Therefore an extra lecture attended now only leads to an increase in statistics mark of only 0.115. Less than when the additional variables were excluded. The above regression also says that a one unit increase in ability leads to an expected increase in qtmark of 0.471 and an increase in hrsqt by one unit increases qtmark by 0.555. These variables seem to have a significant impact on the mark and give a possible explanation for a decrease in the coefficient for attl. The new model has a higher R2 so a greater proportion of variation in the dependent variable qtmark can be explained by the model; this does not imply the model is better.
Significance test for attl in multivariate regression model (2.2):
H0: β1 = 0 H1: β1 ≠ 0 = 2.327
DoF (Degrees of Freedom) = 334 Ð'- 4 = 330
t = = = 3.88 > = 2.327
⇒ Reject null hypothesis at the 1% significance level and therefore this implies, reject the null hypothesis at the 5% and 10% significance levels. Therefore there is evidence to suggest that attl has an effect on qtmark.
Test for overall significance of the regression:
We use an F-Test
H0: β1 = β2 = β3 = 0 H1: Any βi ≠ 0, i = 1, 2, 3 = 3.78
k = 3 (no. of restrictions under H0) DoF = 334 - 4 = 330
F = = = 21.57 > = 3.78
⇒Reject the null hypothesis at the 1% significance level and therefore this implies, reject the null hypothesis at the 5% and 10% significance levels. Therefore there is evidence to suggest that at least one of the coefficients for attl, ability or hrsqt is not equal to zero. Hence this variable has an effect on qtmark. For the above tests data from Figure 2.2 has been used.
(c) A series of dummy variables have been defined for the number of hours spent revising statistics during the year (hrsqt). There are six different classes for those with number of hours: 0-2; 3-5; 6-8; 9-11; and >11. The groups get bigger as they contain fewer observations. These groups were chosen as they represent the data evenly and fairly. By referring to figure 2.3 we can obtain the regression (2.3).
qtmark = 48.69955 + 0.119712attl + 0.438292ability + -4.972108hrsqt2c01 + -6.142459hrsqt2c01 + -2.016211hrsqt2c01 + -3.071867hrsqt2c01 + e (2.3)
The default category has been chosen to be >11. From this we observe that if for example an individual spends 3-5 hours on statistics revision then he/she is expected to get
...
...