75.1k views
4 votes
Consider the linear function y; = ß0 + ß1xi + ui. Suppose that the following results were obtained from a sample with 12 observations:

2 Sample average of y = 20
Sample average of x = 20
Sample variance of y = 20
Sample variance of x = 10
Sample covariance of y and x = 10.

Suppose that the CLM Assumptions hold here and answer the following questions.
1. Calculate the OLS estimates of ß0 and ß1, and the R². (Hint: R² is equaled to the square of "coefficient of correlation", r.]
2. Estimate the variance of error term,σ², and Var (ß1). [Hint: See eq. (2.61).]
3. Test the null hypothesis that x has no effect on y against the alternative that x has effect on y, at the 5% and 1% significance levels.
4. Suppose that we add the term ß2z to the original model and that x and z are negatively correlated. What is the likely bias in estimates of ß1 obtained from the simple regression of y on x if ß2 <0? (2 points)
5. Based on question 4, when R² = 0.75 from regressing y on x and z, what is the t-statistic for the coefficient on z? Can we say that "z is statistically significant?"
6. Based on question 4, suppose that x is highly correlated with z in the sample, and z has large partial effects on y. Will the bias in question 4 tend to be large or small? Explain.

User Ilovewt
by
8.0k points

1 Answer

1 vote

To answer the questions, let's go step by step:

Calculate the OLS estimates of ß0 and ß1, and the R²:

The OLS estimates can be obtained using the following formulas:

ß1 = Cov(x, y) / Var(x)

ß0 = y_bar - ß1 * x_bar

where Cov(x, y) is the sample covariance between x and y, Var(x) is the sample variance of x, y_bar is the sample average of y, and x_bar is the sample average of x.

Given the information:

Sample average of y = 20

Sample average of x = 20

Sample variance of y = 20

Sample variance of x = 10

Sample covariance of y and x = 10

Using the formulas, we get:

ß1 = Cov(x, y) / Var(x) = 10 / 10 = 1

ß0 = y_bar - ß1 * x_bar = 20 - (1 * 20) = 0

The coefficient of determination, R², can be calculated as the square of the coefficient of correlation, r. Since r is equal to the covariance between x and y divided by the product of their standard deviations, we have:

r = Cov(x, y) / (std(x) * std(y)) = 10 / (√10 * √20) ≈ 0.707

Therefore, R² = r² = 0.707² ≈ 0.5

Estimate the variance of the error term, σ², and Var(ß1):

The variance of the error term, σ², can be estimated as:

σ² = (SSR / (n - k))

where SSR is the sum of squared residuals, n is the number of observations, and k is the number of predictors (including the intercept).

Var(ß1) can be estimated as:

Var(ß1) = σ² / (n * Var(x))

where Var(x) is the sample variance of x.

Since the sample variance of x is given as 10, we need to know the number of observations (n) and the number of predictors (k) to calculate σ² and Var(ß1).

Test the null hypothesis that x has no effect on y against the alternative that x has an effect on y at the 5% and 1% significance levels:

To test this hypothesis, we can perform a t-test for the coefficient ß1. The null hypothesis is that ß1 = 0, indicating that x has no effect on y.

The t-statistic for ß1 can be calculated as:

t = ß1 / se(ß1)

where se(ß1) is the standard error of ß1.

To determine statistical significance, we compare the t-statistic to the critical values at the desired significance levels (5% and 1%). If the t-statistic is larger than the critical value, we reject the null hypothesis.

However, since we haven't calculated the standard error of ß1, we cannot perform the t-test without that information.

Suppose we add the term ß2z to the original model, and x and z are negatively correlated. The likely bias in the estimates of ß1 obtained from the simple regression of y on x, if ß2 < 0, is that it will be upwardly biased.

This is known as the omitted variable bias. When an additional variable (z) that is correlated with the independent variable (x) but omitted from the regression is negatively correlated with x, the coefficient of x (ß1) tends to be biased upward. In this case, since ß2 is negative, it leads to an upward bias in ß1.

Based on question 4, when R² = 0.75 from regressing y on x and z, we don't have enough information to calculate the t-statistic for the coefficient on z. The t-statistic is typically calculated using the standard error of the coefficient estimate, which we don't have. Therefore, we cannot determine whether z is statistically significant based on the given information.

Based on question 4, if x is highly correlated with z in the sample and z has large partial effects on y, the bias in question 4 would tend to be small. When x and z are highly correlated, the omitted variable bias tends to be smaller because the correlation between the omitted variable (z) and the included variable (x) reduces the bias. Additionally, if z has a large partial effect on y, it can help explain the variation in y that is not accounted for by x alone, further reducing the bias in the estimate of ß1.

User Konrad Talik
by
7.9k points