Final answer:
OLS is a bad option when p>n because it may lead to overfitting. Techniques such as ridge regression, lasso regression, and principal component regression are better options.
Step-by-step explanation:
When the number of variables (p) is greater than the number of observations (n), ordinary least squares (OLS) is not a good option to use. This is because OLS assumes that the number of observations is greater than or equal to the number of variables, and it may lead to overfitting and unreliable estimates when p > n.
Techniques that would be best to use in this situation include:
- Ridge regression: This technique adds a penalty term to the OLS estimation equation to shrink the coefficients, which helps to reduce the effects of overfitting.
- Lasso regression: Similar to ridge regression, lasso regression also adds a penalty term, but it has the additional property of selecting variables and making some coefficients exactly zero. This helps to perform variable selection and simplify the model.
- Principal component regression: This technique uses principal component analysis to reduce the dimensionality of the data set by creating new variables, which are then used in the regression analysis.