224k views
5 votes
Assume we have a 10 -variable regression problem with a training data set and testing data set. We run the following three regression methods on the training data: - best subsets - forward selection (forward stepwise) - backward elimination (backward stepwise) For each method we keep the chosen models with 5 and 7 variables (total of 6 models). (a) Which 5-variable model will have the lowest Residual Sum of Squares (RSS) for the training data? Briefly explain your answer. (b) Which 5 -variable model will have the lowest prediction RSS, ie the lowest RSS when using the model fit using training data to predict for the testing data? Briefly explain your answer. (c) For which of the methods are we guaranteed that the 5 -variables in the 5 -variable model are a subset of the 7 -variables in the 7 -variable model?

User ZuzEL
by
8.2k points

1 Answer

3 votes

Answer:

Explanation:

(a) To determine the 5-variable model with the lowest Residual Sum of Squares (RSS) for the training data, we need to compare the RSS values of the chosen models from each method. However, since the models chosen using best subsets, forward selection, and backward elimination methods are not specified, we cannot determine which 5-variable model will have the lowest RSS without additional information.

(b) Similarly, to determine the 5-variable model with the lowest prediction RSS (RSS when using the model fit using training data to predict the testing data), we would need the specific models chosen using each method. Without this information, we cannot answer this question.

(c) For the best subsets method, we are guaranteed that the 5-variables in the 5-variable model are a subset of the 7-variables in the 7-variable model. This is because the best subsets method exhaustively searches through all possible combinations of variables and selects the model with the lowest RSS for each subset size. Therefore, if a 5-variable model is chosen as the best subset, it will be a subset of the 7-variable model as well.

However, for the forward selection and backward elimination methods, we cannot guarantee that the 5-variables in the 5-variable model are a subset of the 7-variables in the 7-variable model. These methods add or remove variables incrementally based on certain criteria (e.g., p-values, adjusted R-squared, etc.) and do not necessarily consider all possible subsets of variables. As a result, the final selected models may exclude some variables that were previously included or include some variables that were previously excluded. Therefore, there is no guarantee that the 5-variable model will be a subset of the 7-variable model in these methods.

User AlwaysALearner
by
8.5k points

No related questions found