63.3k views
4 votes
In a given dataset, there are M columns. Out of these M, m columns are chosen each time for creating training samples for the individual trees in a random forest. What will happen if

A - m is almost equal to M

B - m is very small

1 Answer

6 votes
A - If m is almost equal to M, then the trees in the random forest will be highly correlated. This is because each tree will be trained on almost the same set of features, and will therefore make similar predictions. In such a scenario, the random forest will not be able to reduce the variance of the model, which is one of the main benefits of using an ensemble method like random forests.

B - If m is very small, then the trees in the random forest will have high variance and low bias. This is because each tree will be trained on a small subset of features, and will therefore be more sensitive to noise in the data. In such a scenario, the random forest will be more prone to overfitting, and may not generalize well to new data.
User Mikespiteri
by
8.1k points