200k views
4 votes
You’ve just finished training an ensemble tree method for spam classification, and it is getting abnormally bad performance on your validation set, but good performance on your training set. Your implementation has no bugs. Define various reasons that could be causing the problem?

1 Answer

7 votes

Answer:

The various reasons that could be a major problem for the implementation are it involves a large number of parameters also, having a noisy data

Step-by-step explanation:

Solution

The various reasons that could be causing the problem is given as follows :

1. A wide number of parameters :

  • In the ensemble tree method, the number of parameters which are needed to be trained is very large in numbers.
  • When the training is performed in this tree, then the model files the data too well.
  • When the model has tested against the new data point form the validation set, then this causes a large error because the model is trained completely according to the training data.

2. Noisy Data:

  • The data used to train the model is taken from the real world . The real world's data set is often noisy i.e. contains the missing filed or the wrong values.
  • When the tree is trained on this noisy data, then it sets its parameters according to the training data.
  • As regards to testing the model by applying the validate set, the model gives a large error of high in accuracy y.

User Hiheelhottie
by
6.6k points