222k views
1 vote
Complete data are very rare because some data are usually missing. There are typically four strategies to handle missing data.

Ignore the variables with missing data.
Delete the records that have some missing values.
Impute the missing values (i.e., simply fill in the missing values with some other value, such as the mean of similar records with data).
Use a data mining technique that can handle missing values, such as CART, which is one of a few techniques that handle missing data pretty well.
Suppose you are working on a project to model a hotel company's customer base. The goal is to determine which customer attributes are correlated with a high number of hotel stays. You have access to data from a customer survey, but one field (which is optional) is yearly income and has some missing values equaling roughly 5% of the records. Which of the strategies should be used? Explain why. Suppose you chose the third strategy. Explain how you would impute the missing incomes.

User Dave Loepr
by
3.1k points

1 Answer

4 votes

Answer:

see explaination

Step-by-step explanation:

If I am to select I will select the strategy Impute the missing values . Because rather than deleting and all the other strategies. This is the best option because, this is an optional field as described there is no need for such accurate information about salary.

If we want to maintain this field we can just fill the value by identifying the similar records which is the other customers who is visiting the same no of times and fill that.

User Pablisco
by
3.3k points