Final answer:
In order to determine the best model to predict the purchase of a new product, three models were compared. The demographic characteristics of customers who are more likely to purchase the new product can be identified using the winner model. To maximize profit, the supermarket should target loyal customers with a higher likelihood of purchasing the new product. Additional variables can be collected to improve the prediction model.
Step-by-step explanation:
In order to determine which model performs best in terms of Area under ROC Curve (AUC), three models were used: decision tree, random forest, and gradient boosting. The model that performs best is the one with the highest AUC score. The major parameters used in the winning model were maximum depth, minimum leaf size, pruning options for the decision tree; number of trees, in-bag sample proportion, and number of inputs to consider per split for random forest; and number of trees, learning rate, and L1/L2 regularization for gradient boosting.
Using the winner model, the demographic characteristics of customers who are more likely to purchase the new product can be identified. This can be done by analyzing the Partial Dependence (PD) plots for each demographic characteristic. The PD plots show the relationship between the demographic characteristic and the probability of buying the new product.
If the supermarket decides to expand the scope of the coupon campaign among loyal customers, they should target the type of loyal customers who have a higher likelihood of purchasing the new product. This can be identified by analyzing the PD plots for loyalty-related characteristics, which show the relationship between the loyalty-related characteristic and the probability of buying the new product.
By using the winner model to mail coupons to the top 30% quantile of loyal customers in terms of purchase likelihood, the supermarket can earn more profit compared to mailing coupons to all loyal customers. The exact amount of additional profit can be calculated using the cumulative response percentage.
In order to improve the prediction model, the supermarket should consider collecting other variables that might have an impact on consumers’ demand for the new product. Some possible variables to collect could be personal preferences, income level, household size, and previous purchase history.