213k views
2 votes
Imagine that you are a medical researcher compiling data for a study. You have collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of 5 medications, Drug A, B, C, X, and Y. Part of your job is to build a model to find out which drug might be appropriate for a future patient with the same illness from the drug200.csv dataset. The features of this dataset are Age, Sex, Blood Pressure, Cholesterol level of the patients, and Na-to-K ratio. The dependent variable is Drug column, the drug that each patient responded to. Start by splitting the data into a training set and a testing set. You should put 70% of the data in the training set and set the random seed as 93.

(a) Develop a CART model and draw the classification tree. Without setting any parameters, the model would produce a fairly simple tree. Interpret your model by describing what type of patients responds to each of the different drugs.
(b) Make prediction on the test set by using predict() function. Here, by specifying type="class", your prediction will consist of a single predicted value for each test set observation. Using this prediction, make a 5×5 classification matrix comparing your prediction with actual values. What is the accuracy of your CART model on the test set?

User Nkjt
by
8.4k points

1 Answer

7 votes

Final answer:

To split the data into a training set and a testing set, use the 'caret' package in R. Develop a CART model and draw the classification tree using the 'rpart' package. Make predictions on the test set and calculate the accuracy of the CART model.

Step-by-step explanation:

To split the data into a training set and a testing set for building a CART model, you would use the 'caret' package in R. First, load the package and set the random seed to 93 using the 'set.seed(93)' function. Then, use the 'createDataPartition' function to split the data with a 70:30 ratio, specifying the 'Drug' column as the outcome variable. This will create two data sets, one for training and one for testing.

a. To develop a CART model and draw the classification tree, you can use the 'rpart' function from the 'rpart' package in R. Simply use the formula 'Drug ~ Age + Sex + BloodPressure + Cholesterol + Na_to_K' to specify the model and pass in the training data. The resulting tree can be visualized using the 'rpart.plot' function from the 'rpart.plot' package.

b. To make predictions on the test set, you can use the 'predict' function with the trained CART model and the testing data. Specify 'type = "class"' to get the predicted drug classification for each observation. Finally, you can create a 5x5 classification matrix using the 'table' function to compare the predictions with the actual values and calculate the accuracy by dividing the sum of the diagonal elements by the total number of observations.

User Sandeep Datta
by
8.2k points