51.7k views
5 votes
I am trying to code in R programming, this scenario:

• Graph the training and testing error against the number of trees using a classification random forest model for the presence of heart disease (target) using variables age (age), sex (sex), chest pain type (cp), resting blood pressure (trestbps), cholesterol measurement (chol), resting electrocardiographic measurement (restecg), exercise-induced angina (exang), and number of major vessels (ca). Use a maximum of 150 trees. Use set.seed(6522048).

I've tried several ways but I either don't get it to run or get errors and I can't figure out the errors. Any help would be great, Thank You.

User Bgaluszka
by
7.9k points

1 Answer

7 votes

Answer:

```R

library(randomForest)

library(ggplot2)

# Set seed for reproducibility

set.seed(6522048)

# Load the heart disease dataset

data <- read.csv("path/to/heart_disease_dataset.csv")

# Split the data into training and testing sets

train_index <- sample(1:nrow(data), round(nrow(data)*0.7), replace = FALSE)

train_set <- data[train_index, ]

test_set <- data[-train_index, ]

# Define the predictor variables

predictors <- c("age", "sex", "cp", "trestbps", "chol", "restecg", "exang", "ca")

# Create empty vectors to store the errors

train_errors <- vector("numeric", 150)

test_errors <- vector("numeric", 150)

# Train and test the random forest model with different numbers of trees

for (i in 1:150) {

model <- randomForest(target ~ ., data = train_set, mtry = 3, ntree = i)

train_pred <- predict(model, train_set)

train_errors[i] <- mean(train_pred != train_set$target)

test_pred <- predict(model, test_set)

test_errors[i] <- mean(test_pred != test_set$target)

}

# Create a data frame with the errors and number of trees

df <- data.frame("Number of trees" = 1:150,

"Training error" = train_errors,

"Testing error" = test_errors)

# Plot the errors against the number of trees

ggplot(df, aes(x = `Number of trees`)) +

geom_line(aes(y = `Training error`, color = "Training error")) +

geom_line(aes(y = `Testing error`, color = "Testing error")) +

scale_color_manual(values = c("Training error" = "red", "Testing error" = "blue")) +

labs(x = "Number of trees", y = "Error rate", color = "Error type") +

ggtitle("Random Forest Error Rates for Heart Disease Classification")

```

User Maninvan
by
7.9k points

No related questions found