I am trying to code in R programming, this scenario: • Graph the training and testing error against the number of trees using a classification random forest model for the presen…

Question

asked Mar 10, 2024 51.7k views

I am trying to code in R programming, this scenario:

• Graph the training and testing error against the number of trees using a classification random forest model for the presence of heart disease (target) using variables age (age), sex (sex), chest pain type (cp), resting blood pressure (trestbps), cholesterol measurement (chol), resting electrocardiographic measurement (restecg), exercise-induced angina (exang), and number of major vessels (ca). Use a maximum of 150 trees. Use set.seed(6522048).

I've tried several ways but I either don't get it to run or get errors and I can't figure out the errors. Any help would be great, Thank You.

Bgaluszka asked

by Bgaluszka

7.9k points

1 Answer

← Prev Question Next Question →

Ask a Question

Maninvan · Answer 1 · 2024-03-14T11:44:57+0000

Answer:

```R

library(randomForest)

library(ggplot2)

# Set seed for reproducibility

set.seed(6522048)

# Load the heart disease dataset

data <- read.csv("path/to/heart_disease_dataset.csv")

# Split the data into training and testing sets

train_index <- sample(1:nrow(data), round(nrow(data)*0.7), replace = FALSE)

train_set <- data[train_index, ]

test_set <- data[-train_index, ]

# Define the predictor variables

predictors <- c("age", "sex", "cp", "trestbps", "chol", "restecg", "exang", "ca")

# Create empty vectors to store the errors

train_errors <- vector("numeric", 150)

test_errors <- vector("numeric", 150)

# Train and test the random forest model with different numbers of trees

for (i in 1:150) {

model <- randomForest(target ~ ., data = train_set, mtry = 3, ntree = i)

train_pred <- predict(model, train_set)

train_errors[i] <- mean(train_pred != train_set$target)

test_pred <- predict(model, test_set)

test_errors[i] <- mean(test_pred != test_set$target)

}

# Create a data frame with the errors and number of trees

df <- data.frame("Number of trees" = 1:150,

"Training error" = train_errors,

"Testing error" = test_errors)

# Plot the errors against the number of trees

ggplot(df, aes(x = `Number of trees`)) +

geom_line(aes(y = `Training error`, color = "Training error")) +

geom_line(aes(y = `Testing error`, color = "Testing error")) +

scale_color_manual(values = c("Training error" = "red", "Testing error" = "blue")) +

labs(x = "Number of trees", y = "Error rate", color = "Error type") +

ggtitle("Random Forest Error Rates for Heart Disease Classification")

```

I am trying to code in R programming, this scenario: • Graph the training and testing error against the number of trees using a classification random forest model for the presen…

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

No related questions found

Categories

Other Questions