151k views
5 votes
Function Implementations Complete The Following Functions In Task4.Py: Calculate_naive_metrics Given A Train Dataframe, Test Dataframe, Target_col And Naive Assumption Split Out The Target Column From The Training And Test Dataframes To Create A Feature Dataframes And A Target Series Then

Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following Functions in task4.py: calculate_naive_metrics Given a train dataframe, test dataframe, target_col and naive assumption split out the target column from the training and test dataframes to create a feature dataframes and a target series then calculate (rounded to 4 decimal places) accuracy, recall, precision and f1 score using the sklearn functions, the train and test target values and the naive assumption. calculate_logistic_regression_metrics Given a train dataframe, test dataframe, target_col and logreg_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a logistic regression model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets. For Feature Importance use the top 10 features selected by RFE and sort by absolute value of the coefficient from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_random_forest_metrics Given a train dataframe, test dataframe, target_col and rf_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a random forest model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_gradient_boosting_metrics Given a train dataframe, test dataframe, target_col and gb_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a gradient boosting model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) Submit task4.py to Gradescope

1 Answer

4 votes

Final answer:

The functions calculate_naive_metrics, calculate_logistic_regression_metrics, calculate_random_forest_metrics, and calculate_gradient_boosting_metrics perform various calculations and computations using scikit-learn (sklearn) python package. These functions split out the target column from the train and test dataframes, create feature dataframes, and target series. They then use different models (naive assumption, logistic regression, random forest, and gradient boosting) to predict and calculate various evaluation metrics such as accuracy, recall, precision, f1 score, false positive rate, false negative rate, and area under the receiver operator curve.

Step-by-step explanation:

calculate_naive_metrics

The calculate_naive_metrics function takes a train dataframe, test dataframe, target_col, and naive assumption as inputs. It splits out the target column from the training and test dataframes to create a feature dataframe and a target series. It then uses the scikit-learn (sklearn) python package to calculate the accuracy, recall, precision, and f1 score using the train and test target values and the naive assumption.

calculate_logistic_regression_metrics

The calculate_logistic_regression_metrics function takes a train dataframe, test dataframe, target_col, and logreg_kwargs as inputs. It splits out the target column from the training and test dataframes and creates a feature dataframe and a target series. It then trains a logistic regression model using the kwargs on the training data and predicts both binary predictions and probability estimates on the training and test data. It finally calculates the accuracy, recall, precision, f1 score, false positive rate, false negative rate, and area under the receiver operator curve for both training and test datasets.

calculate_random_forest_metrics

The calculate_random_forest_metrics function takes a train dataframe, test dataframe, target_col, and rf_kwargs as inputs. It splits out the target column from the training and test dataframes and creates a feature dataframe and a target series. It then trains a random forest model using the kwargs on the training data and predicts both binary predictions and probability estimates on the training and test data. It finally calculates the accuracy, recall, precision, f1 score, false positive rate, false negative rate, and area under the receiver operator curve for both training and test datasets.

calculate_gradient_boosting_metrics

The calculate_gradient_boosting_metrics function takes a train dataframe, test dataframe, target_col, and gb_kwargs as inputs. It splits out the target column from the training and test dataframes and creates a feature dataframe and a target series. It then trains a gradient boosting model using the kwargs on the training data and predicts both binary predictions and probability estimates on the training and test data. It finally calculates the accuracy, recall, precision, f1 score, false positive rate, false negative rate, and area under the receiver operator curve for both training and test datasets.

User Abhishek Shah
by
7.7k points