444,936 views
16 votes
16 votes
The internetusage.csv dataset contains different statistics for all 50 states in the United States. Among those statistics are percentage of internet users in the state and percentage of individuals with bachelor's degrees.

Write a program that creates a model that uses the percentage of individuals with bachelor's degrees as input and returns the percentage of internet users in a state as output.

For example, if the input is:

User Pan
by
2.6k points

1 Answer

11 votes
11 votes

To create a model that uses the percentage of individuals with bachelor's degrees as input and returns the percentage of internet users in a state as output, you can use a linear regression model. Linear regression is a statistical method that allows you to model the relationship between a dependent variable (in this case, the percentage of internet users in a state) and one or more independent variables (in this case, the percentage of individuals with bachelor's degrees).

Here is an example of how you can implement a linear regression model in Python using the scikit-learn library:

# Import necessary libraries

from sklearn.linear_model import LinearRegression

import pandas as pd

# Load the dataset

data = pd.read_csv("internetusage.csv")

# Select the percentage of individuals with bachelor's degrees and percentage of internet users as the input and output variables

X = data[["percent_bachelor_degrees"]]

y = data["percent_internet_users"]

# Split the data into training and test sets

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create the linear regression model

model = LinearRegression()

# Train the model on the training data

model.fit(X_train, y_train)

# Use the model to make predictions on the test data

y_pred = model.predict(X_test)

# Calculate the accuracy of the model

from sklearn.metrics import r2_score

accuracy = r2_score(y_test, y_pred)

print("Accuracy: ", accuracy)

This code will create a linear regression model that uses the percentage of individuals with bachelor's degrees as the input and returns the predicted percentage of internet users in a state as the output. The model will be trained on a portion of the data, and the accuracy of the model will be calculated using the remaining data as the test set. The accuracy of the model will be printed to the console.

You can then use the model to make predictions on new data by using the predict method, like this:

# Make a prediction using the model

percent_bachelor_degrees = 60

prediction = model.predict([[percent_bachelor_degrees]])

print("Predicted percentage of internet users: ", prediction[0])

This code will make a prediction for a state with a 60% percentage of individuals with bachelor's degrees and print the predicted percentage of internet users to the console.