34.9k views
4 votes
Using a dataset us_contagious_diseases compare the rate of Florida, New York and Illinois using a time series plot where rate = count / population * 10000 * 52 / weeks_reporting. Comment on the plot. (Submit the word file including comments, screenshot of code as well as graph) Write the code using R language.

User RLave
by
7.8k points

1 Answer

6 votes

Final answer:

To analyze disease rates in different states, the R programming language is used to create a time series plot with calculated rates. By examining this plot, trends and potential outbreaks can be assessed based on the data flow and correlation.

Step-by-step explanation:

The question pertains to the analysis of a dataset named us_contagious_diseases, particularly focusing on the comparison of the disease rates in Florida, New York, and Illinois over time.

To visualize this comparison, a time series plot is to be constructed using the R programming language, where the rate is calculated as specified by count/population * 10000 * 52/weeks_reporting. Since the population size is considered constant across the time points (weeks 12, 26, 40, 52), it is possible to compare reported cases directly.

Sample R code for creating a time series plot:

# Sample code to create a time series plot (assuming data is loaded into a dataframe called df)
df$rate <- df$count / df$population * 10000 * 52 / df$weeks_reporting
library(ggplot2)
ggplot(data = df, aes(x = week, y = rate, color = state)) +
geom_line() +
facet_wrap(~state) +
theme_minimal() +
labs(title = "Disease Rate Comparison", x = "Week", y = "Rate")

By analyzing the plot, epidemiologists can determine if there has been a sporadic outbreak or if a disease is becoming more common in a certain region. They can also assess whether the incidence of the disease is constant, declining, or trending towards an epidemic.

The plot is essential for showcasing the flow of data and may suggest a linear or non-linear relationship between time and disease rates, which aids in understanding the correlation between variables.

User Isanka Wijerathne
by
7.8k points