169k views
0 votes
Researchers at the National Cancer Institute released the results of a study that investigated the effect of weed-killing herbicides on house pets. The randomly sampled 400 dogs from homes where an herbicide was used on a regular basis, diagnosing lymphoma in 230 of them. Of 200 dogs randomly sampled from homes where no herbicides were used, only 25 were found to have lymphoma. For this problem, let p1 be the population proportion of dogs that get lymphoma from homes where herbicides are used and p2 be the population proportion of dogs that get lymphoma from homes where herbicides are not used. Researchers are interested in learning about the difference in the proportion of cancer diagnoses between the two groups. What is the 95% confidence interval for the difference in the proportion of cancer diagnoses between the two groups

1 Answer

4 votes

Answer:

The 95% confidence interval for the difference in the proportion of cancer diagnoses between the two groups is (0.3834, 0.5166).

Explanation:

Before building the confidence interval, we need to understand the central limit theorem and subtraction between normal variables.

Central Limit Theorem

The Central Limit Theorem estabilishes that, for a normally distributed random variable X, with mean
\mu and standard deviation
\sigma, the sampling distribution of the sample means with size n can be approximated to a normal distribution with mean
\mu and standard deviation
s = (\sigma)/(√(n)).

For a skewed variable, the Central Limit Theorem can also be applied, as long as n is at least 30.

For a proportion p in a sample of size n, the sampling distribution of the sample proportion will be approximately normal with mean
\mu = p and standard deviation
s = \sqrt{(p(1-p))/(n)}

Subtraction between normal variables:

When two normal variables are subtracted, the mean is the difference of the means, while the standard deviation is the square root of the sum of the variances.

The randomly sampled 400 dogs from homes where an herbicide was used on a regular basis, diagnosing lymphoma in 230 of them.

This means that:


p_h = (230)/(400) = 0.575, s_h = \sqrt{(0.575*0.425)/(400)} = 0.0247

Of 200 dogs randomly sampled from homes where no herbicides were used, only 25 were found to have lymphoma.

This means that:


p_n = (25)/(200) = 0.125, s_n = \sqrt{(0.125*0.875)/(200)} = 0.0234

Distribution of the difference:


p = p_h - p_n = 0.575 - 0.125 = 0.45


s = √(s_h^2+s_n^2) = √(0.0247^2 + 0.0234^2) = 0.034

Confidence interval:

The confidence interval is:


p \pm zs

In which

z is the zscore that has a pvalue of
1 - (\alpha)/(2).

95% confidence level

So
\alpha = 0.05, z is the value of Z that has a pvalue of
1 - (0.05)/(2) = 0.975, so
Z = 1.96.

The lower bound is
0.45 - 1.96(0.034) = 0.3834

The upper bound is
0.45 + 1.96(0.034) = 0.5166

The 95% confidence interval for the difference in the proportion of cancer diagnoses between the two groups is (0.3834, 0.5166).

User NikxDa
by
6.1k points