496,335 views
29 votes
29 votes
Use the box plots comparing the number of males and number of females attending the latest superhero movie each day for a month to answer the questions. Part A: Estimate the IQR for the males' data. Part B: Estimate the difference between the median values of each data set. Part C: Describe the distribution of the data and if the mean or median would be a better measure of center for each. Part D: Provide a possible reason for the outlier in the data set.

Use the box plots comparing the number of males and number of females attending the-example-1
User Trees
by
3.0k points

1 Answer

9 votes
9 votes

Part A

IQR stands for Interquartile Range. It represents the interval where 50% of the data located around the median are.

If we find the value in the value in the middle of the data, i.e., the median, then half the values are less than the median, and the other half is greater than it.

Now, if we find the middle value of each of those half parts, let's say Q1 and Q2, the interquartile range is the difference between those values:

IQR = Q2 - Q1

(25% of the values in the interquartile range are less than the median, and 25% of them are greater than the median).

In a box plot, the interquartile range is formed by all the values inside from the left end of the box to the right end of the box.

Now, for the male's data, we see that the left end of the box is located between 0 and 10. Then, we can say that Q1 is approximately 5. Otherwise, the right end of the box is located between 10 and 20. So, we can say Q2 is approximately 15.

Therefore, for the male's data:

IQR ≅ 15 - 5 = 10

Part B

The median in a box plot is represented by the vertical line inside the box. Then, for male's data, the median Mm is

Mm ≅ 9

and for the female's data, the median Mf is

Mf ≅ 7

Therefore, the difference between those medians can be estimated to be

Mm - Mf ≅ 9 - 7 = 2

Part C

From the male's data box plot, more data (75% of the data, from the left end of the plot (0) to the right end of the box (approximately 15) are located far from its right end (its right end is about 35). We say the distribution of the data is right-skewed.

In this case, since there are values far from the rest of most of the data, the median is a better measure of the center for this data, because the mean would be dislocated to the right because of those values.

For the female's data box plot, the data is distributed more homogeneously.

Nevertheless, we can see there's an outlier, so the mean would be affected by its value and wouldn't well reflect the center of the data. Therefore, the median is also a better measure of the center for this data.

Part D

One possible reason for the outlier in the data set is a mistake, or an observation conducted in a different way than it should be, because it is too distant from the rest of the data.

User Maarty
by
3.2k points