52.1k views
2 votes
How could we detect outliers when there are multiple dimensions?

User Kevinmicke
by
8.2k points

1 Answer

6 votes

Final answer:

To detect outliers in multidimensional data, measure the residuals from the line of best fit and flag points that are more than two standard deviations away. Tools like graphing calculators can facilitate this process visually or numerically, and removing identified outliers can be analyzed for impact on the correlation coefficient and model fit.

Step-by-step explanation:

In detecting outliers in multidimensional data, one method is to look at the residuals or errors between data points and a line of best fit. In a scatter plot with multiple dimensions, we can detect outliers by drawing extra lines that signify two standard deviations (2s) from the best-fit line, often represented by the equation Y1. Any data point with a vertical distance greater than 2s from this line can be flagged as an outlier. Analytical methods such as regression analysis can further quantify the influence of outliers by examining changes in the correlation coefficient once an outlier is removed, which affects the overall fit of the model. Using tools like the TI-83, 83+, or 84+ graphing calculators, this process can be facilitated graphically. Additionally, the visual approach can be complemented by numerical calculations for precise determination. For example, in a data set mapping the relationship between scores on a third exam and a final exam, a student scoring 65 on the third exam and 175 on the final exam would be considered an outlier if their score was more than two standard deviations from the regression line.

User Nathan Bertram
by
7.4k points

No related questions found