194k views
0 votes
Which of the two similarity measures, Jaccard or Correlation, would you use for the following data? Briefly explain.

1.Data set of responses to True/False test questions by a set of students.For a pair of students, the measure should capture similarity between the answers of the two students.
2.Data set of languages spoken by a set of students. For a pair of students, the measure should capture the extent to which they speak the same languages.

1 Answer

3 votes

Final answer:

For both examples provided, the Jaccard similarity measure is more appropriate because it effectively handles binary data, such as True/False answers and the presence/absence of languages spoken, by comparing the intersection over the union of the sets.

Step-by-step explanation:

When choosing between the Jaccard similarity measure and Correlation for a given data set, it's important to understand the type of data and the comparison being made.

Jaccard Similarity

In the case of a set of students' True/False test responses, the Jaccard similarity measure would be a more suitable choice. The size of the intersection divided by the size of the sample sets' union defines the Jaccard index, which calculates the degree of similarity between finite sample sets. In this instance, it would compare the total number of questions asked with the number of questions that two students responded the same way (both True or both False).

Correlation

The degree of linear relationship between two variables is measured by correlation. For example, you could use correlation analysis to look for a linear link between changes in SAT scores and changes in GPA if you were comparing the two.

In the example of languages spoken by a set of students, however, Jaccard would again be preferable since it is a binary measure, dealing with the presence or absence of each language for each student.

User Kirill Savik
by
8.5k points