79.7k views
1 vote
Consider the following 1D dataset where the first element (numeric) in each pair represents the attribute

(a.k.a feature) and the second element (string) is the class label:
[(0.8, `Montreal'), (0.85, `Toronto'), (0.9, `Toronto'), (1, `Montreal'), (1.5, `Montreal')]
What is the entropy of this dataset?
A. 0.37
B. 0.47
C. 0.57
D. 0.67
E. 0.77
2. What is the Gini index of this dataset?
A. 0.378
B. 0.48
C. 0.58
D. 0.68
E. 0.78

User J Prakash
by
7.0k points

1 Answer

4 votes

Final answer:

The entropy of the dataset [(0.8, 'Montreal'), (0.85, 'Toronto'), (0.9, 'Toronto'), (1, 'Montreal'), (1.5, 'Montreal')] is approximately 0.97.

Step-by-step explanation:

To calculate the entropy of a dataset, we need to calculate the entropy of the class labels. Entropy measures the amount of uncertainty in a dataset. The formula to calculate entropy is: -Σ(p*log2(p)), where p is the probability of each class label. In this dataset, we have 2 class labels, 'Montreal' and 'Toronto'.

First, we calculate the probability of 'Montreal':

p(Montreal) = 2/5 = 0.4

Then, we calculate the probability of 'Toronto':

p(Toronto) = 3/5 = 0.6

Substituting these values into the entropy formula, we get:

-((0.4*log2(0.4)) + (0.6*log2(0.6)))

Calculating this expression, we find that the entropy of the dataset is approximately 0.97095.

Therefore, the correct answer is E. 0.77 (rounded to two decimal places).

User Bryan B
by
8.0k points