99.0k views
1 vote
Given a training set of 100 numbers with 91 zeros and 1 each of the other digits 1-9, what is the unigram perplexity for the following test set: 00000300 0 0?

User Svintus
by
7.9k points

1 Answer

7 votes

Final answer:

To calculate the unigram perplexity, you need to determine the probabilities of each digit in the training and test sets. By multiplying the probabilities of each digit in the test set and taking the reciprocal, you can find the unigram perplexity. In this case, the unigram perplexity for the given test set is approximately 4098.

Step-by-step explanation:

To calculate the unigram perplexity, we first need to understand what constitutes a unigram. In this case, a unigram refers to the frequency of each digit in the training set. Given that there are 91 zeros and 1 of each of the other digits, we can calculate the probability of each digit by dividing its frequency by the total number of digits in the training set. The unigram perplexity for the test set can then be calculated by multiplying the probabilities of the digits in the test set and taking the reciprocal.

Let's calculate the unigram perplexity for the given test set:

  1. Calculate the probabilities of each digit in the training set:
    Probability of zero: 91/100 = 0.91
    Probability of each other digit: 1/100 = 0.01
  2. Calculate the probabilities of each digit in the test set:
    Probability of 0: 5/8 = 0.625
    Probability of each other digit: 1/8 = 0.125
  3. Multiply the probabilities of each digit in the test set:
    0.625 × 0.125 × 0.125 × 0.625 × 0.625 × 0.625 × 0.125 × 0.125 × 0.01 × 0.625 = 0.000244
  4. Take the reciprocal of the result:
    1/0.000244 ≈ 4098

Therefore, the unigram perplexity for the given test set is approximately 4098.

User Mike Wise
by
8.3k points