37.6k views
3 votes
Suppose in a book there are 500,000 words in total. Out of the 500,000 words we have the following counts: N(‘I’) = 3000, N(‘love’) = 500, N(‘Chinese’) = 50, N(‘food’) = 300, N(‘I love’) = 200, N(‘love I’) = 30, N(‘love food’) = 100, N(‘Chinese love’) = 40, N(‘love Chinese’) = 20, N(‘Chinese food’) = 50, N(‘food Chinese’) = 2 Using a bigram language model without smoothing, estimate how many times you would find ‘I love Chinese food’ in the text?

User JJoao
by
7.5k points

1 Answer

6 votes

Final answer:

To estimate the number of times the phrase 'I love Chinese food' appears in the text, we can calculate the probability of the bigram using the given counts. Using the formula, N('I love Chinese food') = N('I love') * N('love Chinese') * N('Chinese food'). Plugging in the given counts, the estimated count is 200,000.

Step-by-step explanation:

To estimate how many times you would find the phrase 'I love Chinese food' in the given text using a bigram language model without smoothing, we need to calculate the probability of the bigram 'I love Chinese food'. In this case, N('I love Chinese food') = N('I love') * N('love Chinese') * N('Chinese food'). So, using the given counts, we have: N('I love Chinese food') = 200 * 20 * 50 = 200,000.

User Ross Symonds
by
8.0k points