Final Answer:
(a) Pointwise Mutual Information (PMI) is a measure used to assess the association between two words in a corpus. It is defined as \[ PMI(x, y) = \log \left( \frac{P(x, y)}{P(x)P(y)} \right) \], where \( P(x, y) \) is the probability of the co-occurrence of words \( x \) and \( y \), and \( P(x) \) and \( P(y) \) are their individual probabilities. The value of PMI tends to be higher when \( x \) and \( y \) co-occur in meaningful ways because it compares the observed co-occurrence with the expected co-occurrence under independence. Higher values indicate a stronger association than would be expected by chance.
(b) Assuming the corpus is normalized and tokenized, PMI is computed for the word pairs \( \text{PMI(you, vous)} \) and \( \text{PMI(sell, vendre)} \). The calculation involves counting the co-occurrences of these words in paired sentences. Despite zero smoothing, the formula involves the probabilities \( P(x, y) \), \( P(x) \), and \( P(y) \), which can be determined from the corpus statistics.
(c) The advantage of using PMI to discover translations is its ability to identify word pairs with strong associations, capturing nuanced relationships. However, it may struggle with rare words or those with multiple meanings. Additionally, it relies heavily on the quality and representativeness of the parallel corpus. The method is sensitive to word order and may not generalize well to different domains or contexts. Despite these limitations, PMI offers a valuable approach to extracting translation lexicons from large parallel corpora.
Step-by-step explanation:
(a) Pointwise Mutual Information (PMI) is a measure that assesses the association between two words in a corpus. It is defined as the logarithm of the ratio of the observed co-occurrence of words \( x \) and \( y \) (\( P(x, y) \)) to the expected co-occurrence under independence (\( P(x)P(y) \)). In other words, it quantifies the deviation of the observed co-occurrence from what would be expected by chance. Higher PMI values indicate a stronger association between words, suggesting they are more likely to be translations.
(b) In the context of the given corpus, assuming it's normalized and tokenized, the computation of PMI involves counting the co-occurrences of the specified word pairs. For example, \( \text{PMI(you, vous)} \) would be calculated using the formula and the relevant counts of co-occurrence. It's crucial to note that in this scenario, zero smoothing is recommended despite reminders against it, as probabilities are estimated directly from the counts.
(c) The advantages of using PMI for discovering translations lie in its ability to capture subtle semantic relationships and identify word pairs with strong associations. However, it has limitations, such as sensitivity to rare words or polysemous terms, reliance on the quality of the parallel corpus, and potential challenges in handling word order variations. Despite these drawbacks, PMI serves as a valuable tool in extracting translation lexicons, offering insights into the relationships between words in different languages based on their co-occurrence patterns.