229k views
4 votes
Transcribed image text: Consider the following text: retrieve remove data retrieved reduce [3+2+3=8M] a. How many character trigram dictionary entries are generated by indexing the trigrams in the terms in the text above? Use the special character $ to denote the beginning and end of terms. b. How would the wild-card query re've be most efficiently expressed as an AND query using the trigram index over the text above? c. Explain the necessary steps involved in processing the wild-card query red using the trigram index over the text above?

User Kklw
by
7.8k points

1 Answer

1 vote

Answer:

a. To generate the character trigrams dictionary entries from the terms in the text above, we first add a $ symbol at the beginning and end of each term, and then split each term into its character trigrams. For example, "retrieve" becomes "$re", "ret", "etr", "tri", "rie", "iev", "eve", "vet", "et$", and "remove" becomes "$re", "rem", "emo", "mov", "ove", "ve$". Finally, we merge all the character trigrams from all the terms to create the dictionary entries. In this case, we have 8 unique character trigrams, represented by the following dictionary entries: {"$re", "rem", "etr", "emo", "tri", "mov", "rie", "ove", "iev", "ve$", "ret", "vet", "et$"}.

b. To efficiently express the wild-card query "re've" as an AND query using the trigram index over the text above, we can use the fact that the trigram index already contains the character trigrams for all the terms. We can first generate the trigrams for the query term "$re've" by filling in the missing characters with wild-cards, resulting in the set {"$re", "re'", "e'v", "ve$"}. We can then retrieve the trigrams from the index that match any of these query trigrams, and find the terms that contain all of these trigrams. In this case, we get the terms "retrieve" and "remove" as matches.

c. To process the wild-card query "red" using the trigram index over the text above, we first generate the query trigrams by filling in the missing characters with wild-cards, resulting in the set {"$re", "red", "ed$"}. We can then retrieve the terms that match any of these query trigrams, and filter the resulting terms to find the ones that match the original query pattern. For example, we can retrieve the terms "retrieve", "remove", and "reduced" as matches, and then filter them to find only the ones that contain the substring "red", resulting in the term "reduced".

Step-by-step explanation:

User Dheeraj Bhaskar
by
8.8k points