59.8k views
4 votes
I want answers for

Q3(c)+(d)
Q5
Q6
Consider the following documents collection:
dl= "Big cats are nice and funny"
d2 = "Small dogs are better than big dogs"
d3= "Small cats are afraid of small dogs"
d4= "Big cats are not afraid of small dogs"
d5 = "Funny cats are not afraid of small dogs"
1. For the Preprocessing, answer to the following points:
(a) Compute the tokens for each document
(b) Normalize the tokens with respect to plurals and upper/lower case
(c) Compute the dictionary relative to the documents collection
2. build the documents-terms incidence matrix as required by the Boolean model.
3. Consider a Boolean model.
(a) Aswer the query q1= funny AND dog
(b) Aswer the query q2= nice OR dog
(c) Aswer the query q3 = big AND dog AND NOT funny
(d) Translate query q3 into a Disjunctive Normal Form considering a dictionary ={big,cat, funny,small,dog}
4. Build the documents-terms weights matrix using as term-frequency
(a) the number of occurrences of the term in each document
(b) the normalized number of occurrences
(c) the logarithmic number of occurrences
5. Build the documents-terms weights matrix using as term-frequency.
6. Rank the documents with respect to query q={ big - cat - afraid } using the normalized term-frequency model (used in question 5).
(a) Use the Eucledian distance
(b) Use Cosine similarity
(c) Use Jaccard similarity

2 Answers

2 votes

Final answer:

We are asked to sort the words into three groups based on whether they contain the letter 'd' or not.

Step-by-step explanation:

Based on the given question, we are asked to sort the twenty words into three groups based on whether they contain the letter 'd' or not. Here are the three groups of words:

  • Group 1: Words that contain the letter 'd' dogs, afraid, and enjoyed.
  • Group 2: Words that do not contain the letter 'd' cats, big, small, are, nice, and funny.
  • Group 3: Words that contain the letter 'd' more than once: afraid and enjoyed.

By categorizing the words in this way, we can easily identify which words have the letter 'd' and how many times it appears in each word.

User Djheru
by
7.9k points
2 votes

Final answer:

The Boolean model is used to answer queries based on the presence or absence of terms in documents. This model allows for logical operations such as AND, OR, and NOT to be applied to the queries. Additionally, the Disjunctive Normal Form (DNF) representation of a query can be obtained by combining conjunctions and disjunctions.

Step-by-step explanation:

Boolean Model

(a) Query: funny AND dog

Answer: In the given collection, the word 'funny' is present in document dl and the word 'dog' is present in document d2. To find the intersection of these two documents, we look for common terms, which in this case is an empty set. Therefore, 'funny AND dog' does not yield any results.

(b) Query: nice OR dog

Answer: The word 'nice' is present in document dl and the word 'dog' is present in documents d2 and d3. To find the union of these documents, we combine the terms, which gives us the set {dl, d2, d3}. Therefore, 'nice OR dog' yields documents dl, d2, and d3.

(c) Query: big AND dog AND NOT funny

Answer: The word 'big' is present in documents dl and d4, the word 'dog' is present in documents d2 and d3, and the word 'funny' is present in document dl. To find the intersection and complement of these documents, we subtract the term 'funny' from the intersection of 'big' and 'dog'. Therefore, 'big AND dog AND NOT funny' yields document d4.

(d) Query: big AND dog AND NOT funny

Answer: To translate the query 'big AND dog AND NOT funny' into Disjunctive Normal Form (DNF), we represent it as a combination of conjunctions (AND) and disjunctions (OR). The DNF form of the query will be: (big AND dog AND NOT funny).

User Fmarm
by
7.5k points