138k views
0 votes
Consider the following hash function. Let U be the universe of strings composed of the characters from the alphabet Σ=[A,…,Z], and let the function f(x

i

) return the index of a letter x
i

∈Σ, e.g., f( A)=1 and f(Z)=26. Finally, for an m-character string x∈Σ
m
, define h(x)=([∑
i=1
m

f(x
i

)]modℓ), where ℓ is the number of buckets in the hash table. That is, our hash function sums up the index values of the characters of a string x and maps that value onto one of the ℓ buckets. Suppose this is going to be used to hash words from a large body of English text. List at least 4 reasons why h(x) is a bad hash function relative to the ideal behavior of uniform hashing.

1 Answer

2 votes

Answer:

There are several reasons why the given hash function h(x) is a bad hash function relative to the ideal behavior of uniform hashing when used to hash words from a large body of English text:

1. The hash function does not take into account the order of the characters in the string. Anagrams will have the same hash value, which will lead to collisions.

2. The hash function is not sensitive to permutations of the characters in the string. For example, the strings "ABC" and "ACB" will have the same hash value, which will lead to collisions.

3. The hash function is not sensitive to the frequency of occurrence of the characters in the string. For example, the strings "AAB" and "ABB" will have the same hash value, which will lead to collisions.

4. The hash function is not sensitive to the length of the string. For example, the strings "A" and "AA" will have the same hash value, which will lead to collisions.

All of these factors will result in poor performance of the hash function, with many collisions and poor distribution of keys across the hash table.

User Eliah
by
7.6k points
Welcome to QAmmunity.org, where you can ask questions and receive answers from other members of our community.