151k views
4 votes
The following table gives the frequencies of the letters of the English language (including the blank for separating words) in a particular corpus. blank 18.3% r 4.8% y 1.6% e 10.2% d 3.5% p 1.6% t 7.7% l 3.4% b 1.3% a 6.8% c 2.6% v 0.9% o 5.9% u 2.4% k 0.6% i 5.8% m 2.1% j 0.2% n 5.5% w 1.9% x 0.2% s 5.1% f 1.8% q 0.1% h 4.9% g 1.7% z 0.1% What is the optimum Huffman encoding of this alphabet?

User Burrell
by
7.3k points

1 Answer

3 votes

To find the optimum Huffman encoding for this alphabet, we need to assign shorter codes to the more frequent letters and longer codes to the less frequent letters. This will help reduce the overall length of the encoded message.

Here is a step-by-step approach to finding the optimum Huffman encoding:

1. Start by listing all the letters in the alphabet along with their frequencies in descending order:

blank 18.3%

e 10.2%

t 7.7%

a 6.8%

o 5.9%

i 5.8%

n 5.5%

s 5.1%

h 4.9%

r 4.8%

d 3.5%

l 3.4%

u 2.4%

m 2.1%

c 2.6%

f 1.8%

p 1.6%

y 1.6%

w 1.9%

g 1.7%

b 1.3%

v 0.9%

k 0.6%

x 0.2%

j 0.2%

z 0.1%

q 0.1%

2. Create a binary tree, where each leaf node represents a letter and its frequency. The tree starts with the two least frequent letters as sibling nodes, and their parent node has a frequency equal to the sum of their frequencies.

3. Repeat the following steps until all nodes are connected and the tree is complete:

a. Combine the two nodes with the lowest frequencies to create a new parent node. The frequency of the parent node is the sum of the frequencies of the two child nodes.

b. Remove the two child nodes from the list and add the parent node to the list.

c. Sort the list in ascending order based on the frequencies.

4. Assign "0" as the code for the left child node and "1" as the code for the right child node. Repeat step 3 until the tree is complete.

5. The Huffman encoding for each letter is obtained by traversing the tree from the root node to the respective leaf node, recording "0" for a left branch and "1" for a right branch.

For example, following the steps above, we obtain the following Huffman encoding for the given alphabet:

blank: 00

r: 010

y: 0110

e: 100

d: 0101

p: 0111

t: 101

l: 01111

b: 1000

a: 110

c: 1001

v: 1110

o: 1010

u: 1101

k: 11110

i: 1011

m: 11111

j: 111110

n: 10101

w: 1100

x: 111111

s: 10100

f: 10001

q: 1111110

h: 01110

g: 10000

z: 11111110

Please note that the Huffman encoding obtained can vary slightly depending on the implementation, but the general principle remains the same.

User Boxed
by
8.3k points