79.1k views
1 vote
In this assessment you are required to write two MapReduce programs and run them on Hadoop Distributed File System (HDFS).

First, create a directory for this assessment called test1 within the /home/prac/ directory as we normally have in practicals. From here you should be able to follow the directions in Practical 2 to write and run your MapReduce programs. Our input file (or data) for this assessment will be the text version of The Time Machine by H. G. Wells in public domain.
Copy the file into your input folder and rename it to test_input.txt. Then, create /user/prac/test1 directory within HDFS.
Lastly, upload our input file (test_input.txt) to /user/prac/test1/input HDFS directory, and run our programs using Hadoop Streaming.
The output for Q1 should be put in /user/prac/test1/output1 and output for Q2 should be put in /user/prac/test1/output2.
Write a MapReduce program to determine the frequency of word lengths within an input file.
Write a MapReduce program to determine the total length of English alphabetic characters in the input file.

User Leo Rams
by
7.7k points

1 Answer

4 votes

Final answer:

To determine word length frequencies and the total length of English alphabetic characters in an input file, you can use MapReduce programs in Hadoop. The Map function would tokenize the text and the Reduce function would aggregate the results.

Step-by-step explanation:

Finding Word Length Frequencies



To determine the frequency of word lengths within an input file, you can use a MapReduce program in Hadoop. The Map function would tokenize the input text and emit key-value pairs, with the key being the length of the word and the value being 1. The Reduce function would then sum up the values for each key and output the word length frequencies.

Calculating Total Length of English Alphabetic Characters



To determine the total length of English alphabetic characters in the input file, you can again use a MapReduce program. The Map function would filter out non-alphabetic characters and emit key-value pairs, with the key being a fixed value (e.g., 'total') and the value being the length of the filtered word. The Reduce function would then sum up all the values and output the total length.

Note:

You would need to implement the Map and Reduce functions in a language supported by Hadoop, such as Java or Python.

User Salil
by
8.4k points