170k views
0 votes
Write a Pig script and run it in local mode on this data to find out the top 10 states according to the land area. Since you want to execute the script in local mode, do not save your answer in HDFS but in local file system (your cluster home directory). Include the source code and the output file in the submission. The land area of the top 10 states are in range of 15*10^11 and 25*10^10.

User Michaldo
by
5.7k points

1 Answer

3 votes

Step-by-step explanation:

Apache Pig script execution modes

Local mode: In 'local mode', you can run the pig script on the local file system. In this case, you don't need to store the data in the Hadoop HDFS file system, instead you can work with the data stored in the local file system.

MapReduce mode: In 'MapReduce mode', the data must be stored in the HDFS file system and you can process the data with the help of pig script.

Apache Pig Script in MapReduce mode

Let's say our task is to read data from a data file and display the required contents in the terminal as output.

The sample data file contains the following data:

Txt information file - Apache Pig Script - Edureka

Save the text file with the name 'information.txt'

The sample data file contains five First Name, Last Name, Mobile Number, City, and Profession columns separated by the tab key. Our task is to read the contents of this HDFS file and display all the columns of these records.

To process this data using Pig, this file must be present in Apache Hadoop HDFS.

Command: hadoop fs –copyFromLocal /home/edureka/information.txt / edureka

User Trobrock
by
5.4k points