166k views
5 votes
Under the MapReduce v1 programming model, what happens in the Map step?

A) Data is split into smaller chunks
B) Data is processed in parallel across multiple nodes
C) Data is sorted and shuffled
D) Data is aggregated and reduced

1 Answer

4 votes

Final answer:

In the Map step of the MapReduce v1 programming model, data is processed in parallel across multiple nodes where it is transformed into intermediate key-value pairs by the Mapper.

Step-by-step explanation:

Under the MapReduce v1 programming model, during the Map step, each Mapper works on a small, distinct chunk of data. This is where data is indeed processed, but not solely in the way any of the given options describe. Specifically, the Map step involves:

  • Reading data from input sources (typically from HDFS in Hadoop).
  • Processing this data by applying a map function, which typically filters and sorts the data into intermediate key-value pairs.
  • Writing out the intermediate data to local disk, not yet globally sorting or shuffling it—this comes later in the pipeline.

Therefore, the correct answer to what happens in the Map step is B) Data is processed in parallel across multiple nodes. This step is all about transforming input data into intermediate form suitable for the subsequent Reduce step. The task of sorting and shuffling is generally handled in the intermediate phase between the Map and Reduce steps. Likewise, data is not aggregated or reduced until the Reduce step.

User Doobdargent
by
8.1k points