Mapreduce example

Hadoop Mapreduce :

Map Phase :This is a phase whern the Mappers will accept the task and process(division of computation) specific to each node.
The result will be in key-value pairs. This is called intermediate output and will be stored in Local disk.

Sort and shuffle :
Each key value pairs from each mapper are taken and the values are now joined based on Keys and stored in local disk . After sorting and shuffling is done based on Keys of Key-value pair , the values will be sent to Reducers.

Reduce :  The output from the sort and shuffle will now be reduced and is stored in HDFS. This will be the final output.

Key- Value Pair : This is the output of the Mapper which will be given for Sorting and merging .

Combiner  : It is called as a mini reducer .It is generally used for searching in data set (Example highest salary in employee table).
It will search the highest of each dataset from Map stage .

Hive cannot convert nested subqueries into joins


Sample Text :

<1, What do you mean by Object>
<2, What do you know about Java>
<3, What is Java Virtual Machine>
<4, How Java enabled High Performance>

Map :

<What,1> <do,1> <you,1> <mean,1> <by,1> &lt;Object,1&gt;
<What,1> <do,1> <you,1> <know,1> <about,1> <Java,1>
<What,1> <is,1> <Java,1> <Virtual,1> <Machine,1>
<How,1> <Java,1> <enabled,1> <High,1> <Performance,1>

Combiner :

<What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> &lt;Object,1&gt;
<know,1> <about,1> <Java,1,1,1>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>

Partitioner :Partitioner will move the input to respective reducers based on key values from the mapper stage.

No of Partitioners = No of reducers



Reducer :

<What,3> <do,2> <you,2> <mean,1> <by,1> &lt;Object,1&gt;
<know,1> <about,1> <Java,3>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>