data:image/s3,"s3://crabby-images/5e0d6/5e0d6170cd01e66e9c8a1e0071f63e932f07573f" alt="Big Data Analytics with Hadoop 3"
Map
The map function takes a series of key/value pairs, processes each, and generates zero or more output key/value pairs. The input and output types of the map can be (and often are) different from each other.
If the application is doing a word count, the map function will break the line into words and output a key/value pair for each word. Each output pair will contain the word as the key and the number of instances of that word in the line as the value.
In the mapper, code is executed on each key/value pair from the record reader to produce zero or more new key/value pairs, called the intermediate output of the mapper (which also consists of key/value pairs). The decision of what the key and value from each record is directly related to what the MapReduce job is accomplishing. The key is what the data will be grouped on and the value is the part of the data to be used in the reducer to generate the necessary output. One of the key items discussed in the patterns is how the different types of use cases also determine the particular key/value logic. In fact, the semantics of this logic is a key differentiator between MapReduce design patterns.