Developing a MapReduce Application – packet dissection
How MapReduce Works? MapReduce application is based on Hadoop Distributed FileSystem, HDFS. Steps of the MapReduce process: The client submits the job. JobTracker coordinates the job and splits into tasks. TaskTrackers run the tasks, here is the main map and reduce phases.
Shuffle and Sort These two facilities are the heart of MapReduce which make Cloud Computing powerful. Sort phase: guarantees the input to every reduce is sorted by key. Shuffle phase: transfers the map output to the reducers as input.
A MapReduce Application – packet dissection With Jpcap library, captures packets and writes to HDFS directly.
A MapReduce Application – packet dissection Setup a job configuration and submit the job.
A MapReduce Application – packet dissection The mapper filters packets with the port 1863, which is the MSN protocol.
A MapReduce Application – packet dissection The reducer dissect the packet, and write message to output collector.
A MapReduce Application – packet dissection See the result: