Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce Practice :WordCount

Similar presentations


Presentation on theme: "MapReduce Practice :WordCount"— Presentation transcript:

1 MapReduce Practice :WordCount
박 영 택 컴퓨터학부

2 Mapper Execution Input text I am a boy
You are a girl Input text mapper.py Mapper output I \t 1 am \t 1 a \t 1 boy \t 1 You \t 1 are \t 1 girl \t 1 #!/usr/bin/env python import sys   #--- get all lines from stdin --- for line in sys.stdin:     #--- remove leading and trailing whitespace---     line = line.strip()     #--- split the line into words ---     words = line.split()     #--- output tuples [word, 1] in tab-delimited format---     for word in words:          print '%s\t%s' % (word, "1")

3 Reducer Execution Mapper output reducer.py { I : 1, am : 1, a : 2,
I \t 1 am \t 1 a \t 1 boy \t 1 You \t 1 Are \t 1 Girl \t 1 #!/usr/bin/env python import sys   word2count = {}   for line in sys.stdin:     # remove leading and trailing whitespace     line = line.strip()       word, count = line.split('\t', 1)     try:         count = int(count)     except ValueError:         continue     try:         word2count[word] = word2count[word]+count     except:         word2count[word] = count  for word in word2count.keys():     print '%s\t%s'% ( word, word2count[word] ) reducer.py { I : 1, am : 1, a : 2, boy : 1, You : 1, are : 1, girl : 1 }

4 Demo : Shakespeare Shakespeare Shakespeare COUNTESS OF
ROUSILLON mother to Bertram. (COUNTESS:) HELENA a gentlewoman protected by the Countess. An old Widow of Florence. (Widow:) DIANA daughter to the Widow. VIOLENTA | | neighbours and friends to the Widow. MARIANA | Shakespeare comedies(1.7mb) glossary(57kb) histories(1.4mb) poems(262kb) tragedies(1.7mb) Around : 5 mb


Download ppt "MapReduce Practice :WordCount"

Similar presentations


Ads by Google