Download presentation
Presentation is loading. Please wait.
1
MapReduce Practice :WordCount
박 영 택 컴퓨터학부
2
Mapper Execution Input text I am a boy
You are a girl Input text mapper.py Mapper output I \t 1 am \t 1 a \t 1 boy \t 1 You \t 1 are \t 1 girl \t 1 #!/usr/bin/env python import sys #--- get all lines from stdin --- for line in sys.stdin: #--- remove leading and trailing whitespace--- line = line.strip() #--- split the line into words --- words = line.split() #--- output tuples [word, 1] in tab-delimited format--- for word in words: print '%s\t%s' % (word, "1")
3
Reducer Execution Mapper output reducer.py { I : 1, am : 1, a : 2,
I \t 1 am \t 1 a \t 1 boy \t 1 You \t 1 Are \t 1 Girl \t 1 #!/usr/bin/env python import sys word2count = {} for line in sys.stdin: # remove leading and trailing whitespace line = line.strip() word, count = line.split('\t', 1) try: count = int(count) except ValueError: continue try: word2count[word] = word2count[word]+count except: word2count[word] = count for word in word2count.keys(): print '%s\t%s'% ( word, word2count[word] ) reducer.py { I : 1, am : 1, a : 2, boy : 1, You : 1, are : 1, girl : 1 }
4
Demo : Shakespeare Shakespeare Shakespeare COUNTESS OF
ROUSILLON mother to Bertram. (COUNTESS:) HELENA a gentlewoman protected by the Countess. An old Widow of Florence. (Widow:) DIANA daughter to the Widow. VIOLENTA | | neighbours and friends to the Widow. MARIANA | Shakespeare comedies(1.7mb) glossary(57kb) histories(1.4mb) poems(262kb) tragedies(1.7mb) Around : 5 mb
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.