Download presentation
Presentation is loading. Please wait.
Published byChristal Kathlyn Osborne Modified over 9 years ago
2
Powered by the people High quality data, no cost to end user Better than outsourcing “Crowd sourcing ” : the latest buzz word
3
How are you? आप कैसे हो ? Statistical Machine Translation (SMT) Crowd Sourcing
4
Parallel Data A SMT system learns phrasal translation correspondences from parallel data
5
~ 1 million sentences needed to build a good SMT system Human translation is very costly, unaffordable Judicial Domain: Translation very important to expedite cases Is Crowdsourcing the solution?
6
Groups of size 4 Each group to collect 5000 translations using crowdsourcing Source Language : English; Target Language : Hindi
7
A good, user-friendly interface for translation How to attract the crowd? Facebook, Orkut, etc.? Quality check, spam detection!!
9
साम (Explain, appeal to their logic, win over by dialogue) दाम (Pay and acquire, each group will be provided ` 1000) दंड (Penalty in the form of marks for not meeting deadlines, targets) भेद (Divide and rule, pitted against your classmates, conflict of interest on social networks)
10
Perfectly valid Hindi sentence but no relation with source sentence Complete junk Syntactic/Grammatical errors Google Translate
11
Gold data (i.e. correct translations) available with us Crowd data will be compared with gold data Penalty for wrong translations (Your spam detection is not working well!!) The first group to submit 5000 correct sentences gets bonus points Each group will provide a detailed account of their expenditure
12
No false promises “Translate 1 sentence and win a SUV!!” Stick to your promises “Promise a free t-shirt, give a free t-shirt!!” Avoid monetary transactions, give goodies instead
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.