Powered by the people High quality data, no cost to end user Better than outsourcing “Crowd sourcing ” : the latest buzz word.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

WALT: DISCOVER MORE DETAILS ABOUT FRANCE WILF: TO BE ABLE TO RECOGNISE WELL KNOWN PLACES IN FRANCE AND READ AND WRITE ABOUT THEM USING LEVEL 5 AND 6 LANGUAGE.
USING GRAMMAR GAMES IN EFFECTIVELY TEFL CRISTINA CERNEI “O. Ghibu” TL MAEP, 1st DD.
Overview of Programming and Problem Solving ROBERT REAVES.
© Janice Regan Problem-Solving Process 1. State the Problem (Problem Specification) 2. Analyze the problem: outline solution requirements and design.
Submitted by: Omer & Ofer Kiselov Supevised by: Dmitri Perelman Networked Software Systems Lab Department of Electrical Engineering, Technion.
MyFlashcard —A facebook application with multiple purposes Aobo Wang 1.
Natural Language Query Interface Mostafa Karkache & Bryce Wenninger.
Revisions If you received the grade: “R” If you received the grade: “R” You must do a revision to receive a grade for lab 2. You must do a revision to.
Computational Language Andrew Hippisley. Computational Language Computational language and AI Language engineering: applied computational language Case.
Pro Exchange SPAM Filter An Exchange 2000 based spam filtering solution.
Managing and Avoiding Junkmail. Junk  Where does Junk Mail come from? People with whom you do business  Pepsi Friends of people with whom you.
Prototype & Design Computer Inputs. How to Prototype & Design Computer Inputs Step 1: Review Input Requirements Step 2: Select the GUI Controls Step 3:
April 2009 BEATING BLACKJACK CARD COUNTING FEASIBILITY ANALYSIS THROUGH SIMULATION.
Tech Made Simple: Boosting Your Business Using Technology.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Measures to protect files from unauthorised access and modification Your name and surname Communication Skills ICT Skills Welsh Language.
Programming Translators.
ITEC224 Database Programming
Assignment M eclipse Assignment Model Tracing Dr Vive Kumar IIST, Massey University.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
PROGRAMMING LANGUAGES Prof. Lani Cantonjos. PROGRAM - set of step-by-step instructions that tells or directs the computer what to do. PROGRAMMING LANGUAGE.
Exam Taking Kinds of Tests and Test Taking Strategies.
Object Oriented Design Jerry KotubaSYST Object Oriented Methodologies1.
CAPIT: An Intelligent Tutoring System for Capitalisation and Punctuation ICTG Group Department of Computer Science University of Canterbury.
TRANSLATION MEMORY TECHNOLOGY
Designing & Testing Information Systems Notes Information Systems Design & Development: Purpose, features functionality, users & Testing.
Fundamental Programming: Fundamental Programming K.Chinnasarn, Ph.D.
CHAPTER 13 Acquiring Information Systems and Applications.
Chicken Translate Members: Hakan Anit,… Online Translation System.
CMPF124 Basic Skills For Knowledge Workers Chapter 1 – Part 1 Introduction To Windows Operating Systems CMPF 112 : COMPUTING SKILLS.
1 California State University, Fullerton Chapter 5 Information System Software.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
1 Topic iii supporting papers Dan Hedlin Statistics Sweden.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
Intermediate 2 Computing Unit 2 - Software Development.
1 Access Control Policies: Modeling and Validation Luigi Logrippo & Mahdi Mankai Université du Québec en Outaouais.
ERRORS. Types of errors: Syntax errors Logical errors.
Jesse Oleyar Baker College NET 101.  Serves as an interface between user and network  application software.
Payment processing re-invented Mark Bradbury, CEO.
Chapter 5 How to use . The benefits of -- in personal and professional lives --mailing lists and discussion lists -- inside/outside.
Lecture Notes for Revisiting Use Case Descriptions.
Introduction to Computer Programming using Fortran 77.
English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 10 Using Menus and Validating Input.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Chapter 6 Testing and running a solution. Errors X Three types Syntax Logic Run-time.
Catching Phish. If I went fishing what would I be doing? On the Internet fishing (phishing) is similar! On the internet people might want to get your.
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Gmail Helpline Numberfix Gmail.
Language Technologies Institute Carnegie Mellon University
Operating System Interface between a user and the computer hardware
Targets.
Google translate app demo
Balázs Lehóczki Service Press & Information Court of Justice of the EU
Crowdsourcing with all-pay auctions: A field experiment on Taskcn
Women's Social Organizations
How to Protect your Identity Online PIYUSH HARSH
Zoho provides complete protection against spam and makes the interface speed faster.
Yuri Pettinicchi Jeny Tony Philip
Test Driven Lasse Koskela Chapter 9: Acceptance TDD Explained
The CoNLL-2014 Shared Task on Grammatical Error Correction
Find the surface area of:
Toolkits for Communication
Chapter Two Visual Basic.Net.
Who Wants to Win a Million?
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
  Homework 6 (June 20) Material covered: Slides
Target Language English Created by Jane Driver.
Presentation transcript:

Powered by the people High quality data, no cost to end user Better than outsourcing “Crowd sourcing ” : the latest buzz word

How are you? आप कैसे हो ? Statistical Machine Translation (SMT) Crowd Sourcing

Parallel Data A SMT system learns phrasal translation correspondences from parallel data

~ 1 million sentences needed to build a good SMT system Human translation is very costly, unaffordable Judicial Domain: Translation very important to expedite cases Is Crowdsourcing the solution?

Groups of size 4 Each group to collect 5000 translations using crowdsourcing Source Language : English; Target Language : Hindi

A good, user-friendly interface for translation How to attract the crowd? Facebook, Orkut, etc.? Quality check, spam detection!!

साम (Explain, appeal to their logic, win over by dialogue) दाम (Pay and acquire, each group will be provided ` 1000) दंड (Penalty in the form of marks for not meeting deadlines, targets) भेद (Divide and rule, pitted against your classmates, conflict of interest on social networks)

Perfectly valid Hindi sentence but no relation with source sentence Complete junk Syntactic/Grammatical errors Google Translate

Gold data (i.e. correct translations) available with us Crowd data will be compared with gold data Penalty for wrong translations (Your spam detection is not working well!!) The first group to submit 5000 correct sentences gets bonus points Each group will provide a detailed account of their expenditure

No false promises “Translate 1 sentence and win a SUV!!” Stick to your promises “Promise a free t-shirt, give a free t-shirt!!” Avoid monetary transactions, give goodies instead