Problem 1: Word Segmentation whatdoesthisreferto.

Slides:



Advertisements
Similar presentations
Speech-to-Speech Translation Hannah Grap Language Weaver, Inc.
Advertisements

compilers and interpreters
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
An Online Microsoft Word Tutorial & Evaluation Begin.
Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9,
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
Speech Recognition. What makes speech recognition hard?
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Automatic Spelling Correction Probability Models and Algorithms Motivation and Formulation Demonstration of a Prototype Program The Underlying Probability.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Spanish WebQuest 5 th Grade Sra. Vigil Introduction.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
ENRICHMENT – Quiz 3 Level A Language Conventions (Spelling) What is the correct spelling of the word?
Albert Gatt Corpora and Statistical Methods Lecture 9.
Artificial Intelligence What’s Possible, What’s Not, How Do We Move Forward? Adam Cheyer Co-Founder, VP Engineering Siri Inc.
Chapter 11: Interaction Styles. Interaction Styles Introduction: Interaction styles are primarily different ways in which a user and computer system can.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Applications (2 of 2): Recognition, Transduction, Discrimination, Segmentation, Alignment, etc. Kenneth Church Dec 9,
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
1999 Asian Women's Network Training Workshop Tools for Searching Information on the Web  Search Engines  Meta-searchers  Information Gateways  Subject.
Programming Practice Programming Puzzles and Competitions CIS 4900 / 5920 Spring 2009.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Introduction to Grid Computing Felix Hageloh Roberto Valenti Deployment of a Language Detector Grid Service University of Amsterdam,
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Measurement. 3 Parts to Measurement: 1. a numeric value 2. a unit 3. an estimation of the amount of error.
Statistical Machine Translation Raghav Bashyal. Statistical Machine Translation Uses pre-translated text (copora) Compare translated text to original.
Estimating N-gram Probabilities Language Modeling.
The Unreasonable Effectiveness of Data
Distant Course Master English English Language Course For Masters of Mathematical and Mechanical Faculty Saint Petersburg State University The Faculty.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Welcome to Introduction to Bioinformatics Monday, 2 May 2005 Probability Anything you like How to read text files.
Introductions. Specialized instruction in Written Expression: The challenges of Learning to Write.
Keeping up with translation technologies: a call for experimental pedagogies Anthony Pym.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Individual Project Tingting Xu What I planned Study the many features of Google Translator Toolkit Translate a 300-character speech by Chinese.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Automatic Categorization of Patent Applications Presentation to the 3rd IPC Workshop, WIPO, Feb , The need for automatic categorization of.
#APMP2016. Submitting proposals in more than one language: a survival guide Considering language and translation as a key component of your value proposition.
How to teach what you don’t know: a call for pedagogical experiments Anthony Pym.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
WP4 Models and Contents Quality Assessment
Virtual Trip to a Spanish Speaking Country
Speech Recognition
Statistical Machine Translation Part II: Word Alignments and EM
Writing Inspirations, 2017 Aalto University
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Virtual Trip to a Spanish Speaking Country
English-Korean Machine Translation System
Los países hispano-hablantes
Hidden Markov Models (HMM)
Writing Inspirations, Spring 2016 Aalto University
Statistical n-gram David ling.
Intro to digital technology
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 3 prof. ssa Laura Liucci –
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
University of Illinois System in HOO Text Correction Shared Task
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Statistical Machine Translation
Presentation transcript:

Problem 1: Word Segmentation whatdoesthisreferto

Application: Chinese Text

Application: Internet Domain Names Visit Britain

Statistical Machine Learning Best segmentation = one with highest probability Probability of a segmentation = P(first word) × P(rest of segmentation) P(word) = estimated by counting

Statistical Machine Learning choosespain Choose Spain Chooses pain P( “Choose Spain” ) > P( “Chooses Pain” )

Example segment(“nowisthetime…”) P f (“n”) × P r (“owisthetime…”) P f (“no”) × P r (“wisthetime…”) P f (“now”) × P r (“isthetime…”) P f (“nowi”) × P r (“sthetime…”) ……

Example segment(“nowisthetime…”)

The Complete Program

Performance Accuracy = 98% Trained on 1.7B words (English) Typical errors: baseratesoughtto smallandinsignificant ginormousego

Some Results whorepresents.com  [“who”, “represents”] therapistfinder.com  [“therapist”, “finder”] expertsexchange.com  [“experts”, “exchange”] speedofart.net  [“speed”, “of”, “art”] penisland.com  error: expected [“pen”, “island”]

Problem 2: Spelling Correction Mehran Salami Typical word processor:  Tehran Salami But Google can …

Statistical Machine Learning Best correction = one with highest probability Probability of a spelling correction c = P(c as a word) × P(original is a typo for c) P(c as a word) = estimated by counting P(original is a typo for c) = proportional to number of changes

The Complete Program

Problem 3: Speech Recognition An informal, incomplete grammar of the English language runs over 1,700 pages. Invariably, simple models and a lot of data trump more elaborate models based on less data.

Problem 3: Speech Recognition If you have a lot of data, memorisation is a good policy. For many tasks such as speech recognition, once we have a billion or so examples, we essentially have a closed set that represents (or at least approximates) what we need, without general rules.

Problem 3: Speech Recognition

“Every time I fire a linguist, the performance of our speech recognition system goes up.” --- Fred Jelinek

Problem 4: Machine Translation

Conclusion (Statistical) [Machine] Learning Is The Ultimate Agile Development Tool Peter Norvig (Director of Research, Google)