Download presentation
Presentation is loading. Please wait.
Published byRoy Derrick Walton Modified over 9 years ago
1
Problem 1: Word Segmentation whatdoesthisreferto
2
Application: Chinese Text
3
Application: Internet Domain Names www.visitbritain.com Visit Britain
4
Statistical Machine Learning Best segmentation = one with highest probability Probability of a segmentation = P(first word) × P(rest of segmentation) P(word) = estimated by counting
5
Statistical Machine Learning choosespain Choose Spain Chooses pain P( “Choose Spain” ) > P( “Chooses Pain” )
6
Example segment(“nowisthetime…”) P f (“n”) × P r (“owisthetime…”) P f (“no”) × P r (“wisthetime…”) P f (“now”) × P r (“isthetime…”) P f (“nowi”) × P r (“sthetime…”) ……
7
Example segment(“nowisthetime…”)
8
The Complete Program
9
Performance Accuracy = 98% Trained on 1.7B words (English) Typical errors: baseratesoughtto smallandinsignificant ginormousego
10
Some Results whorepresents.com [“who”, “represents”] therapistfinder.com [“therapist”, “finder”] expertsexchange.com [“experts”, “exchange”] speedofart.net [“speed”, “of”, “art”] penisland.com error: expected [“pen”, “island”]
11
Problem 2: Spelling Correction Mehran Salami Typical word processor: Tehran Salami But Google can …
13
Statistical Machine Learning Best correction = one with highest probability Probability of a spelling correction c = P(c as a word) × P(original is a typo for c) P(c as a word) = estimated by counting P(original is a typo for c) = proportional to number of changes
14
The Complete Program
15
Problem 3: Speech Recognition An informal, incomplete grammar of the English language runs over 1,700 pages. Invariably, simple models and a lot of data trump more elaborate models based on less data.
16
Problem 3: Speech Recognition If you have a lot of data, memorisation is a good policy. For many tasks such as speech recognition, once we have a billion or so examples, we essentially have a closed set that represents (or at least approximates) what we need, without general rules.
17
Problem 3: Speech Recognition
19
“Every time I fire a linguist, the performance of our speech recognition system goes up.” --- Fred Jelinek
20
Problem 4: Machine Translation
21
Conclusion (Statistical) [Machine] Learning Is The Ultimate Agile Development Tool Peter Norvig (Director of Research, Google)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.