Download presentation
1
The Unreasonable Effectiveness of Data
Alon Halevy, Peter Norvig and Fernando Pereira Google Eun-Sol Kim
2
Essentially, all models are wrong but some are useful
The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences Essentially, all models are wrong but some are useful George Box
3
Two approaches to AI GOFAI ( Good Old-Fashioned Artificial Intelligence ) Based on Logic Symbolic AI SML ( Statistical Machine Learning ) Based on empirical data ( sensor data or databases ) Inductive inference based on data, generalize data to rules, predict on future data
4
Scene completion using millions of photographs
- Hays et al., CMU, SIGGRAPH 2007
5
The power of data
6
Learning from Text at Web Scale
Brown Corpus 1 Million English words Complete sentences, no spelling errors, no grammatical errors Google a trillion-word corpus 100 time larger than Brown corpus Frequency counts for all sequences up to 5 words long.
7
Some lessons of web-scale learning
1. Use available large-scale data rather than annotated data We can find useful semantic relationships automatically from the statistics of search queries and the corresponding results or from the accumulated evidence of web-based text patterns without annotated data.
8
2. Memorization is a good policy
Memorizing specific phrases is more effective than general patterns. Machine translation example : Large memorized phrase tables that give candidate mappings between specific source- and target-language phrases. For many tasks, words and word combinations provide all the representational machinery we need to learn from text.
9
Conventional two approaches to NLP
Deep approach Hand-coded grammars and ontologies Complex networks of relations Statistical approach Learning n-gram statistics from large corpora
10
New approaches to NLP Combination of two conventional approaches
Statistical relational learning Represent relations between objects with rule ( first-order-logic) Model built by statistical learning
11
Semantic interpretation
Semantic web A convention for formal representation languages that lets software services interact with each other Semantic interpretation Imprecise, ambiguous natural languages. Embodied in human cognitive and cultural processes whereby linguistic expression elicits expected responses and expected changes in cognitive states
12
The challenges for achieving accurate semantic interpretation
Interpreting the content methods to infer relationships between column headers or mentions of entities in the world. Web-scale data might be an important part of the solution. Hundreds of millions of independently created tables. Tables represent structured data With table, we can resolve semantic heterogeneity.
13
Choose a representation That can use unsupervised learning On unlabeled data Which is so much more plentiful than labeled data.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.