The Unreasonable Effectiveness of Data

Name: The Unreasonable Effectiveness of Data
Uploaded: 2017-08-19T17:35:01+00:00
Duration: PTM4S17
Channel: Rosa Park
Description: The Unreasonable Effectiveness of Data

The Unreasonable Effectiveness of Data
Alon Halevy, Peter Norvig and Fernando Pereira Google Eun-Sol Kim

Essentially, all models are wrong but some are useful
The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences Essentially, all models are wrong but some are useful George Box

Two approaches to AI GOFAI ( Good Old-Fashioned Artificial Intelligence ) Based on Logic Symbolic AI SML ( Statistical Machine Learning ) Based on empirical data ( sensor data or databases ) Inductive inference based on data, generalize data to rules, predict on future data

Scene completion using millions of photographs
- Hays et al., CMU, SIGGRAPH 2007

The power of data

Learning from Text at Web Scale
Brown Corpus 1 Million English words Complete sentences, no spelling errors, no grammatical errors Google a trillion-word corpus 100 time larger than Brown corpus Frequency counts for all sequences up to 5 words long.

Some lessons of web-scale learning
1. Use available large-scale data rather than annotated data We can find useful semantic relationships automatically from the statistics of search queries and the corresponding results or from the accumulated evidence of web-based text patterns without annotated data.

2. Memorization is a good policy
Memorizing specific phrases is more effective than general patterns. Machine translation example : Large memorized phrase tables that give candidate mappings between specific source- and target-language phrases. For many tasks, words and word combinations provide all the representational machinery we need to learn from text.

Conventional two approaches to NLP
Deep approach Hand-coded grammars and ontologies Complex networks of relations Statistical approach Learning n-gram statistics from large corpora

New approaches to NLP Combination of two conventional approaches
Statistical relational learning Represent relations between objects with rule ( first-order-logic) Model built by statistical learning

Semantic interpretation
Semantic web A convention for formal representation languages that lets software services interact with each other Semantic interpretation Imprecise, ambiguous natural languages. Embodied in human cognitive and cultural processes whereby linguistic expression elicits expected responses and expected changes in cognitive states

The challenges for achieving accurate semantic interpretation
Interpreting the content methods to infer relationships between column headers or mentions of entities in the world. Web-scale data might be an important part of the solution. Hundreds of millions of independently created tables. Tables represent structured data With table, we can resolve semantic heterogeneity.

Choose a representation That can use unsupervised learning On unlabeled data Which is so much more plentiful than labeled data.

The Unreasonable Effectiveness of Data

Similar presentations

Presentation on theme: "The Unreasonable Effectiveness of Data"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Unreasonable Effectiveness of Data

Similar presentations

Presentation on theme: "The Unreasonable Effectiveness of Data"— Presentation transcript:

Similar presentations

About project

Feedback