The Unreasonable Effectiveness of Data

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
CPSC 322 Introduction to Artificial Intelligence September 15, 2004.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Resources, Agents and Processes in the context of Next Generation World Wide Web Dr. Evgeny Osipov Head of Communication Networks group Luleå University.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Language and Learning Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Artificial Intelligence
Language Translators By: Henry Zaremba. Origins of Translator Technology ▫1954- IBM gives a demo of a translation program called the “Georgetown-IBM experiment”
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Artificial Intelligence What’s Possible, What’s Not, How Do We Move Forward? Adam Cheyer Co-Founder, VP Engineering Siri Inc.
The Future Of AI What’s Possible, What’s Not, How Do We Get There? Adam Cheyer Co-Founder, VP Engineering Siri Inc.
Some Thoughts to Consider 6 What is the difference between Artificial Intelligence and Computer Science? What is the difference between Artificial Intelligence.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 4 Analytics, Decision Support, and Artificial Intelligence:
Notes for CS3310 Artificial Intelligence Part 1: Overview Prof. Neil C. Rowe Naval Postgraduate School Version of January 2009.
Introduction to NLP.
Some Thoughts to Consider 13 What do we really mean by ‘learning’ in a software system? Can humans or systems learn anything that they don’t already know?
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Εισαγωγή στη Μουσική Πληροφορική Περίληψη: Μάθημα 1 και 2 Christina Anagnostopoulou Οι διαφάνειες αυτές είναι στα Αγγλικά.
Survey of Semantic Annotation Platforms
Knowledge representation
Artificial intelligence project
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
WEEK INTRODUCTION IT440 ARTIFICIAL INTELLIGENCE.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Chapter 6. Inference beyond the index 2007 년 1 월 30 일 부산대학교 인공지능연구실 김민호 Text : FINDING OUT ABOUT Page. 182 ~ 251.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
COMPUTER SYSTEM FUNDAMENTAL Genetic Computer School INTRODUCTION TO ARTIFICIAL INTELLIGENCE LESSON 11.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Applying Deep Neural Network to Enhance EMPI Searching
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Machine Learning in Natural Language Processing
Statistical NLP: Lecture 9
TA : Mubarakah Otbi, Duaa al Ofi , Huda al Hakami
Natural Language Processing
Language and Learning Introduction to Artificial Intelligence COS302
Performance Criteria across ELP Levels
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Introduction to Artificial Intelligence
Information Retrieval
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig and Fernando Pereira Google 2011. 10. 24 Eun-Sol Kim

Essentially, all models are wrong but some are useful The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. Eugene Wigner, The Unreasonable Effectiveness of Mathematics in the Natural Sciences Essentially, all models are wrong but some are useful George Box

Two approaches to AI GOFAI ( Good Old-Fashioned Artificial Intelligence ) Based on Logic Symbolic AI SML ( Statistical Machine Learning ) Based on empirical data ( sensor data or databases ) Inductive inference based on data, generalize data to rules, predict on future data

Scene completion using millions of photographs - Hays et al., CMU, SIGGRAPH 2007

The power of data

Learning from Text at Web Scale Brown Corpus 1 Million English words Complete sentences, no spelling errors, no grammatical errors Google a trillion-word corpus 100 time larger than Brown corpus Frequency counts for all sequences up to 5 words long.

Some lessons of web-scale learning 1. Use available large-scale data rather than annotated data We can find useful semantic relationships automatically from the statistics of search queries and the corresponding results or from the accumulated evidence of web-based text patterns without annotated data.

2. Memorization is a good policy Memorizing specific phrases is more effective than general patterns. Machine translation example : Large memorized phrase tables that give candidate mappings between specific source- and target-language phrases. For many tasks, words and word combinations provide all the representational machinery we need to learn from text.

Conventional two approaches to NLP Deep approach Hand-coded grammars and ontologies Complex networks of relations Statistical approach Learning n-gram statistics from large corpora

New approaches to NLP Combination of two conventional approaches Statistical relational learning Represent relations between objects with rule ( first-order-logic) Model built by statistical learning

Semantic interpretation Semantic web A convention for formal representation languages that lets software services interact with each other Semantic interpretation Imprecise, ambiguous natural languages. Embodied in human cognitive and cultural processes whereby linguistic expression elicits expected responses and expected changes in cognitive states

The challenges for achieving accurate semantic interpretation Interpreting the content methods to infer relationships between column headers or mentions of entities in the world. Web-scale data might be an important part of the solution. Hundreds of millions of independently created tables. Tables represent structured data With table, we can resolve semantic heterogeneity.

Choose a representation That can use unsupervised learning On unlabeled data Which is so much more plentiful than labeled data.