Representational and inferential foundations for possible large-scale information extraction and question-answering from the web Stuart Russell Computer.

Slides:



Advertisements
Similar presentations
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Advertisements

Gibbs sampling in open-universe stochastic languages Nimar S. Arora Rodrigo de Salvo Braz Erik Sudderth Stuart Russell.
Automatic Inference in BLOG Nimar S. Arora University of California, Berkeley Stuart Russell University of California, Berkeley Erik Sudderth Brown University.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
1 Vertically Integrated Seismic Analysis Stuart Russell Computer Science Division, UC Berkeley Nimar Arora, Erik Sudderth, Nick Hay.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
CPSC 322, Lecture 19Slide 1 Propositional Logic Intro, Syntax Computer Science cpsc322, Lecture 19 (Textbook Chpt ) February, 23, 2009.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
CPSC 422, Lecture 14Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 14 Feb, 4, 2015 Slide credit: some slides adapted from Stuart.
CIS 410/510 Probabilistic Methods for Artificial Intelligence Instructor: Daniel Lowd.
Probabilistic Reasoning
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Some Thoughts to Consider 6 What is the difference between Artificial Intelligence and Computer Science? What is the difference between Artificial Intelligence.
Topics Combining probability and first-order logic –BLOG and DBLOG Learning very complex behaviors –ALisp: hierarchical RL with partial programs State.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 26 of 41 Friday, 22 October.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
First-Order Probabilistic Languages: Into the Unknown Brian Milch and Stuart Russell University of California at Berkeley, USA August 27, 2006 Based on.
1 First-Order Probabilistic Models Brian Milch 9.66: Computational Cognitive Science December 7, 2006.
Artificial Intelligence
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
Representational and inferential foundations for possible large-scale information extraction and question-answering from the web Stuart Russell Computer.
Markov Logic And other SRL Approaches
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
Unknown Objects and BLOG Brian Milch MIT IPAM Summer School July 16, 2007.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Computing & Information Sciences Kansas State University Wednesday, 20 Sep 2006CIS 490 / 730: Artificial Intelligence Lecture 12 of 42 Wednesday, 20 September.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayes’ Nets: Sampling [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available.
1 2010/2011 Semester 2 Introduction: Chapter 1 ARTIFICIAL INTELLIGENCE.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 13 of 41 Monday, 20 September.
Computing & Information Sciences Kansas State University Lecture 13 of 42 CIS 530 / 730 Artificial Intelligence Lecture 13 of 42 William H. Hsu Department.
Computing & Information Sciences Kansas State University Lecture 14 of 42 CIS 530 / 730 Artificial Intelligence Lecture 14 of 42 William H. Hsu Department.
TEMPLATE DESIGN © Vertically Integrated Seismological Analysis II : Inference (S31B-1713) Nimar S. Arora, Stuart Russell,
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
BLOG: Probabilistic Models with Unknown Objects Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong, Andrey Kolobov University of.
Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.
11 Artificial Intelligence CS 165A Thursday, October 25, 2007  Knowledge and reasoning (Ch 7) Propositional logic 1.
Spring, 2005 CSE391 – Lecture 1 1 Introduction to Artificial Intelligence Martha Palmer CSE391 Spring, 2005.
TEMPLATE DESIGN © Vertically Integrated Seismological Analysis I : Modeling Nimar S. Arora, Michael I. Jordan, Stuart.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
BLOG: Probabilistic Models with Unknown Objects Brian Milch Harvard CS 282 November 29,
Learning and Structural Uncertainty in Relational Probability Models Brian Milch MIT 9.66 November 29, 2007.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
Announcements  Upcoming due dates  Thursday 10/1 in class Midterm  Coverage: everything in lecture and readings except first-order logic; NOT probability.
PLUIE: Probability and Logic Unified for Information Extraction Stuart Russell Patrick Gallinari, Patrice Perny.
Unifying logic and probability The BLOG language
CS 4700: Foundations of Artificial Intelligence
School of Computer Science & Engineering
CS 4700: Foundations of Artificial Intelligence
Uncertainty in an Unknown World
CAP 5636 – Advanced Artificial Intelligence
Open universes and nuclear weapons
ANALYST EVALUATION OF MODEL-BASED BAYESIAN SEISMIC MONITORING AT THE CTBTO Logic, Inc. Nimar S. Arora1, Jeffrey Given2, Elena Tomuta2, Stuart J. Russell1,3,
Conclusions and Further Work
Machine learning, probabilistic modelling
CS 188: Artificial Intelligence
Markov Chain Monte Carlo Limitations of the Model
CS 188: Artificial Intelligence Fall 2007
Gibbs sampling in open-universe stochastic languages
CS 188: Artificial Intelligence
Deniz Beser A Fundamental Tradeoff in Knowledge Representation and Reasoning Hector J. Levesque and Ronald J. Brachman.
The open universe Stuart Russell
Presentation transcript:

Representational and inferential foundations for possible large-scale information extraction and question-answering from the web Stuart Russell Computer Science Division UC Berkeley

Goal A system that knows everything on the Web* –Answer all questions –Discover patterns –Make predictions Raw data → useful knowledge base Requires: NLP, vision, speech, learning, DBs, knowledge representation and reasoning Berkeley: Klein, Malik, Morgan, Darrell, Jordan, Bartlett, Hellerstein, Franklin, Hearst++

Past projects: PowerSet “Building a natural language search engine that reads and understands every sentence on the Web.” Parsing/extraction technology + crowdsourcing to generate collections of x R y triples Example: –Manchester United beat Chelsea –Chelsea beat Manchester United Bought by Microsoft in 2008, merged into Bing

Current projects: UW Machine Reading Initially based on bootstrapping text patterns –Born(Elvis,1935) => “Elvis was born in Tupelo” => “Obama was born in Hawaii” => “Obama’s birthplace was Hawaii” => …. [Google: Best guess for Elvis Presley Born is January 8, 1935] –Inaccurate, runs out of gas, learned content shallow, 99% of text ignored Moving to incorporate probabilistic knowledge, inference using Markov logic

Current Projects: NELL (CMU) Bootstrapping approach to learning facts from the web using text patterns (642,797 so far) Initial ontology of basic categories and typed relations Examples: –the_chicken is a type of meat 100.0%the_chickenmeat –coventry_evening_telegraph is a blog 99.0%coventry_evening_telegraphblog –state_university is a sports team also known as syracuse_university 93.8%state_universityalso known as syracuse_university –orac_values_for_mushrooms is a fungus 100.0%orac_values_for_mushroomsfungus –Hank Paulson is the CEO of Goldman 100.0%Hank PaulsonGoldman

Problems Language (incl. speech act pragmatics) –… Jerry Brown, who has been called the first American in space Uncertainty –Reference uncertainty is ubiquitous –Bootstrapping can converge or diverge; exacerbated by “accepting” uncertain facts, naïve probability models Universal ontological framework (O(1) work) –Taxonomy, events, compositional structure, time… –Compositional structure of objects and events –Knowledge, belief, other agents –Semantic content below lexical level (must be learned) E.g., buy = sell -1, ownership, transfer, etc.

Technical approach Web is just evidence; compute P(World | web) α P(web | World) P(World) What is the domain of the World variable? –Complex sets of interrelated objects and events How does it cause the Web variable? –Pragmatics/semantics/syntax (and copying!) Uncertainty about –What objects exist –How they’re related –What phrases/images refer to what real objects => Open-universe, first-order probabilistic language

9 Brief history of expressiveness atomicpropositionalfirst-order/relational logic probability 5 th C B.C.19 th C 17 th C20 th C21 st C

10 Brief history of expressiveness atomicpropositionalfirst-order/relational logic probability 5 th C B.C.19 th C 17 th C20 th C21 st C (be patient!)

12 Herbrand vs full first-order Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have?

13 Herbrand vs full first-order Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have? Herbrand semantics: 2

14 Herbrand vs full first-order Given Father(Bill,William) and Father(Bill,Junior) How many children does Bill have? Herbrand semantics: 2 First-order logical semantics: Between 1 and ∞

Possible worlds Propositional (Boolean, ANNs, Bayes nets) First-order closed-universe (DBs, Prolog) First-order open-universe A B C D

16 Open-universe models in BLOG Construct worlds using two kinds of steps, proceeding in topological order: –Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values

17 Open-universe models in BLOG Construct worlds using two kinds of steps, proceeding in topological order: –Dependency statements: Set the value of a function or relation on a tuple of (quantified) arguments, conditioned on parent values –Number statements: Add some objects to the world, conditioned on what objects and relations exist so far

18 Technical basics Theorem: Every well-formed* BLOG model specifies a unique proper probability distribution over open-universe possible worlds; equivalent to an infinite contingent Bayes net Theorem: BLOG inference algorithms (rejection sampling, importance sampling, MCMC) converge to correct posteriors for any well- formed* model, for any first-order query

19 Example: Citation Matching [Lashkari et al 94] Collaborative Interface Agents, Yezdi Lashkari, Max Metral, and Pattie Maes, Proceedings of the Twelfth National Conference on Articial Intelligence, MIT Press, Cambridge, MA, Metral M. Lashkari, Y. and P. Maes. Collaborative interface agents. In Conference of the American Association for Artificial Intelligence, Seattle, WA, August Are these descriptions of the same object? Core task in CiteSeer, Google Scholar, over 300 companies in the record linkage industry

20 (Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

21 (Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

22 (Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

23 (Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

24 (Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

25 (Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

26 (Simplified) BLOG model #Researcher ~ NumResearchersPrior(); Name(r) ~ NamePrior(); #Paper(FirstAuthor = r) ~ NumPapersPrior(Position(r)); Title(p) ~ TitlePrior(); PubCited(c) ~ Uniform({Paper p}); Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c))); Evidence: lots of citation strings Query: who wrote what? Which paper is being cited in this string? Are these two people the same?

27 Citation Matching Results Four data sets of ~ citations, referring to ~ papers

Example: multitarget tracking #Aircraft(EntryTime = t) ~ NumAircraftPrior(); Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1); InFlight(a, t) if t < EntryTime(a) then = false elseif t = EntryTime(a) then = true else = (InFlight(a, t-1) & !Exits(a, t-1)); State(a, t) if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, t-1)); #Blip(Source = a, Time = t) if InFlight(a, t) then ~ NumDetectionsCPD(State(a, t)); #Blip(Time = t) ~ NumFalseAlarmsPrior(); ApparentPos(r) if (Source(r) = null) then ~ FalseAlarmDistrib() else ~ ObsCPD(State(Source(r), Time(r)));

29 #Person ~ LogNormal[6.9, 2.3](); Honest(x) ~ Boolean[0.9](); #Login(Owner = x) ~ if Honest(x) then 1 else LogNormal[4.6,2.3](); Transaction(x,y) ~ if Owner(x) = Owner(y) then SibylPrior() else TransactionPrior(Honest(Owner(x)), Honest(Owner(y))); Recommends(x,y) ~ if Transaction(x,y) then if Owner(x) = Owner(y) then Boolean[0.99]() else RecPrior(Honest(Owner(x)), Honest(Owner(y))); Evidence: lots of transactions and recommendations Query: Honest(x) Example: cybersecurity sibyl defence

30 Example: Global seismic monitoring CTBT bans testing of nuclear weapons on earth –Allows for outside inspection of 1000km 2 Need 9 more ratifications for “entry into force” including US, China US Senate refused to ratify in 1998 – “too hard to monitor”

monitoring stations

32

33 Vertically Integrated Seismic Analysis The problem is hard: –~10000 “detections” per day, 90% false –CTBT system (SEL3) finds 69% of significant events plus about twice as many spurious (nonexistent) events –16 human analysts find more events, correct existing ones, throw out spurious events, generate LEB (“ground truth”) –Unreliable below magnitude 4 (1kT)

34

35

36

37

38

39

40

41

42

43

44 # SeismicEvents ~ Poisson[ time_duration * event_rate ]; IsEarthQuake(e) ~ Bernoulli(.999); EventLocation(e) ~ If IsEarthQuake(e) then EarthQuakeDistribution() Else UniformEarthDistribution(); Magnitude(e) ~ Exponential(log(10)) + min_magnitude ; Distance(e,s) = GeographicalDistance(EventLocation(e), SiteLocation(s)); IsDetected(e,p,s) ~ Logistic[site-coefficients(s,p)](Magnitude(e), Distance(e,s) ; #Arrivals(site = s) ~ Poisson [time_duration * false_rate (s)]; #Arrivals(event=e, site) = If IsDetected(e,s) then 1 else 0; Time(a) ~ If (event(a) = null) then Uniform(0, time_duration ) else IASPEI(EventLocation(event(a)),SiteLocation(site(a)),Phase(a)) + TimeRes(a); TimeRes(a) ~ Laplace(time_location(site(a)), time_scale(site(a))); Azimuth(a) ~ If (event(a) = null) then Uniform(0, 360) else GeoAzimuth(EventLocation(event(a)),SiteLocation(site(a)) + AzRes(a); AzRes(a) ~ Laplace(0, azimuth_scale(site(a))); Slow(a) ~ If (event(a) = null) then Uniform(0,20) else IASPEI-slow(EventLocation(event(a)),SiteLocation(site(a)) + SlowRes(site(a));

45 Fraction of LEB events missed

46 Fraction of LEB events missed

47 Event distribution: LEB vs SEL3

48 Event distribution: LEB vs NET-VISA

Open questions Efficient inference Model construction: creating useful new categories and relations HCI: What are answers when existence is uncertain? Making use of partially extracted or unextracted information – “data spaces” (Franklin, Halevy) Proper modeling of availability/absence of evidence

Summary Basic components (accurate parsing, first-order and modal probabilistic logics, universal ontology) are mostly in place; NLP is moving back towards combined syntax/semantics Vertically integrated probabilistic models can be much more effective that bottom-up pipelines The Web is Very Big –Does not imply we can only use trivial methods –Does not imply that trivial methods will suffice –Won’t happen for free

52 Example of using extra detections

53 NEIC event (3.0) missed by LEB

54 NEIC event (3.7) missed by LEB

55 NEIC event (2.6) missed by LEB

TREC 9 Results (2000)