KnowItAll Oren Etzioni, Stephen Soderland, Daniel Weld Michele Banko, Alex Beynenson, Jeff Bigham, Michael Cafarella, Doug Downey, Dave Ko, Stanley Kok,

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

ThemeInformation Extraction for World Wide Web PaperUnsupervised Learning of Soft Patterns for Generating Definitions from Online News Author Cui, H.,
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
TEXTRUNNER Turing Center Computer Science and Engineering
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Modelled on paper by Oren Etzioni et al. : Web-Scale Information Extraction in KnowItAll System for extracting data (facts) from large amount of unstructured.
Machine Reading of Web Text Oren Etzioni Turing Center University of Washington
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
Coupling Semi-Supervised Learning of Categories and Relations by Andrew Carlson, Justin Betteridge, Estevam R. Hruschka Jr. and Tom M. Mitchell School.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
Open Information Extraction From The Web Rani Qumsiyeh.
Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Presented by Zeehasham Rasheed
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,
Information Retrieval
Problem: Extracting attribute set for classes (Eg: Price, Creator, Genre for class ‘Video Games’) Why?  Attributes are used to extract templates which.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
1 Extracting Product Feature Assessments from Reviews Ana-Maria Popescu Oren Etzioni
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
A Probabilistic Model of Redundancy in Information Extraction University of Washington Department of Computer Science and Engineering
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Unsupervised and Semi-Supervised Relation Extraction.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Open IE and Universal Schema Discovery Heng Ji Acknowledgement: some slides from Daniel Weld and Dan Roth.
Web-scale Information Extraction in KnowItAll Oren Etzioni etc. U. of Washington WWW’2004 Presented by Zheng Shao, CS591CXZ.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Open Information Extraction CSE 454 Daniel Weld. To change More textrunner, more pattern learning Reorder: Kia start.
Structured Querying of Web Text: A Technical Challenge Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko Presenter: Shahina.
KnowItAll Oren Etzioni, Stephen Soderland, Daniel Weld Michele Banko, Alex Beynenson, Jeff Bigham, Michael Cafarella, Doug Downey, Dave Ko, Stanley Kok,
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
KnowItAll April William Cohen. Announcements Reminder: project presentations (or progress report) –Sign up for a 30min presentation (or else) –First.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.
KNN & Naïve Bayes Hongning Wang
University of Washington
CS246: Information Retrieval
Open Information Extraction from the Web
KnowItAll and TextRunner
Presentation transcript:

KnowItAll Oren Etzioni, Stephen Soderland, Daniel Weld Michele Banko, Alex Beynenson, Jeff Bigham, Michael Cafarella, Doug Downey, Dave Ko, Stanley Kok, Jim Li, Mike Lindemark, Tessa McDuff, Bao Nguyen, Ana-Maria Popescu, Stefan Schoenmackers, Tal Shaked, Junji Tomita, and Alex Yates Ongoing work since

Goals of KnowItAll Information extraction from the Web that is: Domain-independent Genre-independent Unsupervised Massively scalable High precision Fuses information across documents

KnowItAll Projects KnowItAll (baseline system) Unsupervised information extraction Google queries for extraction and verification KnowItNow (massive speedup) BE: novel search engine index Urns: formal probability model for verification Opine (mining on-line reviews) Learn attributes of a product Find strength and orientation of opinions TextRunner Semantic graph of relationships from corpus Question-answering based on relation graph Ontology Learning Learn relations and attributes of arbitrary classes Continuously grow from a small knowledge base

Unsupervised Information Extraction 1.Create extraction rules from generic rule templates 2.Send queries to search engine based on extraction rules 3.Extract information from resulting pages 4.Verify extractions with mutual information from Web hitcounts (PMI-IR) 5.Enter extractions into a database

The KnowItAll System Predicates Country(X) Domain-independent Rule Templates “such as” NP Bootstrapping Extraction Rules “countries such as” NP Discriminators “country X” ExtractorWorld Wide Web Extractions Country(“France”) Assessor Validated Extractions Country(“France”), prob=0.999

Unary predicates: instances of a class Unary predicates: instanceOf(City), instanceOf(Film), instanceOf(Company), … Good recall and precision from generic patterns: “such as” X X “and other” Instantiated rules: “cities such as” XX “and other cities” “films such as” XX “and other films” “companies such as” XX “and other companies”

Binary Predicates Domain-independent rule templates: relation(arg1, arg2) “,” “of” Instantiated rules before binding an argument: CeoOf(Person, Company) “, CEO of” StarsIn(Actor, Film) “, star of” Population(Number, Country) “, population of” After binding an argument to an entry in knowledge base: CeoOf(Person, Company) NP “, CEO of WalMart” StarsIn(Actor, Film) Dustin Hoffman “, star of” NP Population(Number, Country) “, population of France”

Instantiating a Rule Template Rule Template (domain-independent) Predicate:predName(Class1) Pattern: NP1 “such as” NPList2 Contraints: head(NP1) = plural(label(Class1) properNoun(head(each(NPList2))) Bindings: instanceOf(Class1, head(each(NPList2))) Extraction Rule (substituting “instanceOf” and “Country”) Predicate:instanceOf(Country) Pattern: NP1 “such as” NPList2 Contraints: head(NP1) = “nations” properNoun(head(each(NPList2))) Bindings: instanceOf(Country, head(each(NPList2))) Keywords: “nations such as”

Applying the Rule Extraction Rule Predicate:instanceOf(Country) Pattern: NP1 “such as” NPList2 Contraints: head(NP1) = “nations” properNoun(head(each(NPList2))) Bindings: instanceOf(Country, head(each(NPList2))) Keywords: “nations such as” Sentence: Other nations such as France, India and Pakistan, have conducted recent tests. Three extractions: instanceOf(Country, France) instanceOf(Country, India) instanceOf(Country, Pakistan)

Recall – Precision Tradeoff High precision rules apply to only a small percentage of sentences on Web hits for “X” “cities such as X” “X and other cities” Boston 365,000,000 15,600,000 12,000 Tukwila 1,300,000 73, Gjatsk Hadaslav “Redundancy-based extraction” ignores all but the unambiguous references.

Limited Recall with Binary Rules Relatively high recall for unary rules: “companies such as” X 2,800,000 Web hits X “and other companies” 500,000 Web hits Low recall for binary rules: X “is the CEO of Microsoft” 160 Web hits X “is the CEO of Wal-mart” 19 Web hits X “is the CEO of Continental Grain” 0 Web hits X “, CEO of Microsoft” 6,700 Web hits X “, CEO of Wal-mart” 700 Web hits X “, CEO of Continental Grain” 2 Web hits

Results for Unary Predicates High precision and high recall for unary (instance of) extraction. More errors for Country (“Latin America”, “Iriquois nation”, etc).

Results for Binary Predicates High precision for both CapitalOf and CeoOf. Able to find capitals of all cities. Generic rules are too specific to find Ceo’s for low-frequency companies.

“Generate and Test” Paradigm 1.Find extractions from generic rules 2.Validate each extraction –Assign probability that extraction is correct –Use search engine hit counts to compute PMI –PMI (pointwise mutual information) between extraction “discriminator” phrases for target concept PMI-IR: P.D.Turney, “Mining the Web for synonyms: PMI-IR versus LSA on TOEFL”. In Proceedings of ECML, 2001.

Examples of Extraction Errors Rule: countries such as X => instanceOf(Country, X) “We have 31 offices in 15 countries such as London and France.” =>instanceOf(Country, London) instanceOf(Country, France) Rule: X and other cities => instanceOf(City, X) “A comparative breakdown of the cost of living in Klamath County and other cities follows.” =>instanceOf(City, Klamath County)

Computing PMI Scores Measures mutual information between the extraction and target concept. D = a discriminator phrase for the concept “countries such as X” I = an instance of a target concept instanceOf(Country, “France”) D+I = insert the instance into discriminator phrase “countries such as France”

Example of PMI Discriminator: “countries such as X” Instance: “France” vs. “London” PMI for France >> PMI for London (2 orders of magnitude) Need features for probability update that distinguish –“high” PMI from “low” PMI for a discriminator “countries such as France” : 27,800 hits “France”: 14,300,000 hits “countries such as London” : 71 hits “London”: 12,600,000 hits

PMI for Binary Predicates insert both arguments of extraction into the discriminator phrase each argument is a separate query term Extraction: CeoOf(“Jeff Bezos”, “Amazon”) Discriminator: ceo of PMI = hits for “Jeff Bezos ceo of Amazon” 39,000 hits for “Jeff Bezos”, “Amazon”

Bootstrap Training 1.Only input is set of predicates with class labels. instanceOf(Country), class labels “country”, “nation” 2.Combine predicates with domain-independent templates such as NP => instanceOf(class, NP) to create extraction rules and discriminator phrases rule: “countries such as” NP => instanceOf(Country, NP) discrim: “country X” 3.Use extraction rules to find set of candidate seeds 4.Select best seeds by average PMI score 5.Use seeds to train discriminators and select best discriminators 6.Use discriminators to rerank candidate seeds, select new seeds 7.Use new seeds to retrain discriminators, ….

Bootstrap Parameters Select candidate seeds with minimum support –Over 1,000 hit counts for the instance –Otherwise unreliable PMI scores Parameter settings: –100 candidate seeds –Pick best 20 as seeds –Iteration 1, rank candidate seeds by average PMI –Iteration 2, use trained discriminators to rank candidate seeds –Select best 5 discriminators after training Favor best ratio of to Slight preference for higher thresholds Produced seeds without errors in all classes tested

Discriminator Phrases from Class Labels From the class labels “country” and “nation” country Xnation X countries Xnations X X countryX nation X countriesX nations Equivalent to weak extraction rules - no syntactic analysis in search engine queries - ignores punctuation between terms in phrase PMI counts how often the weak rule fires on entire Web - low hit count for random errors - higher hit count for true positives

Discriminator Phrases from Rule Keywords From extraction rules for instanceOf(Country) countries such as Xnations such as X such countries as Xsuch nations as X countries including Xnations including X countries especially Xnations especially X X and other countriesX and other nations X or other countriesX or other nations X is a countryX is a nation X is the countryX is the nation Higher precision but lower coverage than discriminators from class labels

Using PMI to Compute Probability Standard formula for Naïve Bayes probability update –useful as a ranking function –probabilities skewed towards 0.0 and 1.0 Probability that fact  is a correct, given features Need to turn PMI-scores into features Need to estimate conditional probabilities and

Features from PMI: Method #1 Thresholded PMI scores Learn a PMI threshold from training Learn conditional probabilities for PMI > threshold, given that  is in the target class, or not P(PMI > thresh | class)P(PMI <= thresh | class) P(PMI > thresh | not class)P(PMI <= thresh | not class) Small training set. Train each discriminator separately.

One Threshold or Two? Wide gap between positive and negative training. Often two orders of magnitude. With two thresholds, learn conditional probabilities: P(PMI > threshA | class)P(PMI > threshA | not class) P(PMI < threshB | class)P(PMI < threshB | not class) P(PMI between A,B | class)P(PMI between A,B | not class) 0threshBthreshA (single thresh) PMI scores

Results for 1 or 2 thresholds One threshold is better at high-precision end of curve. Two thresholds are better at high-recall end of curve.

Features from PMI: Method #2 Continuous Probability Density Function (PDF) –Gaussian probability model for positive PMI scores –Gaussian model for negative training. –Probability from Naïve Bayes is determined by ratio of P(PMI | class) to P(PMI | not class) Where N is the number of positive training examples, for discriminator

PDF vs. Single Threshold Continuous PDF not as good as thresholded PMI Poor performance at both ends of precision-recall curve –Hard to smooth the small training size –Overfit to a few high PMI negative, low PMI positive training –Need to learn the ratio of to

Effect of Noisy Seeds Bootstrapping must produce correct seeds –10% noise: some drop in performance –30% noise: badly degraded performance

Source of Negative Seeds Use seeds from other classes as negative training –Standard method for bootstrapping Better performance than negative training from hand-tagging extraction errors

Open Question #1 Sparse data (even with entire Web) –PMI thresholds are typically small (1/10,000) –False negatives for instances with low hit count City of Duvall –312,000 Web hits –Under threshold on 4 out of 5 discriminators City of Mossul –9,020 Web hits –Under threshold on all 5 discriminators –PMI = 0.0 for 3 discriminators

Open Question #2 Polysemy –Low PMI if instance has multiple word senses –False negative if target concept is not the dominant word sense. “Amazon” as an instance of River – Most references are to the company, not the river “Shaft” as an instance of Film –2,000,000 Web hits for the term “shaft” –Only a tiny fraction are about the movie

(Former) Open Question Time bottleneck –Search engine “courtesy wait” limits queries per day –Each instance requires several queries for PMI Hitcount caching helps with repeated experiments Solved with KnowItNow - BE enfolds extraction rules into search index - Urns computes probabilities without hitcounts - Speed up of 2 or 3 orders of magnitude

(Former) Open Question How to compute realistic probabilities Naïve Bayes formula gives skewed probabilities –Close to 1.0 probability or close to 0.0 –Useful as ranking but not good probability estimates Urns gives probabilities 15 times more accurate than PMI