Open IE and Universal Schema Discovery Heng Ji Acknowledgement: some slides from Daniel Weld and Dan Roth
Traditional, Supervised I.E. Raw Data Labeled Training Data Learning Algorithm Extractor Kirkland-based Microsoft is the largest software company. Boeing moved it’s headquarters to Chicago in Hank Levy was named chair of Computer Science & Engr. … HeadquarterOf(, )
Solutions 3 Open IE Universal Schema Discovery Concept Typing
Open Information Extraction The challenge of Web extraction is to be able to do Open Information Extraction. Unbounded number of relations Web corpus contains billions of documents.
How open IE systems work learn a general model of how relations are expressed (in a particular language), based on unlexicalized features such as part-of- speech tags. (Identify a verb) Learn domain-independent regular expressions. (Punctuations, Commas).
Methods for Open IE Self Supervision Kylin (Wikipedia) Shrinkage & Retraining Hearst Patterns PMI Validation Subclass Extraction Pattern Learning Structural Extraction List Extraction & WebTables TextRunner
Kylin: Self-Supervised Information Extraction from Wikipedia [Wu & Weld CIKM 2007] Its county seat is Clearfield. As of 2005, the population density was 28.2/km². Clearfield County was created in 1804 from parts of Huntingdon and Lycoming Counties but was administered as part of Centre County until ,972 km² (1,147 mi²) of it is land and 17 km² (7 mi²) of it (0.56%) is water. From infoboxes to a training set
Kylin Architecture
Long-Tail: Incomplete Articles Desired Information Missing from Wikipedia 800,000/1,800,000(44.2%) stub pages [July 2007 of Wikipedia ] Length ID
Schema Mapping Heuristics Edit History String Similarity Experiments Precision: 94% Recall: 87% Future Integrated Joint Inference Person Performer birth_date birth_place name other_names … birthdate location name othername …
Main Lesson: Self Supervision Find structured data source Use heuristics to generate training data E.g. Infobox attributes & matching sentences
The KnowItAll System Predicates Country(X) Domain-independent Rule Templates “such as” NP Bootstrapping Extraction Rules “countries such as” NP Discriminators “country X” ExtractorWorld Wide Web Extractions Country(“France”) Assessor Validated Extractions Country(“France”), prob=0.999
Unary predicates: instances of a class Unary predicates: instanceOf(City), instanceOf(Film), instanceOf(Company), … Good recall and precision from generic patterns: “such as” X X “and other” Instantiated rules: “cities such as” XX “and other cities” “films such as” XX “and other films” “companies such as” XX “and other companies”
Recall – Precision Tradeoff High precision rules apply to only a small percentage of sentences on Web hits for “X” “cities such as X” “X and other cities” Boston 365,000,000 15,600,000 12,000 Tukwila 1,300,000 73, Gjatsk Hadaslav “Redundancy-based extraction” ignores all but the unambiguous references.
Limited Recall with Binary Rules Relatively high recall for unary rules: “companies such as” X 2,800,000 Web hits X “and other companies” 500,000 Web hits Low recall for binary rules: X “is the CEO of Microsoft” 160 Web hits X “is the CEO of Wal-mart” 19 Web hits X “is the CEO of Continental Grain” 0 Web hits X “, CEO of Microsoft” 6,700 Web hits X “, CEO of Wal-mart” 700 Web hits X “, CEO of Continental Grain” 2 Web hits
Examples of Extraction Errors Rule: countries such as X => instanceOf(Country, X) “We have 31 offices in 15 countries such as London and France.” =>instanceOf(Country, London) instanceOf(Country, France) Rule: X and other cities => instanceOf(City, X) “A comparative breakdown of the cost of living in Klamath County and other cities follows.” =>instanceOf(City, Klamath County)
“Generate and Test” Paradigm 1. Find extractions from generic rules 2. Validate each extraction Assign probability that extraction is correct Use search engine hit counts to compute PMI PMI (pointwise mutual information) between extraction “discriminator” phrases for target concept PMI-IR: P.D.Turney, “Mining the Web for synonyms: PMI-IR versus LSA on TOEFL”. In Proceedings of ECML, 2001.
Computing PMI Scores Measures mutual information between the extraction and target concept. I = an instance of a target concept instanceOf(Country, “France”) D = a discriminator phrase for the concept “ambassador to X” D+I = insert instance into discriminator phrase “ambassador to France”
Example of PMI Discriminator: “countries such as X” Instance: “France” vs. “London” PMI for France >> PMI for London (2 orders of mag.) Need features for probability update that distinguish “high” PMI from “low” PMI for a discriminator “countries such as France” : 27,800 hits “France”: 14,300,000 hits “countries such as London” : 71 hits “London”: 12,600,000 hits
20 Chicago Unmasked City senseMovie sense
21 Impact of Unmasking on PMI Name Recessive Original Unmask Boost Washington city % Casablanca city % Chevy Chase actor % Chicago movie %
22 RL: learn class-specific patterns. “Headquarted in ” SE: Recursively extract subclasses. “Scientists such as physicists and chemists” LE: extract lists of items (~ Google Sets). How to Increase Recall?
23 List Extraction (LE) 1. Query Engine with known items. 2. Learn a wrapper for each result page. 3. Collect large number of lists. 4. Sort items by number of list “votes”. LE+A=sort list according to Assessor. Evaluation: Web recall, at precision= 0.9.
TextRunner
Works in two phases. 1. Using a conditional random field, the extractor learns to assign labels to each of the words in a sentence. 2. Extracts one or more textual triples that aim to capture (some of) the relationships in each sentence.
Information Redundancy Assumption (Banko et al., 2008) 26
Performance Comparison on Traiditonal IE vs. Open IE 27
General Remarks 29 Exploit data-driven methods (e.g., domain-independent patterns) to extract relation or event tuples It dramatically enhanced the scalability of IE. Obtain substantially lower recall than traditional IE heavily rely on information redundancy to validate the extraction results and thus suffer from the ``long-tail" knowledge sparsity problem Incapable of generalizing the lexical contexts in order to name new fact types Over-simplified IE problem: IE != Sentence-level Relation Extraction; how about entity and event types? They focused on making the types of relations and events unrestricted, while still used other IE components such as name tagging and semantic role labeling for pre-defined types
Universal Schema 30 Discover a wide range of domain-independent fact types manually AMR, extended NE or automatically by relation clustering based on coreferential arguments Go_off represented by “plant”, “set_off”, “injure” because they share coreferential arguments combining patterns with external knowledge sources such as Freebase or query logs Some name tagging work focused on extracting more fine-grained types beyond the traditional entity types (person, organization and geo-political entities)
Open Questions 31 Is there a close set of universal schema? How many knowledge sources should we use for discovering universal schema? What are they? Many knowledge bases are available, how to unify them (automatically)?
Concept Typing 32 Chunking Bootstrapping Clustering with Contexts
Concept Mention Extraction (Wang et al., 2013) 33
Concept Mention Extraction (Tsai et al., 2013) Identify and categorize mentions of concepts (Gupta and Manning, 2011) TECHNIQUE and APPLICATION “We apply support vector machines on text classification.” Unsupervised Bootstrapping algorithm (Yarowsky, 1995; Collins and Singer, 1999) The proposed algorithm 1. Extract noun phrases (Punyakanok and Roth, 2001) 2. For each category, initialize a decision list by seeds. 3. For several rounds, 1. Annotate NPs using the decision lists. 2. Extract top features from new annotated phrases, and add them into decision lists. 34
Seeds 35
Paper1…………………………………… support vector machine………………... …………………………………………… ………………………………………. c4.5…….. Paper2…………………………………… svm-based classification………………….………………………………… decision_trees………….…….…………… …………………… Paper4…………………………………… maximal_margin_classifiers……………… …………………….……………………… ………………………………………….. Paper3.…………………………………… …………………………………….. svm….…………………………………….………………………………………… ………… (Cortes,1995) (Quinlan,1993) (Vapnik,1995) (Quinlan,1993) (Cortes,1995) (Quinlan,1993) (Vapnik,1995) (Quinlan,1993) (Cortes,1995) c4.5 decision trees support vector machine svm-based classification svm maximal margin classifiers Citation-Context Based Concept Clustering (CitClus) Cluster mentions into semantic coherent concepts 1.Group concept mentions by citation context 2.Merge clusters based on lexical similarity between mentions in the clusters
Remarks 37 Where to draw the line for different granuarilities? Maybe the type of each phrase should be a path in the ontology structure? Is an absolute cold-start extraction possible? How to use other resources such as scenario models and social cognitive theories?
Paper presentations 38
Lifu’s presentation + Phrase Typing Exercise 39