Sparse Information Extraction: Unsupervised Language Models to the Rescue Doug Downey, Stef Schoenmackers, Oren Etzioni Turing Center University of Washington.

Sparse Information Extraction: Unsupervised Language Models to the Rescue Doug Downey, Stef Schoenmackers, Oren Etzioni Turing Center University of Washington

Q: Who has won a best actor Oscar for playing a villain? Q: Which nanotechnology companies are hiring? Q: What’s the general consensus on the IBM T40? Q: What kills bacteria?... No single Web page contains the answer. Answering Questions on the Web

1.Compile time: –Parse every sentence on the Web –Extract key information 2.Query time: –Synthesize extractions in response to queries Challenges: Topics of interest not known in advance No hand-tagged examples Open Information Extraction

TextRunner [Banko et al 2007] …and when Thomas Edison invented the light bulb around the early 1900s… …end of the 19th century when Thomas Edison and Joseph Swan invented a light bulb using carbon fiber… … At compile time… => Invented ( Thomas Edison, light bulb )

invented in real time TextRunner [Banko et al 2007] Live demo at: www.cs.washington.edu/research/textrunner

Problem: Sparse Extractions A mixture of correct and incorrect e.g., ( A. Church, lambda calculus ) ( drug companies, diseases ) context Tend to be correct e.g., ( Thomas Edison, light bulb )

Assessing Sparse Extractions Task: Identify which sparse extractions are correct. Challenge: No hand-tagged examples. Strategy: 1.Build a model of how common extractions occur in text 2.Rank sparse extractions by fit to model The distributional hypothesis: elements of the same relation tend to appear in similar contexts. [Brin, 1998; Riloff & Jones 1999; Agichtein & Gravano, 2000; Etzioni et al. 2005; Pasca et al. 2006; Pantel et al. 2006] Our contribution: Unsupervised language models. –Methods for mitigating sparsity –Precomputed – scalable to Open IE

The R EALM Architecture RElation Assessment using Language Models Input: Set of extractions for relation R E R = {(arg1 1, arg2 1 ), …, (arg1 M, arg2 M )} 1)Seeds S R = s most frequent pairs in E R (assume these are correct) 2)Output ranking of (arg1, arg2)  E R – S R by distributional similarity to each (seed1, seed2) in S R

Distributional Similarity Naïve Approach – find sentences containing seed1&seed2 or arg1&arg2: Compare context distributions: P(w b,…, w e | seed1, seed2 ) P(w b,…, w e | arg1, arg2) But e – b can be large Many parameters, sparse data => inaccuracy wbwb …whwh seed1w h+2 …wiwi seed2w i+2 …wewe wbwb …whwh arg1w h+2 …wiwi arg2w i+2 …wewe

N-gram Language Models Computes phrase probabilities of n words: P(w i,…, w i+n-1 ) E.g.: P( ) > P() Obtained by counting over a corpus. citiessuchasCleveland wiwi w i+1 …w i+n-1 citiessuchasIntel

Distributional Similarity in R EALM Two steps for assessing R(arg1, arg2) Typechecking –Ensure arg1 and arg2 are of proper type for R MayorOf ( Intel, Santa Clara ) Leverages all occurrences of each arg Relation Assessment –Ensure R actually holds between arg1 and arg2 MayorOf ( Giuliani, Seattle ) Both steps use pre-computed language models => Scales to Open IE

The R EALM Architecture Two steps for assessing R(arg1, arg2) Typechecking –Ensure arg1 and arg2 are of proper type for R MayorOf ( Intel, Santa Clara ) Leverages all occurrences of each arg Relation Assessment –Ensure R actually holds between arg1 and arg2 MayorOf ( Giuliani, Seattle ) Both steps use pre-computed language models => Scales to Open IE

Task: For each extraction (arg1, arg2)  E R, determine if arg1 and arg2 are the proper type for R. Solution: Assume seedj  S R are of the proper type, and rank argj by distributional similarity to each seedj Computing Distributional Similarity: 1)Offline, train Hidden Markov Model (HMM) of corpus 2)At query time, measure distance between argj, seedj in HMM’s N-dimensional latent state space. Typechecking and H MM - T

HMM Language Model titi t i+1 t i+2 t i+3 wiwi w i+1 w i+2 w i+3 cities such as Seattle Offline Training: Learn P(w | t), P(t i | t i-1, …, t i-k ) to maximize probability of corpus (using EM). k = 1 case:

H MM-T Trained HMM gives “distributional summary” of each w: N-dimensional state distribution P(t | w) Typecheck each arg by comparing state distributions: Rank extractions in ascending order of f(arg) summed over arguments.

Previous n-gram technique (1) 1) Form a context vector for each extracted argument: … cities such as Chicago, Boston, But Chicago isn’t the best cities such as Chicago, Boston, Los Angeles and Chicago. … 2) Compute dot products between extractions and seeds in this space [cf. Ravichandran et al. 2005]. 121 … … such as, Boston But isn’t the Angeles and.

Miami: Twisp: Problems: –Vectors are large –Intersections are sparse...71251513... when he visited X he visited X and visited X and other X and other cities...0001 Previous n-gram technique (2)

Miami: P(t | Miami): Latent state distribution P(t | w) –Compact (efficient – 10-50x less data retrieved) –Dense (accurate – 23-46% error reduction)...71251513... 0.140.01…0.06 t=1 2 N Compressing Context Vectors

Example: N-Grams on Sparse Data Is Pickerington of the same type as Chicago ? Chicago, Illinois Pickerington, Ohio Chicago: Pickerington: => N-grams says no, dot product is 0! 2910 …, Ohio, Illinois 01 …

HMM Generalizes: Chicago, Illinois Pickerington, Ohio Example: H MM-T on Sparse Data

H MM-T Limitations Learning iterations take time proportional to (corpus size *T k+1 ) T = number of latent states k = HMM order We use limited values T=20, k=3 –Sufficient for typechecking ( Santa Clara is a city) –Too coarse for relation assessment ( Santa Clara is where Intel is headquartered)

The R EALM Architecture Two steps for assessing R(arg1, arg2) Typechecking –Ensure arg1 and arg2 are of proper type for R MayorOf ( Intel, Santa Clara ) Leverages all occurrences of each arg Relation Assessment –Ensure R actually holds between arg1 and arg2 MayorOf ( Giuliani, Seattle ) Both steps use pre-computed language models => Scales to Open IE

Type checking isn’t enough NY Mayor Giuliani toured downtown Seattle. Want: How do arguments behave in relation to each other? Relation Assessment

N-gram language model: P(w i, w i-1, … w i-k ) arg1, arg2 often far apart => large k (inaccurate) R EL-GRAMS (1)

Relational Language Model (R EL-GRAMS ): For any two arguments e 1, e 2 : P(w i, w i-1, … w i-k | w i = e 1, e 1 near e 2 ) k can be small – R EL-GRAMS still captures entity relationships –Mitigate sparsity with BM25 metric (from IR) Combine with H MM-T by multiplying ranks. R EL-GRAMS (2)

Experiments Task: Re-rank sparse TextRunner extractions for Conquered, Founded, Headquartered, Merged R EALM vs. –TextRunner (TR) – frequency ordering (equivalent to PMI [Etzioni et al, 2005] and Urns [Downey et al, 2005] ) –Pattern Learning (PL) – based on Snowball [Agichtein 2000] –H MM-T and R EL-GRAMS in isolation

Results Metric: Area under precision-recall curve. R EALM reduces missing area by 39% over nearest competitor.

Sparse extractions are common, even on the Web Language models can assess sparse extractions –Accurate –Scalable Future Work –Other language modeling techniques Conclusions

Web Fact-Finding Who has won three or more Academy Awards?

Web Fact-Finding Problems: User has to pick the right words, often a tedious process: " world foosball champion in 1998 “ – 0 hits “ world foosball champion ” 1998 – 2 hits, no answer What if I could just ask for P(x) in “x was world foosball champion in 1998?” How far can language modeling and the distributional hypothesis take us?

Thanks!

Miami Twisp Star Wars...9802025030513... 501211 110000211... X soundtrack he visited X and cities such as X X and other cities X lodging KnowItAll HypothesisDistributional Hypothesis

Miami Twisp Star Wars...9802025030513... 501211 110000211... X soundtrack he visited X and cities such as X X and other cities X lodging KnowItAll Hypothesis Distributional Hypothesis

invent in real time TextRunner Ranked by frequency REALM improves precision of the top 20 extractions by an average of 90%.

Tarantella, Santa Cruz International Business Machines Corporation, Armonk Mirapoint, Sunnyvale ALD, Sunnyvale PBS, Alexandria General Dynamics, Falls Church Jupitermedia Corporation, Darien Allegro, Worcester Trolltech, Oslo Corbis, Seattle TR Precision: 40% REALM Precision: 100% Improving TextRunner: Example (1) “headquartered” Top 10: company, Palo Alto held company, Santa Cruz storage hardware and software, Hopkinton Northwestern Mutual, Tacoma 1997, New York City Google, Mountain View PBS, Alexandria Linux provider, Raleigh Red Hat, Raleigh TI, Dallas TR Precision: 40%

Arabs, Rhodes Arabs, Istanbul Assyrians, Mesopotamia Great, Egypt Assyrians, Kassites Arabs, Samarkand Manchus, Outer Mongolia Vandals, North Africa Arabs, Persia Moors, Lagos TR Precision: 60% REALM Precision: 90% Improving TextRunner: Example (2) “conquered” Top 10: Great, Egypt conquistador, Mexico Normans, England Arabs, North Africa Great, Persia Romans, part Romans, Greeks Rome, Greece Napoleon, Egypt Visigoths, Suevi Kingdom TR Precision: 60%

Sparse Information Extraction: Unsupervised Language Models to the Rescue Doug Downey, Stef Schoenmackers, Oren Etzioni Turing Center University of Washington.

Similar presentations

Presentation on theme: "Sparse Information Extraction: Unsupervised Language Models to the Rescue Doug Downey, Stef Schoenmackers, Oren Etzioni Turing Center University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sparse Information Extraction: Unsupervised Language Models to the Rescue Doug Downey, Stef Schoenmackers, Oren Etzioni Turing Center University of Washington.

Similar presentations

Presentation on theme: "Sparse Information Extraction: Unsupervised Language Models to the Rescue Doug Downey, Stef Schoenmackers, Oren Etzioni Turing Center University of Washington."— Presentation transcript:

Similar presentations

About project

Feedback