Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Extraction MAS.S60 Catherine Havasi Rob Speer.

Similar presentations


Presentation on theme: "Information Extraction MAS.S60 Catherine Havasi Rob Speer."— Presentation transcript:

1 Information Extraction MAS.S60 Catherine Havasi Rob Speer

2 Wikipedia as a corpus 3.9 million English articles, 284 languages 2 billion words – Brown has 1 million DBpedia and Freebase

3 Text reveals relations “Various explanations of the overabundance of carbon, oxygen, nitrogen, and other elements have been proposed.” “These were performed in town halls and other large buildings...” “The splendid artistic legacy of Angkor Wat and other Khmer monuments...”

4 NACLO puzzle Would it be plausible to describe something as “danty but sloshful”?

5 Possible patterns both X and Y X but not Y use NP to VP [Un]fortunately, VP

6 Constraints using named entities

7 Constraints using named entities and parts of speech

8 TextRunner Starts out with some seed patterns Label: Uses those to label possible extractions in a sentence Learn: Using a graphical model Extract: Using the learned pattern, extract the sentence Problem: 200,000 – 300,000 labeled training points needed

9 ReVerb Syntactic Constraint – Requires extraction to match syntactic patterns Lexical Constraint – Phrases must have many different arguments in the corpus

10 Accuracy of IE Incoherent extractions make up 15-30% of extracted knowledge bits Uninformative extractions 3-7%

11 Tom Mitchell (NELL) Unsupervised learning machine

12 Categories on Wikipedia (Dan Weld)

13 How Kylin Works

14 Word senses on Wikipedia

15 Named entities on Wikipedia? [[Pigeon photography]] is an [[aerial photography]] technique invented in 1907 by the German apothecary [[Julius Neubronner]]...

16 Downloading Wikipedia and other Wikimedia projects A 2200-article sample is available on the class web site

17 Lab Find an information pattern besides the ones we’ve listed Run it over the Wikipedia front page corpus Does it need a tagger? A named entity extractor?

18 Assignment Choose and refine an information extractor Hand-tag some examples Add a classifier for good vs. bad matches You are allowed to work in groups Sharing code is fine, but one writeup per person


Download ppt "Information Extraction MAS.S60 Catherine Havasi Rob Speer."

Similar presentations


Ads by Google