Download presentation
Presentation is loading. Please wait.
1
Learning Dictionaries for Information Extraction by Multi- Level Bootstrapping Ellen Riloff and Rosie Jones, AAAI 99 Presented by: Sunandan Chakraborty
2
Information Extraction Extracting domain-specific information from NL text Example Domains Locations Companies Terrorism
3
Required Lexical Resources Semantic lexicons Dictionary of words tagged using semantic categories e.g. name of locations (countries, cities) Extraction patterns e.g. outlets in, from From Noun Phrase Outlets in New York
4
Mutual Bootstrapping No annotated corpus Learning extraction patterns and semantic lexicon Input Unannotated corpus Seed words
5
Mutual Bootstrapping Starting from seed words Identifying NPs related to the seed words [for extraction patterns] Using extraction patterns to identify new terms New terms should be in the same lexical category Using new terms to search for more patterns
6
Algorithm Input: Candidate extraction pattern from AutoSlog Seed words Data Structures EPdata – to store candidate extraction patterns Initial value: extraction patterns from AutoSlog an the extractions SemLex – to store semantic lexicons as they are identified Initial value: seed words Cat_EPlist – to store the extraction patterns Initial value: null
7
Algorithm (contd...) 1.For all Extraction Patterns P i in EPdata score(P i ) = R i * log 2 (F i ) F i = no. of lexicons produced by P i R i = F i /N i, N i : no. of NPs extracted by P i 2. Insert P i to Cat_Eplist, where score(P i ) is max 3. Insert P i ’s extraction SemLex 4. Repeat from step 1.
8
Results (Locations) PatternSeed wordExtracted Headquarted in xNicaraguaSan miguel, chapare region Gripped xColombianone To occupy xNicaragua, townSmall country, this northern region, san sebastian nieghbourhood…. Shot in xCity, soyapangoJauja, central square, head, clash central mountain region…..
9
Multi-level Bootstrapping Problem with mutual bootstrapping Insertion of incorrect word in SemLex can drastically reduce accuracy Solution Second level of bootstrapping
10
Meta-bootstrapping Outer level of bootstrapping Retains the best 5 NPs Corresponding lexicons are added to a permanent list Reliability score: rel(NP i ) = Σ Ni k=1 (1+ score(p k )) Using reliable lexicons for the next iteration of Mutual-BS
11
Results Web LocationWeb CompanyTerrorism weapon Offices in Owned by exploded Facilities in employedThrew Operates in trust companyQuantity of Expanded into Sold to Hurled
12
Evaluation Corpus 4160 Corporate web pages 1500 terrorism text AutoSlog candidate extraction patterns 19,690 for the web pages 14,064 for the terrorism text Seed words Web company: Co., Company, Corp… Web Location: Different country names Terrorism location: Bolivia, city, Colombia, district
13
Evaluation (contd…) 50 iterations of Meta-bootstrapping Mutual bootstrapping ran until to produced 10 unique patterns
14
Evaluation (contd…) After 50 th iteration Web company95/206 (46%) Web location191/250 (76%) Web title107/231 (46%) Terrorism location158/250 (63%) Terrorism weapon124/244 (51%) Other systems’ accuracy (weapon): 17% (Rilof & Shepherd, 1997) 36% (Roark & Charniak, 1998)
15
Evaluation (contd…) Tested on 233 new web pages Recall/Preci sion (%) BaselineLexiconUnion Web company 10/3218/4718/45 Web location11/9851/7754/74 Web title6/10046/6647/62
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.