Download presentation
Presentation is loading. Please wait.
Published byMervyn Bishop Modified over 9 years ago
1
Entity Mention Detection using a Combination of Redundancy-Driven Classifiers Silvana Marianela Bernaola Biggio, Manuela Speranza, Roberto Zanoli bernaola, manspera, zanoli{@fbk.eu} Fondazione Bruno Kessler – Irst Trento, Italy The present work is supported by the LiveMemories Project May, 2010
2
2 Outline Entity Mention Detection: An extension of NER task. The system to be presented: Mention Levels: NAM, NOM, PRO Entity types:GPE, LOC, ORG, PER Drawing from 2 systems (ACE 2008, EVALITA 2009) 2 new features to recognize mentions Applied in LiveMemories and Italian wikipedia Available as a web service, to be integrated into TextPro
3
4 mentions of type NAM (proper name ): 2 PER, 1 ORG, 1 GPE Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela. Mentions: Named Entities 3
4
Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela. 3 nominal mentions (NOM): 3 PER Mentions: Nominals 4
5
Mentions: Pronominals Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela. 2 pronoun mentions (PRO): 2 PER 5
6
c c One-level mentions:Hugo Chavez Venezuelan Two-level mention:Venezuelan President Three-level:Venezuelan President Hugo Chavez Nested Mentions Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. 6
7
6 different mentions refer to 1 entity of type PER Entities 7 Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela.
8
8 The idea … Exploiting a large corpus to improve the detection of mentions: -Patterns -Data redundancy “ … Italia … “ “ … Rossi …” “ … Benetton … “
9
9 1.Candidates 2.TF – IDF (Term Frequency – Inverse Document Frequency) : Pattern Frequency: The more frequent the pattern occurs with a mention that belongs to an specific category, the more important is for the category. Inverse Category Frequency : The more categories the pattern occurs with, the smaller its contribution in characterizing the semantics of a category which it co-occurs with. [After annotating the large corpus] word n-5 word n-4 word n-3 word n-2 word n-1 word n word n+1 word n+2 word n+3 word n+4 word n+5 MENTION Pattern Extraction
10
10 1.“... La giunta Coni sostiene la candidatura di Torino per le Olimpiadi giovanili 2010...” A GPE or an ORG (soccer team)? 2.Prob(“Torino”/type=“GPE”)? Use a classifier to recognize all mentions in a large corpus in order to obtain the probability distribution for all mentions across all possible types. PERORGGPELOC Mention=“Torino” Data Redundancy B-GPE_NAM11823 B-ORG_NAM2950 B-LOC_NAM:33 B-PER_NAM:5
11
System Architecture 11 Identifies the syntactic head of a mention and its mention level. For the extension of a mention, we use the Malt Parser for Italian (Lavelli et al. 2009) Recognizes the type of a mention
12
System Architecture 12 1.
13
13 2. System Architecture
14
14 3. System Architecture
15
15 4. System Architecture
16
16 5. System Architecture
17
17 6. System Architecture
18
1.EVALITA 2009 EMD Task: value = 65.7% 2.Feature Analysis: 18 Evaluation and Feature Analysis FB1 ClassAll featuresNOT redundancyNOT pattern General79.58%74.09%79.28% NAM_GPE83.65%78.37%82.83% NAM_LOC73.02%77.52%73.02% NAM_ORG73.92%66.81%72.94% NAM_PER91.63%88.86%92.03% NOM_GPE75.86%55.38%75.18% NOM_LOC62.37%55.10%59.18% NOM_ORG71.46%64.03%70.41% NOM_PER86.32%78.29%86.08% PRO_GPE30.77%14.29%24.00% PRO_ORG29.17%27.59%30.56% PRO_PER69.58%68.43%69.97%
19
1.LiveMemories Project.- Identifying mentions in 2 Italian corpora: 19 Applications … A.Articles from the local newspaper “L’Adige” B.Blogs posted by students living in the university residence of “San Bartolomeo”
20
2.Semantic Wikipedia for Italian (SWiiT) http://textpro.fbk.eu/resources/SWiiT.htmlhttp://textpro.fbk.eu/resources/SWiiT.html, annotated at 5 levels: A.Basic NLP processing B.Entity Mentions C.Entity Subtypes (work in progress) D.Entity Co-reference (work in progress) E.Dependency parsing (work in progress) 20 Applications …
21
System available as … 1.A web service: http://textpro.fbk.eu/typhoon.htmlhttp://textpro.fbk.eu/typhoon.html Using Axis (open source, XML based web service framework) Allows the user to submit a document and have it annotated with entity mentions using the IOB format 2.Part of TextPro: http://textpro.fbk.eu (work in progress) 21
22
Conclusions and future work 1.Difficulties in recognizing pronominal mentions, coreference is needed. 2.Data Redundancy improves the general FB1 in around 5%; and in around 20% for nominal names that refer to geopolitical entities. 3.The results for patterns were not what was expected; probably because the selection of them for each class were not the appropriate ones. As future work we would like to find out how to select the right patterns for each class. 22
23
Bartalesi Lenzi, V., Sprugnoli, R. (2009). EVALITA 2009: Description and Results of the Local Entity Detection and Recognition (LEDR) task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. Bernaola Biggio, S.M., Zanoli, R., Giuliano, C., Uryupina, O., Versley, Y., Poesio, M. (2009). Local Entity Detection and Recognition Task. In Proceedings of Evalita 2009, workshop to held at AI*IA, 12 December 2009, Reggio Emilia, Italy. Bernaola Biggio, S.M., Speranza M., Zanoli, R. Entity Mention Detection Using a Combination of Redundancy-Driven Classifiers. In Proceedings of LREC 2010, 7th Conference on Language Resources and Evaluation, Malta, Italy. Lavelli, A., Hall, J., Nilsson, J., Nivre, J. (2009). MaltParser at the EVALITA 2009 Dependency Parsing Task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. Magnini, B., Cappelli, A., Pianta, E., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R., Romano, L., Girardi, C., Negri, M. (2006). Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In Proceedings of SILFI 2006. Florence, Italy. Speranza, M. (2009). The Named Entity Recognition Task at EVALITA 2009. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. References
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.