Entity Mention Detection using a Combination of Redundancy-Driven Classifiers Silvana Marianela Bernaola Biggio, Manuela Speranza, Roberto Zanoli bernaola,

Slides:



Advertisements
Similar presentations
EVALITA 2009 Recognizing Textual Entailment (RTE) Italian Chapter Johan Bos 1, Fabio Massimo Zanzotto 2, Marco Pennacchiotti 3 1 University of Rome La.
Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
University of Sheffield NLP Module 4: Machine Learning.
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
A Corpus for Cross- Document Co-Reference D. Day 1, J. Hitzeman 1, M. Wick 2, K. Crouch 1 and M. Poesio 3 1 The MITRE Corporation 2 University of Massachusetts,
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task Robert Dale, Ilya Anisimoff and George Narroway Centre for Language Technology.
REACTION REACTION Workshop Task 1 – Progress Report & Plans Lisbon, PT and Austin, TX Mário J. Silva University of Lisbon, Portugal.
Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
Ontology Notes are from:
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Information Retrieval in Practice
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Truc-Vien T. Nguyen Lab: Named Entity Recognition.
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
L-ISA Learning Domain Specific ISA relations from the WEB Alessandra Potrich and Emanuele Pianta Fondazione Bruno Kessler - IRST Trento, Italy LREC 2008.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
The TERN Task EVALITA 2007 Valentina Bartalesi Lenzi & Rachele Sprugnoli
Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
A Language Independent Method for Question Classification COLING 2004.
Lecture 13 Information Extraction Topics Name Entity Recognition Relation detection Temporal and Event Processing Template Filling Readings: Chapter 22.
Web-Assisted Annotation, Semantic Indexing and Search of Television and Radio News (proceedings page 255) Mike Dowman Valentin Tablan Hamish Cunningham.
NTCIR /21 ASQA: Academia Sinica Question Answering System for CLQA (IASL) Cheng-Wei Lee, Cheng-Wei Shih, Min-Yuh Day, Tzong-Han Tsai, Tian-Jian Jiang,
A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval David A. Evans, Chengxiang Zhai Laboratory for Computational Linguistics, CMU 34 th.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Supervised Relation Extraction.
ACE Automatic Content Extraction A program to develop technology to extract and characterize meaning from human language.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.
Curriculum Project for Information Extraction. Task definitions Task 1: Entity detection and recognition Task 2: Relation detection and recognition Both.
A Statistical Model for Multilingual Entity Detection and Tracking R. Florian, H. Hassan, A. Ittycheriah, H. Jing, N. Kambhatla, X. Luo, N. Nicolov, S.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Behind the LO-MATCH interface for end-users Semantic annotation Fabrizio Lamberti Dipartimento di Automatica e Informatica Politecnico di Torino Italy.
Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date:
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Automatically Labeled Data Generation for Large Scale Event Extraction
Measuring Monolinguality
CSCE 590 Web Scraping – Information Retrieval
Social Knowledge Mining
Clustering Algorithms for Noun Phrase Coreference Resolution
From Unstructured Text to StructureD Data
Presentation transcript:

Entity Mention Detection using a Combination of Redundancy-Driven Classifiers Silvana Marianela Bernaola Biggio, Manuela Speranza, Roberto Zanoli bernaola, manspera, Fondazione Bruno Kessler – Irst Trento, Italy The present work is supported by the LiveMemories Project May, 2010

2 Outline Entity Mention Detection: An extension of NER task. The system to be presented: Mention Levels: NAM, NOM, PRO Entity types:GPE, LOC, ORG, PER Drawing from 2 systems (ACE 2008, EVALITA 2009) 2 new features to recognize mentions Applied in LiveMemories and Italian wikipedia Available as a web service, to be integrated into TextPro

4 mentions of type NAM (proper name ): 2 PER, 1 ORG, 1 GPE Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela. Mentions: Named Entities 3

Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela. 3 nominal mentions (NOM): 3 PER Mentions: Nominals 4

Mentions: Pronominals Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela. 2 pronoun mentions (PRO): 2 PER 5

c c One-level mentions:Hugo Chavez Venezuelan Two-level mention:Venezuelan President Three-level:Venezuelan President Hugo Chavez Nested Mentions Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. 6

6 different mentions refer to 1 entity of type PER Entities 7 Venezuelan President Hugo Chavez on Saturday called for Internet regulations. He demanded that authorities crack down on a news Web site he accused of spreading false information. "The Internet cannot be something open where anything is said and done." President said, according to reports by Reuters. Hugo Rafael Chávez Frías (28 July 1954) is the President of Venezuela.

8 The idea … Exploiting a large corpus to improve the detection of mentions: -Patterns -Data redundancy “ … Italia … “ “ … Rossi …” “ … Benetton … “

9 1.Candidates 2.TF – IDF (Term Frequency – Inverse Document Frequency) : Pattern Frequency: The more frequent the pattern occurs with a mention that belongs to an specific category, the more important is for the category. Inverse Category Frequency : The more categories the pattern occurs with, the smaller its contribution in characterizing the semantics of a category which it co-occurs with. [After annotating the large corpus] word n-5 word n-4 word n-3 word n-2 word n-1 word n word n+1 word n+2 word n+3 word n+4 word n+5 MENTION Pattern Extraction

10 1.“... La giunta Coni sostiene la candidatura di Torino per le Olimpiadi giovanili ” A GPE or an ORG (soccer team)? 2.Prob(“Torino”/type=“GPE”)? Use a classifier to recognize all mentions in a large corpus in order to obtain the probability distribution for all mentions across all possible types. PERORGGPELOC Mention=“Torino” Data Redundancy B-GPE_NAM11823 B-ORG_NAM2950 B-LOC_NAM:33 B-PER_NAM:5

System Architecture 11 Identifies the syntactic head of a mention and its mention level. For the extension of a mention, we use the Malt Parser for Italian (Lavelli et al. 2009) Recognizes the type of a mention

System Architecture 12 1.

13 2. System Architecture

14 3. System Architecture

15 4. System Architecture

16 5. System Architecture

17 6. System Architecture

1.EVALITA 2009 EMD Task: value = 65.7% 2.Feature Analysis: 18 Evaluation and Feature Analysis FB1 ClassAll featuresNOT redundancyNOT pattern General79.58%74.09%79.28% NAM_GPE83.65%78.37%82.83% NAM_LOC73.02%77.52%73.02% NAM_ORG73.92%66.81%72.94% NAM_PER91.63%88.86%92.03% NOM_GPE75.86%55.38%75.18% NOM_LOC62.37%55.10%59.18% NOM_ORG71.46%64.03%70.41% NOM_PER86.32%78.29%86.08% PRO_GPE30.77%14.29%24.00% PRO_ORG29.17%27.59%30.56% PRO_PER69.58%68.43%69.97%

1.LiveMemories Project.- Identifying mentions in 2 Italian corpora: 19 Applications … A.Articles from the local newspaper “L’Adige” B.Blogs posted by students living in the university residence of “San Bartolomeo”

2.Semantic Wikipedia for Italian (SWiiT) annotated at 5 levels: A.Basic NLP processing B.Entity Mentions C.Entity Subtypes (work in progress) D.Entity Co-reference (work in progress) E.Dependency parsing (work in progress) 20 Applications …

System available as … 1.A web service: Using Axis (open source, XML based web service framework) Allows the user to submit a document and have it annotated with entity mentions using the IOB format 2.Part of TextPro: (work in progress) 21

Conclusions and future work 1.Difficulties in recognizing pronominal mentions, coreference is needed. 2.Data Redundancy improves the general FB1 in around 5%; and in around 20% for nominal names that refer to geopolitical entities. 3.The results for patterns were not what was expected; probably because the selection of them for each class were not the appropriate ones. As future work we would like to find out how to select the right patterns for each class. 22

Bartalesi Lenzi, V., Sprugnoli, R. (2009). EVALITA 2009: Description and Results of the Local Entity Detection and Recognition (LEDR) task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. Bernaola Biggio, S.M., Zanoli, R., Giuliano, C., Uryupina, O., Versley, Y., Poesio, M. (2009). Local Entity Detection and Recognition Task. In Proceedings of Evalita 2009, workshop to held at AI*IA, 12 December 2009, Reggio Emilia, Italy. Bernaola Biggio, S.M., Speranza M., Zanoli, R. Entity Mention Detection Using a Combination of Redundancy-Driven Classifiers. In Proceedings of LREC 2010, 7th Conference on Language Resources and Evaluation, Malta, Italy. Lavelli, A., Hall, J., Nilsson, J., Nivre, J. (2009). MaltParser at the EVALITA 2009 Dependency Parsing Task. In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. Magnini, B., Cappelli, A., Pianta, E., Speranza, M., Bartalesi Lenzi, V., Sprugnoli, R., Romano, L., Girardi, C., Negri, M. (2006). Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In Proceedings of SILFI Florence, Italy. Speranza, M. (2009). The Named Entity Recognition Task at EVALITA In Proceedings of Evalita 2009, workshop held at AI*IA, 12 December 2009, Reggio Emilia, Italy. References