Text Analytics in ITS 2.0: Annotation of Named Entities

Slides:



Advertisements
Similar presentations
Requirements gathering
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
ANNIC ANNotations In Context GATE Training Course 27 – 28 April 2006 Niraj Aswani.
IN350 Document Management & Information Steering Introduction to Document Management. Class 1 August 25, 2003 Judith A. Molka-Danielsen
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
Information Extraction from Documents for Automating Softwre Testing by Patricia Lutsky Presented by Ramiro Lopez.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Survey of Semantic Annotation Platforms
ANNIC ANNotations In Context GATE Training Course October 2006 Kalina Bontcheva (with help from Niraj Aswani)
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
1 Writeslike.us Em Tonkin, Andrew Hewson
1 Writeslike.us Em Tonkin, Andrew Hewson
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Interpreting & communicating about computational models: using visual analytics as a bridge between policy makers and researchers Philippe J. Giabbanelli,
Semiautomatic domain model building from text-data Petr Šaloun Petr Klimánek Zdenek Velart Petr Šaloun Petr Klimánek Zdenek Velart SMAP 2011, Vigo, Spain,
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Facilitating Document Annotation using Content and Querying Value.
Using Semantic Mapping to Manage Heterogeneity in XLIFF Interoperability by Dave Lewis, Rob Brennan, Alan Meehan, Declan O’Sullivan CNGL Centre for Global.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.
Natural Language Processing for Information Retrieval -KVMV Kiran ( )‏ -Neeraj Bisht ( )‏ -L.Srikanth ( )‏
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Machine Translate Post Edit Quality Check Extract Content I18N Text Analysis Curate Corpora Workflow Analysis Segment Identify Terms Translate Provenance.
ITS 2.0 in XLIFF 2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Conceptualization Relational Model Incomplete Relations Indirect Concept Reflection Entity-Relationship Model Incomplete Relations Two Ways of Concept.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
AUTONOMOUS REQUIREMENTS SPECIFICATION PROCESSING USING NATURAL LANGUAGE PROCESSING - Vivek Punjabi.
IR&NLP Coursework P1 Text Analysis Within The Fields Of Information Retrieval and Natural Language Processing By Ben Addley Academic Year 2004.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
WP4 Models and Contents Quality Assessment
^ Reviewer’s Workbench
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Based on Menu Information
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Semantic Web Technologies
Tagging documents made easy, using machine learning
Machine Learning in Natural Language Processing
Part of the Multilingual Web-LT Program
Statistical NLP: Lecture 9
SmaRT Visualization of Legal Rules for Compliance
ITS Workbench Two Problems, One Open Standards Based Solution
Part of the Multilingual Web-LT Program
OpenDOAR and ROAR RSP Services Day, Nottingham, 23rd Apr.2008
Use Cases Simple Machine Translation (using Rainbow)
Linked Data Reuse in the Language Services Industry
Information Retrieval
Two Problems, One Open Standards Based Solution
Text Analytics in ITS 2.0: Annotation of Named Entities
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Text Analytics in ITS 2.0: Annotation of Named Entities Tadej Štajner Jožef Stefan Institute

Motivation Translating proper names … can be problematic for statistical MT systems

Motivation (2) Translation depends on source and target language: There are specific rules to translate (or transliterate) particular proper names or concepts Sometimes, they should not even be translated Solution: figure out what is actually being mentioned and see if any existing translated expression exists for that entity

Motivation (3) Localization of proper names: personal names, product names, or geographic names, chemical compounds, protein names Names can appear without sufficient context: we can use ITS2.0 Text Analysis annotations to provide context for ambiguous content.

ITS2.0 Text Analysis Support text analysis agents that enhance content by suggesting or identifying concepts, identities, identified by IRIs. The data category provides three pieces of information: confidence entity type/concept class entity/concept identifier

ITS2.0 Text Analysis <!DOCTYPE html> <div its-annotators-ref="text-analysis|http://enrycher.ijs.si/mlw/toolinfo.xml#enrycher"> <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin" its-ta-class-ref="http://schema.org/Place">Dublin</span> is the <span its-ta-ident-ref="http://purl.org/vocabularies/princeton/wn30/synset-capital-noun-3.rdf">capital</span> of <span its-ta-ident-ref="http://dbpedia.org/resource/Ireland" its-ta-class-ref="http://schema.org/Place">Ireland</span>. </div>

Producing these annotations NLP Techniques Named entity extraction & disambiguation Word sense disambiguation Manual annotation

Named entity disambiguation Document Entity Mention Label

Use cases Informing a human agent (i.e. translator) that a certain fragment of text is subject to follow specific translation rules: [this is taken up in OKAPI and the XLIFF generation] proper names officially regulated translations. Informing software agent (i.e. CMS) about the conceptual type of a textual entity in order to enable special processing or indexing; geographic names personal names product names chemical compounds

Open issues Text Analysis data category can’t represent stand-off annotations, so only one layer can be done Support for the domain data category via text analysis tools

Business case By itself, it’s infrastructure that indirectly supports business cases by supporting other technical scenarios [Clemens:] Having metadata saves time, producing it automatically can compound the savings [XLIFF roundtrip:] Human as well as machine consumption of this metadata

Demo http://enycher.ijs.si/mlw/