Linked Data Reuse in the Language Services Industry

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

An Open Localisation Interface to CMS using OASIS Content Management Interoperability Services Provide name of demo, name of presenter (and affiliation.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Semantic Web Introduction
Alone No More: Integrating Learning with Enterprise Content © 2009 Xyleme - All rights reserved April 2009.
Embedding Knowledge in HTML Some content from a presentations by Ivan Herman of the W3c.
Information and Business Work
1 Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies.
Funded under the EU ICT Policy Support Programme Automated Solutions for Patent Translation John Tinsley Project PLuTO WIPO Symposium of.
(C) 2013 Logrus International Practical Visualization of ITS 2.0 Categories for Real World Localization Process Part of the Multilingual Web-LT Program.
2005 Adobe Systems Incorporated. All Rights Reserved. 1 Ontolog Forum Gunar Penikis Sr. Product Manager Adobe Systems.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Semantic Publishing Update Second TUC meeting Munich 22/23 April 2013 Barry Bishop, Ontotext.
Content Strategy.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
Tyler Snow Brigham Young University Translation Research Group.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
MultilingualWeb – Language Technology A New W3C Working Group Felix Sasaki, David Filip, David Lewis.
Sofia Garcia/Roberto Silva Tutorial Workshop, GrenobleDate: 31/Jan/2007 The work of a professional translator and the translation agency V1.0.
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
Using Semantic Mapping to Manage Heterogeneity in XLIFF Interoperability by Dave Lewis, Rob Brennan, Alan Meehan, Declan O’Sullivan CNGL Centre for Global.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Embedding Knowledge in HTML Some content from a presentations by Ivan Herman of the W3c.
Linked Data: Emblematic applications on Legacy Data in Libraries.
Introduction to the Semantic Web and Linked Data
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
Machine Translate Post Edit Quality Check Extract Content I18N Text Analysis Curate Corpora Workflow Analysis Segment Identify Terms Translate Provenance.
ITS 2.0 in XLIFF 2 FEISGILTT Dublin June 2014 Yves Savourel ENLASO Corporation This presentation was made possible by.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
RDFa Primer Bridging the Human and Data webs Presented by: Didit ( )
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
Content Management: What Is It and Why Should You Care?
INHA UNIVERSITY, KOREA Rainer Simon Austrian Institute of Technology.
A report by Olaf-Michael Stefanov to the JIAMCATT community
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
TextCrowd – Collaborative semantic enrichment of text-based datasets
^ Reviewer’s Workbench
DATA INTEGRATION FOR LANGUAGE DOCUMENTATION
Introducing the technology
Embedding Knowledge in HTML
Dave Lewis W3C MultilingualWeb - Language Technology Working Group
Building the Localization Web
NISO Virtual Conference 19 February 2014 Ralph Swick, W3C
Text Analytics in ITS 2.0: Annotation of Named Entities
Applications of IFLA Namespaces
Business Benefits of ITS 2
Part of the Multilingual Web-LT Program
ITS 2.0 Enriched Terminology Annotation Showcase
ITS Workbench Two Problems, One Open Standards Based Solution
Semantic Annotation service
Part of the Multilingual Web-LT Program
Embedding Knowledge in HTML
FashionBrain: Understanding Europe’s Fashion Data Universe
CSE591: Data Mining by H. Liu
Jonathan Griffin, Managing Director, IFIS Publishing &
Use Cases Simple Machine Translation (using Rainbow)
Two Problems, One Open Standards Based Solution
Presentation transcript:

Linked Data Reuse in the Language Services Industry David Lewis, Leroy Finn, Dominic Jones

Language & Localisation: Meta-Data and Interoperability Translations + Term Localisation Industry € Language Technology Community Language Resource Community Language Resource Curation Relies on specific funded actions Episodic funding, sustainability and coverage concerns LOD promises: Improved sharing and annotation of existing resources Some opportunistic curation, e.g. mining Dbpedia Can there be better synergy with commercial language services? Use case: Localisation workflows Multilingual Web

Content Management-Localisation Workflow & Interoperability Interoperability roadblocks in high variety: Content Management Systems Content formats Increasingly dynamic Add language resource curation Data driven MT & text analytics Client Create Content editors CMS Language Service Provider Extract & Segment translators CMS/ CVS Trans/term Mgmt Trans+Term Reuse MT Translate Content CAT QA QA tools design Publish Content Web CMS

Internationalisation Tag Set (ITS) Allows Localisation tools to be instructed to treat specific text in specific ways Annotates text in XML and HTML5 documents RDF Mapping – NLP Interchange Format Principles: Minimise disturbance of original content Don’t reinvent wheel Link to existing meta-data before adding new Defined distinct, independent Data Categories ITS2.0 is “Dublin Core” of the Multilingual Web

Here Content is the Data So we must annotate textual content RDFa and microdata better for annotating elements C. Lieske, F. Sasaki 2010

Process Monitoring in RDF Better up-front and real-time business decision support: Use historic and aggregated metadata for inference Content carries its own “audit trail” Metadata can be transformed to linked semantic data allowing: Inferential queries across data siloes New forms of data visualization Extensible data relationships

Provenance Data: RDF-based logging W3C Provenance WG http://www.w3.org/2011/prov/ subproperty: wasTranslatedFrom ITS related entity subclass: document, segment, analysed-text, term, translation, translation-revision From: http://www.w3.org/TR/prov-primer/

CMS to Localisation Workflow Roundtrip ITS +XLIFF Web-based Postediter MT - Matrex Source CMS Parse, filter, segment MT - Bing Named Entity Recogniser –Enrycher ITS +HTML5 +CMIS MT – M4LOC Target CMS XLIFF/ PROV-O Workflow Management ITS +XLIFF LKR ITS +SPARQL RDF provenance store XLIFF store QA viewer CAT Content Management Localisation Preparation Translation Management

ITS RDF: PROV-O + NIF XLIFF HTML5 CMIS HTML5 CMIS Segment Machine Translate Extract Terminology Post Edit Text Analysis Domain Translate Terminology ITS Quality Check Text Analysis MT Confidence HTML5 CMIS Loc Quality Issue HTML5 CMIS PublishContent Content I18N Provenance Workflow Analysis Curate Corpora RDF: PROV-O + NIF

Active Curation Localisation Industry: €33bn global industry 12% growth Established commercial models for translation and term reuse High interoperability costs Long tail industry – 99% SMEs SMEs struggle to benefit from leveraging reuse Challenged to adapt machine translation and text analytics technology Active Curation: On-demand, quality-driven assembly of training corpora Collect human linguistic judgements from across the workflow Professional translator/reviewer, crowds, end user Already shown 25% MT improvement via in-job retraining Linked Language and Localisation Data Extending existing tools to generate process provenance data Pooling/Bokering of data across value chains

Multilingual Linked Open Data Where Next? Help EC and other public bodies to bootstrap data per year EC produces 11 million pages in 22 language => 14 billion triples Attribution/license annotation and access control – Controlled Linked Data Increase industrial engagement Media localisation – audio, video, images Non-localisation MT and TA applications Multilingual content personalisation YOUR USE CASE HERE Join new W3C Community Group: Multilingual Linked Open Data http://www.w3.org/community/bpmlod/

Thank You