2018/4/14 SMC4LRT Semantic Mapping Component for Language Resources and Technology 2011-06-06 Matej Ďurčo, ICLTT, Vienna;

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
CLARIN Metadata & ISO DCR Daan Broeder. Max-Planck Institute for Psycholinguistics TKE ES05 Workshop, August 14th Dublin.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Interoperability Aspects in Europeana Antoine Isaac Workshop on Research Metadata in Context 7./8. September 2010, Nijmegen.
Utrecht Matej Ďurčo, ICLTT, Vienna Controlled Vocabularies and SMC4LRT Semantic Mapping in CMDI.
From CLARIN Component Metadata to Linked Open Data
Flexible Syntax and Concept Registries as a basis for Metadata Daan Broeder TLA - MPI for Psycholinguistics & CLARIN Metadata in Context, APA/CLARIN Workshop,
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
4th project meeting 27-29/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA agINFRA A data infrastructure for agriculture.
Agenda CMDI Workshop 9.15 Welcome 9.30 Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.15Coffee 10.30Use of ISOCat within CMDI.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
ISOcat demo and providing RELcat input Menzo Windhouwer The Language Archive tla.mpi.nl Data Archiving and Networked Solutions
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Metadata & CMDI CLARIN Component Metadata Infrastructure Daan Broeder et al. Max-Planck Institute for Psycholinguistics CLARIN NL CMDI Metadata Tutorial.
PART IV: REPRESENTING, EXPLAINING, AND PROCESSING ALIGNMENTS & PART V: CONCLUSIONS Ontology Matching Jerome Euzenat and Pavel Shvaiko.
The MMI Tools Carlos Rueda Monterey Bay Aquarium Research Institute OOS Semantic Interoperability Workshop Marine Metadata Interoperability Project Boulder,
DASISH Metadata Catalogue Binyam Gebrekidan Gebre, Stephanie Roth, Olof Olsson, Catharina Wasner, Matej Durco, Bartholemeus Worcslav, Przemyslaw Lenkiewicz,
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
Linguistics with CLARIN Storing resources in CLARIN Jan Odijk LOT Winterschool Amsterdam,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
11 CMDI/ISOcat And Semantic Operability Ineke Schuurman ISOcat content coördinator CLARIN-NL Menzo Windhouwer ISOcat system administrator Utrecht
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands NP CMDI-1 Metadata Component Framework New Standardization.
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
CMDI Software Components. MD Service Delivers services for the Catalog & Search GUI – Query – Populate UI Acts as a WS and exposes the query and “queryModel()*”
Metadata Registries Registry: authoritative, centrally controlled store of information – W3C Web Services Glossary, 2004
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
1 CLARIN - NL What is going on? Jan Odijk Amsterdam 26 Aug 2010.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Agenda CMDI Tutorial 9.30 Welcome & Coffee Introduction to metadata and the CLARIN Metadata Infrastructure (CMDI) 10.30CMDI & ISO-DCR 10.50The CMDI.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
An Ontological Approach to Financial Analysis and Monitoring.
A Data Category Registry- and Component- based Metadata Framework Daan Broeder et al. Max-Planck Institute for Psycholinguistics LREC 2010.
TDS-Curator DANS MPI for Psycholinguistics Utrecht Institute of Linguistics OTS languagelink.let.uu.nl/tds/ 9/21/20101CLARIN-NL - Call 1 - ISOcat status.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Of 24 lecture 11: ontology – mediation, merging & aligning.
Enhancing the Quality of Metadata by using Authority Control Thorsten Trippel, Claus Zinn LDL 2016 Workshop at LREC May 23-28, Portorož (Slovenia)
WP5: Semantic Multimedia
The Semantic Web By: Maulik Parikh.
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Doron Goldfarb & Yann LE FRANC
PDAP Query Language International Planetary Data Alliance
Lifecycle Metadata for Digital Objects
PREMIS Tools and Services
Property consolidation for entity browsing
Session 2: Metadata and Catalogues
LOD reference architecture
Antoine Isaac SEMIC conference
Chaitali Gupta, Madhusudhan Govindaraju
Taxonomy of public services
Taxonomy of public services
Metadata supported full-text search in a web archive
Presentation transcript:

2018/4/14 SMC4LRT Semantic Mapping Component for Language Resources and Technology 2011-06-06 Matej Ďurčo, ICLTT, Vienna;

Context on Language Resource and Technology 2018/4/14 2018/4/14 Context on Language Resource and Technology CLARIN – Common Language Resources and Technology Infrastructure CMDI - CLARIN Metadata Infrastructure heterogeneous collection of (Metadata about) Resources ISOcat (ISO 12620) - a framework within ISO TC 37 for defining: Data Categories – Definitions of widely accepted linguistic concepts apply Semantic Technologies Ontology Mappping / Schema Mapping Ontology Browsing / Visualization Linked Open Data

Main Goal/s Enhance Metadata Search → Semantic Search Basic Idea 2018/4/14 2018/4/14 Main Goal/s Enhance Metadata Search → Semantic Search Basic Idea query: + relations: (#DatCat) = expanded query: (Class level) Semantic Browsing - Browse Metadata/Resources via ontologies (LT-World) (Instance-Level) Interoperability / Reuses - Connect dataset to Linked Open Data Actor.Name any Peter #sameAs (#Actor, #Person) #sameAs (#Name, #FullName) Actor.Name any Peter OR Actor.FullName any Peter OR Person.Name any Peter OR Person.FullName any Peter

Definitions Vocabulary, Lexicon, Ontology Term, Category, Concept ? 2018/4/14 2018/4/14 Definitions Vocabulary, Lexicon, Ontology Term, Category, Concept ? MD Profile / Schema MD Description

2018/4/14 2018/4/14 Components DataCategoryRegistry - isocat DCR (ISO/TC37) Define/Standardize a reusable set of (basic) data categories CMDI - ComponentRegistry define profiles/schemas at will, but reference DatCats! CMDRSB - Repository/Service/Browser CMDI exploitation-side trinity http://clarin.aac.ac.at/MDService2/ RelationRegistry allows defining relations between DatCats VLO - Virtual Language Observatory faceted browser for CLARIN Metadata, maps all hetergeneous information from all profiles to 10 facets! VAS – Vocabulary Alignment Service (CATCHPlus.nl) find concept to literal, find aligned concepts LT-World - Domain ontology

2018/4/14 2018/4/14 Components - CMDI

Components - dependencies 2018/4/14 2018/4/14 Components - dependencies

Approach – Class/Concept level 2018/4/14 2018/4/14 Approach – Class/Concept level Use linkage: Profiles → Data Categories ← Relation Registry just mapping based on the ConceptLink resolvable via ComponentRegistry different Profile/Elements pointing to the same DatCat use Information from Relation Registry: a) equivalence relation between DatCats b) equivalence relation also between Component DatCats (yet to come) c) use also other relations in Relation Registry (subClassOf, synonymy?, …) Apply selected (user-defined) relation-sets from Relation Registry <CMD> MDRecord <Header> <MdProfile>{profileID}</MdProfile> <Components><{profileName}> <{component}> <{element}> CMD-Profile-Specification <CMD_ComponentSpec> <Header><ID>{profileID}</ID>...</Header> <CMD_Component name=“{profileName}"> <CMD_Component name=“{component}"> <CMD_Element name=“{element}“ ConceptLink=“{datcat-uri}”> Data Category Registry <dcif:dataCategorySelection> <dcif:dataCategory pid=“{datcat-uri}“ > {detail-information} <rdf:RDF> Relation Registry <rdf:Description rdf:about="{datcatX-uri}“> <sameAs rdf:resource="{datcatY-uri}"/> </rdf:Description>

Approach – Individuals/Instance Level 2018/4/14 2018/4/14 Approach – Individuals/Instance Level One step when (pre)processing incoming new MD-sets Express MD-Records as RDF-triples: Identify potential target Domain Ontologies/Vocabularies Create inverted Index: Define lookup function: Enrich dataset with new facts: Property-values of Metadata-Records are linked to instances of domain-ontologies <#mdrecord #property “string-value”> Category Label Entity dc:Organization „MPI“ #MPI „Max-Planck...“ „DFKI“ #DFKI „De Fo Kü In“ skos:LCSH „19th Poetry“ lcsh:19thPoetry skos:DDC ddc:19thPoetry label → entity lookup(category, string-value) → <external-entity, measure> <#mdrecord #property #external-entity>

Semantic Mapping - Linking and Data Flow 2018/4/14 2018/4/14 Semantic Mapping - Linking and Data Flow INCONSISTENT

Semantic Search - Query sequence 2018/4/14 2018/4/14 Semantic Search - Query sequence

Candidate Categories/Properties 2018/4/14 2018/4/14 Candidate Categories/Properties ResourceType, Format, AnnotationLevelType → map to: isocat-DataCategories (Thematic Views: Metadata, Morphosyntax, ...) Genre, Topic, Subject → map to: Taxonomies, Library Classification systems (LCSH, DDC, Dornseiff,...) Project, Institution, Person, Publisher open controlled vocabularies (real entities) → map to: LT-World (perhaps others: LCCN, DBPedia?)

2018/4/14 2018/4/14 Expected Results Specification + Prototype of a Semantic Mapping Component allowing to transform CMD-Metadata into RDF Specification + Prototype of a Semantic Search Component REST-WebService enriching the MD-Search, allowing query expansion and ontology/concept-based search CLARIN Metadata expressed as RDF/LOD-Dataset

Next Steps Literature → Related Work Linked Open Data Ontology Mapping 2018/4/14 2018/4/14 Next Steps Literature → Related Work Linked Open Data Ontology Mapping Ontology Browsing/Visualization Analyze Data Existing MD-Schemas (DC, OLAC, MODS, TEI, IMDI, CMD, ...) LT-World Ontology SKOS-Data available via Vocabulary Alignement Service LCSH, LCCN DBPedia

2018/4/14 2018/4/14 References - LRT [1] D. V. Uytvanck, C. Zinn, D. Broeder, P. Wittenburg, and M. Gardellini, \Virtual language observatory: The portal to the language resources and technology universe," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May 2010. [2] D. Broeder, M. Kemps-Snijders, D. V. Uytvanck, M. Windhouwer, P. Withers, P. Wittenburg, and C. Zinn, \A data category registry- and component-based metadata framework," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May 2010. [3] ISO12620:2009, \Computer applications in terminology { data categories {specification of data categories and management of a data category registry for language resources," 2009. [4] E. Hinrichs, P. Banski, K. Beck, G. Budin, T. Caselli, K. Eckart, K. Elenius, G. Faa, M. Gavrilidou, V. Henrich, V. Quochi, L. Lemnitzer, W. Maier, M. Monachini, J. Odijk, M. Ogrodniczuk, P. Osenova, P. Pajas, M. Piasecki, A. Przepiorkowski, D. V. Uytvanck, T. Schmidt, I. Schuurman, K. Simov, C. Soria, I. Skadina, J. Stepanek, P. Stranak, P. Trilsbeek, T. Trippel, and I. Vogel, \Interoperability and standards," deliverable, CLARIN, March 2011. [5] B. Jörg, H. Uszkoreit, and A. Burt, \Lt world: Ontology and reference information portal," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May 2010.

References – Semantic Technologies 2018/4/14 2018/4/14 References – Semantic Technologies [5] B. Jörg, H. Uszkoreit, and A. Burt, \Lt world: Ontology and reference information portal," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May 2010. [6] Y. Kalfoglou and M. Schorlemmer, \Ontology mapping: the state of the art," The Knowledge Engineering Review, vol. 18, pp. 1{31, Jan. 2003. [7] P. Shvaiko and J. Euzenat, \Ten challenges for ontology matching," in On the Move to Meaningful Internet Systems: OTM 2008 (R. Meersman and Z. Tari, eds.), vol. 5332 of Lecture Notes in Computer Science, pp. 1164{1182, Springer Berlin / Heidelberg, 2008. 10.1007/978-3-540-88873-4 18. [8] M. Ehrig and Y. Sure, \Ontology mapping { an integrated approach," in The Semantic Web: Research and Applications (C. Bussler, J. Davies, D. Fensel, and R. Studer, eds.), vol. 3053 of Lecture Notes in Computer Science, pp. 76{91, Springer Berlin / Heidelberg, 2004. 10.1007/978-3-540-25956-5 6. [9] S. Noah, N. Alias, N. Osman, Z. Abdullah, N. Omar, Y. Yahya, and M. Yusof, \Ontology-driven semantic digital library," in Information Retrieval Technology (P.-J. Cheng, M.-Y. Kan, W. Lam, and P. Nakov, eds.), vol. 6458 of Lecture Notes in Computer Science, pp. 141-150, Springer Berlin / Heidelberg, 2010. 10.1007/978-3-642-17187-1 13. [10] T. Berners-Lee, \Linked data." online: http://www.w3.org/DesignIssues/LinkedData.html, 07 2006. Status: personal view only. Editing status: imperfect but published. Last visited: 2011-04-13. [11] T. Heath and C. Bizer, \Linked data: Evolving the web into a global data space," Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1, pp. 1-136, Feb 2011.

Tasks / Open Issues (Who/How) Define Concept-Level Relations 2018/4/14 2018/4/14 Tasks / Open Issues (Who/How) Define Concept-Level Relations (Vocabulary Service http://catchplus.tuxic.nl/catchplus/serviceapi/1/) Populate Vocabulary service translate Ontologies, Taxonomies Express MDRepo in RDF every profile is one Ontology Every MDRecord is an instance Ontology Mapping (compute similarities between profiles and between instances)

Questions/Discussion 2018/4/14 2018/4/14 Questions/Discussion Distinguish between relations (is it type vs. subclass?) ISA, a-kind-of = type subsumption (hypo/hyperonymy) = subClassOf Resource-Level: Annotation-Tiers of Resources are conceptLinked to DatCats Values of Annotation-Tiers are linked to DatCats Thierry: user rather Computer Linguist within an application (relevant in META-NET) How to employ Linguistic Ontologies? Lemon/LingInfo, isocat, GOLD, wals.info, Wordnet? Thierry: shouldn't be necessary, mainly for OntoPopulation from texts

2018/4/14 2018/4/14 MDService - Basics MDService accepts queries about metadata from MetadataBrowser (and external Applications) and passes them to the Metadata Repository(ies) and/or to the Virtual Collection Registry, optionally applying Semantic Mapping based on the information from Component Registry, Data Category Registries and Relation Registry receiving results and passing them (optionally formatted) back to the requesting node.

MDService - Functionality 2018/4/14 2018/4/14 MDService - Functionality  REST-interface (trac:WADL, MDService2/docs/htmlpage/wadl) collections list the „natural“ hierarchical collections-structure of the repository model return xml-elems used in the repository (with usage statistics) terms return terms/indices/xml-elems used in the repository enriched with a) the usage statistics (count occurrences and distinct values) b) the corresponding CMD-components and data categories values list distinct values for given index (similar to facet functionality) recordset retrieve a list of MDrecords based on a query [CQL] record retrieve individual MDrecord based on the identifier

MDBrowser - Functionality 2018/4/14 2018/4/14 MDBrowser - Functionality http://clarin.aac.ac.at/MDService2/docs/htmlpage/info Dynamic Repositories Collections browsing Terms/Values browsing Query Input Simple full-text query Complex queries (CQL-searchclauses, boolean op) Index auto-completion Queryset/Resultset work with multiple results in parallel Paging Variable views (select columns, auto-columns) Workspace (storing queries, bookmarks) „Linkable“ Queries (Semantic Mapping)

CMDRSB - Situation and Outlook 2018/4/14 2018/4/14 CMDRSB - Situation and Outlook The MDRepository currently contains around 109.000 records, mainly from the datasets: OLAC and IMDI (overview of collections) Currently there are three instances of the MDRepository running providing similar but not identical datasets: University of Gothenburg (main) ICLTT, Vienna MPI Psycholing, Nijmegen A first version of the MDService and Browser is online: clarin.aac.ac.at/MDService2 Although the repository and interface already provide a lot of information and functionality, it is demo-quality and cannot yet be seen as reliable service. Lot of work is still needed both on the data quality and user interface: Enhancing the UI (based on feedback from Nijmegen 201101 2011-05) continuous integration of new datasets (provided for harvest by the centres) Nevertheless we invite you to try it out and look forward to any critical remarks (they can be accessed by the same MDService, by switching the target repository in the UI)