Presentation is loading. Please wait.

Presentation is loading. Please wait.

2018/4/14 SMC4LRT Semantic Mapping Component for Language Resources and Technology 2011-06-06 Matej Ďurčo, ICLTT, Vienna;

Similar presentations


Presentation on theme: "2018/4/14 SMC4LRT Semantic Mapping Component for Language Resources and Technology 2011-06-06 Matej Ďurčo, ICLTT, Vienna;"— Presentation transcript:

1 2018/4/14 SMC4LRT Semantic Mapping Component for Language Resources and Technology Matej Ďurčo, ICLTT, Vienna;

2 Context on Language Resource and Technology
2018/4/14 2018/4/14 Context on Language Resource and Technology CLARIN – Common Language Resources and Technology Infrastructure CMDI - CLARIN Metadata Infrastructure heterogeneous collection of (Metadata about) Resources ISOcat (ISO 12620) - a framework within ISO TC 37 for defining: Data Categories – Definitions of widely accepted linguistic concepts apply Semantic Technologies Ontology Mappping / Schema Mapping Ontology Browsing / Visualization Linked Open Data

3 Main Goal/s Enhance Metadata Search → Semantic Search Basic Idea
2018/4/14 2018/4/14 Main Goal/s Enhance Metadata Search → Semantic Search Basic Idea query: + relations: (#DatCat) = expanded query: (Class level) Semantic Browsing - Browse Metadata/Resources via ontologies (LT-World) (Instance-Level) Interoperability / Reuses - Connect dataset to Linked Open Data Actor.Name any Peter #sameAs (#Actor, #Person) #sameAs (#Name, #FullName) Actor.Name any Peter OR Actor.FullName any Peter OR Person.Name any Peter OR Person.FullName any Peter

4 Definitions Vocabulary, Lexicon, Ontology Term, Category, Concept ?
2018/4/14 2018/4/14 Definitions Vocabulary, Lexicon, Ontology Term, Category, Concept ? MD Profile / Schema MD Description

5 2018/4/14 2018/4/14 Components DataCategoryRegistry - isocat DCR (ISO/TC37) Define/Standardize a reusable set of (basic) data categories CMDI - ComponentRegistry define profiles/schemas at will, but reference DatCats! CMDRSB - Repository/Service/Browser CMDI exploitation-side trinity RelationRegistry allows defining relations between DatCats VLO - Virtual Language Observatory faceted browser for CLARIN Metadata, maps all hetergeneous information from all profiles to 10 facets! VAS – Vocabulary Alignment Service (CATCHPlus.nl) find concept to literal, find aligned concepts LT-World - Domain ontology

6 2018/4/14 2018/4/14 Components - CMDI

7 Components - dependencies
2018/4/14 2018/4/14 Components - dependencies

8 Approach – Class/Concept level
2018/4/14 2018/4/14 Approach – Class/Concept level Use linkage: Profiles → Data Categories ← Relation Registry just mapping based on the ConceptLink resolvable via ComponentRegistry different Profile/Elements pointing to the same DatCat use Information from Relation Registry: a) equivalence relation between DatCats b) equivalence relation also between Component DatCats (yet to come) c) use also other relations in Relation Registry (subClassOf, synonymy?, …) Apply selected (user-defined) relation-sets from Relation Registry <CMD> MDRecord <Header> <MdProfile>{profileID}</MdProfile> <Components><{profileName}> <{component}> <{element}> CMD-Profile-Specification <CMD_ComponentSpec> <Header><ID>{profileID}</ID>...</Header> <CMD_Component name=“{profileName}"> <CMD_Component name=“{component}"> <CMD_Element name=“{element}“ ConceptLink=“{datcat-uri}”> Data Category Registry <dcif:dataCategorySelection> <dcif:dataCategory pid=“{datcat-uri}“ > {detail-information} <rdf:RDF> Relation Registry <rdf:Description rdf:about="{datcatX-uri}“> <sameAs rdf:resource="{datcatY-uri}"/> </rdf:Description>

9 Approach – Individuals/Instance Level
2018/4/14 2018/4/14 Approach – Individuals/Instance Level One step when (pre)processing incoming new MD-sets Express MD-Records as RDF-triples: Identify potential target Domain Ontologies/Vocabularies Create inverted Index: Define lookup function: Enrich dataset with new facts: Property-values of Metadata-Records are linked to instances of domain-ontologies <#mdrecord #property “string-value”> Category Label Entity dc:Organization „MPI“ #MPI „Max-Planck...“ „DFKI“ #DFKI „De Fo Kü In“ skos:LCSH „19th Poetry“ lcsh:19thPoetry skos:DDC ddc:19thPoetry label → entity lookup(category, string-value) → <external-entity, measure> <#mdrecord #property #external-entity>

10 Semantic Mapping - Linking and Data Flow
2018/4/14 2018/4/14 Semantic Mapping - Linking and Data Flow INCONSISTENT

11 Semantic Search - Query sequence
2018/4/14 2018/4/14 Semantic Search - Query sequence

12 Candidate Categories/Properties
2018/4/14 2018/4/14 Candidate Categories/Properties ResourceType, Format, AnnotationLevelType → map to: isocat-DataCategories (Thematic Views: Metadata, Morphosyntax, ...) Genre, Topic, Subject → map to: Taxonomies, Library Classification systems (LCSH, DDC, Dornseiff,...) Project, Institution, Person, Publisher open controlled vocabularies (real entities) → map to: LT-World (perhaps others: LCCN, DBPedia?)

13 2018/4/14 2018/4/14 Expected Results Specification + Prototype of a Semantic Mapping Component allowing to transform CMD-Metadata into RDF Specification + Prototype of a Semantic Search Component REST-WebService enriching the MD-Search, allowing query expansion and ontology/concept-based search CLARIN Metadata expressed as RDF/LOD-Dataset

14 Next Steps Literature → Related Work Linked Open Data Ontology Mapping
2018/4/14 2018/4/14 Next Steps Literature → Related Work Linked Open Data Ontology Mapping Ontology Browsing/Visualization Analyze Data Existing MD-Schemas (DC, OLAC, MODS, TEI, IMDI, CMD, ...) LT-World Ontology SKOS-Data available via Vocabulary Alignement Service LCSH, LCCN DBPedia

15 2018/4/14 2018/4/14 References - LRT [1] D. V. Uytvanck, C. Zinn, D. Broeder, P. Wittenburg, and M. Gardellini, \Virtual language observatory: The portal to the language resources and technology universe," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May [2] D. Broeder, M. Kemps-Snijders, D. V. Uytvanck, M. Windhouwer, P. Withers, P. Wittenburg, and C. Zinn, \A data category registry- and component-based metadata framework," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May [3] ISO12620:2009, \Computer applications in terminology { data categories {specification of data categories and management of a data category registry for language resources," [4] E. Hinrichs, P. Banski, K. Beck, G. Budin, T. Caselli, K. Eckart, K. Elenius, G. Faa, M. Gavrilidou, V. Henrich, V. Quochi, L. Lemnitzer, W. Maier, M. Monachini, J. Odijk, M. Ogrodniczuk, P. Osenova, P. Pajas, M. Piasecki, A. Przepiorkowski, D. V. Uytvanck, T. Schmidt, I. Schuurman, K. Simov, C. Soria, I. Skadina, J. Stepanek, P. Stranak, P. Trilsbeek, T. Trippel, and I. Vogel, \Interoperability and standards," deliverable, CLARIN, March [5] B. Jörg, H. Uszkoreit, and A. Burt, \Lt world: Ontology and reference information portal," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May 2010.

16 References – Semantic Technologies
2018/4/14 2018/4/14 References – Semantic Technologies [5] B. Jörg, H. Uszkoreit, and A. Burt, \Lt world: Ontology and reference information portal," in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10) (N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odjik, S. Piperidis, M. Rosner, and D. Tapias, eds.), (Valletta, Malta), European Language Resources Association (ELRA), May [6] Y. Kalfoglou and M. Schorlemmer, \Ontology mapping: the state of the art," The Knowledge Engineering Review, vol. 18, pp. 1{31, Jan [7] P. Shvaiko and J. Euzenat, \Ten challenges for ontology matching," in On the Move to Meaningful Internet Systems: OTM 2008 (R. Meersman and Z. Tari, eds.), vol of Lecture Notes in Computer Science, pp. 1164{1182, Springer Berlin / Heidelberg, / [8] M. Ehrig and Y. Sure, \Ontology mapping { an integrated approach," in The Semantic Web: Research and Applications (C. Bussler, J. Davies, D. Fensel, and R. Studer, eds.), vol of Lecture Notes in Computer Science, pp. 76{91, Springer Berlin / Heidelberg, / [9] S. Noah, N. Alias, N. Osman, Z. Abdullah, N. Omar, Y. Yahya, and M. Yusof, \Ontology-driven semantic digital library," in Information Retrieval Technology (P.-J. Cheng, M.-Y. Kan, W. Lam, and P. Nakov, eds.), vol of Lecture Notes in Computer Science, pp , Springer Berlin / Heidelberg, / [10] T. Berners-Lee, \Linked data." online: Status: personal view only. Editing status: imperfect but published. Last visited: [11] T. Heath and C. Bizer, \Linked data: Evolving the web into a global data space," Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 1, pp , Feb 2011.

17 Tasks / Open Issues (Who/How) Define Concept-Level Relations
2018/4/14 2018/4/14 Tasks / Open Issues (Who/How) Define Concept-Level Relations (Vocabulary Service Populate Vocabulary service translate Ontologies, Taxonomies Express MDRepo in RDF every profile is one Ontology Every MDRecord is an instance Ontology Mapping (compute similarities between profiles and between instances)

18 Questions/Discussion
2018/4/14 2018/4/14 Questions/Discussion Distinguish between relations (is it type vs. subclass?) ISA, a-kind-of = type subsumption (hypo/hyperonymy) = subClassOf Resource-Level: Annotation-Tiers of Resources are conceptLinked to DatCats Values of Annotation-Tiers are linked to DatCats Thierry: user rather Computer Linguist within an application (relevant in META-NET) How to employ Linguistic Ontologies? Lemon/LingInfo, isocat, GOLD, wals.info, Wordnet? Thierry: shouldn't be necessary, mainly for OntoPopulation from texts

19 2018/4/14 2018/4/14 MDService - Basics MDService accepts queries about metadata from MetadataBrowser (and external Applications) and passes them to the Metadata Repository(ies) and/or to the Virtual Collection Registry, optionally applying Semantic Mapping based on the information from Component Registry, Data Category Registries and Relation Registry receiving results and passing them (optionally formatted) back to the requesting node.

20 MDService - Functionality
2018/4/14 2018/4/14 MDService - Functionality  REST-interface (trac:WADL, MDService2/docs/htmlpage/wadl) collections list the „natural“ hierarchical collections-structure of the repository model return xml-elems used in the repository (with usage statistics) terms return terms/indices/xml-elems used in the repository enriched with a) the usage statistics (count occurrences and distinct values) b) the corresponding CMD-components and data categories values list distinct values for given index (similar to facet functionality) recordset retrieve a list of MDrecords based on a query [CQL] record retrieve individual MDrecord based on the identifier

21 MDBrowser - Functionality
2018/4/14 2018/4/14 MDBrowser - Functionality Dynamic Repositories Collections browsing Terms/Values browsing Query Input Simple full-text query Complex queries (CQL-searchclauses, boolean op) Index auto-completion Queryset/Resultset work with multiple results in parallel Paging Variable views (select columns, auto-columns) Workspace (storing queries, bookmarks) „Linkable“ Queries (Semantic Mapping)

22 CMDRSB - Situation and Outlook
2018/4/14 2018/4/14 CMDRSB - Situation and Outlook The MDRepository currently contains around records, mainly from the datasets: OLAC and IMDI (overview of collections) Currently there are three instances of the MDRepository running providing similar but not identical datasets: University of Gothenburg (main) ICLTT, Vienna MPI Psycholing, Nijmegen A first version of the MDService and Browser is online: clarin.aac.ac.at/MDService2 Although the repository and interface already provide a lot of information and functionality, it is demo-quality and cannot yet be seen as reliable service. Lot of work is still needed both on the data quality and user interface: Enhancing the UI (based on feedback from Nijmegen ) continuous integration of new datasets (provided for harvest by the centres) Nevertheless we invite you to try it out and look forward to any critical remarks (they can be accessed by the same MDService, by switching the target repository in the UI)


Download ppt "2018/4/14 SMC4LRT Semantic Mapping Component for Language Resources and Technology 2011-06-06 Matej Ďurčo, ICLTT, Vienna;"

Similar presentations


Ads by Google