Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information A Multi-Layered, XML-Based Approach to the Integration of Linguistic and Semantic Annotations Thierry Declerck, Paul Buitelaar University of the Saarland & DFKI GmbH Saarbrücken, Germany In this presentation are also slides and graphics included, which are taken from three presentations at the EUROLAN 2003 in Bucharest. Authors are P.Vossen (Wordnet, EuroWordNet, Global Wordnet), A. Lenci (Computational Lexicons and the Semantic Web) and Srini Narayanan (FrameNet Meets the Semantic Web). Also included are graphics from M. Fernández-López and A. Gómez-Pérez Asun Gomez Perez (UPM) from the deliverable 1.2 of the Esperonto Project
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Overview Semantic Web Applications of LT Annotation of Web Documents with Ontology- based Metadata (Knowledge Markup) Ontology Learning through Text Mining from Annotated Corpora Integration of Annotations Use of Different Tools Use of Different Knowledge Sources Motivations
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Overview … Linguistic and Semantic Annotations Linguistic: e.g. PoS, Lemma, Phrase Structure Semantic: e.g. Concepts, Relations, Events Objectives: Integration of… … Annotations from Different Resources e.g. Different Domains … Annotations in Different Formats e.g. from Different Tools
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Knowledge Markup and Knowledge Extraction Text/Speech/Image-Video Text/Speech/Media Mining Concepts, Relations, Events Linguistic and Media Analysis Linguistic, Low-level Image and Semantic Annotations
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Annotations Projects, Tools and Resources Projects MuchMore:Cross-lingual Information Retrieval, Medical Domain Mumis: Content-based Multimedia Retrieval, Soccer Domain Tools and Resources MuchMore:Integration of Shprot (TnT, Mmorph, Chunkie) with Semantic Tagging Tools (UMLS – Medical Semantic Resource, EuroWordNet) Mumis:Schug, Integration of SPPC with Rule-based Chunking and Shallow Dependency Analysis, Event Structure (Mumis Soccer Ontology)
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information document sentence umlsterms xrceterms ewnterms semrels gramrels chunks text cui sense umlsterm xrceterm ewnterm semrel gramrel chunk token to id from to offset from id code type term2 term1 id pref tui code pref tui type id to id from type id pos lemma msh cui msh Annotations MuchMore
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions. Balint syndrom is a combination of symptoms... spatial perception and representation... > Annotations MuchMore: Linguistic
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions. Annotations MuchMore: Semantic
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information DocumentSentenceParagraph PP VG NP NE AP AdvP Subord-Clause Annotations Mumis
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information AP TYPE STRUK AP_AGR STRING AP_HEAD W Annotations Mumis
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information VG TYPE VG_SUBCAT_STEM STRING KLAMMER VG_STRG SENT_STRING VG_TYPE VG_AGR STRUK VG_HEAD... VG W Annotations Mumis
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information W INFL STRING CLAUSE_PRED_SUBCAT CLAUSE_PP_LIST... CLAUSE_TYPE TC CLAUSE_SUBJ CLAUSE_PRED_STRG STEM TYPE SENT_STRING CLAUSE_VG_LIST CLAUSE_PRED_AGR CLAUSE POS CLAUSE_PP_ADJUNKT CLAUSE_NP_LIST Annotations Mumis
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Annotations Integration Objectives Integrate Linguistic and Semantic Information from the MuchMore and Mumis Annotations, e.g. Enrich MuchMore: Head/Complement of Chunks, Clauses Enrich Mumis: EuroWordNet, Medical Ontology Approach MuchMore uses Multilayered Annotation over Indexes (‘standoff’) Introduce Mumis Annotations as Additional Layers Problems Integration of Overlapping Layers (i.e. Additional Attributes)
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Industrie, Handel und Dienstleistungen werden in der ersten Liste aufgeführt, wobei die in Klammern gesetzten Zahlen auf die Mutterfirmen hinweisen. (Industry, trade and services are mentioned in the first list, in which numbers within brackets point to parent companies.) …. Annotations Mumis: Linguistic
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Ein Freistoss von Christian Ziege aus 25 Metern geht über das Tor. (A 25-meter free-kick by Christian Ziege goes over the goal.) Annotations Mumis: Semantic
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Conclusions MuchMore and MUMIS Work in Progress Development of Compatibility between the Formats Full Integration of the Formats Possible Future Work Integration of the Formats on a more Abstract Level, i.e. by Use of Data Categories as Specified by ISO/TC37/SC4 Separating Text Data from Annotation. Multiple pointing to Annotations. Extension to Multimedia
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Esperonto: Overview Applications Router Agent XMLDAMLOILRDF(S) Certificate Workbench Maintenance Multilinguality Reengineering Mapping Ontology Repository Service Tagger/ Wrapper Web Server Provider Dynamic Information Provider Static Information Provider Multimedia Data Provider Multilingual NL Understanding World Wide Web Semantic Web Visualization Service Provider SemASP Multilingual NL Generation Portal Agent Tagger/ Wrapper Tagger/ Wrapper Tagger/ Wrapper Router Semantic indices, Concept instances
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Ontologies (Classification) Lassila and McGuinness [Lassila and McGuinness, 2001] categorization
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Ontologies(classification) Van Heijst and colleagues [Van Heijst et al., 1997] categorization
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Knowledge Architecture
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Esperonto Knowledge Architecture
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Abstracting over Linguistic Information in Esperonto Ontology_1: NP Head:N Mod: {Adj*,PP?} Spec: {Det? PossPron} Type: {RefNP, ProNP, DateNP,etc.} Ontology_2: PP Head: Prep Type: {LocPP,DatePP, etc.} Comp: NP Ontology_3: Grammatical Functions Subject, Object, Ind. Object NP Adjunct, PP Adjunct, etc.. Ontology_4: Dependencies Head Comp Mod Spec
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information From WordNet to EuroWordNet voorwerp {object} lepel {spoon} werktuig{tool} tas {bag} bak {box} blok {block} lichaam {body} Wordnet1.5Dutch Wordnet bag spoon box object natural object (an object occurring naturally) artifact, artefact (a man-made object) instrumentality blockbody container device implement tool instrument
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Relations of EWN to Top-Level Ontologies ReferenceOntologyClasses: BOX ContainerProduct; SolidTangibleThing Language-Neutral Ontology object box container box container WordNet1.5 Language-Specific Wordnets doos voorwerp Dutch Wordnet EuroWordNet Top-Ontology: Form: Cubic Function: Contain Origin: Artifact Composition: Whole
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Framenet: Events in Syntactic Context events artifacts, built objects natural kinds, parts and aggregates institutions, belief systems, practices space, time, location, motion etc. Let us take a commercial transaction as an example of an event. The following (partial) wordlist is showing lexical realization of the event: Verbs: pay, spend, cost, buy, sell, charge Nouns:cost, price, payment Adjectives: expensive, cheap
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Semantic and Domain Specific Information in the Simple/Parole Framework semantic frame semantic relations ontology
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Combining Ontological and “Linguistic Ontology” (EWN, Parole/Simple) Torschuss abzieh URL: DFB home page/glossary
Linguistic and Semantic Information for the Semantic Web LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Actual Work Including FrameNet for 3 Languages. Including new semantic classes for Adj., Adverbs, Polarity etc. New improved annotation schema for syntactic/Semantic annotation A declarative set of mapping rule Linguistic Ontology (domain ontologies). The Onto-LT frameowrk (see paper by P. Buitelaar & al at LREC).