Nov. 17, 2004© Artem Chebotko, OntoELAN: An Ontology-Based Linguistic Multimedia Annotator Speaker: Artem Chebotko Department of Computer Science Wayne State University
Nov. 17, 2004© Artem Chebotko, Coauthors From left: Ms. Yu Deng, graduated with M.S. in Computer Science in 2004; Prof. Shiyong Lu, Computer Science, my advisor; Prof. Farshad Fotouhi, Computer Science, Chair of the department; Prof. Anthony Aristar, Dept. of English, Linguistics Program. All at the Wayne State University. Hennie Brugman, Alexander Klassmann, Han Sloetjes, Albert Russel, Peter Wittenburg, Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands. Acknowledgements: Laura Buszard-Welcher and Andrea Berez, Dept. of English, Linguistics Program, WSU.
Nov. 17, 2004© Artem Chebotko, The Outline of The Talk Background and Motivation The Limitations of Existing Tools Our Approach and Advantages An Overview of OntoELAN Demo
Nov. 17, 2004© Artem Chebotko, Background and Motivation Linguistics Many languages are in serious danger of being lost In fact, half of the world's approximately 6,500 languages may disappear in the next 100 years Language data is critical to the research of linguistics, anthropology, history, sociology, and political science, etc. Language data is also important for the community of that language.
Nov. 17, 2004© Artem Chebotko, Background and Motivation Multimedia Many language data are collected as audio and video recordings Difficult for indexing and retrieval because multimedia data are not structured and their semantics are implicit in their contents. Annotation of multimedia data provides an opportunity for making the semantics explicit
Nov. 17, 2004© Artem Chebotko, Background and Motivation Ontology-based annotation An ontology is an explicit specification of a shared conceptualization. It formalizes the knowledge of various concepts and their relationships in a particular domain Annotation with ontological terms, whose meaning is known and understood by the domain community
Nov. 17, 2004© Artem Chebotko, Requirements for a Linguistic Multimedia Annotator Support for the annotation of descriptive metadata such as title, authors, date, time, etc. Support for a time axis and temporal segmentation of clips into slots Support for multiple-tier annotation, with each tier providing one avenue for annotation Support for ontology-based annotation to avoid incompatible formats and vocabularies
Nov. 17, 2004© Artem Chebotko, The Limitations of Existing Tools Either don’t support ontology IBM MPEG-7 Annotation Tool, ELAN or provide limited support of multimedia Protégé, ImageSpace, IBM MPEG-7 Annotation Tool ToolsDescriptive annotation Temporal segmentation Multi-tier annotation Ontology support ProtégéYesNo Yes IBM MPEG-7YesNo ImageSpaceYesNo Yes ELANYes No
Nov. 17, 2004© Artem Chebotko, Our Approach and Advantages We developed an ontology-based annotation tool, OntoELAN, for linguistic multimedia data that satisfies all the above requirements The ontological approach eliminates multiple incompatible annotation formats if the whole community can agree upon one domain ontology Annotations are formally defined and machine interpretable Deduction of additional, implicit information Search is precise and easier
Nov. 17, 2004© Artem Chebotko, An Overview of OntoELAN Developed on the top of ELAN annotator Max Planck Institute for Psycholinguistics team Features inherited from ELAN display a speech and/or video signals, together with their annotations; time linking of annotations to media streams; linking of annotations to other annotations; unlimited number of annotation tiers as defined by a user; different character sets; basic search facilities.
Nov. 17, 2004© Artem Chebotko, An Overview of OntoELAN Ontology support Wayne State University team New features language profile creation; ontology-based annotation; storing annotations in the XML format based on the General Multimedia Ontology and domain ontologies.
Nov. 17, 2004© Artem Chebotko, An Overview of OntoELAN
Nov. 17, 2004© Artem Chebotko, An Overview of OntoELAN
Nov. 17, 2004© Artem Chebotko, Linguistic Domain Ontology One example is the General Ontology for Linguistic Description (GOLD) Developed at University of Arizona Expressions OrthographicExpression, Utterance, SignedExpression, Word, WordPart Grammar Tense, Number, Agreement, PartOfSpeech PartOfSpeech: Noun, Verb, Participle, Preverb Data structures A lexical entry, a phoneme table and a syntactic tree Metaconcepts Language itself
Nov. 17, 2004© Artem Chebotko, General Multimedia Ontology Simple semantic framework for multimedia annotation Developed at Wayne State University especially for OntoELAN AnnotationDocument Tier TimeSlot Annotation AlignableAnnotation ReferringAnnotation AnnotationValue StringAnnotation OntologyAnnotation etc.
Nov. 17, 2004© Artem Chebotko, General Multimedia Ontology
Nov. 17, 2004© Artem Chebotko, Language Profile … is a subset of ontological terms, possibly renamed, that are used in the annotation of a particular multimedia resource ontological terms user-defined terms a mapping between ontological terms and user- defined terms a reference to an ontology
Nov. 17, 2004© Artem Chebotko, Language Profile Advantages Only a subset of ontological terms is useful for a particular resource annotation Renaming ontological terms, e.g. use another language, give an abbreviation or a synonym Combining the meaning of two or many ontological terms in one user-defined term. Disadvantage More work
Nov. 17, 2004© Artem Chebotko, Language Profile
Nov. 17, 2004© Artem Chebotko, Annotation Tiers and Linguistic Types Annotation tiers contain annotation values can be either alignable or referring are associated with their linguistic types Linguistic types None Time Subdivision Symbolic Subdivision Symbolic Association Ontological tier
Nov. 17, 2004© Artem Chebotko, Linguistic Multimedia Annotation with OntoELAN Language profile creation Creation of tiers Creation of annotations
Nov. 17, 2004© Artem Chebotko, Linguistic Multimedia Annotation with OntoELAN
Nov. 17, 2004© Artem Chebotko, Demos Language profile creation profile01.swf profile01.AVI profile01.swfprofile01.AVI profile02.swf profile02.AVI profile02.swfprofile02.AVI Creation of tiers & Creation of annotations annotate01.swf annotate01.AVI annotate01.swfannotate01.AVI annotate02.swf annotate02.AVI annotate02.swfannotate02.AVI
Nov. 17, 2004© Artem Chebotko, Conclusions and Future Work OntoELAN is the first attempt at annotating linguistic multimedia data with a linguistic ontology Future Work provide more channels for sharing data on the Web, such as the multimedia descriptions, the language words, etc. improve the current searching system integrate a text document annotation
Nov. 17, 2004© Artem Chebotko, References Artem Chebotko, Yu Deng, Shiyong Lu and Farshad Fotouhi. An Ontology-based Multimedia Annotator for the Semantic Web of Language Engineering. International Journal on Semantic Web and Information Systems, January, Artem Chebotko et al. OntoELAN: An Ontology-based Linguistic Multimedia Annotator. Proc. of the IEEE Sixth International Symposium on Multimedia Software Engineering (IEEE-MSE'2004), Miami, FL, USA, December, 2004.
Nov. 17, 2004© Artem Chebotko, References OntoELAN LangDL: A Digital Library For Language Engineering And Research ELAN E-MELD GOLD General Multimedia Ontology
Nov. 17, 2004© Artem Chebotko, Questions? Contact information Artem Chebotko