Download presentation
Presentation is loading. Please wait.
Published byJeffrey Chandler Modified over 9 years ago
1
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 1 The GOLD Effort So Far Terry Langendoen Brian Fitzsimons Emily Kidder Department of Linguistics University of Arizona
2
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 2 Acknowledgments Everyone else who’s worked on E-MELD at U Arizona 2001-05, especially: Graduate students: Scott Farrar, Will Lewis, Peter Norquest, Ruby Basham Undergraduate students: Jesse Kirchner, Shauna Eggers, Alexis Lanham, Sandy Chow Everyone who’s worked on E-MELD elsewhere, especially: Gary, Helen, Anthony, Laura, Zhenwei, Baden, Doug
3
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 3 Whalen’s problem “We want to be able to describe the data in just the way we want, but we don’t want to program it.” Doug Whalen, at 2001 E-MELD Workshop
4
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 4 Our problem We want to be able to describe the data in just the way we want, and we want to be able to use everybody else’s data described in just the way they want, and we want to be able to process it in all kinds of ways that make sense to us as scientists and teachers. Call this the interoperability problem.
5
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 5 TEI’s data interchange solution Create a “data interchange” format such as the Text Encoding Initiative’s P3. Require projects that wish to share data to define mappings to and from the interchange format. φ ψˉ¹ X ——————-> P3 ——————> Y ψ φˉ¹ Y ——————-> P3 ——————> X
6
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 6 Two lessons from the TEI Use a standard markup language. Our choice (like theirs): XML. Individual projects don’t have to use XML, but their software should export to XML.
7
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 7 XML markup is syntax In TEI, the tags, and were designed to delimit sentences, words and morphemes respectively. But they can be used to describe any three-level hierarchy over character strings, such as: = sentence, = word, = morpheme = paragraph, = sentence, = word = chapter, = paragraph, = morpheme = big chunk, = middle-size chunk, = small chunk
8
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 8 Two avenues to markup semantics The syntax is the semantics (SIS) This is essentially the TEI solution. Leave the semantics to us (LSU) Essentially the “Semantic Web” idea
9
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 9 Problems with SIS Hard sell. Based on the TEI experience, it’ll be hard to convince linguists to use it. Expensive. It will be costly to retrofit existing resources to conform to it. Fragile. Future changes will be likely to break existing applications.
10
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 10 Advantages of LSU Easier sell. Can have lots of special purpose markup schemas for different purposes, which will be easier to use. Cheap. Migration to best practice much less costly. Robust. Changes are less likely to break existing applications.
11
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 11 Place of a linguistic ontology as part of LSU The central component of LSU is a linguistic ontology that: defines the common concepts used in linguistic analysis and description, expresses the relations that hold among those concepts, relates those concepts to concepts of common- sense understanding (“upper” ontology) and concepts in other disciplines.
12
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 12 Proof of concept that it works Last year, the Arizona team, together with Gary, Scott, and Will’s team at CSU Fresno, showed that GOLD could be used for smart searching across massive cross-linguistic databases created from XML documents of different types. Interlinear glossed texts Lexicons
13
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 13 The GOLD Summit Last November, Will hosted a summit meeting of researchers most involved with GOLD to plan for its further development and maintenance after Arizona’s E-MELD funding ran out yesterday. It recommended: Creating a GOLD website. Forming a GOLD Council with oversight responsibility, and putting procedures in place using the OLAC model to foster and evaluate development and maintenance. Focusing the E-MELD 2005 workshop on GOLD.
14
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 14 Current state of play We’re proposing to move GOLD “out of the lab” effective with this meeting despite the fact that: GOLD version 0.2 has very small coverage, even within morphosyntax, and many areas of the field are not covered at all. Several important design issues have not been settled. What upper ontology should we use? (Currently SUMO) Some “core GOLD” concepts are in flux. We broke last year’s applications with our redesign of the treatment of grammatical features.
15
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 15 Classes and instances in GOLD 0.1 (“Old GOLD”) Reasoning with classes and instances If i is of type A and A is a subclass of B, then i is of type B. For example, a search for instances of Verb will find all instances of both TransitiveVerb and IntransitiveVerb.
16
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 16 A problem with saying what we want about language X In language X, verbs are inflected only for tense. Verb inflectedFor Tense? This won’t do if both subject and object of the relation are classes. Fails to represent the claim that tense is the only feature that verbs are inflected for in X. XVerb inflectedFor XTense? OK, since XVerb and XTense are both instances (of the GOLD classes Verb and Tense respectively) Lack of other inflectional features will show up in response to query.
17
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 17 A problem with saying what we want in GOLD XTense hasValue XFutureTense OK since hasValue relates instances. Tense hasValue FutureTense Not OK since hasValue relates classes.
18
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 18 Parallel structures for GOLD and language-specific concepts Allow certain GOLD concepts to be instances of other GOLD classes. In particular, define atomic feature values as instances of particular feature classes. Allow certain language-specific concepts to be classes that are instantiated by other language-specific concepts. In particular, define language-specific features as classes instantiated by their language-specific values.
19
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 19 Feature systems as substructures Any /|\ NonP HodP PreHodP TenseSystem-x as a substructure of TenseFeature
20
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 20 Mapping from a language class to a GOLD class +------------+ | Any <------+----+-- XAny | | | | NonP <-----+----+-- XPres | | | | HodP <-----+----+-- XRecP | | | | PreHodP <--+----+-- XRemP | +------------+ Mapping to GOLD TenseSystem-x from XTense
21
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 21 Isomorphism between a language system and a GOLD system XAny /|\ XPres XRecP XRemP XTense system isomorphic to TenseSystem-x
22
July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 22 Future of GOLD ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.