July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 1 The GOLD Effort So Far Terry Langendoen Brian Fitzsimons Emily Kidder Department of Linguistics.

Slides:



Advertisements
Similar presentations
Can I Use It, and If so, How? Christian Lieske SAP AG – MultiLingual Technology Discussion of Consortium Proposal for OLIF2 File Header.
Advertisements

The Seven Pillars of Open Language Archiving: A Vision Statement Gary Simons and Steven Bird Workshop on Web-based Language Documentation and Description.
Outreach Jeff Good UC Berkeley. OLAC's Needs Maximal involvement from the whole community –The more data providers involved the more useful the services.
Helen Dry & Anthony Aristar LINGUIST List: LSA Symposium: The Open Language Archives Community 4 January 2002http://linguistlist.org.
Requirements. UC&R: Phase Compliance model –RIF must define a compliance model that will identify required/optional features Default.
West Virginia Department of Education May Why this webinar? To provide additional guidance … To provide additional models … To help you revise your.
Language data and XML: archiving and interoperability Simon Musgrave Linguistics Program Monash University
Improvements on the benchmark suites. October 10th © Raúl García-Castro Improvements on the benchmark suites Raúl García-Castro October 10th, 2005.
Solutions to Review Questions. 4.1 Define object, class and instance. The UML Glossary gives these definitions: Object: an instance of a class. Class:
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Publishing Workflow for InDesign Import/Export of XML
IS 373—Web Standards Todd Will
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
1 Metadata, Structured Documents, and XML. 2 Metadata Literally “data about data” –“a set of data that describes and gives information about other data”
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Resource Discovery (metadata and searching) Working Group Report.
Principles of the GOLD Ontology & Conversion of GOLD to DCIF Presenters: Anthony Aristar, Evelyn Richter.
 The results of your research should be more than just a regurgitation of the facts or a summary of other people’s ideas. They should be based on new.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Practical RDF Chapter 1. RDF: An Introduction
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
© British Council, All rights reserved. Language Awareness in the Primary Classroom An ELIS WSA-EC course, under licence from British Council Session.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
Categories of Vocabulary Compatibility Dmitry Lenkov Oracle.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Scientific writing style Exact  Word choice: make certain that every word means exactly what you want to express. Choose synonyms with care. Be not.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
Lis512 lecture 4 XML: documents and records. up until now Relational databases can store information that is internal to an organization. But a lot of.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
A Common Ontology for Linguistic Concepts Scott Farrar University of Arizona.
Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U.
Resource Description Framework (RDF) Course: Electronic Document Team member: Ding Feng Ding Wei Wang Ling Date:
Semantic Web - an introduction By Daniel Wu (danielwujr)
An Ontology for Linguistic Representation Scott Farrar, Terry Langendoen, William Lewis University of Arizona.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Aug 2-5, 2002 EMELD Workshop Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization.
Technology – Broad View Aspects that play a role when integrating archives leave the details of some core topics to the 2. day Bernhard Neumair:Base Technologies.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
SIL FieldWorks Language Explorer: The lexicon component Gary Simons SIL International Lexicon Tools and Lexicon Standards Nijmegen, 4–5 August 2010.
Standards for Technology in Automotive Retail STAR Update Michelle Vidanes STAR XML Data Architect April 30 th, 2008.
Metadata Schema for CERIF Andrei Lopatenko Vienna University of Technology
XML Alyssa Roberts. What is XML? Extensible Markup Language Specification to creating custom mark-up languages Simplified version of SGML, originally.
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
WP 3: Standardisation of shared metadata Mode of operation –All partners are involved –Building on practice outside the project Achievements of Year 1.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
XML. HTML Before you continue you should have a basic understanding of the following: HTML HTML was designed to display data and to focus on how data.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Connecting to External Data. Financial data can be obtained from a number of different data sources.
Chapter 8A Semantic Web Primer 1 Chapter 8 Conclusion and Outlook Grigoris Antoniou Frank van Harmelen.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Presentation transcript:

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 1 The GOLD Effort So Far Terry Langendoen Brian Fitzsimons Emily Kidder Department of Linguistics University of Arizona

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 2 Acknowledgments Everyone else who’s worked on E-MELD at U Arizona , especially:  Graduate students: Scott Farrar, Will Lewis, Peter Norquest, Ruby Basham  Undergraduate students: Jesse Kirchner, Shauna Eggers, Alexis Lanham, Sandy Chow Everyone who’s worked on E-MELD elsewhere, especially:  Gary, Helen, Anthony, Laura, Zhenwei, Baden, Doug

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 3 Whalen’s problem “We want to be able to describe the data in just the way we want, but we don’t want to program it.”  Doug Whalen, at 2001 E-MELD Workshop

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 4 Our problem We want to be able to describe the data in just the way we want, and we want to be able to use everybody else’s data described in just the way they want, and we want to be able to process it in all kinds of ways that make sense to us as scientists and teachers. Call this the interoperability problem.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 5 TEI’s data interchange solution Create a “data interchange” format such as the Text Encoding Initiative’s P3.  Require projects that wish to share data to define mappings to and from the interchange format. φ ψˉ¹ X ——————-> P3 ——————> Y ψ φˉ¹ Y ——————-> P3 ——————> X

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 6 Two lessons from the TEI Use a standard markup language.  Our choice (like theirs): XML. Individual projects don’t have to use XML, but their software should export to XML.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 7 XML markup is syntax In TEI, the tags, and were designed to delimit sentences, words and morphemes respectively.  But they can be used to describe any three-level hierarchy over character strings, such as:  = sentence, = word, = morpheme  = paragraph, = sentence, = word  = chapter, = paragraph, = morpheme  = big chunk, = middle-size chunk, = small chunk

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 8 Two avenues to markup semantics The syntax is the semantics (SIS)  This is essentially the TEI solution. Leave the semantics to us (LSU)  Essentially the “Semantic Web” idea

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 9 Problems with SIS Hard sell. Based on the TEI experience, it’ll be hard to convince linguists to use it. Expensive. It will be costly to retrofit existing resources to conform to it. Fragile. Future changes will be likely to break existing applications.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 10 Advantages of LSU Easier sell. Can have lots of special purpose markup schemas for different purposes, which will be easier to use. Cheap. Migration to best practice much less costly. Robust. Changes are less likely to break existing applications.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 11 Place of a linguistic ontology as part of LSU The central component of LSU is a linguistic ontology that:  defines the common concepts used in linguistic analysis and description,  expresses the relations that hold among those concepts,  relates those concepts to concepts of common- sense understanding (“upper” ontology) and concepts in other disciplines.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 12 Proof of concept that it works Last year, the Arizona team, together with Gary, Scott, and Will’s team at CSU Fresno, showed that GOLD could be used for smart searching across massive cross-linguistic databases created from XML documents of different types.  Interlinear glossed texts  Lexicons

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 13 The GOLD Summit Last November, Will hosted a summit meeting of researchers most involved with GOLD to plan for its further development and maintenance after Arizona’s E-MELD funding ran out yesterday. It recommended:  Creating a GOLD website.  Forming a GOLD Council with oversight responsibility, and putting procedures in place using the OLAC model to foster and evaluate development and maintenance.  Focusing the E-MELD 2005 workshop on GOLD.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 14 Current state of play We’re proposing to move GOLD “out of the lab” effective with this meeting despite the fact that:  GOLD version 0.2 has very small coverage, even within morphosyntax, and many areas of the field are not covered at all.  Several important design issues have not been settled.  What upper ontology should we use? (Currently SUMO)  Some “core GOLD” concepts are in flux.  We broke last year’s applications with our redesign of the treatment of grammatical features.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 15 Classes and instances in GOLD 0.1 (“Old GOLD”) Reasoning with classes and instances  If i is of type A and A is a subclass of B, then i is of type B.  For example, a search for instances of Verb will find all instances of both TransitiveVerb and IntransitiveVerb.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 16 A problem with saying what we want about language X In language X, verbs are inflected only for tense.  Verb inflectedFor Tense?  This won’t do if both subject and object of the relation are classes.  Fails to represent the claim that tense is the only feature that verbs are inflected for in X.  XVerb inflectedFor XTense?  OK, since XVerb and XTense are both instances (of the GOLD classes Verb and Tense respectively)  Lack of other inflectional features will show up in response to query.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 17 A problem with saying what we want in GOLD XTense hasValue XFutureTense  OK since hasValue relates instances. Tense hasValue FutureTense  Not OK since hasValue relates classes.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 18 Parallel structures for GOLD and language-specific concepts Allow certain GOLD concepts to be instances of other GOLD classes. In particular, define atomic feature values as instances of particular feature classes. Allow certain language-specific concepts to be classes that are instantiated by other language-specific concepts. In particular, define language-specific features as classes instantiated by their language-specific values.

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 19 Feature systems as substructures Any /|\ NonP HodP PreHodP TenseSystem-x as a substructure of TenseFeature

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 20 Mapping from a language class to a GOLD class | Any < XAny | | | | NonP < XPres | | | | HodP < XRecP | | | | PreHodP < XRemP | Mapping to GOLD TenseSystem-x from XTense

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 21 Isomorphism between a language system and a GOLD system XAny /|\ XPres XRecP XRemP XTense system isomorphic to TenseSystem-x

July 1-3, 2005 E-MELD 2005 Ontologies in Linguistic Annotation 22 Future of GOLD ?