Implementing the TEI Feature System Declaration Gary F. Simons SIL International ___________________________ TEI Members Meeting 11 Oct 2002, Chicago.

Slides:



Advertisements
Similar presentations
Proposal for change to VOTable schema Without changing set of valid documents (modulo undesired ones)
Advertisements

W3C SML F2F XML Schema 1.1 Sandy Gao, IBM.
The eXtensible Markup Language (XML) An Applied Tutorial Kevin Thomas.
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: text format Dr Andy Evans. Text-based data formats As data space has become cheaper, people have moved away from binary data formats. Text easier.
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
XSLT (eXtensible Stylesheet Language Transformation) 1.
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
Extensible Markup Language Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
Features & Unification Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Features & Unification Ling 571 Deep Processing Techniques for NLP January 31, 2011.
Uncovering the TEI and ODD A pedagogical strip-tease Laurent Romary - Max Planck Digital Library.
EAGLES/ISLE Workshop LREC 2000 Athens, Greece The XML Framework Its Implications for Corpus Access and Use Nancy Ide Department of Computer Science Vassar.
XML Introduction What is XML –XML is the eXtensible Markup Language –Became a W3C Recommendation in 1998 –Tag-based syntax, like HTML –You get to make.
XML: What, Why, When & How? Hope Greenberg Center for Teaching & Learning June 11 & 18.
Jennifer Widom XML Data XML Schema. Jennifer Widom XML Schema “Valid” XML Adheres to basic structural requirements  Also adheres to content-specific.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA eXtensible Markup Language version 1.0 Recommendation,
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XML CPSC 315 – Programming Studio Fall 2008 Project 3, Lecture 1.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
Using the TEI framework as a possible serialization for LMF Laurent Romary INRIA & HUB-IDSL
Experiments with ODD outside the TEI framework Laurent Romary & Piotr Banski The ISO-TEI connection.
1 XHTML محمد احمدی نیا 2 Of 19 HTML vs XHTML  XHTML is a stricter and cleaner version of HTML.  by combining the strengths of HTML.
XHTML,XML M.Abdullah Mrian. What is the XHTML Why XHTML ?
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
November 1, 2006IU DLP Brown Bag : Fall Data Integrity and Document- centric XML Using Schematron for Managing Text Collections Dazhi Jiao, Tamara.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
XML and Validation Tools Schema Schematron. XML eXtensible Markup Language (XML) –A metamarkup language. –The basic unit is called an element –Fairly.
17 Apr 2002 XML Syntax: Documents Andy Clark. Basic Document Structure Element tags – Elements have associated attributes Text content Miscellaneous –
Sheet 1XML Technology in E-Commerce 2001Lecture 2 XML Technology in E-Commerce Lecture 2 Logical and Physical Structure, Validity, DTD, XML Schema.
XML for Text Markup An introduction to XML markup.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
Schematron Tim Bornholtz. Schema languages Many people turn to schema languages when they want to be sure that an XML instance follows certain rules –DTD.
Jennifer Widom XML Data Introduction, Well-formed XML.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI / Search.
MEDIN Standards Workshop Standards / XML / Validation / Transformation / ESRI / Search.
ENG 5933 Humanities Computing Introduction to XML
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
CIS 228 The Internet 9/20/11 XHTML 1.0. “Quirks” Mode Today, all browsers support standards Compliant pages are displayed similarly There are multiple.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
1 XSLT XSLT (extensible stylesheet language – transforms ) is another language to process XML documents. Originally intended as a presentation language:
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
XML: Extensible Markup Language
Unit 4 Representing Web Data: XML
Tutorial 9 Working with XHTML
XML Examples AIXM 5 RC2.
Java XML IS
Eugenia Fernandez IUPUI
A year in the life of the council
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Chapter 7 Representing Web Data: XML
Tutorial 9 Working with XHTML
XML Data Introduction, Well-formed XML.
XML Data DTDs, IDs & IDREFs.
Presented by: Jacky Ma Date: 11 Dec 2001
Introduction to HTML5.
Review of XML IST 421 Spring 2004 Lecture 5.
Unit 6 - XML Transformations
Presentation transcript:

Implementing the TEI Feature System Declaration Gary F. Simons SIL International ___________________________ TEI Members Meeting 11 Oct 2002, Chicago

2 What is a feature structure? A device for the linguistic analysis of text A recursive bundle of feature-value pairs ┍ ┑ │ category = noun │ │ wordForm = Kind │ │ proper = - │ │ ┍ ┑ │ │ agreement = │gender = neut│ │ │ │number = sg │ │ │ │case = nom │ │ │ ┕ ┙ │ ┕ ┙

3 In TEI markup Kind

4 What is an FSD? An auxiliary document type used in conjunction with markup to: Document the allowed features Document their allowed values Specify default values for underspecified features Specify constraints on feature co-occurrence In short: “It’s an XML schema language for markup.”

5 An implementation strategy Use XSLT scripts to generate XSLT scripts — inspired by Schematron Compilation phase (applied to FSD) 1.Script-1 generates script-3 to add defaults 2.Script-2 generates script-4 to test validity Execution phase (applied to document) 3.Script-3 adds default feature values 4.Script-4 generates an HTML report of violations

6 The tricky bit: subsumption Default specifications and co-occurrence constraints are based on subsumption — a subsumption test translates to an XPath E.g., an English pronoun has gender if and only if it is third person and singular The current has gender: test="current()[ ' gender ' ] ]" The current is third person singular: test="current()[ ] [ ] "

7 Errors reported by validator The feature structure type Type is not defined in the FSD. A feature has no name. The feature structure violates a constraint. The feature named Name is not defined for the current fs type. The value of the feature named Name is not in the value range defined for it in the FSD. The feature named Name is not allowed to have more than one value.

8 Sample error report In /TEI.2/text/body/div[2]/fsLib/fs[3]:  The feature structure violates a constraint.  pronoun [ pron-type: personal pers: 3rd number: pl gender: feminine ]  If the feature structure has: [ gender: any ], i t must also have: [ pers: 3rd; number: sg ].

9 Toward an ISO standard? It has been proposed that TEI feature structure markup be put forward to the new ISO TC37/SC4 as a proposed standard TC 37 — “Terminology and other language resources” SC 4 — “Language resources” Chair: Laurent Romary

10 Some issues Current DTDs for and FSD are intertwined with the TEI DTD: An ISO standard would need to stand on its own. Current scheme has bells and whistles that have never been implemented: An ISO standard should be simplified and be backed by a working implementation.

11 Making it stand on its own Drop TEI extension mechanisms in favor of fixed names and content models. In the DTD for the FSD: Drop dependency on TEI header in favor of a header with a content model of ANY. Drop dependency on TEI %paraContent in favor of a documentation element with a content model of ANY.

12 Making it simpler Drop most global attributes. Drop ; is adequate. Drop value types motivated by general data representation (e.g.,, ) Rethink special values in light of implemen- tation (e.g.,,, ) Rethink relation attribute in light of implementation (e.g. eq, ne, sb, ns)