Download presentation
Presentation is loading. Please wait.
Published byEleanore Henry Modified over 9 years ago
2
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C
3
TEI History The developing organizations first met in 1987 –Association for Computers and the Humanities (ACH) –Association for Computational Linguistics (ACL) –Association for Literary and Linguistic Computing (ALLC) 1990—first Version TEI P1 1992—TEI P2 1993—TEI P3
4
TEI History Continued Principles for the development of TEI –Standard format for data interchange in humanities research –Guidelines for encoding texts in the same format –Define a recommended syntax –Define a meta language for description of text-encoding schemes Future Developments –Linguistic description and grammatical annotation –Historical analysis and interpretation –Base tag sets for further document types –Manuscript analysis and physical description of text
5
General Introduction to SGML and XML
6
The Evolution of SGML and XML 1960’ Generalized Markup Language by IBM 1960’s 1970’s & 1980’s ANSI initiates project to develop a Standard text- description language based on GML 1983 SGML became an industry standard 1986 ISO ratified a standards for SGML 1990’s Tim Berners-Lee developed HTML a simple formatting markup language for the World Wide Web Mid 1990’s XML was developed by the W3C to combine the flexibility of SGML and the simplicity of HTML
7
Benefits of SGML and XML SGML is a toolkit for developing specialized markup languages –Specifies the structure of information –Enables interoperability between multiple platforms –Acts like a database –ail encompassing The DTD acts as a blueprint for document structure XML provides a manageable framework in which you can define your own elements
8
XML Syntax Information content must have start and end tags –Case is significant –Elements may not overlap –Elements can nest one inside another
9
The XML Environment XML Editor XML Parser/Validator Display program DTD or schema to define elements Style sheet for display of elements
10
The XML Document Document prologue –XML declaration –Document type declaration Points to root element Points to external standards (DTDs, namespaces) Document itself –Bracketed by root element –Contains elements, attributes, entities
11
The Document Type Definition
12
The DTD Document Type Definition DTD defines a document’s structure i.e. it is a set of rules and declarations that specify what tags can be used and what these tags can contain DTD validates documents - determines which documents conform to language - reduces possibility of errors DTD provides blueprint for documents - specifies how to handle elements - specifies which elements are allowed
13
The DTD Document Type Definition The DTD has four main functions : 1. declares a set of allowed elements “vocabulary” 2. defines content model for each element “grammar” 3. declares set of allowed attributes for each element 4. provide various mechanisms to make management of model easier (Ray, Chapter 5, p 148)
14
Basic Structure of DTD -Element Declaration- Holds two functions: 1.Adds a new element 2.States what can go inside the element For every element that appears in the document, one must be identified in the DTD Order of declarations is important
15
“vocabulary” Denotes NAME of element that appears in mark-up tag (case-sensitive-LOWER) e.g. title, graphic, article, thingie “grammar” Formula that delineates what kind of content, how many and in what order 1.Empty elements: EMPTY 2.No content restrictions (little value): ALL 3.Only character data, no elements: #PCDATA 4.Only elements: formula 5.Mixed Content: content model
16
Basic Structure of a DTD -Attribute Declaration- <!attlist name (attname1 atttype1 attdescl1) (attname2 atttype2 attdescl2)> For each element that appears in document, attributes of the element must be declared All attributes are declared in one place, attribute list
17
“vocabulary” Name of element to which the attributes belong Same as name as element declared earlier e.g. title, article, thingie “Attribute declarations” attname1 Gives attribute name atttype1 Specifies datatype of attribute, list of values CDATA, NMTOKEN, ID attdesc1 Describes behavior 1. default value “high” 2. author specified value #REQUIRED, #FIXED, #IMPLIED
18
The DTD Document Type Definition “It is important to remember that every document type definition is an interpretation of a text. There is no single DTD which encompasses any kind of absolute truth about a text, although it may be convenient to privilege some DTDs above others for particular types of analysis.” TEI Guidelines for Electronic Text Encoding and Interchange http://etext.virginia.edu/TEI.html
19
The TEI DTD Uses basic structural elements of general DTD Designed to simplify the task of choosing an appropriate set of tags for the text in hand. Selects appropriate combination of smaller tag sets, each containing some set of tags likely to be used together 1. core tag sets – standard components that are always included, no encoder action 2. basic tag sets – basic building blocks for text types, encoder must select at least one 3. additional tag sets – extra tags compatible with all other tag sets, encoder may add with basic tags in any combination http://www.tei-c.org/P4X/DTD/
20
The TEI Header
21
Basic Elements of TEI Paragraphs Punctuation, Quotations or Lists, etc. Bibliographic Citations THE HEADER!
22
The TEI Header Required of every TEI text, composed of four parts May be large and complex or very simple The header may differ for documents not based on written text, such as computer files or spoken text The header is not a library cataloging record, although the intent is similar
23
Four Parts File Description Encoding Description Text Profile Revision Description
24
File Description
25
Encoding Description
26
Profile Description
27
Revision Description
28
Examples and Application
29
Dumble Geological Survey –A Geological survey of Texas from the late 19th Century comprised of twelve volumes Digitally imaged monographs processed with OCR software to produce text Text marked up in XML using the TEI Lite specifications http://www.lib.utexas.edu/books/dumble/
30
Dumble DTD Element and Attribute definitions Entity references
33
Dumble Header Four basic sections –File description –Encoding description –Profile description –Revision description Contains bibliographic information Contains information on the creation of the digital file
37
Why XML? Ability to record information about a document within the document. Ability to separate structure from format Ability to “wrap” or embed information in layers of xml
38
XML Beyond TEI Open Archives Initiative (OAI) Semantic Web Open Archival Information System Digital Preservation Information Discovery
39
References A Sample TEI Markup Appendix A.2 Elements in TEI Lite OAI OAIS Learning XML www.tei-c.org/Lite/U5-eg.html www.tei-c.org/Lite/U5-taglist.html www.openarchives.org/ http://www.rlg.org/longterm/oais.html Erik T. Ray
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.