Introduction to TEI Tomaž Erjavec dept

Slides:



Advertisements
Similar presentations
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
Advertisements

Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Music Encoding Initiative (MEI) DTD and the OCVE
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
IS 373—Web Standards Todd Will
A Practical Introduction to XML in Libraries Marty Kurth NYLA October 22, 2004.
Exploring Microsoft® Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Robert Grauer and Maryann Barber Using.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
Digital Encoding What’s behind E-text Resources?.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
Chapter 6 Text and Multimedia Languages and Properties
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
National Institute of Standards and Technology 1 Testing and Validating OAGi NDRs Puja Goyal Salifou Sidi Presented to OAGi April 30 th, 2008.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
Practical RDF Chapter 1. RDF: An Introduction
School of Computing and Management Sciences © Sheffield Hallam University To understand the Oracle XML notes you need to have an understanding of all these.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
XML BIS4430 – unit 10. XML Origins Extensible Markup Language (XML) 1998 Inspired by Standard Generalized Markup Language (SGML) and HTML. SGML defines.
Extending the Scope of Learning Objects with XML Bill Tait COLMSCT Associate Teaching Fellow The Open University ALT-C Conference Sep 2007.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
EXtensible Markup Language (XML) and Documentation --ManojBokil -- Manoj Bokil.
November 1, 2006IU DLP Brown Bag : Fall Data Integrity and Document- centric XML Using Schematron for Managing Text Collections Dazhi Jiao, Tamara.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
SDPL 2005Notes 2.5: XML Schemas1 2.5 XML Schemas n Short introduction to XML Schema –W3C Recommendation, 1 st Ed. May, 2001; 2 nd Ed. Oct, 2004: »XML Schema.
Standards for digital encoding Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
1 Credits Prepared by: Rajendra P. Srivastava Ernst & Young Professor University of Kansas Sponsored by: Ernst & Young, LLP (August 2005) XBRL Module Part.
XML Introduction. What is XML? XML stands for eXtensible Markup Language XML stands for eXtensible Markup Language XML is a markup language much like.
Introduction to the Semantic Web and Linked Data
Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Standards for digital encoding Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 2: TEI.
WP 3: Standardisation of shared metadata Mode of operation –All partners are involved –Building on practice outside the project Achievements of Year 1.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
Unit 3 — Advanced Internet Technologies Lesson 10 — Introduction to XHTML.
From XML to DAML – giving meaning to the World Wide Web Katia Sycara The Robotics Institute
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
TEI presentation for IS 590 Robert Patrick Waltz July 10 th, 2012.
XML Extensible Markup Language
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Introduction to XML IST 653 Digital Libraries 1. What is XML It’s an acronym for eXtensible Markup Language Tag based syntax, very much like HTML – Allows.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML Rice University, April.
1 XML and XML in DLESE Katy Ginger November 2003.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
TEI 工作坊 TEI and Images October The Concept.
The Semantic Web By: Maulik Parikh.
RDFa How and Why Ralph R. Swick World Wide Web Consortium
Demystifying Digital Scholarship 10: TEI
XML QUESTIONS AND ANSWERS
Improving Braille accessibility and personalization on Internet
Markup of Educational Content
Manuscript Markup with TEI
XML in Web Technologies
Markup Languages Gilok Choi 9/17/2018
Some Options for Non-MARC Descriptive Metadata
CSE591: Data Mining by H. Liu
Presentation transcript:

Introduction to TEI Tomaž Erjavec dept Introduction to TEI Tomaž Erjavec dept. of knowledge technologies Jožef Stefan Institute Ljubljana, Slovenia

Overview Introduction to text markup What is TEI Some examples of usage

What‘s in a text?

What‘s in a text (2)

What‘s in a text (3)

The ontology of text Where is the text? in the shape of letters and their layout? in the original from which this copy derives? in the ideas it brings forth? in their format, or their intentions? Texts are abstractions conjured up by readers. Markup encodes those abstractions.

Encoding of texts Texts are more then sequences of encoded characters they have structure and content they also have multiple readings Encoding, or markup, is a way of making these things explicit Only that which is explicit can be reliably processed

Some definitions Markup makes explicit the distinctions we want to make when processing a string of bytes Markup is a way of naming and characterizing the parts of a text in a formalized way It’s (usually) more useful to markup what things are than what they look like

What does markup capture? Compare <head>Upon Julia’s Clothes</head> <lg> <l>Whenas in silks my <hi>Julia</hi> goes,</l> <l>Then, then (me thinks) how sweetly flowes</l> <l>That liquefaction of her clothes.</l> </lg> and <s n="1" role="head"> <w type="pp">Upon</w> <w type="np">Julia</w><w type="pos">’s </w> <w type="nn2">Clothes</w> </s> <s n="2" role="line"> <w type="adv">Whenas</w> <w type="pp">in</w> <w type="nn2">silks</w> ...

What is the point of markup? To make explicit (to a machine) what is implicit (to a person) To add value by supplying multiple annotations To facilitate re-use of the same material in different formats in different contexts for different users

XML XML is structured data represented as strings of text: XML is extensible XML must be well-formed XML can be validated XML is application-, platform-, and vendor- independent XML empowers the content provider and facilitates data integration

Schema languages XML schemas are used to: define the element and attribute vocabularies for particular text types define content models for elements define data types of attributes (and elements) Schemas can be written in: XML DTD Language W3C schema language ISO Relax NG schema language (TEI mostly uses Relax NG)

Developing schemas For simple annotations, one can define a project-specific schema from scratch But if the markup will be complicated, it is better to take one of the standard schemas Using standard schemas means: better documentation better interchange better tool support There are many schemas around, but only one initiative delals with encoding arbitrary texts for scholarly purposes

Text Encoding Initiatve The TEI provides a framework for the definition of multiple XML schemas it defines and names several hundred useful textual distinctions it provides a set of modules that can be used to define schemas making those distinctions it provides a customization mechanism for modifying and combining those definitions with new ones using the same conceptual model

Where did the TEI come from? Originally, a research project within the humanities Sponsored by three professional associations Funded 1990-1994 by US, EU Major influences digital libraries and text collections language corpora scholarly datasets International consortium established June 1999 (see http://www.tei-c.org/)

Goals of the TEI better interchange and integration of scholarly data support for all texts, in all languages, from all periods guidance for the perplexed: what to encode — hence, a user-driven codification of existing best practice assistance for the specialist: how to encode — hence, a loose framework into which unpredictable extensions can be fitted These apparently incompatible goals result in a flexible and modular environment

TEI Guidelines A set of recommendations for text encoding, covering both generic text structures and some highly specific areas based on (but not limited by) existing practice A very large collection of element definitions with associated declarations for various schema languages a modular system for creating personalized schemas from the foregoing for the full picture see http://www.tei-c.org/Guidelines/

Legacy of the TEI a way of looking at what ‘text’ really is a codification of current scholarly practice (crucially) a set of shared assumptions and priorities about the digital agenda: focus on content and function (rather than presentation) identify generic solutions (rather than application-specific ones)

Users of TEI Over 100 projects listed on the TEI project page Main areas of use: digital libraries text-critical editions computer corpora dictionaries

Versions of the Guidelines TEI P3 (1994) first public version: SGML + book (1200pp) and soon also on the Web. TEI P4 (2002): provides equal support for XML and SGML applications using the TEI scheme; error correction, while maintaining backward compatibility: documents conforming to TEI P3 will not become illegal when processed with TEI P4. TEI P5 (2007): implements more fundamental changes to the schemas, in line with current practice and identified problems, e.g. uses namespaces no longer backward compatible with P3, P5 Relax NG becomes the main schema language continuous improvement..

TEI modules TEI is too general to be supported by a single schema Rather, TEI is composed of modules, and which modules the user select is determined by the project needs Some examples of modules: Transcription of spoken texts Dictionaries and lexica Varieties of linguistic annotation Nonstandard characters and glyphs Linking, alignment, non-hierarchic structures Detailed metadata (the TEI Header) Manuscript description Text-critical apparatus

Support offerred by TEI Web interface to make XML schemas from a TEI parametrisation A set of XSLT stylesheets to convert TEI/XML to HTML or PDF Mailing list tei-l Various tutorials available from the TEI pages Yearly conference and members‘ meeting

Examples of applications Mostly work done by me in collaboration with other people institutions: Annotated corpora Machine readable dictionaries Text-critical editions Biographical databases

JOS corpus <s xml:id="F0203.557.2"> <w xml:id="F0203.557.2.1" lemma="ta" msd="Zk-sei">To</w><S/> <w xml:id="F0203.557.2.2" lemma="biti" msd="Gp-ste-n">je</w><S/> <term type="sloWNet" sortKey="kraj" key="ENG20-08114200-n"> <w xml:id="F0203.557.2.3" lemma="turističen" msd="Ppnmein">turističen</w><S/> <w xml:id="F0203.557.2.4" lemma="kraj" msd="Somei">kraj</w> </term> <c xml:id="F0203.557.2.5">.</c><S/> </s> <linkGrp type="syntax" targFunc="head argument" corresp="#F0203.557.2"> <link type="ena" targets="#F0203.557.2.2 #F0203.557.2.1"/> <link type="modra" targets="#F0203.557.2 #F0203.557.2.2"/> <link type="dol" targets="#F0203.557.2.4 #F0203.557.2.3"/> <link type="dol" targets="#F0203.557.2.2 #F0203.557.2.4"/> <link type="modra" targets="#F0203.557.2 #F0203.557.2.5"/> </linkGrp>   JOS corpus <s xml:id="F0203.557.2"> <w xml:id="F0203.557.2.1" lemma="ta" msd="Zk-sei">To</w><S/> <w xml:id="F0203.557.2.2" lemma="biti" msd="Gp-ste-n">je</w><S/> <term type="sloWNet" sortKey="kraj" key="ENG20-08114200-n"> <w xml:id="F0203.557.2.3" lemma="turističen" msd="Ppnmein">turističen</w><S/> <w xml:id="F0203.557.2.4" lemma="kraj" msd="Somei">kraj</w> </term> <c xml:id="F0203.557.2.5">.</c><S/> </s> <linkGrp type="syntax" targFunc="head argument" corresp="#F0203.557.2"> <link type="ena" targets="#F0203.557.2.2 #F0203.557.2.1"/> <link type="modra" targets="#F0203.557.2 #F0203.557.2.2"/> <link type="dol" targets="#F0203.557.2.4 #F0203.557.2.3"/> <link type="dol" targets="#F0203.557.2.2 #F0203.557.2.4"/> <link type="modra" targets="#F0203.557.2 #F0203.557.2.5"/> </linkGrp>  

jaSlo dictionary <entry id="jaslo.55"> <form type="hw"> <orth type="roma">ainiku</orth> <orth type="kana">あいにく</orth> <orth type="kanji">生憎</orth> </form> <gramGrp> <pos>N/Ana/Adv</pos> </gramGrp> <trans> <tr>nesrečen</tr> <tr>nepričakovan</tr> <tr>nesluten</tr> <tr>žal</tr> </trans> <eg> <q>おあいにくさまです。</q> <tr>Žal mi je za vas.</tr> </eg> <usg type="level">2</usg> </entry>

eZISS text-critical editions <l>Na <app> <lem wit="#Drobt_1846">cesti</lem> <rdg wit="#UKM_123 #UKM_553">zeſti</rdg> </app> popotnik <app> <lem wit="#Drobt_1846">zdihuje. –</lem> <rdg wit="#UKM_123">sdihuje. <add>–</add></rdg> <rdg wit="#UKM_553">sdihuje</rdg> </app> </l> Three readings: Drobt_1846: Na cesti popotnik zdihuje. – UKM_123: Na zeſti popotnik sdihuje. – UKM_553: Na zeſti popotnik sdihuje

SBL biographical database <person> <sex value="1"/> <persName xml:lang="lat"> <forename>Johannes</forename> <surname>Aquila de <placeName>Rakerspurga</placeName></surname> </persName> <persName> <forename>Janez</forename> <surname>Akvila iz <placeName>Radgone</placeName></surname> </persName> <persName> <forename xml:lang="hun">János</forename> <surname>Aquila</surname> </persName> <occupation>slikar</occupation> <floruit notAfter="1392" notBefore="1378"> <placeName> <region>Prekmurje</region> </placeName> </floruit> </person>

Conclusions Gave a brief introduction to TEI For more, visit the TEI web pages!