Standards for digital encoding Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž.

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

Delivering textual resources. Overview Getting the text ready – decisions & costs Structures for delivery Full text Marked-up Image and text Indexed How.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora Tomaž Erjavec Department of Knowledge Technologies Jožef.
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 13-1 COS 346 Day 24.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
XML for Information Management – Day 2 Airi Salminen University of Erlangen-Nuremberg Computational Linguistics Instructor: Professor Airi Salminen
IS 373—Web Standards Todd Will
Advanced Technical Writing 2006 Session #3. Today in Class… ► Teams pitch poster concepts:  Meet with your editorial team, show us how your material.
1 Technologies and Modelling Frameworks XML ontology RDF taxonomy OWL thesaurus Semantic Web.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.
Digital Encoding What’s behind E-text Resources?.
1 Networks and the Internet A network is a structure linking computers together for the purpose of sharing resources such as printers and files Users typically.
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
XP 1 HTML: The Language of the Web A Web page is a text file written in a language called Hypertext Markup Language. A markup language is a language that.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
School of Computing and Management Sciences © Sheffield Hallam University To understand the Oracle XML notes you need to have an understanding of all these.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
CISC 3140 (CIS 20.2) Design & Implementation of Software Application II Instructor : M. Meyer Address: Course Page:
Copyright, UCL LEADERS: Linking EAD to Electronically Retrievable Sources Interoperability: Where the irresistible force of flexibility meets the immovable.
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Extending the Scope of Learning Objects with XML Bill Tait COLMSCT Associate Teaching Fellow The Open University ALT-C Conference Sep 2007.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XHTML. Introduction to XHTML What Is XHTML? – XHTML stands for EXtensible HyperText Markup Language – XHTML is almost identical to HTML 4.01 – XHTML is.
Text Encoding for Interchange: Myths and Realities Yesterday's Information Tomorrow? Lou Burnard Oxford University Computing Services.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
Leiden University. The university to discover. DMT Week 3 Adriaan van der Weel and Peter Verhaar.
Content and Computer Platforms Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
Development Process and Testing Tools for Content Standards OASIS Symposium: The Meaning of Interoperability May 9, 2006 Simon Frechette, NIST.
The LOM RDF binding – update Mikael Nilsson The Knowledge Management.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
1 Credits Prepared by: Rajendra P. Srivastava Ernst & Young Professor University of Kansas Sponsored by: Ernst & Young, LLP (August 2005) XBRL Module Part.
XML technologies for text encoding Tamás Váradi
The eXtensible Markup Language (XML). Presentation Outline Part 1: The basics of creating an XML document Part 2: Developing constraints for a well formed.
Content Objective Today I will identify how alliteration enhances the meaning of Beowulf by choosing an alliterative passage from the text and analyzing.
Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.
Advanced Technical Writing 2006 Session #4. Today in Class… ► Meet with your editorial team, refine/post deliverables ► Send URL for deliverables to Bill.
Introduction to the Semantic Web and Linked Data
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Metadata Metadata Mark-up and Management © Adolf Knoll, National Library of the Czech Republic.
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
Standards for digital encoding Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 2: TEI.
WP 3: Standardisation of shared metadata Mode of operation –All partners are involved –Building on practice outside the project Achievements of Year 1.
Information Design Trends Unit 4: Sources and Standards Lecture 3: A Brief Introduction to XML.
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES Andy Dawson LIS1510 Library and Archives Automation Issues XML and extensible systems Andy Dawson School.
XML The Extensible Markup Language (XML ), which is comparable to SGML and modeled on it, describes how to describe a collection of data. A standard way.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
INFSY 547: WEB-Based Technologies Gayle J Yaverbaum, PhD Professor of Information Systems Penn State Harrisburg.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
Linda Schmandt Structured Text & XML in Medicine 16 Jan 2004.
Delivering textual and visual resources. Overview Case studies Methods for providing access Structures for delivery Full text Marked-up Image and text.
Advanced Technical Writing 2006 Session #3. Today in Class… ► Show-n-tell your CSS Objects from exercise 1 ► Meet with your editorial team, refine/post.
Shakespeare’s Macbeth. Introduction  Born April 23, 1564 in Stratford-on- Avon, England.  IN 1582, at the age of 18, Shakespeare married Anne Hathaway,
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
BEOWULF.
Introduction to TEI Tomaž Erjavec dept
Improving Braille accessibility and personalization on Internet
Markup of Educational Content
Presentation transcript:

Standards for digital encoding Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec Lecture 2: TEI and XSLT

Lecturer Tomaž Erjavec Department of Knowledge Technologies Jožef Stefan Institute Ljubljana Tomaž Erjavec Department of Knowledge Technologies Jožef Stefan Institute Ljubljana corpora and other language resources, standards, annotation, text-critical editions corpora and other language resources, standards, annotation, text-critical editions Web page for this course: Web page for this course: students: send s! students: send s!

Overview 1. Introduction 2. TEI background 3. TEI structure 4. Introduction to XSLT Lab session: writing a teiLite document, trasforming to HTML with XSLT

What’s in a text?

What’s in a text (2)?

What’s in a text (3)?

The ontology of a text Where is the text?   in the shape of letters and their layout?   in the original from which this copy derives?   in the ideas it brings forth? in their format, or their intentions? Texts are abstractions conjured up by readers. Markup encodes those abstractions.

Encoding of texts Texts are more then sequences of encoded glyphs   They have structure and content   They also have multiple readings Encoding, or markup, is a way of making these things explicit Only that which is explicit can be reliably processed

Styles of markup In the beginning there was procedural markup RED INK ON; print balance; RED INK OFF which being generalised became descriptive markup some numbers also known as encoding or annotation descriptive markup allows for re-use of data

Some more definitions Markup makes explicit the distinctions we want to make when processing a string of bytes Markup is a way of naming and characterizing the parts of a text in a formalized way It’s (usually) more useful to markup what things mean than what they look like

What does markup capture? Compare Upon Julia’s Clothes Whenas in silks my Julia goes, Then, then (me thinks) how sweetly flowes That liquefaction of her clothes. and Upon Julia ’s Clothes Whenas in silks...

Likewise.. Compare H &WYN;ÆT WE GARDE na in gear-dagum þeod-cyninga þrym gefrunon, hu ða æþelingas ellen fremedon. oft scyld scefing sceaþe na þreatum, moneg um mægþum meodo-setl a of teah egsode eorl syððan ærest wear þ fea sceaft funden... and Hwæt! we Gar-dena in gear-dagum þeod-cyninga þrym gefrunon, hu ða æþelingas ellen fremedon, Oft Scyld Scefing sceaþena þreatum, monegum mægþum meodo-setla ofteah; egsode Eorle, syððan ærest wearþ

What’s the point of markup? To make explicit (to a machine) what is implicit (to a person) To add value by supplying multiple annotations To facilitate re-use of the same material   in different formats   in different contexts   for different users

A useful mental exercise Imagine you are going to markup several thousand pages of complex material....   Which features are you going to markup?   Why are you choosing to markup this feature?   How reliably and consistently can you do this? Now, imagine your budget has been halved. Repeat the exercise!

What can the TEI do for you? The TEI provides a framework for the definition of multiple schemas it defines and names several hundred useful textual distinctions it provides a set of modules that can be used to define schemas making those distinctions it provides a customization mechanism for modifying and combining those definitions with new ones using the same conceptual model

Where did the TEI come from? Originally, a research project within the humanities   Sponsored by three professional associations   Funded by US NEH, EU LE Programme et al. Major influences   digital libraries and text collections   language corpora   scholarly datasets International consortium established June 1999 (see

Goals of the TEI better interchange and integration of scholarly data support for all texts, in all languages, from all periods guidance for the perplexed: what to encode — hence, a user-driven codification of existing best practice assistance for the specialist: how to encode — hence, a loose framework into which unpredictable extensions can be fitted These apparently incompatible goals result in a flexible and modular environment

TEI Guidelines A set of recommendations for text encoding, covering both generic text structures and some highly specific areas based on (but not limited by) existing practice A very large collection of element definitions with associated declarations for various schema languages a modular system for creating personalized schemas or DTDs from the foregoing for the full picture see

Legacy of the TEI a way of looking at what ‘text’really is a codification of current scholarly practice (crucially) a set of shared assumptions and priorities about the digital agenda:   focus on content and function (rather than presentation)   identify generic solutions (rather than application-specific ones)

Users of TEI Over 100 projects listed on the TEI project page Over 100 projects listed on the TEI project pageTEI project pageTEI project page Main areas: Main areas:  digital libraries  text-critical editions  computer corpora  dictionaries

Versions of the Guidelines TEI P3 (1994) first public version: TEI P3 (1994) first public version:  SGML + book (1200pp) and soon also on the Web. TEI P4 (2002): TEI P4 (2002): TEI P4 TEI P4  provides equal support for XML and SGML applications using the TEI scheme;  error correction, while maintaining backward compatibility: documents conforming to TEI P3 will not become illegal when processed with TEI P4. TEI P5 (2006…): TEI P5 (2006…): TEI P5 TEI P5  implements more fundamental changes to the schemas, in line with current practice and identified problems, e.g. uses namespaces  no longer backward compatible (but a migration P4 to P5 XSLT exists)  Relax NG becomes the main schema langauge  still somewhat fluid (details in schemas, Web presentation)

The general structure of TEI documents Burnard, Driscoll, Rahtz, TEI Training Course, Sofia 2005: Slides for TEI overview Burnard, Driscoll, Rahtz, TEI Training Course, Sofia 2005: Slides for TEI overviewTEI Training Course, Sofia 2005TEI overviewTEI Training Course, Sofia 2005TEI overview

TEI Lite TEI Lite is a particular parametrisation of TEI that “provides 90% of the elements needed for 90% of users” TEI Lite is a particular parametrisation of TEI that “provides 90% of the elements needed for 90% of users” TEI Lite TEI Lite the TEI Lite P4 DTD can be found at the TEI Lite P4 DTD can be found at

Lab session 1 again, recipes again, recipes  Bavarian-Style Pork Roast with Cabbage and Knödel and 2 others, e.g. from the Cabbage section Bavarian-Style Pork Roast with Cabbage and KnödelCabbage Bavarian-Style Pork Roast with Cabbage and KnödelCabbage  take teiLite DTD and mark-up the documents according to TEI teiLite DTD teiLite DTD  make use of documentation provided at TEI Lite page TEI LiteTEI Lite

XSLT Erjavec: Course at ESSLII 2005 Annotation of Language Resources, Lecture II. XML-Related Recommendations: Formatting and Transforming XML Annotation of Language Resources Formatting and Transforming XML Annotation of Language Resources Formatting and Transforming XML ZVON tutorial ZVON tutorial ZVON tutorial ZVON tutorial W3schools W3schools W3schools …