Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML Rice University, April.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
History Leading to XHTML
1 CP3024 Lecture 9 XML revisited, XSL, XSLT, XPath, XSL Formatting Objects.
XML Unit 6 October 31. XML, review XML is used to markup data Used to describe information Uses tags like HTML –But all tags are user-defined –Must be.
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
Creating a Well-Formed Valid Document. 2 Objectives Introducing XHTML Creating a Well-Formed Document Creating a Valid Document Creating an XHTML Document.
Tutorial 11 Creating XML Document
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Manohar – Why XML is Required Problem: We want to save the data and retrieve it further or to transfer over the network. This.
Copyright © 2003 Pearson Education, Inc. Slide 2-1 Created by Cheryl M. Hughes, Harvard University Extension School — Cambridge, MA The Web Wizard’s Guide.
ECA 228 Internet/Intranet Design I Intro to XSL. ECA 228 Internet/Intranet Design I XSL basics W3C standards for stylesheets – CSS – XSL: Extensible Markup.
Chapter 12 Creating and Using XML Documents HTML5 AND CSS Seventh Edition.
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
Working with XHTML Creating a Well-Formed Valid Document.
XP Tutorial 9New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XP The University of Akron Summit College Business Technology Department Computer Information Systems 2440: 140 Internet Tools Instructor: Enoch E. Damson.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language)  XML is a markup language for creating documents containing structured information.
August Chapter 2 - Markup and Core Concepts Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Session IV Chapter 9 – XML Schemas
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
Digital Media Technology Week 5: XML and Presentation Peter Verhaar.
17 Apr 2002 XML Syntax: Documents Andy Clark. Basic Document Structure Element tags – Elements have associated attributes Text content Miscellaneous –
CEAL 2003 XML for CJK Wooseob Jeong School of Information Studies University of Wisconsin - Milwaukee.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
1 Credits Prepared by: Rajendra P. Srivastava Ernst & Young Professor University of Kansas Sponsored by: Ernst & Young, LLP (August 2005) XBRL Module Part.
XML for Text Markup An introduction to XML markup.
An Introduction to XML Paul Donohue May 8th 2002 Hotel Senator Zürich.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
XML Introduction. Markup Language A markup language must specify What markup is allowed What markup is required How markup is to be distinguished from.
XML Basics A brief introduction to XML in general 1XML Basics.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
Jennifer Widom XML Data Introduction, Well-formed XML.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Web Technologies Lecture 4 XML and XHTML. XML Extensible Markup Language Set of rules for encoding a document in a format readable – By humans, and –
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
Introduction to XML Jussi Pohjolainen TAMK University of Applied Sciences.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
CIS 228 The Internet 9/20/11 XHTML 1.0. “Quirks” Mode Today, all browsers support standards Compliant pages are displayed similarly There are multiple.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 9 Working with XHTML. XP Objectives Describe the history and theory of XHTML Understand the rules for creating valid XHTML documents Apply a.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
Kynn Bartlett 11 April 2001 STC San Diego The HTML Writers Guild Copyright © 2001 XML, XHTML, XSLT, and other X-named specifications.
1. Intro and XML Rules Spring Summer 2010 Marcus Bingenheimer TEI Workshop.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML BASICS and more…. What is XML? In common:  XML is a standard, simple, self-describing way of encoding both text and data so that content can be processed.
Unit 4 Representing Web Data: XML
Creating a Well-Formed Valid Document
Tutorial 9 Working with XHTML
Chapter 7 Representing Web Data: XML
Introducing HTML & XHTML:
Web Programming Maymester 2004
Tutorial 9 Working with XHTML
XML Data Introduction, Well-formed XML.
Review of XML IST 421 Spring 2004 Lecture 5.
Presentation transcript:

Spring 2013 Markup – Validate – Transform Introduction to Digital Text and XML Rice University, April 5 th 2013 Marcus Bingenheimer (Temple University)

Digital Humanities... Academic effort to digitize, research and preserve all aspects of Human culture in a digital environment: Oral & Written text Images & Architecture Music Performance & Ritual Geography/Topography Networks....

Martin Hilbert and Priscila López 2011 (Science 332, 60): ● 2000: 75% of stored information was in an analog format (e.g. video cassettes) ● 2007: 94% of it was digital

“Digitization” ● → General level: The transformation of analog information into digital information ● → Modelling analog information in the digital (with bits and bytes) ● → Modelling is a social endeavour: it relies on (open) standards ● → How to model (natural-language) text?

Different ways to model a printed page digitally ● Digital Image (raster or vector) ● Text file (Characters encoded by a series of bytes, output as glyphs on screen or page controlled via a code-page) ● “Plain” text ● Word processor file ● PDF file ● Text with Markup, e.g. XML

t.s. eliot: facsimile copy of the draft of “the waste land” with annotations by e. pound and v. eliot Modeled in typeset: (pound = red, v. eliot = italics)

How to create high-end digital editions of texts?

Enter Markup ● Editors add information, they do not merely reproduce text ● The way this is done in digital text is by using markup ● Markup means applying tags to encode information about the form and content of a text ● These days most markup standards are expressed in a grammar called XML (eXtensible Markup Language)

and ● XML and HTML are "sister languages," both developed from a standard called SGML (Standard Generalized Markup Language) ● Thus, the appearance of, and the rules governing these two languages is similar, e.g. they both use bracketed tags to encode information

vs. ● HTML is a application specific standard primarily used to encode the style and structure of web-pages to make them appear in browsers ● XML is a more flexible master format which can encode an infinite variety of structure and semantic content ● XML is merely a set of grammar rules. It does not have a fixed tag-set or vocabulary like HTML

XML (X)HTML New Japanese-English Dictionary 新和英大辞典 Koh Masuda (Ed.) Tokyo: Kenkyusha, 2000 XHTML

New Japanese-English Dictionary 新和英大辞典 Koh Masuda Tokyo Kenkyusha 2000 XML

XML basics

Well-formed & Valid Every XML document must be well-formed 2. can in principle be validated against a “schema”

Well-formed means: that the document conforms to the XML rules. E.g. ● One Root Element - The XML document may only have one root element. ● All start-tags have end-tags ● Each element is properly nested within the root element ("nesting"). ● Names are always case sensitive

Broken XML Code Mr. Garcia Hello there! How are we today? Well-formed XML Code Mr. Garcia Hello there! How are we today?

Valid means... ● that the document conforms to the vocabulary and syntax of a markup standard (e.g. TEI, XHTML, Music ML, MathML) expressed in a document schema (written in DTD, W3 Schema, or Relax NG)

Parsing: XML text well-formed not well-formed valid not valid  Parser step 1 (DTD/ Schema) e.g.TEI Parser step 2

XML rules 1 (declaration) An XML document should begin with an XML declaration the declaration has the form: (encoding and standalone are optional)

XML rules 2 (root element) It has one, and only one root element containing all other elements and the character data

XML rules 3 (end-tags) Every start tag must have a matching end-tag Exception: Empty elements

XML rules 4 (nesting) Elements must be properly nested like this: not like this:

XML rules 5 (xml names) XML is case-sensitive Element names must start with a letter (including CJK 漢字 ) or the “_” May contain only alphanumeric characters (letters and digits) and “_” “-” “.” the colon “:” is reserved for XML-namespaces

CSS Document Model: DTD, Relax NG, XML Schema <XML Document> XQuery XSLT XPath XSL-FO JScript HTML PDF Any xml: docx, odt... OUTPUT TRANSFORM ePub validates

Practice Write a “firstdocument.xml” Open it in Firefox Make some XML mistakes and refresh Firefox Add this stylesheet declaration: Refresh Firefox