Introduction to XML Timothy W. Cole Thomas G. Habing University of Illinois at UC CDP / Colorado Alliance of Research Libraries 23 October 2002.

Slides:



Advertisements
Similar presentations
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Advertisements

1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
XML A brief introduction ---by Yongzhu Li. XML --- a brief introduction 2 CSI668 Topics in System Architecture SUNY Albany Computer Science Department.
A Practical Introduction to XML in Libraries Marty Kurth NYLA October 22, 2004.
Developing a Basic Web Page with HTML
Content Management at Grainger Engineering Library Case studies from various digital library research projects Tom Habing
Chapter 2 Introduction to HTML5 Internet & World Wide Web How to Program, 5/e Copyright © Pearson, Inc All Rights Reserved.
Copyright © 2002 ProsoftTraining. All rights reserved. XML Document Design.
Introduction to XSLT & its use in Grainger Library full-text & metadata projects Thomas G. Habing Grainger Engineering Library Presentation to ASIS&T,
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
Topics The "bigger picture" –The "XML sales pitch" –XML/XHTML vs. SGML/HTML –XML in electronic publishing –XML and the future, web 2.0 XML basics: –Building.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
Creating a Basic Web Page
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
University of Illinois at Urbana-Champaign OAI Alpha Experiences Timothy W. Cole Thomas G. Habing Grainger Engineering.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XML Tutorial Timothy W. Cole Thomas G. Habing University of Illinois at UC CDP / Colorado Alliance of Research Libraries 23/24 October 2002.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
An Introduction to XML Presented by Scott Nemec at the UniForum Chicago meeting on 7/25/2006.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.
TEXT ENCODING INITIATIVE (TEI) Inf 384C Block II, Module C.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
XML TUTORIAL Portions from w3 schools By Dr. John Abraham.
Introduction to HTML Tutorial 1 eXtensible Markup Language (XML)
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
XML for Text Markup An introduction to XML markup.
XML Design Goals 1.XML must be easily usable over the Internet 2.XML must support a wide variety of applications 3.XML must be compatible with SGML 4.It.
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
EAD: An Introduction and Primer Christopher J. Prom, Ph.D. Assistant University Archivist University of Illinois Archives July 7, 2003.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Introduction to XML Thomas G. Habing Timothy W. Cole Grainger Engineering Library Information Center UIUC - GSLIS
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
Interoperability How to Build a Digital Library Ian H. Witten and David Bainbridge.
XML Tools (Chapter 4 of XML Book). What tools are needed for a complete XML application? n Fundamental components n Web infrasructure n XML development.
Basic HTML Document Structure. Slide 2 Goals (XHTML HTML5) XHTML Separate document structure and content from document formatting HTML 5 Create a formal.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XP 1Creating Web Pages with XML Tutorial 1 New Perspectives on XML Tutorial 1 – Creating an XML Document.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
21 October 2000 MathML & Math on the Web Illinois D-Lib Testbed: Technologies for Converting Legacy Mathematics for Display on the Web Timothy W. Cole.
The Open Archives Initiative: Perspectives on Metadata Harvesting OAI Provider & Harvesting Services at the University of Illinois Timothy W. Cole Mathematics.
Beyond HTML: Extensible Markup Language (XML)
Kynn Bartlett 11 April 2001 STC San Diego The HTML Writers Guild Copyright © 2001 XML, XHTML, XSLT, and other X-named specifications.
Updated :02 Hong Kong University of Science & Technology Library Workshop on XML-Based Library Applications 1. What is XML?
Unit 4 Representing Web Data: XML
Qualified Dublin Core Using RDF for Sci-Tech Journal Articles DC-2001 International Conference on Dublin Core and Metadata Applications, October 22-26,
XML in Web Technologies
Session I - Introduction
Session I - Introduction
Database Processing with XML
Beyond HTML: Extensible Markup Language
Presentation transcript:

Introduction to XML Timothy W. Cole Thomas G. Habing University of Illinois at UC CDP / Colorado Alliance of Research Libraries 23 October 2002

23 October 2002 Introduction to XML Tim Cole & Tom Habing2 Presenters Tim Cole Mathematics Librarian & Assoc. Prof. of Library Admin. PI, Univ. of Illinois OAI Metadata Harvesting Projects Tom Habing Research Programmer, Grainger Engineering Library Information Center

23 October 2002 Introduction to XML Tim Cole & Tom Habing3 Agenda – Introduction to XML 1. What is it? 2. What’s it good for? 3. How does it work? 4. The infrastructure of XML 5. Using XML on the Web 6. Implementation issues & costs

23 October 2002 Introduction to XML Tim Cole & Tom Habing4 1.What is it? Discussion points: First principles: OHCO Example: A simple XML fragment Compare/contrast: SGML, HTML, XHTML A different XML for every community Terminology

23 October 2002 Introduction to XML Tim Cole & Tom Habing5 Ordered hierarchies of content objects Premise: A text is the sum of its component parts A could be defined as containing:, s, could contain: s A could contain: s A could contain: s or s or s … Components chosen should reflect anticipated use

23 October 2002 Introduction to XML Tim Cole & Tom Habing6 Ordered hierarchies of content objects OHCO is a useful, albeit imperfect, model Exposes an object’s intellectual structure Supports reuse & abstraction of components Better than a bit-mapped page image Better than a model of text as a stream of characters plus formatting instructions Data management system for document-like objects Does not allow overlapping content objects Incomplete; requires infrastructure

23 October 2002 Introduction to XML Tim Cole & Tom Habing7 Content objects in a book Book FrontMatter BookTitle Author(s) PubInfo Chapter(s) ChapterTitle Paragraph(s) BackMatter References Index

23 October 2002 Introduction to XML Tim Cole & Tom Habing8 Content objects in a catalog card Card CallNumber MainEntry TitleStatement TitleProper StatementOfResponsibility Imprint SummaryNote AddedEntrySubject(s) Added EntryPersonalName(s)

23 October 2002 Introduction to XML Tim Cole & Tom Habing9 A simple XML fragment XML Is Easy Tim Cole Tom Habing CDP Press, 2002 First Was SGML Once upon a time …

23 October 2002 Introduction to XML Tim Cole & Tom Habing10 This is NOT XML It was six men of Indostan To learning much inclined, Who went to see the Elephant (Though all of them were blind), That each by observation Might satisfy his mind.

23 October 2002 Introduction to XML Tim Cole & Tom Habing11 XML comes from SGML Standard Generalized Markup Language Based on IBM’s GML (Goldfarb, et al.) ISO standard since 1989 Used for large-scale document management (Boeing 747 user’s manual) Expensive, complex to implement Not Web-friendly (no “well-formed” SGML) Too many options (e.g., tag minimization)

23 October 2002 Introduction to XML Tim Cole & Tom Habing12 XML, HTML, & XHTML HTML—display-oriented, SGML-based scheme for making Web pages Syntax & allowed elements (semantics) are fixed XML—set of rules for defining markup schemes Element set is fully extensible Syntax is fixed XHTML—HTML modified to be XML-compliant (not just SGML-compliant)

23 October 2002 Introduction to XML Tim Cole & Tom Habing13 Markup languages compared XML syntax is stricter than HTML or SGML Must explicitly close all elements Attributes must be enclosed in quotes All markup is case-sensitive XML & SGML: no fixed tags, no predefined style XML & SGML are extensible Fixed elements (HTML) vs. rules (XML, SGML) HTML elements describe how to present content XML elements can describe the content itself

23 October 2002 Introduction to XML Tim Cole & Tom Habing14 A different XML for every community XML is a set of rules used for defining & encoding intellectual structures XML is extensible & customizable Its greatest strength Its greatest weakness HTML was invented by physicists What if it had been lawyers, or teachers, or bureaucrats, or librarians, or …?

23 October 2002 Introduction to XML Tim Cole & Tom Habing15 Terminology Document instance Document class Document Type Definition (DTD), or schema Well-formed XML Valid XML Stylesheets XML Transformations Document Object Model (DOM)

23 October 2002 Introduction to XML Tim Cole & Tom Habing16 2.What’s it good for? Discussion points: Smarter documents Full text Metadata Machine-to-machine interactions

23 October 2002 Introduction to XML Tim Cole & Tom Habing17 Smarter documents Standards-based Facilitates… Search & discovery Precise, field-specific searching Interoperability & normalization Complex transformations Linking between and within texts Reuse of documents and fragments

23 October 2002 Introduction to XML Tim Cole & Tom Habing18 Smarter documents

23 October 2002 Introduction to XML Tim Cole & Tom Habing19 Full text Electronic Text Center (U of VA Library) Originally SGML, now also XML, eBooks 70,000 texts; 350,000 related images 37,000 visits to collection per day Open eBook Forum International trade & standards organization Goal: establish specs & stds for epublishing

23 October 2002 Introduction to XML Tim Cole & Tom Habing20 Using XML for full text No inherent presentation information Requires… CSS in XML-aware browsers, or XSLT to transform to XHTML, or XSL-FO to reformat for presentation Techniques for including non-text content vary by application XML can be verbose Most standard full-text schemas are complex

23 October 2002 Introduction to XML Tim Cole & Tom Habing21 Metadata XML schemas exist for a range of metadata standards Encoded Archival Description (EAD) Encoded Archival Description MARC 21 XML (also MODS) MARC 21 XMLMODS Metadata Encoding & Transmission Standard (METS) Metadata Encoding & Transmission Standard Dublin Core Variants Open Archives Initiative (OAI) Open Archives Initiative National Science Digital Library (NSDL) National Science Digital Library Resource Description Framework (RDF) Resource Description Framework

23 October 2002 Introduction to XML Tim Cole & Tom Habing22 Using XML for metadata Consistency in applying schema Optional versus required elements Consistent use of elements Granularity & depth of information XML schemas still evolving Attributes versus elements Mixing namespaces Schema languages Philosophical issues

23 October 2002 Introduction to XML Tim Cole & Tom Habing23 Machine-to-machine interactions Web services Facilitating machine-to-machine communications via XML Simple Object Access Protocol (SOAP) XML Protocol Working Group Semantic Web Abstract representation of data on the Web XML and Databases

23 October 2002 Introduction to XML Tim Cole & Tom Habing24 3.How does it work? In XML, there’s content and there’s markup. Markup Elements Attributes Comments Processing instructions Content Entities Encoded (Unicode) characters

23 October 2002 Introduction to XML Tim Cole & Tom Habing25 Elements Elements are markup that enclose content … or Content models Parsed Character Data Only Child Elements Only Mixed Cole, T

23 October 2002 Introduction to XML Tim Cole & Tom Habing26 Attributes Associate a name-value pair with an element … Can be used to embellish content… or to associate added content to an element Cole, T

23 October 2002 Introduction to XML Tim Cole & Tom Habing27 Comments Human-readable annotations Can be inserted anywhere after headers Not part of the document structure Usually ignored by XML parsers Do not have to be passed to application

23 October 2002 Introduction to XML Tim Cole & Tom Habing28 Processing instructions Machine-readable & application-specific Must be passed through by XML Parsers XML Declaration is a special PI XML Declaration is always first line in file

23 October 2002 Introduction to XML Tim Cole & Tom Habing29 Entities Placeholders for internal or external content Placeholder for a single character… or string of text… or external content (images, audio, etc.) Implementation specifics may vary &copyright; is replaced by © &pic; is replaced by graphic image

23 October 2002 Introduction to XML Tim Cole & Tom Habing30 Character Encoding Issues XML Parsers must accept UTF-8 & UTF-16 Also must accept &#nnnn; or &#xhhhh; MARC-8 encodings must be converted to Unicode for use in XML

23 October 2002 Introduction to XML Tim Cole & Tom Habing31 4.The infrastructure of XML Required to make it work… DTDs & schemas: defining document classes Reusing & integrating schemas (using namespaces) Stylesheets for presentation & transformation Standards for linking, querying, & pointing Programming standards

23 October 2002 Introduction to XML Tim Cole & Tom Habing32 Defining Document Classes Formal descriptions of document structure Set expectations Maximize reusability Enforce business rules DTDs XML Schema Schematron Relax NG

23 October 2002 Introduction to XML Tim Cole & Tom Habing33 Document Type Definitions (DTD) Legacy from SGML; part of XML standard

23 October 2002 Introduction to XML Tim Cole & Tom Habing34 XML schema language New in XML Uses XML syntax Supports datatyping Richer and more complex …

23 October 2002 Introduction to XML Tim Cole & Tom Habing35 Alternatives: Schematron & RelaxNG Schematron based on XPath (XSLT) Doesn’t support datatyping as well Supports additional content models May become an ISO standard RelaxNG Returns some of the power of SGML DTDs back to XML (mixed and unordered content) Uses datatyping from the XML Schema spec Does not support inheritance Developed by an OASIS Technical Committee chaired by James Clark

23 October 2002 Introduction to XML Tim Cole & Tom Habing36 Namespaces Qualify element and attribute names Allows modularization of schemas Mix and match elements from multiple schemas in document instances Import or include from one XML Schema into another …

23 October 2002 Introduction to XML Tim Cole & Tom Habing37 XML & Cascading Style Sheets Attach styling instructions directly to XML files Supported by newest browsers: IE5+, Mozilla, Opera Can style but not rearrange elements Block or inline style Bold, italic, underline, font, color, etc. Margins, positioning Generated content (browser support not good) front author {color:red; font-weight:bold; font- family:serif;}

23 October 2002 Introduction to XML Tim Cole & Tom Habing38 XSLT — Transforming Stylesheets Language for transforming XML documents Into HTML, Text, or other XML documents Supported in new browsers (IE5+, Mozilla; not Opera) Usually applied on the server or in batch mode Valuable for interoperability or reusability,

23 October 2002 Introduction to XML Tim Cole & Tom Habing39 XSL-FO (formatting objects) Another styling language Similar to CSS, but includes the power of XSLT to rearrange the document Syntax is entirely XML Not currently supported in browsers (but there are tools for use on the server or in batch mode) Author: Cole, T

23 October 2002 Introduction to XML Tim Cole & Tom Habing40 XPath, XPointer, & XLink XPath Allows addressing of parts of an XML document Used in XSLT, XPointer, and XQuery XPointer (working draft) Used as a fragment id in an XML URI reference XLink Creates and describes extended or simple links between resources Used for HTML-style hrefs or imgs, tables of contents, etc. Cole, T

23 October 2002 Introduction to XML Tim Cole & Tom Habing41 XQuery (XML query language) Treat an XML document or collection of documents as a database Equivalent to SQL SELECT statements, only for XML Some support in XML databases (but working draft only)

23 October 2002 Introduction to XML Tim Cole & Tom Habing42 Programming standards “Platform- and language-neutral interfaces that allow programs and scripts to dynamically access and update the content, structure, and style of XML documents.” Document Object Model (DOM) Object-based Better for complex documents High memory usage, slower Documents can be updated Simple API for XML (SAX) Event-based Better for simple documents Low memory usage, faster Documents cannot be updated

23 October 2002 Introduction to XML Tim Cole & Tom Habing43 Other XML-related standards XBase XForms XML Encryption XML Signature Many more …

23 October 2002 Introduction to XML Tim Cole & Tom Habing44 5.Using XML on the Web Case Studies: Illinois DLI / D-Lib Test Suite Projects Illinois OAI Metadata Harvesting

23 October 2002 Introduction to XML Tim Cole & Tom Habing45 Illinois DLI / D-Lib Test Suite Funded by NSF, DARPA, NASA Funded by CNRI (D-Lib Test Suite) Large-scale testbed, distributed repository 60,000+ articles; 50+ journal titles All XML Partners: AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier

23 October 2002 Introduction to XML Tim Cole & Tom Habing46 Illinois DLI / D-Lib Test Suite Used metalanguages (SGML, XML) & transformations Searched both full-text & surrogate (metadata) Investigated rendering & presentation issues Dynamic metadata for normalization, linking Participated in DOI-X (Cross-Ref precursor project)

23 October 2002 Introduction to XML Tim Cole & Tom Habing47 Illinois DLI architecture

23 October 2002 Introduction to XML Tim Cole & Tom Habing48

23 October 2002 Introduction to XML Tim Cole & Tom Habing49

23 October 2002 Introduction to XML Tim Cole & Tom Habing50 Illinois OAI metadata harvesting Funded by Mellon Foundation ( ) Funded by IMLS (2002 – 05) Objective: use OAI protocol for metadata harvesting to create Web portal to resources Build harvesting and search service Investigate viability of searching OAI-harvested resources Explore advanced search, indexing, and display Document user needs & usage patterns Identify critical issues and best practices for using OAI-PMH with cultural heritage material

23 October 2002 Introduction to XML Tim Cole & Tom Habing51 XML well-suited to task Standards-based approach easy to implement Harvesting performance/scalability encouraging XML facilitates schema crosswalks, normalization Still need to demonstrate aggregation benefit Searching across heterogeneous metadata Metadata aggregation and normalization issues Search scalability & performance issues Need for interoperability best practices Illinois project — lessons learned

23 October 2002 Introduction to XML Tim Cole & Tom Habing52

23 October 2002 Introduction to XML Tim Cole & Tom Habing53

23 October 2002 Introduction to XML Tim Cole & Tom Habing54

23 October 2002 Introduction to XML Tim Cole & Tom Habing55 6.Implementation issues & costs Discussion Points: Tools Range of options Specialized tools expensive Good technical resources on the Web Significant upfront investment Initial training is greatest expense Ongoing costs moderate Need to be clear about objectives Similar to what you went through to get on Web

23 October 2002 Introduction to XML Tim Cole & Tom Habing56 XML authoring tools XML editors XMetaL (Corel/SoftQuad) XMetaL Epic Editor (Arbortext) Epic Editor TurboXML (Tibco Extensibility) TurboXML Standard Office Tools WordPerfect 2002 (Corel) WordPerfect Microsoft Office XP OpenOffice Plain Text Editors

23 October 2002 Introduction to XML Tim Cole & Tom Habing57 Other XML tools Validating parsers & transformation tools MSXML (Microsoft) MSXML Xerces, Xalan (Apache Software Foundation) Xerces, Xalan XSV (U. of Edinburgh) XSV Document management & database tools Tamino (Software AG) Tamino XMLCanon/Developer (Tibco/Extensibility) XMLCanon/Developer DLXS/XPAT (U. of Michigan/OpenText) DLXS/XPAT XML-aware browsers

23 October 2002 Introduction to XML Tim Cole & Tom Habing58 XML resources on the Web World Wide Web Consortium OASIS Microsoft Developer Network Sun Microsystems Apache XML Project XML.COM (O’Reilly) XML.COM XML.ORG (OASIS) XML.ORG ZVON.ORG

23 October 2002 Introduction to XML Tim Cole & Tom Habing59 Need to acquire expertise Turnkey XML solutions of limited utility Can start with well-formed XML For real utility, need to understand schemas Stylesheet expertise required to customize UI CSS if users limited to XML-aware browsers XSLT + CSS for browser neutrality XSLT also required for crosswalk, refresh Outsourcing an option for certain applications Analogous to WWW & HTML four years ago

23 October 2002 Introduction to XML Tim Cole & Tom Habing60 Ongoing costs Underlying technology now reasonably stable Non-proprietary standards, now 4 years old Parsers, validators, & transformation tools stable If initial design meets long-term needs, ongoing maintenance costs will be minimal Changes to schemas, presentation layer, workflow can be costly Small schema change can require major retrospective changes in documents & stylesheets Work hard to identify necessary schema changes as quickly as possible

23 October 2002 Introduction to XML Tim Cole & Tom Habing61 Final thoughts Core XML technologies stable & mature Ancillary standards are still evolving Best way to learn is by doing Start with a small project Long-term benefit potential is great Archival refreshes generally easier, less frequent Extensible & powerful Facilitates interoperability now & in the future Requires initial investment of time and resources

23 October 2002 Introduction to XML Tim Cole & Tom Habing62 Contact information Presentation Presenters