Download presentation
Presentation is loading. Please wait.
Published bySara Lewis Modified over 9 years ago
1
XML and XSL A report on the workshop given by Shaoping Moss on October 16, 2004 Presented by ASIS&T members Caryn Anderson, Prairie Clayton & Kara Schwartz At Simmons College, November 1, 2004 ….with additional examples from a real-life project
2
Topics discussed SGML, XML, and HTML XML and XSL Basics XML in Libraries and Academics XML in Future Web Development Slide content courtesy of Shaoping Moss.
3
Markup Languages Address the structure of a document. Identify different components of the document. Convey information to software that will allow it to: –Index the data for searching. –Render the data. –Transform the data. SGML, XML, and HTML are all markup languages. Slide content courtesy of Shaoping Moss.
4
Document, Structure, and Format A document is: –“A record which contains information, originally an inscribed or written record but now considered to include any format in which information might be held (e.g. map, manuscript, tape, video, software).” (International Encyclopedia of Information and Library Science) –A collection of small elements, which can be headings, subheadings, paragraphs, quotations, etc… Structure vs Format –Structure is about the content of the document. –Format is about the way a document looks. Slide content courtesy of Shaoping Moss.
5
What is SGML? Stands for Standard Generalized Markup Language. Initiated by Charles Goldfarb at IBM in the 1960s. Adopted as a standard of the International Organization for Standardization(ISO 8879) in 1986. Slide content courtesy of Shaoping Moss.
6
SGML and Its Subdivisions SGML is composed of tag-set building rules. SGML has given birth to other sets of subdivisions: –HTML and XML. –CALS for defense. –BOEING for commercial airlines. –C-H for publishing. –OED for Old English Dictionary. –TEI guidelines for the Text Encoding Initiative. –EAD for Encoded Archival Descriptions. Slide content courtesy of Shaoping Moss.
7
HTML Development HTML stands for Hypertext Markup Language. HTML was developed by Tim Berners-Lee at a physics lab near Geneva, Switzerland in 1992. Its simplicity has contributed to the rapid growth of the World Wide Web in the 1990s. HTML version 4 came out in 1997. XHTML 1.0 is the latest HTML standard. Slide content courtesy of Shaoping Moss.
8
HTML Problems Easy HTML coding has made it harder for browsers to handle. Tags are predefined in HTML. Format and content are mixed and content is hard to reuse. Slide content courtesy of Shaoping Moss.
9
What is XML? XML is a new Web standard developed by the World Wide Web Consortium in 1998. XML stands for eXtensible Markup Language. XML was designed to describe data. XML tags are not predefined in XML. XML separates format from content and semantic structure. Data encoded in XML can function much like a traditional database. XML content can be output in many formats, such as XHTML, text, Word documents, PDF, etc… Slide content courtesy of Shaoping Moss.
10
The Display of the Document My First XML Chapter 1: Introduction to XML What is HTML? What is XML? Chapter 2: XML Syntax Elements must have a closing tag Elements must be properly nested Slide content courtesy of Shaoping Moss.
11
An HTML Document Slide content courtesy of Shaoping Moss. An HTML document describes the book: … My First XML Introduction to XML What is HTML? What is XML? XML Syntax Elements must have a closing tag. Elements must be properly nested. …
12
An XML Document Slide content courtesy of Shaoping Moss, 2004 An XML document describes the book: … My First XML Introduction to XML What is HTML? What is XML? XML Syntax Elements must have a closing tag. Elements must be properly nested. …
13
HTML Elements/Tags Original slide content courtesy of Shaoping Moss. An HTML document describes the book: … My First XML Introduction to XML What is HTML? What is XML? XML Syntax Elements must have a closing tag. Elements must be properly nested. … Are: defined by HTML standard always the same can be used in any order
14
XML Elements/Tags An XML document describes the book: … My First XML Introduction to XML What is HTML? What is XML? XML Syntax Elements must have a closing tag. Elements must be properly nested. … Are: defined by user/groups (DTD/Schema) different for each DTD/Schema hierarchical (tree structure) Original slide content courtesy of Shaoping Moss.
15
XML is flexible and extensible An XML document describes the book for a different user group: … My First XML Introduction to XML What is HTML? What is XML? XML Syntax Element Rules Elements must have a closing tag. Elements must be properly nested. … Instead of “book” Extend to accommodate greater detail of “part” “section” AND “paragraph” Original slide content courtesy of Shaoping Moss.
16
Slide content courtesy of Shaoping Moss. Differences between HTML and XML XML is not a replacement for HTML. XML and HTML were designed with different goals. - XML was designed to describe data and to focus on what data is. - HTML was designed to display data and to focus on how data looks. HTML structure and tags are very loose while XML structure and tags are strict: - XML documents must be well-formed. - XML elements must be properly nested. - All XML elements must be closed. - Tag names must be case consistent.
17
Differences HTML XML Content Format Selection & Organization - Held in generic containers (,, etc.) -In the default format of the content tag OR -As defined by a Cascading Style Sheet (internal or external) -All content always included (no option to easily select or suppress content – must manually change document) -Content only displayed in the order written (to change order you must manually change document -Held in specific containers that describe what the data is (,, etc.) -XSLT files define the formats of each section (i.e. font, color, size, etc.) -multiple XSLTs for same XML -XSLT selects and determines order of display of content -Multiple XSLTs for same XML (one to produce just book title list, one to display full text, one for citations, etc.)
18
Differences HTML XML Analogy What you can get Address List in plain WORD document One document of your list of contacts with all the information that you have for each person in the order you typed it. Address List in database or MAIL MERGE data file Friends & Family with full addresses for Holiday cards E-mail list of just Professional contacts for announcing new product Special formatting of whole list for better display on PDA Etc. etc. etc. all from SAME XML document
19
How to Build an XML file family 1.Establish the Document Type Definition (DTD) or Schema 2.Write a well-formed XML document that holds your data in the containers established by your DTD/Schema 3.Validate your XML document to make sure you conformed to your DTD/Schema 4.Build as many different XSL documents as you need to select data from your XML file, organize it the way you want it to appear, and format it so it looks the way you want. Now you can link your XML file to whatever XSL you want to get the kind of display you want at any given time.
20
The XML family unit of files and languages XML Where the data is held DTD or Schema The organizational chart for the data XSL Instructions for using XML data and displaying it Uses XSLT to select data from.xml file and format it Uses XSL-PATH to access certain spots in the.xml file Uses XSL-FO for specifying formatting semantics (?) File types:.dtd.xml (schemas) File type:.xmlFile type:.xsl For validation during creation http://www.mysite.org/myfile.xml WEB PAGE Languages used in XSLT documents during creation 1. Calls the.xml file 2. Calls.xsl for display instructions 3. Looks in.xml for content 4. Returns content to.xsl 5. Displays content to browser Uses HTML for formatting
21
The DTD or Schema <!ELEMENT booktitle(#PCDATA) + means there can be as many of this element as you want The DTD establishes the hierarchy of elements/tags. Original file content courtesy of Shaoping Moss.
22
The XML document HTML and XHTML:the Definitive Guide Chuck Musciano Bill Kennedy USA O’ Reilly 19.95 2000 XHTML 1.0 Language Sourcebook Ian S. Graham USA John Wiley and Sons 30.00 2000 This is what DTD is being used. This is what XSL is being used. Original file content courtesy of Shaoping Moss.
23
The XSL document My Book Collection Title Author Publisher Country Price 1995"> “xsl:template” is XSLT for “use the template below” “xsl:for-each” with the “select” instruction is XSLT for “select from each of the books in the booklist” “match” is X-PATH for “link to” or “start with” and “/” means the root element (“booklist” in this case) “xsl:sort” with the “select” instruction is XSLT for “sort by publisher” “xsl:if” with the “test” instruction is XSLT for “only those books when the year is later than 1995” This is basic HTML for the template… “xsl:value-of” with the “select” instruction is XSLT for “use the data from this element” You must close your XSLT commands You must close the HTML tags of your template
24
The Web Page Original file content courtesy of Shaoping Moss.
25
Done! – not so hard Logical Flexible Extensible Interoperable!!
26
XML in Libraries Use XML to mapping MARC to MARC XML, HTML, or MODS formats MARC XML Conversion Stylesheets MARC XML Conversion Stylesheets Use XML to improve searching of archival finding aids and to catalog Web sites- Five College Archives & Manuscript Collections. http://asteria.fivecolleges.edu/index.html XML-based eScholarship. http://escholarship.cdlib.org/ Use XML for interlibrary loan. XML-based database systems. Slide content courtesy of Shaoping Moss.
27
XML in Academics Text Encoding Initiative(TEI) http://www.tei-c.org/ Initially launched in 1987, TEI is an internationally and interdisciplinary standard for encoding, keeping and analyzing textual content & structure of digital texts. This standard is designed for use with a broad range of text types, especially in the humanities. It is widely used in libraries, archives, and by publishers and researchers for online research and teaching and for the storage and exchange of large and small text collections. Since 1987, TEI projects have mushroomed in all humanities disciplines, including language, literature, history, classics, social science and computer science. Slide content courtesy of Shaoping Moss.
28
TEI projects Women Writers Project. http://www.wwp.brown.edu http://www.wwp.brown.eduhttp://www.wwp.brown.edu Perseus Digital Library. http://www.perseus.tufts.edu/ http://www.perseus.tufts.edu/http://www.perseus.tufts.edu/ Early American Fiction Collection. http://etext.lib.virginia.edu/eaf/pubindex.html American Memory Project- Historical Collections for the National Digital Library. http://lcweb2.loc.gov/ammem/ammemhome.html The Newton Papers Project. http://www.newtonproject.ic.ac.uk http://www.newtonproject.ic.ac.ukhttp://www.newtonproject.ic.ac.uk Slide content courtesy of Shaoping Moss.
29
XML is Going to Be Everywhere TEI guidelines for the Text Coding Initiative http://www.tei-c.org/Guidelines2/index.html EAD for Encoded Archival Descriptions http://www.loc.gov/ead/ The Dublin Core Metadata Initiative (DCMI) http://dublincore.org/ MARC XML-MARC 21 XML Schema http://www.loc.gov/standards/marcxml/ MODS XML- Metadata Object Description Schema http://www.loc.gov/standards/mods Slide content courtesy of Shaoping Moss.
30
XML is Going to Be Everywhere Resource Description Framework (RDF) Information and Content Exchange (ICE) Online Information Exchange (ONIX) Metadata for Images in XML (MIX) XML/EDI (Electronic Data Interchange) Bioinformatic Sequence Markup Language (BSML) Mathematical Markup Language (MathML) Slide content courtesy of Shaoping Moss.
31
XML in Future Web Development XML is a cross-platform, software and hardware independent tool for transmitting information. XML will be as important to the future of the Web as HTML has been to the foundation of the Web. XML will become the most common tool for all data manipulation and data transmission. Every serious Web technology is now expected to define its relationship to XML. Slide content courtesy of Shaoping Moss.
32
XML in Future Web Development “Every serious Web technology is now expected to define its relationship to XML.” - Catherine Ebenezer in Trends in Integrated Library Systems. Slide content courtesy of Shaoping Moss.
33
Shaoping Moss Information Technology Consultant Research and Instructional Support Mount Holyoke College Email: smoss@mtholyoke.edusmoss@mtholyoke.edu Phone: 413.538.3034 Fax: 413.538.3112 We are grateful to Shaoping Moss for being such an excellent instructor and giving us permission to use her slides and materials in this presentation.
34
So this XML stuff is rad and all but could I see why I’d want to learn it and not just an encoding set like EAD?
35
Well, suppose you’ve got a batch of metadata on your hands. Not just any metadata, but some weird set of information that can’t really be shoehorned into your pal MARC 21. You need some way of organizing the metadata. It would be nice if you could make the metadata look all pretty and whatnot, while you’re at it.
36
Here’s where XML comes in! 1.Get your metadata together, having done all the sexy stuff like data dictionary creation first 2.Define labels for everything 3.Match related terms, including subordinates 4.Define your rules (Y can only appear after X, and if you have X and Y, you must have Z, but Q is optional, etc) 5.You’ve pretty much just made up a schema right there 6.Wait, what was that about making it pretty?
37
Oh, right, it should be attractive. Well, then you just start playing with XSL. Specifically, you tell the XSL to go look at the plain ol’ stylesheet you’ve adapted from a thousand other HTML pages.
38
So then you’ve got this.
39
Hey, wait. I thought you said this was all cross-platform and cross-browser. How come this isn’t parsing in my browser? And how do I search individual records? You mean I have to hand encode every record? Well, yes. You can write your own parser, export encoded records from a database, or create a search engine if you like. You’ll just need more than a semester’s worth of practice to do it.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.