Download presentation
Presentation is loading. Please wait.
Published bySelena Beller Modified over 10 years ago
1
XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu http://metalab.unc.edu/xml/slides/
2
What is XML? Extensible Markup Language A syntax for documents A Meta-Markup Language A Structural and Semantic language, not a formatting language Not just for Web pages
3
XML is a Meta Markup Language Not like HTML, troff, LaTeX Make up the tags you needs as you need them The tags you create can be documented in a Document Type Definition (DTD) A meta syntax for domain-specific markup languages like MusicML, MathML, and CML
4
XML describes structure and semantics, not formatting XML documents form a tree Element and attribute names reflect the kind of the element Formatting can be added with a style sheet
5
A Song Description in HTML Hot Cop by Jacques Morali, Henri Belolo, and Victor Willis Producer: Jacques Morali Publisher: PolyGram Records Length: 6:20 Written: 1978 Artist: Village People
6
A Song Description in XML Hot Cop Jacques Morali Henri Belolo Victor Willis Jacques Morali PolyGram Records 6:20 1978 Village People
7
Style Sheets provide formatting SONG {display: block} TITLE {display: block; font-family: Helvetica, serif; font-size: 20pt; font-weight: bold} COMPOSER {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-style: italic} ARTIST {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-weight: bold; font-style: italic} PUBLISHER {display: block; font-size: 14pt; font-family: Times, Times New Roman, serif} LENGTH {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt} YEAR {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt}
8
Attaching style sheets to documents Processing Instruction Converter Program
9
What is XML used for? Domain-Specific Markup Languages Self-Describing Data Interchange of Data Among Applications Structured and Integrated Data
10
Domain-Specific Markup Languages Non proprietary format Dont pay for what you dont use
11
Self-Describing Data Much data is lost due to format problems XML is very simple XML is self-describing XML is well documented
12
Judson McDaniel 21 Feb 1834 9 Dec 1905
13
Interchange of Data Among Applications E-commerce Syndication
14
Structured and Integrated Data Can specify relationships between elements Can assemble data from multiple sources
15
XML Applications A specific markup language uses the XML meta-syntax is called an XML application Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax Further syntax can be layered on top of this; e.g. data typing through DCDs or other schemas
16
Example XML Applications Web Pages Mathematical Equations Music Notation Vector Graphics Metadata and more…
17
Mathematical Markup Language
18
Channel Definition Format Cafe con Leche Books about XML Trade shows and conferences about XML Mailing Lists dedicated to XML
19
Classic Literature The Complete Plays of Shakespeare The Bible The Koran The Book of Mormon
20
Vector Graphics Vector Markup Language (VML) –Internet Explorer 5.0 –Microsoft Office 2000 Scalable Vector Graphics (SVG)
21
The Resource Description Framework (RDF) Meta-data Dublin Core Better Web searching
22
An Example of RDF <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf- syntax-ns#" xmlns:dc="http://purl.org/DC/> Elliotte Rusty Harold Cafe con Leche
23
XML for XML XSL: The Extensible Stylesheet Language DCD: The Document Content Description Schema Language XLL: The Extensible Linking Language
24
XSL: The Extensible Stylesheet Language XSL Transformations XSL Formatting Objects
25
DCD: The Document Content Description Schema Language Data Typing in XML is Weak 9 <ElementDef Type="MONTH" Model="Data" Datatype="i1" Min="1" Max="12" />
26
XLL: The Extensible Linking Language Any element can be a link Links can be bi-directional Links can be separated from the documents they connect 7
27
File Formats, In-house applications, and other behind the scenes uses Microsoft Office 2000 Federal Express Web API Netscape Whats Related
28
Hello XML Hello XML! Plain ASCII or UTF-8 text.xml is standard file extension Any standard text editor will work
29
The XML Declaration version attribute –required –always has the value 1.0 standalone attribute –yes –no encoding attribute –UTF-8 –8859_1 –etc.
30
The FOO element Start tag Contents "Hello XML!" End tag Hello XML!
31
greeting.xml Hello XML!
32
Style sheets Separate from the XML document Different Languages –Cascading Style Sheets Level 1 (CSS1) Internet Explorer 5.0 Mozilla 5.0 –Cascading Style Sheets Level 2 (CSS2) Internet Explorer 5 (partial) Mozilla 5.0 (partial) –Extensible Style Language (XSL) Internet Explorer 5.0 (older draft, buggy) LotusXSL, XT, Other non-browser converters –Document Style and Semantics Language (DSSSL) Jade
33
xml-stylesheet Style sheets are attached via an xml- stylesheet processing instruction in the prolog Hello XML! –type attribute has the value text/css or text/xsl –href attribute is a URL to the stylesheet, possibly relative Can also use non-browser converters like XT, LotusXSL, and Jade
34
greeting.css GREETING {display: block; font-size: 24pt; font-weight: bold}
35
A larger example: Baseball statistics Examine the data Design a vocabulary for the data Write a style sheet
36
Sample statistics http://cbs.sportsline.com/u/baseball/mlb/ stats.htm
37
Organizing the Data XML documents are trees. XML elements contain other elements as well as text Within these limits there's more than one way to organize the data –Hierarchically –Relationally –Objects
38
What is the Root Element The League? The Season? A custom Document element?
39
The Root Element Choose SEASON for the root element Everything else will be a descendant of SEASON This is not the only possible choice
40
What are the Immediate Children of The root? Leagues? Teams? Players? Games?
41
Child Elements 1998
42
White space in XML is not especially significant 1998
43
Leagues Major league baseball is divided into two leagues Each league has –a name –three divisions
44
Divisions Each division has –name – 4-6 teams
45
Teams Each team has –Name –City –Players
46
Player Data Each player has –First name –Last name –Position –Statistics
47
Player Batting Statistics G Games Played GS Games Started AB At Bats R Runs H Hits 2B Doubles 3B Triples HR Home Runs RBI Runs Batted In SB Stolen Bases CS Caught Stealing SH Sacrifice Hits SF Sacrifice Flies Err Errors PB Pitcher Balked BB Base on Balls (Walks) SO Strike Outs HBP Hit By Pitch
48
What does a player look like Long names vs. short namesLong namesshort names
49
The Complete 1998 Major League Long version
50
A Style Sheet 1998shortstats.xml baseballstats.css styled1998shortstats.xmlstyled1998shortstats.xml
51
Cascading Style Sheets Partially supported by Mozilla and IE 5.0 Full W3C Recommendation
52
The Default Rule Not every element needs a rule The root element should be at least display: block SEASON { font-size: 14pt; background-color: white; color: black; display: block}
53
A style rule for the YEAR element Make it look like a title YEAR { display: block; font-size: 32pt; font-weight: bold; text-align: center}
54
Style Rules for Division and League Names LEAGUE_NAME { display: block; text-align: center; font-size: 28pt; font-weight: bold} DIVISION_NAME { display: block; text-align: center; font-size: 24pt; font-weight: bold}
55
Alternate Style Rules for Division and League Names LEAGUE_NAME, DIVISION_NAME { display: block; text-align: center; font-weight: bold} LEAGUE_NAME {font-size: 28pt } DIVISION_NAME {font-size: 24pt }
56
Style Rules for Teams Team name and Team city must be one title Must be inline elements Previous and following must be block elements TEAM_CITY { font-size: 20pt; font-weight: bold; font-style: italic} TEAM_NAME { font-size: 20pt; font-weight: bold; font-style: italic} TEAM, PLAYER {display: block}
57
Style Rules for Players TEAM {display: table} TEAM_CITY {display: table-caption} TEAM_NAME {display: table-caption} PLAYER {display: table-row} SURNAME, GIVEN_NAME, POSITION, GAMES, GAMES_STARTED, AT_BATS, RUNS, HITS, DOUBLES, TRIPLES, HOME_RUNS, RBI, STEALS, CAUGHT_STEALING, SACRIFICE_HITS, SACRIFICE_FLIES, ERRORS, WALKS, STRUCK_OUT, HIT_BY_PITCH {display: table-cell}
58
Finished Style Sheet SEASON {font-size: 14pt; background-color: white; color: black; display: block} YEAR {display: block; font-size: 32pt; font-weight: bold; text-align: center} LEAGUE_NAME {display: block; text-align: center; font-size: 28pt; font-weight: bold} DIVISION_NAME {display: block; text-align: center; font-size: 24pt; font-weight: bold} TEAM_CITY {font-size: 20pt; font-weight: bold; font-style: italic} TEAM_NAME {font-size: 20pt; font-weight: bold; font-style: italic} TEAM {display: block} PLAYER {display: block}
59
Possible Extensions There should be captions like "RBI" or "At Bats. Derived numbers like batting averages are not included. The titles are short. E.g. "1998" instead of "1998 Major League Baseball". The document is so long it's hard to read. Something similar to IE5's collapsible outline view would be nice. Pitcher stats should be separated from batter stats.
60
Possible Solutions CSS Level 2 XSL XSL + JavaScript
61
Well-formedness Rules Open and close all tags Empty tags end with /> There is a unique root element Elements may not overlap Attribute values are quoted < and & are only used to start tags and entities Only the five predefined entity references are used
62
Open and close all tags
63
Empty tags end with />,, and instead of,, and Web browsers deal inconsistently with these Can use instead
64
There is a unique root element One element completely contains all other elements of the document This is HTML in HTML files XML Declaration is not an element Hello XML!
65
Elements may not overlap If an element contains a start tag for an element, it must also contain the corresponding end tag Empty elements may appear anywhere Every non root element has a parent element
66
Attribute values are quoted Good: – Bad: –
67
< and & are only used to start tags and entities Good: O'Reilly & Associates Bad: O'Reilly & Associates Good: – for (int i = 0; i <= args.length; i++ ) { Bad: – for (int i = 0; i
68
Only the five predefined entity references are used Good: –& –< –> –" –' Bad: –© –® –&tm; –α –é – –etc.
69
DTDs and Validity A Document Type Definition describes the elements and attributes that may appear in a document Validation compares a particular document against a DTD Well-formedness is a prerequisite for validity
70
What is a DTD? a list of the elements, tags, attributes, and entities contained in a document, and their relationship to each other internal vs. external DTDs
71
The importance of validation Ensures that data is correct before feeding it into a program Ensure that a format is followed Establish what must be supported Not all documents need to be valid; sometimes well-formed is enough
72
A DTD for greeting.xml greeting.xml: Hello XML! greeting.dtd :
73
Document Type Declarations Hello XML! specifies the root element gives a URL for the DTD
74
Invalid Documents Valid: various random text but no markup Invalid: anything else including various random text –or various random text
75
Validating Tools Command line programs like XJParse Online validators –http://www.stg.brown.edu/service/xmlv alid/ –http://www.cogsci.ed.ac.uk/%7Erichard/ xml-check.html Browsers
76
Element Declarations Each tag must be declared in a declaration. A declaration gives the name and content model of the element The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element
77
Content Specifications ANY #PCDATA Sequences Choices Mixed Content Modifiers Empty
78
ANY A SEASON can contain any child element and/or raw text (parsed character data)
79
#PCDATA Parsed Character Data; i.e. raw text, no markup
80
#PCDATA Valid: 1999 99 1999 C.E. The year of our Lord one thousand, nine hundred, and ninety- nine Invalid: January February March April May June July August September October November December
81
Child Elements To declare that a LEAGUE element must have a LEAGUE_NAME child:
82
Sequences Separate multiple required child elements with commas; e.g.
83
One or More Children +
84
Zero or More Children *
85
Zero or One Children ? <!ELEMENT PLAYER (GIVEN_NAME, SURNAME, POSITION, GAMES, GAMES_STARTED, AT_BATS?, RUNS?, HITS?, DOUBLES?, TRIPLES?, HOME_RUNS?, RBI?, STEALS?, CAUGHT_STEALING?, SACRIFICE_HITS?, SACRIFICE_FLIES?, ERRORS?, WALKS?, STRUCK_OUT?, HIT_BY_PITCH?, WINS?, LOSSES?, SAVES?, COMPLETE_GAMES?, SHUT_OUTS?, ERA?, INNINGS?, EARNED_RUNS?, HIT_BATTER?, WILD_PITCHES?, BALK?,WALKED_BATTER?, STRUCK_OUT_BATTER?) >
86
Finished DTD
87
Choices
88
Grouping With Parentheses Parentheses combine several elements into a single element. Parenthesized element can be nested inside other parentheses in place of a single element. The parenthesized element can be suffixed with a plus sign, a comma, or a question mark.
89
Mixed Content Both #PCDATA and child elements in a choice #PCDATA must come first #PCDATA cannot be used in a sequence
90
Empty elements
91
Internal DTDs <!DOCTYPE GREETING [ ]> Hello XML!
92
Internal DTD Subsets <!DOCTYPE GREETING SYSTEM "greeting.dtd" [ ]> Hello XML! Internal declarations override external declarations
93
Programming with XML Java works best C, Perl, Python etc. can also be used Unicode support is the biggest issue
94
SAX, the Simple API for XML Event based Programs can plug in different parsers
95
The Document Object Model (DOM)
96
To Learn More: Books XML: Extensible Markup Language –IDG Books 1998 –ISBN 0-76453-199-9 The XML Bible –IDG Books 1999 –ISBN 0-76453-236-7
97
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.