Presentation is loading. Please wait.

Presentation is loading. Please wait.

XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold

Similar presentations


Presentation on theme: "XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold"— Presentation transcript:

1 XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu http://metalab.unc.edu/xml/slides/

2 What is XML? Extensible Markup Language A syntax for documents A Meta-Markup Language A Structural and Semantic language, not a formatting language Not just for Web pages

3 XML is a Meta Markup Language Not like HTML, troff, LaTeX Make up the tags you needs as you need them The tags you create can be documented in a Document Type Definition (DTD) A meta syntax for domain-specific markup languages like MusicML, MathML, and CML

4 XML describes structure and semantics, not formatting XML documents form a tree Element and attribute names reflect the kind of the element Formatting can be added with a style sheet

5 A Song Description in HTML Hot Cop by Jacques Morali, Henri Belolo, and Victor Willis Producer: Jacques Morali Publisher: PolyGram Records Length: 6:20 Written: 1978 Artist: Village People

6 A Song Description in XML Hot Cop Jacques Morali Henri Belolo Victor Willis Jacques Morali PolyGram Records 6:20 1978 Village People

7 Style Sheets provide formatting SONG {display: block} TITLE {display: block; font-family: Helvetica, serif; font-size: 20pt; font-weight: bold} COMPOSER {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-style: italic} ARTIST {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-weight: bold; font-style: italic} PUBLISHER {display: block; font-size: 14pt; font-family: Times, Times New Roman, serif} LENGTH {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt} YEAR {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt}

8 Attaching style sheets to documents Processing Instruction Converter Program

9 What is XML used for? Domain-Specific Markup Languages Self-Describing Data Interchange of Data Among Applications Structured and Integrated Data

10 Domain-Specific Markup Languages Non proprietary format Dont pay for what you dont use

11 Self-Describing Data Much data is lost due to format problems XML is very simple XML is self-describing XML is well documented

12 Judson McDaniel 21 Feb 1834 9 Dec 1905

13 Interchange of Data Among Applications E-commerce Syndication

14 Structured and Integrated Data Can specify relationships between elements Can assemble data from multiple sources

15 XML Applications A specific markup language uses the XML meta-syntax is called an XML application Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax Further syntax can be layered on top of this; e.g. data typing through DCDs or other schemas

16 Example XML Applications Web Pages Mathematical Equations Music Notation Vector Graphics Metadata and more…

17 Mathematical Markup Language

18 Channel Definition Format Cafe con Leche Books about XML Trade shows and conferences about XML Mailing Lists dedicated to XML

19 Classic Literature The Complete Plays of Shakespeare The Bible The Koran The Book of Mormon

20 Vector Graphics Vector Markup Language (VML) –Internet Explorer 5.0 –Microsoft Office 2000 Scalable Vector Graphics (SVG)

21 The Resource Description Framework (RDF) Meta-data Dublin Core Better Web searching

22 An Example of RDF <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf- syntax-ns#" xmlns:dc="http://purl.org/DC/> Elliotte Rusty Harold Cafe con Leche

23 XML for XML XSL: The Extensible Stylesheet Language DCD: The Document Content Description Schema Language XLL: The Extensible Linking Language

24 XSL: The Extensible Stylesheet Language XSL Transformations XSL Formatting Objects

25 DCD: The Document Content Description Schema Language Data Typing in XML is Weak 9 <ElementDef Type="MONTH" Model="Data" Datatype="i1" Min="1" Max="12" />

26 XLL: The Extensible Linking Language Any element can be a link Links can be bi-directional Links can be separated from the documents they connect 7

27 File Formats, In-house applications, and other behind the scenes uses Microsoft Office 2000 Federal Express Web API Netscape Whats Related

28 Hello XML Hello XML! Plain ASCII or UTF-8 text.xml is standard file extension Any standard text editor will work

29 The XML Declaration version attribute –required –always has the value 1.0 standalone attribute –yes –no encoding attribute –UTF-8 –8859_1 –etc.

30 The FOO element Start tag Contents "Hello XML!" End tag Hello XML!

31 greeting.xml Hello XML!

32 Style sheets Separate from the XML document Different Languages –Cascading Style Sheets Level 1 (CSS1) Internet Explorer 5.0 Mozilla 5.0 –Cascading Style Sheets Level 2 (CSS2) Internet Explorer 5 (partial) Mozilla 5.0 (partial) –Extensible Style Language (XSL) Internet Explorer 5.0 (older draft, buggy) LotusXSL, XT, Other non-browser converters –Document Style and Semantics Language (DSSSL) Jade

33 xml-stylesheet Style sheets are attached via an xml- stylesheet processing instruction in the prolog Hello XML! –type attribute has the value text/css or text/xsl –href attribute is a URL to the stylesheet, possibly relative Can also use non-browser converters like XT, LotusXSL, and Jade

34 greeting.css GREETING {display: block; font-size: 24pt; font-weight: bold}

35 A larger example: Baseball statistics Examine the data Design a vocabulary for the data Write a style sheet

36 Sample statistics http://cbs.sportsline.com/u/baseball/mlb/ stats.htm

37 Organizing the Data XML documents are trees. XML elements contain other elements as well as text Within these limits there's more than one way to organize the data –Hierarchically –Relationally –Objects

38 What is the Root Element The League? The Season? A custom Document element?

39 The Root Element Choose SEASON for the root element Everything else will be a descendant of SEASON This is not the only possible choice

40 What are the Immediate Children of The root? Leagues? Teams? Players? Games?

41 Child Elements 1998

42 White space in XML is not especially significant 1998

43 Leagues Major league baseball is divided into two leagues Each league has –a name –three divisions

44 Divisions Each division has –name – 4-6 teams

45 Teams Each team has –Name –City –Players

46 Player Data Each player has –First name –Last name –Position –Statistics

47 Player Batting Statistics G Games Played GS Games Started AB At Bats R Runs H Hits 2B Doubles 3B Triples HR Home Runs RBI Runs Batted In SB Stolen Bases CS Caught Stealing SH Sacrifice Hits SF Sacrifice Flies Err Errors PB Pitcher Balked BB Base on Balls (Walks) SO Strike Outs HBP Hit By Pitch

48 What does a player look like Long names vs. short namesLong namesshort names

49 The Complete 1998 Major League Long version

50 A Style Sheet 1998shortstats.xml baseballstats.css styled1998shortstats.xmlstyled1998shortstats.xml

51 Cascading Style Sheets Partially supported by Mozilla and IE 5.0 Full W3C Recommendation

52 The Default Rule Not every element needs a rule The root element should be at least display: block SEASON { font-size: 14pt; background-color: white; color: black; display: block}

53 A style rule for the YEAR element Make it look like a title YEAR { display: block; font-size: 32pt; font-weight: bold; text-align: center}

54 Style Rules for Division and League Names LEAGUE_NAME { display: block; text-align: center; font-size: 28pt; font-weight: bold} DIVISION_NAME { display: block; text-align: center; font-size: 24pt; font-weight: bold}

55 Alternate Style Rules for Division and League Names LEAGUE_NAME, DIVISION_NAME { display: block; text-align: center; font-weight: bold} LEAGUE_NAME {font-size: 28pt } DIVISION_NAME {font-size: 24pt }

56 Style Rules for Teams Team name and Team city must be one title Must be inline elements Previous and following must be block elements TEAM_CITY { font-size: 20pt; font-weight: bold; font-style: italic} TEAM_NAME { font-size: 20pt; font-weight: bold; font-style: italic} TEAM, PLAYER {display: block}

57 Style Rules for Players TEAM {display: table} TEAM_CITY {display: table-caption} TEAM_NAME {display: table-caption} PLAYER {display: table-row} SURNAME, GIVEN_NAME, POSITION, GAMES, GAMES_STARTED, AT_BATS, RUNS, HITS, DOUBLES, TRIPLES, HOME_RUNS, RBI, STEALS, CAUGHT_STEALING, SACRIFICE_HITS, SACRIFICE_FLIES, ERRORS, WALKS, STRUCK_OUT, HIT_BY_PITCH {display: table-cell}

58 Finished Style Sheet SEASON {font-size: 14pt; background-color: white; color: black; display: block} YEAR {display: block; font-size: 32pt; font-weight: bold; text-align: center} LEAGUE_NAME {display: block; text-align: center; font-size: 28pt; font-weight: bold} DIVISION_NAME {display: block; text-align: center; font-size: 24pt; font-weight: bold} TEAM_CITY {font-size: 20pt; font-weight: bold; font-style: italic} TEAM_NAME {font-size: 20pt; font-weight: bold; font-style: italic} TEAM {display: block} PLAYER {display: block}

59 Possible Extensions There should be captions like "RBI" or "At Bats. Derived numbers like batting averages are not included. The titles are short. E.g. "1998" instead of "1998 Major League Baseball". The document is so long it's hard to read. Something similar to IE5's collapsible outline view would be nice. Pitcher stats should be separated from batter stats.

60 Possible Solutions CSS Level 2 XSL XSL + JavaScript

61 Well-formedness Rules Open and close all tags Empty tags end with /> There is a unique root element Elements may not overlap Attribute values are quoted < and & are only used to start tags and entities Only the five predefined entity references are used

62 Open and close all tags

63 Empty tags end with />,, and instead of,, and Web browsers deal inconsistently with these Can use instead

64 There is a unique root element One element completely contains all other elements of the document This is HTML in HTML files XML Declaration is not an element Hello XML!

65 Elements may not overlap If an element contains a start tag for an element, it must also contain the corresponding end tag Empty elements may appear anywhere Every non root element has a parent element

66 Attribute values are quoted Good: – Bad: –

67 < and & are only used to start tags and entities Good: O'Reilly & Associates Bad: O'Reilly & Associates Good: – for (int i = 0; i <= args.length; i++ ) { Bad: – for (int i = 0; i

68 Only the five predefined entity references are used Good: –& –< –> –" –&apos; Bad: –© –® –&tm; –α –é – –etc.

69 DTDs and Validity A Document Type Definition describes the elements and attributes that may appear in a document Validation compares a particular document against a DTD Well-formedness is a prerequisite for validity

70 What is a DTD? a list of the elements, tags, attributes, and entities contained in a document, and their relationship to each other internal vs. external DTDs

71 The importance of validation Ensures that data is correct before feeding it into a program Ensure that a format is followed Establish what must be supported Not all documents need to be valid; sometimes well-formed is enough

72 A DTD for greeting.xml greeting.xml: Hello XML! greeting.dtd :

73 Document Type Declarations Hello XML! specifies the root element gives a URL for the DTD

74 Invalid Documents Valid: various random text but no markup Invalid: anything else including various random text –or various random text

75 Validating Tools Command line programs like XJParse Online validators –http://www.stg.brown.edu/service/xmlv alid/ –http://www.cogsci.ed.ac.uk/%7Erichard/ xml-check.html Browsers

76 Element Declarations Each tag must be declared in a declaration. A declaration gives the name and content model of the element The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element

77 Content Specifications ANY #PCDATA Sequences Choices Mixed Content Modifiers Empty

78 ANY A SEASON can contain any child element and/or raw text (parsed character data)

79 #PCDATA Parsed Character Data; i.e. raw text, no markup

80 #PCDATA Valid: 1999 99 1999 C.E. The year of our Lord one thousand, nine hundred, and ninety- nine Invalid: January February March April May June July August September October November December

81 Child Elements To declare that a LEAGUE element must have a LEAGUE_NAME child:

82 Sequences Separate multiple required child elements with commas; e.g.

83 One or More Children +

84 Zero or More Children *

85 Zero or One Children ? <!ELEMENT PLAYER (GIVEN_NAME, SURNAME, POSITION, GAMES, GAMES_STARTED, AT_BATS?, RUNS?, HITS?, DOUBLES?, TRIPLES?, HOME_RUNS?, RBI?, STEALS?, CAUGHT_STEALING?, SACRIFICE_HITS?, SACRIFICE_FLIES?, ERRORS?, WALKS?, STRUCK_OUT?, HIT_BY_PITCH?, WINS?, LOSSES?, SAVES?, COMPLETE_GAMES?, SHUT_OUTS?, ERA?, INNINGS?, EARNED_RUNS?, HIT_BATTER?, WILD_PITCHES?, BALK?,WALKED_BATTER?, STRUCK_OUT_BATTER?) >

86 Finished DTD

87 Choices

88 Grouping With Parentheses Parentheses combine several elements into a single element. Parenthesized element can be nested inside other parentheses in place of a single element. The parenthesized element can be suffixed with a plus sign, a comma, or a question mark.

89 Mixed Content Both #PCDATA and child elements in a choice #PCDATA must come first #PCDATA cannot be used in a sequence

90 Empty elements

91 Internal DTDs <!DOCTYPE GREETING [ ]> Hello XML!

92 Internal DTD Subsets <!DOCTYPE GREETING SYSTEM "greeting.dtd" [ ]> Hello XML! Internal declarations override external declarations

93 Programming with XML Java works best C, Perl, Python etc. can also be used Unicode support is the biggest issue

94 SAX, the Simple API for XML Event based Programs can plug in different parsers

95 The Document Object Model (DOM)

96 To Learn More: Books XML: Extensible Markup Language –IDG Books 1998 –ISBN 0-76453-199-9 The XML Bible –IDG Books 1999 –ISBN 0-76453-236-7

97 Questions?


Download ppt "XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold"

Similar presentations


Ads by Google