CSE 6331 © Leonidas Fegaras XML1 Introduction to XML Leonidas Fegaras.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

An Introduction to XML Based on the W3C XML Recommendations.
1 DTD (Document Type Definition) Imposing Structure on XML Documents (W3Schools on DTDs)W3Schools on DTDs.
XML Document Type Definitions ( DTD ). 1.Introduction to DTD An XML document may have an optional DTD, which defines the document’s grammar. Since the.
1 XML DTD & XML Schema Monica Farrow G30
Fall 2001 CSE3301 XML and Beyond: Parts I and II
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
CSE 190: Internet E-Commerce Lecture 17: XML, XSL.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
1 COS 425: Database and Information Management Systems XML and information exchange.
Semi-structured Data. Facts about the Web Growing fast Popular Semi-structured data –Data is presented for ‘human’-processing –Data is often ‘self-describing’
1 XML and Databases. 2 Outline (ambitious) Background: documents (SGML/HTML) and databases (structured and semistructured data) XML Basics and Document.
1 XML Major Sources: ppt CIS550 Course Notes, U. Penn, source for many slides Yaron Kanza’s slides, source.
XML(EXtensible Markup Language). XML XML stands for EXtensible Markup Language. XML is a markup language much like HTML. XML was designed to describe.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
Introduction to XML This material is based heavily on the tutorial by the same name at
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
4/20/2017.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Document Type Definition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
XML Anisha K J Jerrin Thomas. Outline  Introduction  Structure of an XML Page  Well-formed & Valid XML Documents  DTD – Elements, Attributes, Entities.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
Chapter 10: XML.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
CSE 5330 © Leonidas Fegaras XML1 Introduction to XML Leonidas Fegaras.
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
MIS 315 Bsharah An Introduction to XML 1MIS Bsharah.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
XMLI Structure of XML Data Structure of XML Data XML Document Schema XML Document Schema XPATH XPATH.
XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.
XML과 Database 홍기형 성신여자대학교 성신여자대학교 홍기형.
CSE 6331 © Leonidas Fegaras XML1 Introduction to XML Leonidas Fegaras.
FIGIS’ML Hands-on training - © FAO/FIGIS An introduction to XML Objectives : –what is XML? –XML and HTML –XML documents structure well-formedness.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
Database Systems Part VII: XML Querying Software School of Hunan University
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Chapter 23 XML. 2 Introduction  XML: eXtensible Markup Language (What is a Markup language?)  Defined by the WWW Consortium (W3C)  Originally intended.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
More XML: semantics, DTDs, XPATH February 18, 2004.
1 IST 210 Organization of Data Database and the Web.
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
+ 1 XML eXtensible Markup Language. + 2 XML Lecture Adapted from the work of Dr. Praveen Madiraju of Marquette University.
Martin Kruliš by Martin Kruliš (v1.1)1.
 XML derives its strength from a variety of supporting technologies.  Structure and data types: When using XML to exchange data among clients, partners,
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 14 This presentation © 2004, MacAvon Media Productions XML.
XML – Basic Concepts (modified version from Dr. Praveen Madiraju) 2015, Fall Pusan National University Ki-Joune Li.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Extensible Markup Language (XML) Pat Morin COMP 2405.
XML: Extensible Markup Language
XML QUESTIONS AND ANSWERS
XML in Web Technologies
Web Databases and XML CSE 6331 © Leonidas Fegaras XML.
eXtensible Markup Language (XML)
Lecture 9: XML Monday, October 17, 2005.
CSE591: Data Mining by H. Liu
Presentation transcript:

CSE 6331 © Leonidas Fegaras XML1 Introduction to XML Leonidas Fegaras

CSE 6331 © Leonidas Fegaras XML2 Traditional DB Applications Typically business oriented Large amount of data Data is well-structured, normalized, with predefined schema Large number of concurrent users (transactions) Simple data, simple queries, and simple updates Typically update intensive Small transactions High performance, high availability, scalability Data integrity and security are of major importance Good administrative support, nice GUIs

CSE 6331 © Leonidas Fegaras XML3 Document Applications Human friendly: what-you-see-is-what-you-get paradigm Focus on presentation Information is divided into multiple small documents Mostly static Implicit structure: section, subsection, paragraph, etc Meta-data: title, author, date, indexing keywords, etc Content structure: form/layout, inter-relationships, references Tagging: eg, for new paragraph Operations: retrieving, editing, spell-checking, printing, etc Information retrieval: keyword queries –most successful in web search engines (eg, Google)

CSE 6331 © Leonidas Fegaras XML4 Internet Applications Internet applications use heterogeneous, complex, hierarchical, fast-evolving, unstructured/semistructured data access mostly read-only data need 100% availability manage millions of users world-wide have high-performance requirements are concerned with security (encryption) like to customize data in a personalized manner expect to gain user’s trust for business-to-consumer transactions. Internet users choose speed and availability over correctness

CSE 6331 © Leonidas Fegaras XML5 Electronic Commerce Currently, mostly business-to-business (B2B) rather than business-to-consumer (B2C) interactions Focus on selling and buying: –Order management –Product catalogs –Product configuration Sales and marketing Education and training Web services Communities

CSE 6331 © Leonidas Fegaras XML6 Other Web Applications Web services –Many standards: SOAP, WSDL, UDDI Web integration –Heterogeneous data sources and types –Thousands of web-accessible data sources –Dynamic data –Data warehouses Web publishing –Access different types of content from browsers (PDF, HTML, XML) –Structured, dynamic, customized/personalized content –Integration with application –Accessible via major gateways and search engines Application integration –Transformation between different data formats (eg, XML, HTML) –Integration of multiple applications

CSE 6331 © Leonidas Fegaras XML7 Current Internet Application Architectures Architecture: Server-Tier: relational databases and gateways to diverse data sources, such as, files, OLE/DB etc. Use of enterprise servers Middle-Tier: provides data integration & distribution, query, etc. Consists of a web server and an application server Client-Tier: mostly a web browser, may use CGI scripts or Java Characteristics: Customization is achieved at the server site (customer data in a database) with some data at the client site (cookies) Load balancing is typically hardware based (multiple servers, DNS routers)

CSE 6331 © Leonidas Fegaras XML8 HTML My Web Page Introduction Look at this document It is very simple: human readable, can be edited by any editor It reflects document presentation, not the semantics or structure of data Universal: portable to any platform HTML pages are connected through hypertext links HTML pages can be located using web search engines attribute nameattribute value opening tag closing tag hypertext link

CSE 6331 © Leonidas Fegaras XML9 XML XML (eXtensible Markup Language) is a textual language for representing and exchanging data on the web It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification Based on SGML It was developed around 1996 It is called extensible because –it is not a fixed format like HTML (a single, predefined markup language) –it is actually a metalanguage (a language for describing other languages) which lets you design your own customized markup languages for limitless different types of documents

CSE 6331 © Leonidas Fegaras XML10 XML (cont.) XML can be untyped (semistructured), but there are standards now for schema conformance –DTD –XML Schema Without schema, an XML document is well-formed if it satisfies simple syntactic constraints: –proper nesting of start and end tags With a schema, an XML document is valid if its structure conforms to a DTD or an XML Schema

CSE 6331 © Leonidas Fegaras XML11 Example Leonidas Fegaras (817) Ramez Elmasri (817)

CSE 6331 © Leonidas Fegaras XML12 Why XML is so Popular? It looks like HTML –simple, human-readable, easy to learn, universal Flexible & extensible, since you can represent any kind of data –unlike HTML HTML describes the presentation while XML describes the content Precise –well-formed: properly nested XML tags –valid: its structure may conform to a DTD or an XML Schema Supported by the W3C –trusted and adopted by industry Many standards around XML: schemas, query languages, etc

CSE 6331 © Leonidas Fegaras XML13 What XML has to do with Databases? XML is an important standardization for data representation and exchange, but still needs –to store and query large repositories of XML documents –data models and schema representations –query languages, data indexing, query optimizers –updates, view maintenance –concurrency, distribution, security, etc Example application: –an XML data repository distributed in a peer-to-peer network –answer queries, such as: find all books whose author is Smith and whose title contains the word “Web” –much like a web search engine, but for XML,... and for more precise querying

CSE 6331 © Leonidas Fegaras XML14 XML Syntax XML consists of tags and text XML documents conform to the following grammar: XMLdocument ::= Pi* Element Pi* Element ::= Stag (char | Pi | Element)* Etag Stag ::= ' ' Etag ::= ' ' Pi ::= ' ' Attributes ::= ( Name '=' String )* String ::= '"' char* '"' Tags come in pairs 8/25/2004 and must be properly nested: valid nesting invalid nesting Text is bounded by tags. PCDATA: parsed character data. eg, The Big Sleep 1935

CSE 6331 © Leonidas Fegaras XML15 XML Elements An element is a segment of an XML document between an opening and the matching closing tags Ramez Elmasri (817) An element may contain a mixture of sub-elements and PCDATA An element is a segment An abbreviation: for an element with empty content, we can use: instead of:

CSE 6331 © Leonidas Fegaras XML16 Representing Data Using XML ● Nesting tags can be used to express various structures, such as a record: Ramez Elmasri (817) We can represent a list by using the same tag repeatedly:...

CSE 6331 © Leonidas Fegaras XML17 XML structure XML: Ramez Elmasri (817) is Lisp-like: (person (name “ Ramez Elmasri ”) (tel “ (817) ”) ( “ ”)) and tree-like: person nametel Ramez Elmasri (817)

CSE 6331 © Leonidas Fegaras XML18 Attributes An opening tag may contain attributes –typically used to describe the content of an element Ramez Elmasri It's not always clear when to use attributes Ramez Elmasri ID attributes are special: must be unique within the document An IDref attribute must refer to an existing ID in the same doc

CSE 6331 © Leonidas Fegaras XML19 Referencing Elements Using IDs/IDrefs Jane Doe John Doe Mary Doe Jack Doe

CSE 6331 © Leonidas Fegaras XML20 A Complete Example Amazon Unix Network Programming Addison Wesley 1995 Richard Stevens An Introduction to Object-Oriented Design Addison Wesley 1996 Jo Levin Harold Perry 11.55

CSE 6331 © Leonidas Fegaras XML21 OODB Schema class Movie ( extent Movies, key title ) { attribute string title; attribute string director; relationship set casts inverse Actor::acted_In; attribute int budget; } ; class Actor ( extent Actors, key name ) { attribute string name; relationship set acted_In inverse Movie::casts; attribute int age; attribute set directed; } ;

CSE 6331 © Leonidas Fegaras XML22 In XML … Waking Ned Divine Kirk Jones III 100,000 Dragonheart Rob Cohen 110,000 Moondance Dagmar Hirtz 90,000 David Kelly Sean Connery 68 Ian Bannen :

CSE 6331 © Leonidas Fegaras XML23 DTD: Document Type Descriptor A DTD imposes a structure on an XML document Not quite a typing system –it is purely syntactic –now replaced by XML Schema Uses regular expressions to specify structure –firstnamean element with tag name firstname –book*zero or more books –year?an optional year –firstname,lastnamea firstname followed by lastname –book | journaleither a book or a journal

CSE 6331 © Leonidas Fegaras XML24 Example of XML Data Amazon Unix Network Programming Addison Wesley 1995 Richard Stevens

CSE 6331 © Leonidas Fegaras XML25 DTD Example

CSE 6331 © Leonidas Fegaras XML26 Summary of the DTD Syntax A tagged element in a DTD is defined by where e is a DTD expression If e, e1, e2 are DTD expressions, then so are: –EMPTYempty content –#PCDATAany text –Aan element with tag name A –e1,e2e1 followed by e2 –e1 | e2either e1 or e2 –e*zero or more occurrences of e –e+one or more occurrences of e –e?optional e (zero or one occurrences) –(e) Note: tagged elements are global –must be defined once in a DTD

CSE 6331 © Leonidas Fegaras XML27 DTD Syntax (cont.) Attribute specification: type is: IDmust be unique within the document IDREFa reference to an existing ID IDREFSmultiple IDREFs CDATAany string accuracy is #REQUIRED, #IMPLIED, #FIXED 'value', value 'v1... vn' ID, IDref, and IDrefs attributes are not typed! Example: <!ATTLIST person id ID #REQUIRED children IDrefs #IMPLIED > the id attribute is required while the children attribute is optional

CSE 6331 © Leonidas Fegaras XML28 Connecting an XML document to a DTD In-line the DTD into the XML file: <!DOCTYPE db [... ]>... Better: put the DTD in a separate file and reference it by URL: Documents are validated against their DTD before they are used XML data DTD

CSE 6331 © Leonidas Fegaras XML29 Recursive DTDs We want to capture a person with a mother and a father First attempt: where the first person is the mother while the second is the father Second attempt: Third attempt: <!ATTLIST person id ID #REQUIRED mother IDREF #IMPLIED father IDREF #IMPLIED>

CSE 6331 © Leonidas Fegaras XML30 Back to the OODB Schema class Movie ( extent Movies, key title ) { attribute string title; attribute string director; relationship set casts inverse Actor::acted_In; attribute int budget; } ; class Actor ( extent Actors, key name ) { attribute string name; relationship set acted_In inverse Movie::casts; attribute int age; attribute set directed; } ;

CSE 6331 © Leonidas Fegaras XML31 DTD

CSE 6331 © Leonidas Fegaras XML32 XML Namespaces When merging multiple docs together, name collisions may occur A namespace is a mechanism for uniquely naming tagnames and attribute names to avoid name conflicts Tag/attribute names are now qualified names (QNames) (namespace ':')? localname example: bib:author A document may use multiple namespaces A DTD has its own namespace in which all names are unique A namespace in an XML doc is defined as an attribute: xmlns:bib=” where bib is the namespace name and the URL is the location of the DTD The default namespace is defined as xmlns=”URL” If not defined, it is the global namespace

CSE 6331 © Leonidas Fegaras XML33 Example <item xmlns=“ xmlns:toy= “ backpack cyberpet

CSE 6331 © Leonidas Fegaras XML34 Query Languages for XML Need a language for XML data for –extracting fragments (querying) –restructuring (data transformation) –integrating (eg, combining multiple XML documents) –browsing –presentation (eg, from XML to HTML) We will first learn XPath –used in extracting fragments from a single document –many XML query languages are based on XPath We will briefly discuss XSLT –for extracting, restructuring, and presentation over a single document We will focus later on XQuery –a full-fledged query language –much like OQL

CSE 6331 © Leonidas Fegaras XML35 XPath Describes a single navigation path in an XML document Selects a sequence of nodes reachable by the path –the order of nodes is the document order Main construct: axis navigation Consists of one or more navigation steps separated by / A navigation step is a triplet axis :: node-test list-of-predicates Each navigation path is evaluated relative to a context node Examples: –/child::bib /descendant::author –/descendant::book [ /child::author = “Smith” ] /child::title Most people use shorthands –/bib//author –//book[author=“Smith”]/title

CSE 6331 © Leonidas Fegaras XML36 Axis Navigation In the beginning, the context node is the document root Dot (.) identifies the context node Some navigation steps: – / the root node – // the root node and its descendants –./author all the children of the context node with tagname author ; the context node of the next step is each of these children –.// the context node and all its descendants; the context node of the next step is each of the nodes the attribute value of the attribute name mother of the context node –./* all the children of the context node –.. parent of context node – text() all the text children of the context node Shortcut: you can remove./

CSE 6331 © Leonidas Fegaras XML37 Example a b cb d c b d /./aor /a--> [1] /./a./bor /a/b--> [2,4] /a/c--> [] /a/*/c--> [5,7] //b--> [2,6,4] //b/c--> [5] /a//c--> [5,7]

CSE 6331 © Leonidas Fegaras XML38 Predicates Many variations –[10]the tenth child node of the context node –[last()]the last child node of the context node –[author]true, if the context node has at least one child tagged author –[author/name]true, if the XPath./author/name is nonempty –author[name=“Smith”]true if the author name is Smith Examples < “100”]/title /bib/book[author/text()] author[name/firstname=“John” and name/lastname=“Smith”]/title /bib/book/author[name/firstname][address[//zip][city]]/name/lastname