Cornell CS 502 Markup Languages SGML, HTML, XML, XHTML CS 502 – 20020207 Carl Lagoze – Cornell University.

Slides:



Advertisements
Similar presentations
XML-XSL Introduction SHIJU RAJAN SHIJU RAJAN Outline Brief Overview Brief Overview What is XML? What is XML? Well Formed XML Well Formed XML Tag Name.
Advertisements

DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
What is XML? a meta language that allows you to create and format your own document markups a method for putting structured data into a text file; these.
XML: Extensible Markup Language
HTML/XML XHTML Authoring. Creating Tables  Table: An arrangement of horizontal rows and vertical columns. The intersection of a row and a column is called.
An Introduction to XML Based on the W3C XML Recommendations.
History Leading to XHTML
XHTML 16-Apr-17.
HTML and XHTML Controlling the Display Of Web Content.
17-Jun-15 XHTML 2 What is XHTML? XHTML stands for Extensible Hypertext Markup Language XHTML is aimed to replace HTML.
XML A brief introduction ---by Yongzhu Li. XML --- a brief introduction 2 CSI668 Topics in System Architecture SUNY Albany Computer Science Department.
26-Jun-15 XML. 2 HTML and XML, I XML stands for eXtensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
Tutorial 11 Creating XML Document
Introduction to XML Extensible Markup Language
Upgrading to XHTML DECO 3001 Tutorial 1 – Part 1 Presented by Ji Soo Yoon 19 February 2004 Slides adopted from
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
ECA 228 Internet/Intranet Design I Intro to XML. ECA 228 Internet/Intranet Design I HTML markup language very loose standards browsers adjust for non-standard.
XML – Extensible Markup Language Sivakumar Kuttuva & Janusz Zalewski.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Scientific Markup Languages Birds of a Feather A 10-Minute Introduction to XML Timothy W. Cole Mathematics Librarian & Professor of.
XML eXtensible Markup Language by Darrell Payne. Experience Logicon / Sterling Federal C, C++, JavaScript/Jscript, Shell Script, Perl XML Training XML.
Introduction to XML Eugenia Fernandez IUPUI. What is XML? From the World Wide Web Consortium (W3C) The Extensible Markup Language (XML) is the universal.
XML What is XML? XML v.s. HTML XML Components Well-formed and Valid Document Type Definition (DTD) Extensible Style Language (XSL) SAX and DOM.
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
Week 1 Understanding the Web Design Environment. 1-2 HTML: Then and Now HTML is an application of the Standard Generalized Markup Language Intended to.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language)  XML is a markup language for creating documents containing structured information.
1 © Netskills Quality Internet Training, University of Newcastle Introducing XML © Netskills, Quality Internet Training University.
3 XHTML.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Cornell CS 502 More XML XHTML, namespaces, DTDs CS 502 – Carl Lagoze – Cornell University.
XHTML. Introduction to XHTML What Is XHTML? – XHTML stands for EXtensible HyperText Markup Language – XHTML is almost identical to HTML 4.01 – XHTML is.
XML - Why: The HTML-Dilemma HTML, SGML, XML - How: Syntax, Concept, Language Elements Basics Well-formed XML-Documents (without DTD) Valid XML-Documents.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
XML About XML Things to be known Related Technologies XML DOC Structure Exploring XML.
Electronic Commerce COMP3210 Session 4: Designing, Building and Evaluating e-Commerce Initiatives – Part II Dr. Paul Walcott Department of Computer Science,
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath XPath XQuery XQuery.
Softsmith Infotech XML. Softsmith Infotech XML EXtensible Markup Language XML is a markup language much like HTML Designed to carry data, not to display.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
Lecture 16 Introduction to XML Boriana Koleva Room: C54
Web Technologies COMP6115 Session 4: Adding a Database to a Web Site Dr. Paul Walcott Department of Computer Science, Mathematics and Physics University.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
XML for Text Markup An introduction to XML markup.
An Introduction to XML Paul Donohue May 8th 2002 Hotel Senator Zürich.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Well Formed XML The basics. A Simple XML Document Smith Alice.
When we create.rtf document apart from saving the actual info the tool saves additional info like start of a paragraph, bold, size of the font.. Etc. This.
What is XML? eXtensible Markup Language eXtensible Markup Language A subset of SGML (Standard Generalized Markup Language) A subset of SGML (Standard Generalized.
XML CSC1310 Fall HTML (TIM BERNERS-LEE) HyperText Markup Language  HTML (HyperText Markup Language): December  Markup  Markup is a symbol.
©Silberschatz, Korth and Sudarshan10.1Database System Concepts W3C - The World Wide Web Consortium W3C - The World Wide Web Consortium.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
Introduction to XML Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Representing Web Data:
XML Introduction to XML Extensible Markup Language.
XML Notes taken from w3schools. What is XML? XML stands for EXtensible Markup Language. XML was designed to store and transport data. XML was designed.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
1 Introduction to XML Babak Esfandiari. 2 What is XML? introduced by W3C in 98 Stands for eXtensible Markup Language it is more general than HTML, but.
Unit 4 Representing Web Data: XML
XML QUESTIONS AND ANSWERS
Chapter 7 Representing Web Data: XML
Allyson Falkner Spokane County ISD
Presentation transcript:

Cornell CS 502 Markup Languages SGML, HTML, XML, XHTML CS 502 – Carl Lagoze – Cornell University

Cornell CS 502 Problem Richness of text –Elements: letters, numbers, symbols, case –Structure: words, sentences, paragraphs, headings, tables –Appearance: fonts, design, layout –Multimedia integration: graphics, audio, math –Internationalization: characters, direction (up, down, right, left), diacritics Its not all text

Cornell CS 502 Text vs. Data Something for humans to read Something for machines to process There are different types of humans Goal in digital libraries should be as much automation as possible Works vs. manifestations Parts vs. wholes Preservation: information or appearance?

Cornell CS 502 Who controls the appearance of text? The author/creator of the document Rendering software (e.g. browser) –Mapping from markup to appearance The user –Window size –Fonts and size

Cornell CS 502 Important special cases User has special requirements –Physical abilities –Age/education level –Preference/mood Client has special capabilities –Form factor (palm pilot, cell phone) –Network connectivity

Cornell CS 502 Page Description Language Postscript, PDF Author/creator imprints rendering instructions in document –Where and how elements appear on the page in pixels

Cornell CS 502 Markup languages SGML, XML Represent structure of text Must be combined with style instructions for rendering on screen, page, device

Cornell CS 502 Markup and style sheets rendering software formatted document document content & structure Marked-up document style sheet rendering instructions

Cornell CS 502 Multiple renderings from same marked-up documents rendering software PC display document content & structure marked-up document style sheet 1 print rendering software style sheet 2

Cornell CS 502 A short history of markup (b.w.) Def.: A method of conveying information (metadata) about a document Special characters used by proofreaders, typesetters Standard Generalized Markup Language –Standardized (ISO) in 1986 –Powerful, complex markup language widely used by government and publishers –Also used in the exchange of technical information in manufacturing –Functional overkill has limited widespread implementation and use

Cornell CS 502 HTML – Markup for the masses Core technology of web (along with URLs, HTTP) Simple fixed tag set Highly tolerant –Tag start/close blatz scrog –Capitalization 7-bit ASCII based Tags express both appearance and structure – This is structure –What do bold or italics mean?

Cornell CS 502 What is wrong with HTML? Fixed tag set –Extension has been difficult and chaotic? Pages that can be rendered by IE and not Netscape –Prevents localization 7-bit ASCII –What about kanji, arabic, math, chemistry, etc? Tolerance –Non-specific syntax – can’t be expressed in formal manner like BNF –Parsing is difficult, non-deterministic. Leads to “screen scraping” Non-structural markup –Prevents clean distinction of meaning from appearance

Cornell CS 502 eXtensible Markup Language Subset of SGML improving ease of implementation Meta-language that allows defining markup languages –No defined tags –Meta tools for definition of purpose specific tags DTDs, Schema Syntax is defined using formal BNF –Documents can be parsed, manipulated, stored, transformed, stored in databases…. Unicode character set W3C Recommendation (1998)

Cornell CS 502 XML Suite XML syntax – “well-formedness” XML namespaces – global semantic partitions XML schema – semantic definitions, “validity” XSLT – language for transforming XML documents –One application is stylesheets XPATH – specifying individual information items in XML documents Xpointer – syntax for stating address information in a link to an xml document. Xlink – specifying link semantics, types and behaviors of links

Cornell CS 502 Basic XML building blocks One or more elements –Opening tag –Empty element –Non-empty element Simple (CDATA) value – Paul Smith Complex value – Smith 48 One or more attributes per element – Les Miserables

Cornell CS 502 XML – sample instance document

Cornell CS 502 Every XML document must have a declaration Every opening tag must have a closing tag. Tags can not overlap (well-nested) XML documents can only have 1 root element Attribute values must be in quotation marks (single or double) – Only one value per attribute. XML – well formed-ness

Cornell CS 502 reserved characters should be encoded << && ]]>]]& >> “" ‘&apos; XML – well formed-ness

Cornell CS 502 element names must obey XML naming conventions: start with letter or underscore can contain letters, numbers, hyphens, periods, underscores no spaces in names! no leading space after < colon can only be used to separate namespace of the element from the element name case-sensitive can not start with xml, XML, xML, … XML – well formed-ness

Cornell CS 502 XML – well formed-ness White Spaces: space, tab, line feed, carriage return in HTML: must explicitly write white spaces as &nsbsp; because HTML processors strip off white spaces not so in XML: space in CDATA stays tab in CDATA stays multiple new line characters transformed into a single one

Cornell CS 502 XML as semi-structured data Unstructured data CarlLagozeIthaca GeorgeBushWashington IthacaNY27000 WashingtonDC Structured data Semi-structured data

Cornell CS 502 XML data representation invoice cust. product name addr. code quant.

Cornell CS 502 Document Object Model (DOM) W3C standard interface for accessing and manipulating an XML document Represents document as a tree with typed nodes –Document –Element –Attribute –Text –Comment DOM parser reads an XML document and builds a tree from it

Cornell CS 502 DOM Interface Features Class structure for entities in XML documents Construct tree nodes of various types –E.g. construct element Create nesting structure (linkages) among nodes –E.g. appendChild Traverse trees –E.g. getFirstChild, getNextSibling Specialized sub-classes for HTML

Cornell CS 502 Simple DOM Example "XML Basics"

Cornell CS 502 DOM support in multiple languages Java –JAXP (Sun) –Xerces (Apache) Perl –XML::parser module

Cornell CS 502 Simple API for XML (SAX) Event-based interface Does not build an internal representation in memory Available with most XML parsers Main SAX events –startDocument, endDocument –startElement, endElement –characters

Cornell CS 502 Simple SAX Example Document Events startDocument() startElement(“books”) startElement(“book”) characters(“War and Peace”) endElement(“book”) endElement(“books”) endDocument()

Cornell CS 502 Why use SAX? Memory efficient Data structure independent (not tied to trees) Care only about a small part of the document Simplicity Speed

Cornell CS 502 Why use DOM? Random access through document Document persistence for searches, etc. Read/Write Lexical information –Comments –Encodings –Attribute order

Cornell CS 502 xHTML HTML “expressed” in XML Corrects defects in HTML –All tags closed –Proper nesting –Case sensitive (all tags lower case) –Strict well-formedness Defined by a DTD –Strict –Transitional –Frameset –

Cornell CS 502 xHTML (cont.) All new HTML SHOULD be xHTML W3C validator – Tidy –