McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using XML Parsers and Unicode Ellen Pearlman Eileen Mullin Programming.

Slides:



Advertisements
Similar presentations
Essentials for Design JavaScript Level One Michael Brooks
Advertisements

WeB application development
XHTML Basics.
 Fundamentals of Web Design.  Describe the history and theory of XHTML  Understand the rules for creating valid XHTML documents  Apply a DTD to an.
1 eVenzia Technologies Learning HTML, XHTML & CSS Chapter 1.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. How to Create Web Pages Using HTML Introduction.
IS 373—Web Standards Todd Will
Introduction to XML Extensible Markup Language
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Developing a Basic Web Page with HTML
Dreamweaver 8 Concepts and Techniques Introduction Web Site Development and Macromedia Dreamweaver 8.
1 Chapter 20 — Creating Web Projects Microsoft Visual Basic.NET, Introduction to Programming.
1st Project Introduction to HTML.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Creating Document Type Definitions (DTDs) Ellen Pearlman Eileen Mullin.
Chapter 2 Introduction to HTML5 Internet & World Wide Web How to Program, 5/e Copyright © Pearson, Inc All Rights Reserved.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Chapter ONE Introduction to HTML.
Lesson 4 Computer Software
Creating a Simple Page: HTML Overview
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
CHAPTER 1 GETTING STARTED WITH HTML. LEARNING OBJECTIVES How a Web browser downloads and processes an HTML page for display What is the purpose of a Web.
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.
DAT602 Database Application Development Lecture 14 HTML.
JavaScript, Fifth Edition Chapter 1 Introduction to JavaScript.
Creating Web Pages Overview. Design – Start with a Purpose Before you start any web page, you need to design the website. The first question that should.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Schemas Ellen Pearlman Eileen Mullin Programming the Web Using XML.
Chapter 1 Understanding the Web Design Environment Principles of Web Design, 4 th Edition.
Week 1 Understanding the Web Design Environment. 1-2 HTML: Then and Now HTML is an application of the Standard Generalized Markup Language Intended to.
CS117 Introduction to Computer Science II Lecture 1 Introduction to WWW and HTML Instructor: Li Ma Office: NBC 126 Phone: (713)
XHTML1 Building Document Structure Chapter 2. XHTML2 Objectives In this chapter, you will: Learn how to create Extensible Hypertext Markup Language (XHTML)
XHTML. Introduction to XHTML What Is XHTML? – XHTML stands for EXtensible HyperText Markup Language – XHTML is almost identical to HTML 4.01 – XHTML is.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Introduction to XML Extensible Markup Language. What is XML XML stands for eXtensible Markup Language. A markup language is used to provide information.
 XML is designed to describe data and to focus on what data is. HTML is designed to display data and to focus on how data looks.  XML is created to structure,
Tutorial 1: XML Creating an XML Document. 2 Introducing XML XML stands for Extensible Markup Language. A markup language specifies the structure and content.
1 Tutorial 13 Validating Documents with DTDs Working with Document Type Definitions.
CSC 551: Web Programming Fall 2001 emerging & alternate Web technologies  Dynamic HTML  ActiveX  XML course overview  online review sheet  advice.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. SMIL Ellen Pearlman Eileen Mullin Programming the Web Using XML.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
XP Tutorial 9 1 Working with XHTML. XP SGML 2 Standard Generalized Markup Language (SGML) A standard for specifying markup languages. Large, complex standard.
E0262 – MIS – Multimedia Storage Techniques XML (Extensible Markup Language  XML is a markup language for creating documents containing structured information.
XML 2nd EDITION Tutorial 1 Creating An Xml Document.
HTML Structure & syntax. Introduction This presentation introduces the following: Doctype declaration HTML Tags, Elements and Attributes Sections of a.
XHTML By Trevor Adams. Topics Covered XHTML eXtensible HyperText Mark-up Language The beginning – HTML Web Standards Concept and syntax Elements (tags)
1 Introduction  Extensible Markup Language (XML) –Uses tags to describe the structure of a document –Simplifies the process of sharing information –Extensible.
XP 1 Creating an XML Document Developing an XML Document for the Jazz Warehouse XML Tutorial.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Applying eXtensible Style Sheets (XSL) Ellen Pearlman Eileen Mullin Programming.
1 Introduction to XML XML stands for Extensible Markup Language. Because it is extensible, XML has been used to create a wide variety of different markup.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
What it is and how it works
1 Tutorial 11 Creating an XML Document Developing a Document for a Cooking Web Site.
Copyright © 2003 Pearson Education, Inc. Slide 1-1 Created by Cheryl M. Hughes The Web Wizard’s Guide to XHTML by Cheryl M. Hughes.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. An Overview of XML Ellen Pearlman Eileen Mullin Programming the Web Using.
HTML Basics. HTML Coding HTML Hypertext markup language The code used to create web pages.
XP Tutorial 9New Perspectives on HTML and XHTML, Comprehensive 1 Working with XHTML Creating a Well-Formed Valid Document Tutorial 9.
Objective: To describe the evolution of the Internet and the Web. Explain the need for web standards. Describe universal design. Identify benefits of accessible.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
Hypertext Markup Language.  Developed by Tim Berners-Lee in 1990  Stands for HyperText Markup Languange  A format that tells a computer how to display.
Introduction to XML Jussi Pohjolainen TAMK University of Applied Sciences.
1 CSC160 Chapter 1: Introduction to JavaScript Chapter 2: Placing JavaScript in an HTML File.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
Blended HTML and CSS Fundamentals 3 rd EDITION Tutorial 1 Using HTML to Create Web Pages.
Project 1 Introduction to HTML.
Web Site Development and Macromedia Dreamweaver 8
Chapter 1 Introduction to HTML.
Project 1 Introduction to HTML.
An Introduction to HTML Pages
Presentation transcript:

McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using XML Parsers and Unicode Ellen Pearlman Eileen Mullin Programming the Web Using XML

6-2 Learning Objectives 1.Understanding what an XML parser does 2.Working with the basic Microsoft parser 3.Differentiating between valid documents in different parsers and the way they define error statements 4.Learning about Unicode and UTF-8, UTF- 16 and UTF-32 5.Investigating different character sets and typefaces for Unicode

6-3 Introduction A parser is a grammar and syntax checker for markup and other programming languages. A parser compares a XML document against the grammar in its DTD. This process, called validation ensures there are no mistakes that could potentially confuse the XML applications that access your content. If a document follows the rules listed in its DTD, then it is said to be valid. If the document has markup errors that contradict the rules of the DTD, then it would be labeled invalid.

6-4 What is Unicode? The Unicode Consortium was founded with the goal to foster a character encoding that encompasses all major scripts in the world. Currently, Unicode has a little less than 50,000 different characters encoded in 16 bits for a total of up to 65,536 possible characters. Already almost a third of the encoded characters are in Han Chinese ideographs. More languages are on the way, and so Unicode will jump to 32 bits per character. Because XML uses Unicode as its character set, all character sets are compatible.

6-5 Parsers When XML parsers first became commonly used, they consisted of basic text editors like Microsoft's NotePad, Wordpad and Apple SimpleText, and not much else. These basic text editors could not support Unicode. Now parsers are divided into three categories, basic text editors, graphical text editors and integrated development environments.

6-6 How Parsers Work In general, a parser looks for certain specifics, like the beginning of an XML statement <?xml version =, or even parenthesis (), percent sign % and so on. Just as we look for a period (.) to end a sentence, a parser looks for certain pre- established XML grammatical conventions to know that a statement is correctly formed.

6-7 Differences Between an XML Parser and an HTML Parser With HTML, there is already a pre-set standard that already tells the Web browser application how to render the information visually. An XML editor or parser does not have any predetermined definition of your documents’ element and attribute names. An XML parser only knows basic valid and invalid rules. An XML parser only knows how to look at pure character strings.

6-8 The Basic Microsoft Parser MSXML, Microsoft’s basic XML parser, is a good, free parser that is embedded into the Internet Explorer browser. MSXML is a graphical text editor. It can be referred to as a WYSIWYG (What You See Is What You Get) editor. That means that there are no implied statements, and everything is displayed on the screen.

6-9 Titus Andronicus coded in XML

6-10 Titus Andronicus: Play.dtd

6-11 Play.dtd With End Tag Missing

6-12 The Error Message Produced by the Missing End Tag

6-13 Line 5223, Referred to by the Error Message

6-14 Showing play.dtd, Line 5223 Now Complete

6-15 Creating Your Own Valid Document: validatortest.xml document

6-16 validatortest.xml Document in IE

6-17 validatortest.xml Document in Netscape

6-18 A Word About Errors Most parsers deal with errors in XML in one of two ways. There are errors and then there are fatal errors. –A basic error is a violation of the rules in whatever specification it is checking the code against (i.e. XSLT, plain XML). The parser points out the error and continues processing. –A fatal error stops the parser from checking the code. It also stops the XML document from being well-formed.

6-19 Using XML Spy XML Spy can be thought of as an IDE because it not only has a text and code editor, but also a compiler, debugger and GUI intuitive interface. With XML Spy a developer could actually build a sophisticated project. There are two basic views, the Text view, which resembles any text editor and the Enhanced Grid View, which shows more of the schema of the document.

6-20 Altova XML Spy Home Page

6-21 Initial Code Listing: validatortest.xml <!DOCTYPE scribble [ ]> Our first line Our second line Our third line Our fourth line

6-22 Invalid Validatortest.xml in XML Spy Program

6-23 Viewing Validatortest.xml in IE

6-24 Corrected Version: validatortest.xml <!DOCTYPE scribble [ ]> Our first line Our second line Our third line Our fourth line

6-25 Other XML Editors: Viewing validatortest.xml in XML Edit Pro

6-26 The Development of a Global Standard: Introducing ASCII ASCII is actually a subset of other character sets that contain 256 characters. ASCII was a 7-bit coding system with a limited range and in order to increase its range, an 8-bit coding system was developed, Latin-1 (ISO 646), which coded 256 characters. It became the language character set of choice for the Internet, , gopher, and ftp sites. However, this did not cover all characters that existed in all other non-Latin based languages.

6-27 The Development of a Global Standard: Unicode In order to expand the range of permissible characters in 1983, ISO was developed that used 32 bits and could code 4 billion different characters. However, the code string became too big, and actually clogged up the bandwidth pipes it flowed through. Unicode, developed in 1987 by the International Standard ISO/IEC and maintained since 1991 by the Unicode Consortium, halved the code bit to 16, making it a workable solution because now it could handle more characters using less bandwidth.

6-28 The Adoption of Unicode Unicode provides a unique number for each and every character in the world, no matter what platform, program or language they are viewed on. Every major vendor and standards body, operating system, browser and host of other products has adopted the standard. Another standard, ISO :1993, is being used on the Web and has, for all purposes, Unicode has become a subset of that ISO standard.

6-29 Unicode Enabled Operating Systems Below is a list of operating systems that are Unicode-enabled: –Apple Mac OS 9.2, Mac OS X 10.1, Mac OS X Server, ATSUI –Bell Labs Plan 9 –Compaq's Tru64 UNIX, Open VMS –GNU/Linux with glibc or newer - FAQ support –IBM AIX, AS/400, OS/2 –Inferno by Vita Nuova –Java –Microsoft Windows CE, Windows NT, Windows 2000, and Windows XP –SCO UnixWare –Sun Solaris –Symbian Platform

6-30 XML:LANG Attribute One of the most important attributes used in combination with XML and Unicode is the xml:lang attribute. It is the only attribute to use a language code. This attribute asks the XML software to call upon the server to process the current document with the specified language. An example of this would be as follows coded in an XML statement: Hola amigo

6-31 Pull-down Menu Structure in XML Spy to Add Elements and Attributes

6-32 Unicode for Cherokee language

6-33 UTF-8 and Beyond UTF, which stands for Universal Character Set Transformation Format, allows Unicode to be broken into 8, 16 or even 32 bit values that are used in and on the Internet.. Unicode encodes all text by the type of script (i.e. English language, Cyrillic, etc) used, not the language used, an important distinction that avoids unnecessary duplication of letters.

6-34 Character Sets and Typeface Character sets do not refer to display formats, colors or typefaces. Unicode characters become visible to the user through a special rendering process that maps characters into glyphs. Glyphs are the specific shape of any given character as it is displayed. The actual character "A" is really a generic "A" which might look like the plain letter "A". Many things affect this rendering process such as operating systems, language settings, keyboard and display software, word processing software, type rasterizer and input and output hardware.

6-35 Character Sets and Typeface (2) In ASCII there is a one-to-one correlation between the character, the glyph and the character set. That means that ASCII strips a character raw and renders it in basic text which resembles to most of us plain Courier. This is not true for Unicode. It can render beautiful scripts. Different standards bodies have been set up to make sure languages and scripts coordinate.

6-36 The End