Download presentation
Presentation is loading. Please wait.
Published byJulian Waters Modified over 8 years ago
1
XML – eXtensible Markup Language
2
The World Wide Web and What We Would Like to Do with It XML has a lot of hype surrounding it This week we discuss: –Why XML is needed –Basic technologies used together with XML In the next few weeks: challenges in using XML
3
XML in One Slide Basically, XML looks like HTML. However, in XML, you can use any tag names that you want Example: Lisa Simpson 02-828-1234 054-470-777 lisa@cs.huji.ac.il Is that all? Big Deal?!
4
Motivation (1): The Semantic Web
5
Example 1: A Homepage on the Web Tom Sawyer's Homepage Tom's Friends Tom's Hobbies: Boating on the Mississippi River Chewing Gum Painting the Fence
6
Web Pages are Written in HTML HTML is a markup language An HTML page consists of tags with attributes and data HTML describes the style of the page (e.g., color, font type, etc.)
7
Tom Sawyer's Homepage Hi'ya all. Did you know that my best friend is Huckleberry Finn ? Sometimes, I like Becky Thatcher ? Here are some of my hobbies: Boating on the Mississippi River Chewing gum Painting the fence If you want to discuss common interests, contact me at tom@mark.twain
8
Automatically Using Information Tom Sawyer has a homepage. So do a lot of other people. It would be nice to be able to do the following things automatically (via a computer program) –Querying the Page: Find Tom Sawyer's email address and the names of his friends –Querying Similar Pages: Find people who have interests in common with Tom Sawyer
9
Automatically Using Information Site Personalization: Tom Sawyer's interests should be automatically recognized by sites –When Tom Sawyer enters Amazon, he should get "book recommendations" that match his interests –When Tom Sawyer enters a site that sells food, he should be told about sales on gum –This should all happen without Tom having to tell every site about his interests
10
Can we Automatically use the Information? In order to perform the tasks described before, we have to: –Find web pages that describe people –Extract the relevant information Problems: –How can we know if a page describes a person? –How can we know what to extract? (Everyone has their own style for their homepage...) –How can we "understand" the extracted information (What parts of the page describe which information?)
11
Example 2: Weather Forecasting National Weather Service: Weather Forecasting and Weather Alerts Flood Alerts in Mississippi
12
Wouldn't it be great if… Wouldn't it be great if Tom could get automatic updates of weather problems in Mississippi? It is dangerous to go boating if there are floods…
13
Example 3: News Alerts Yahoo News Traffic Jam in the Mississippi River
14
Wouldn't it be great if… Wouldn't it be great if Tom could get automatic updates of important news related to Mississippi? He might want to choose a different river to go boating…
15
Can these things be done? Once again, we need to FIND the relevant pages and EXTRACT the relevant data HTML pages are constantly changing How can we figure out what data is relevant and what the data is talking about automatically? (even when the page changes) HTML describes only style and not meaning (or semantics) It is difficult (perhaps impossible) to perform these tasks
16
Two Basic Approaches If the information on the Web was neatly organized in a huge database, these problems could be solved. But its not – What should we do? AI, NLP Approach: Use smart techniques to recognize information, e.g., recognize patterns about how things are written DB Approach: Turn the Web in to a “database”, by writing it in XML
17
The Semantic Web The Semantic Web is a machine-understandable Web The meaning of data (i.e., the semantics of data) should be encoded together with the data Tim Berners-Lee, the inventor of the Web (by putting together the ideas of hyper-text, TCP/IP, DNS) is one of the main people behind the Semantic Web
18
Main Technologies Needed XML: The syntax for marking up text with meaning RDF: Defines objects and relationships between them OWL: Defines ontologies which connect different concepts (e.g., a car is an automobile, a car is a type of locamotive) Web Services: Allow services given online to be accessed programmatically Here is a simplified version of how it could work
19
Thomas Sawyer Male English Huckleberry Finn Simplified version of the FOAF standard
20
Is there XML on the Web? (1) The weather forecasting site exports its forecasts as RSS (a standard for marking up news) - this data can easily be used by a program
21
Is there XML on the Web? (2) Yahoo News (seen before) exports its news as RSS - this data can easily be used by a program
22
The Sky’s The Limit: Doctor’s appointment “The Semantic Web”, Scientific American, May 2001 Mom Physician’s Agent Lucy’s Agent required treatment Schedule appointment Insurance Co. Provider sites Rating in-plan? close-by? Specialist? Pete’s Agent Driving schedule
23
Motivation (2): Data Exchange
24
Exchanging Data Problem: Many data sources, each of a different type (different vendor), with a different schema. –How can the data be combined and used together? –How can different companies collaborate on their data? –What (proprietary?) format should be used to exchange the data?
25
Usage Scenario: Company Collaboration Several companies want to collaborate Need to share data Each company has a different type of database system with a different schema Solution: Agree on a XML schema for exchange. Import to and export from this schema
26
Motivation (3): Separating Content From Style
27
Web Site Development Web sites develop over time Important to separate style from data in order to allow changes to the site structure and appearance CSS separates style from data only in a limited way – HTML will still have tables, lists, etc Using XML, we can store data alone Using XSL, this data can be translated into HTML The data can be translated differently as the site develops
28
Write Once Use Everywhere XML Stock Data XS L WML (hand-held devices) XS L HTML (web browser XS L TEXT (Excel)
29
XML Syntax
30
HTML Used for publishing hypertext on the World- Wide Web Designed to describe how a Web browser should arrange text, images and push- buttons on a page Easy to learn, but does not convey structure Fixed tag set
31
HTML Example Welcome to the DBI course Introduction Opening tag Closing tag Text (PCDATA) “Bachelor” tag Attribute name Attribute value
32
XML Vs. HTML XML and HTML are “brothers”. They are both special cases of SGML. HTML has specific tag and attribute names. These are associated with a specific meaning XML can have any tag and attribute name. These are not associated with any meaning HTML is used to specify visual style XML is used to specify meaning HTML XML SGML
33
Terminology The segment of an XML document between an opening and a corresponding closing tag is called an element Bart Simpson 02 – 444 7777 051 – 011 022 bart@tau.ac.il element element, a sub-element of not an element
34
XML Document is a Tree XML documents are abstractly modeled as trees, as reflected by their nesting Sometimes, XML documents are graphs (by using IDs and IDREFs) person name email tel Bart Simpson 02 – 444 7777 051 – 011 022 bart@tau.ac.il
35
Example XML Fragment Donald Duck 04-828-1345 04-828-1374 donald@cs.technion.ac.il Miki Mouse 03-426-1142
36
Another Example An element may contain a mixture of sub- elements and PCDATA British Airways World’s favorite airline
37
A Complete XML Document Lisa Simpson 02-828-1234 054-470-777 lisa@cs.huji.ac.il Required Optional
38
Attributes An opening tag may contain attributes These are typically used to describe the contents of an element cheese fromage branza A food made …
39
When to Use Attributes It’s not always clear when to use attributes L. Simpson lisa@cs.huji.ac.il... 123 4589 L. Simpson lisa@cs.huji.ac.il...
40
When to Use Attributes It’s not always clear when to use attributes L. Simpson lisa@cs.huji.ac.il... 123 4589 L. Simpson lisa@cs.huji.ac.il... General Rule: Use an element if you need to nest data Use an attribute for “IDs”, i.e., identifying data More on this soon…
41
Rules for XML (1) XML is order sensitive, i.e. the following are different: XML is case-sensitive, i.e., the following are different:,, cheese fromage fromage cheese
42
Rules for XML (2) Tags come in pairs... They must be properly nested. Which of the following are good? –......... –...... There is a special shortcut for tags that have no text in between them (bachelor tags) –
43
Rules for XML (3) There should be exactly one top-level element. This element is also called the root element Which of the following is legal? Is this legal? Is this legal? You tell me.
44
Well Formed Documents A document is well-formed if it –obeys all the above rules, and in addition –does not repeat an attribute within a tag, i.e., the following is illegal: …
45
Tables Versus XML Can you easily represent the contents of a table in XML? –Example: Projects(title, budget, managedBy), Employees(name, age, ssn) Can you easily represent the contents of an XML document in a table? –Example: Remember the phone bookExample: Remember the phone book
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.