1 WebOQL A Web Object Query Language. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts.

Slides:



Advertisements
Similar presentations
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Advertisements

HTML popo.
XML: Extensible Markup Language
WeB application development
Developing a Web Site: Links Using a link is a quicker way to access information at the bottom of a Web page than scrolling down A user can select a link.
Compute a listing of the papers’ publication data grouped by title. Select [x.Title / Select [z.Publication] from y in csPapers, z in y’ Where x.title.
1 COS 425: Database and Information Management Systems XML and information exchange.
Chapter 4 : Query Languages Baeza-Yates, 1999 Modern Information Retrieval.
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
WebOQL A Web Object Query Language. Overview Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports.
Representation of Web Data in a Web Warehouse Ragini A.S. & Shipra Dutta November 20 th, 2001.
HTML Companion. Lecture Objectives Learn about HTML. Know basic HTML tags.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
XML Technologies and Applications Rajshekhar Sunderraman Department of Computer Science Georgia State University Atlanta, GA 30302
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
CORE 2: Information systems and Databases HYPERTEXT/ HYPERMEDIA.
1 Advanced Topics XML and Databases. 2 XML u Overview u Structure of XML Data –XML Document Type Definition DTD –Namespaces –XML Schema u Query and Transformation.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.
Computer Science 101 HTML. World Wide Web Invented by Tim Berners-Lee at CERN, the European Laboratory for Particle Physics in Geneva, Switzerland (roughly.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
CpSc 462/662: Database Management Systems (DBMS) (TEXNH Approach) HTML Basics James Wang.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
XP 1 CREATING AN XML DOCUMENT. XP 2 INTRODUCING XML XML stands for Extensible Markup Language. A markup language specifies the structure and content of.
4 HTML Basics Section 4.1 Format HTML tags Identify HTML guidelines Section 4.2 Organize Web site files and folder Use a text editor Use HTML tags and.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
HTML (HyperText Markup Language)
CNIT 133 Interactive Web Pags – JavaScript and AJAX JavaScript Environment.
F-1 Management Information Systems for the Information Age Copyright 2004 The McGraw-Hill Companies, Inc. All rights reserved Extended Learning Module.
Section 4.1 Format HTML tags Identify HTML guidelines Section 4.2 Organize Web site files and folder Use a text editor Use HTML tags and attributes Create.
1 CIS336 Website design, implementation and management (also Semester 2 of CIS219, CIS221 and IT226) Lecture 6 XSLT (Based on Møller and Schwartzbach,
Avoid using attributes? Some of the problems using attributes: Attributes cannot contain multiple values (child elements can) Attributes are not easily.
Introduction to World Wide Web Authoring © Directorate of Information Systems and Services University of Aberdeen, 1999 IT Training Workshop.
Winter 2006Keller, Ullman, Cushing18–1 Plan 1.Information integration: important new application that motivates what follows. 2.Semistructured data: a.
Fundamentals of Web Design Copyright ©2004  Department of Computer & Information Science Introducing XHTML: Module A: Web Design Basics.
 2008 Pearson Education, Inc. All rights reserved Introduction to XHTML.
1 John Magee 9 November 2012 CS120 Lecture 17: The World Wide Web and HTML Web Publishing.
CA Professional Web Site Development Class 2: Anatomy of a Web Site and Web Page & Intro to HTML.
HTML: Hyptertext Markup Language Doman’s Sections.
ITCS373: Internet Technology Lecture 5: More HTML.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
4 HTML Basics Section 4.1 Format HTML tags Identify HTML guidelines Section 4.2 Organize Web site files and folder Use a text editor Use HTML tags and.
HTML (Hyper Text Markup Language) Lecture II. Review Writing HTML files for web pages – efficient compact – fundamental. Text files with htm extension.
HTML Basics. HTML Coding HTML Hypertext markup language The code used to create web pages.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Session 1 Module 1: Introduction to Data Integrity
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Linking XML Documents Ellen Pearlman Eileen Mullin Programming the Web.
1 HTML: HyperText Markup Language Representation and Management of Data on the Internet.
1999, COMPUTER SCIENCE, BUU Introduction to HTML Seree Chinodom
XP 1 Charles Edeki AIU Live Chat for Unit 2 ITC0381.
SEMI-STRUCTURED DATA (XML) 1. SEMI-STRUCTURED DATA ER, Relational, ODL data models are all based on schema Structure of data is rigid and known is advance.
HTML Basics.
XML: Extensible Markup Language
Section 4.1 Section 4.2 Format HTML tags Identify HTML guidelines
Introducing XHTML: Module A: Web Design Basics
Introducing XHTML: Module A: Web Design Basics
XML QUESTIONS AND ANSWERS
3.00cs HTML Overview 3.00cs Develop webpages.
Appendix D: Network Model
COMPUTING FUNDAMENTALS
WEBSITE DESIGN Chp 1
Introducing HTML & XHTML:
Chapter 27 WWW and HTTP.
Computers and Scientific Thinking David Reed, Creighton University
Understand basic HTML and CSS terminology, concepts, and basic operations. Objective 3.01.
Presentation transcript:

1 WebOQL A Web Object Query Language

2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports querying small databases represented as documents (such as catalogs), restructuring single pages (converting a large page into smaller pages), restructuring sets of pages, for example, creating an index page containing a hyperlink to each of them and adding to each page a hyperlink to index page. Restructuring the content of a web site in order to show the same content in another view

3 Data Model The WebOQL data model introduces the hypertree: a tree based Data model representing structured document containing hyperlinks Hypertrees are Ordered arc-labeled trees with two kinds of arcs – Internal and external. Internal arc: represent structured objects External arc: represent references (links), cannot have descendants and their records must contain a ‘ URL ’ field.

4 Data Model Example: [Group: students] [Group: professors] [Name: moshe. Sem: 5] [Name: arik. Sem: 8] [Label: moshe home page. URL: www … /index.html] [Label: arik home page. URL: www … /index.html] [Name: oded. Seniority: 8] [Label: seminar in www. URL: www … /s.html] [Label: databases. URL: www … /index.html]

5 Data Model Hyper trees are a useful data structure because the have three important abstractions: Collections Nesting Ordering The reference notion which is very important to the web structure is captured through the distinction between internal and external arcs. Because the nodes have no type the tree can hold heterogeneous records within its arcs.

6 Data Abstractions WEB a pair (t,F) where: t is a hypertree and F : URLs Hypertrees schema browsing function PAGE F(u) where u is a URL

7 Tree operators Definitions : Tails: a tails of tree t are trees obtained by chopping prefixes of t. Simple tree: simple trees of tree t are the trees that are composed of an arc that stems from the root of t and its sub tree. Subtree: subtrees of t are the trees at the end of arcs which stem from the root of t.

8 [Label:1] [Label:2] [Label:3] [A:1] [A:2] [B:1] [Label:2] [Label:3] [B:1] [Label:1] [A:1] [A:2] Tree t Sample trees of t [A:1] [A:2] [B:1] null Sub trees of t

9 Tails of T ! (prefixes) [Label:1] [Label:2] [Label:3] [A:1] [A:2] [B:1] [Label:2] [Label:3] [B:1] [Label:3]

10 Connects two trees by their roots: t1: [label1: a2] [label1: a1] [label1: a] [label1: c] t2: [label1: b] [label1: c2] [label1: c1] t1 + t2: [label1: b] [label1: c2] [label1: c] [label1: c1] [label1: a2] [label1: a1] Tree operators Concatenate : Tree1 + Tree2

11 Hangs the tree from a new arc. t1: [label1: a2] [label1: a1] Tree operators Hang : [ Arc1 / Tree1 ] [label1: a2] [label1: a1] [label1: a] [ label1: a / t1 ]

12 The first subtree of the argument. Tree operators Prime : Tree’ t1: [label1: a2] [label1: a1] [label1: a] [label1: b] t1 ’ : [label1: a2] [label1: a1]

13 The first x simple trees of the argument, if x is not specified then only the first simple tree. Tree operators Head : Tree & [x] t1: [label1: a2] [label1: a1] [label1: a] [label1: b] t1& : [label1: a2] [label1: a1] [label1: a]

14 q4’ q5& q5! q5&2 q5 q6 q7

15 HANG [Label: “papers from smith”, Format: “ps.Z”/q1] [Tag: “UL”/[Tag: “LI”, Text: “First Child”]+ [Tag: “LI”, Text: “Second Child”]+ [Tag: “LI”, Text: “Third Child”]+ [Url: “ Label “Click Here”] [Label:Papers from smith Format:ps.Z] [Title : Are……….. Url: [Title:Recent……….. Url: HANG + concatenate [Tag:UL] [Tag:LI Text:FirstChild] [ ] [Url: “ Label “Click Here”]

16 Extracts a field from an arc ’ s label, e.g. Example.Group can have a value of ‘ students ’. If this filed does not exist a value of ‘ nil ’ is returned. Tree operators Peek : Arc.field Test for the presence of a field from in an arc ’ s label, e.g. Example?Group evaluates to true, while Example?Name evaluates to false. IsField : Arc?field

17 PPage – when a hypertree has an associated URL that identifies it. Web – Collection of interrelated pages. External Arc of each page is a link in the web Schema – A web can be optionally have a distinguished page to provide entry point to the web

NNo Schema: One must know URL of one or more pages

19 Web Schema Weboql query New page

20 First Child Second Child Third Child Click Here

21 [Tag: LI Text:First Child] [Tag: LI Text:Second Child] [Tag: LI Text:Third Child] [Url: Label: Click here] Tree representing HTML document consisting of a list and a hyperlink Trees are ordered Arcs are not labeled with atomic values but records

22 Paper Database CS papers [group:DBMS] [group:ProgLang] [group:Card] [Title:Recent Authors:Smith Publications:Tech] [Title:Are…… Authors:Smith Publications:ACM] [Label:Full Papers Url: www…] [Label:Abstract Url: www…]

23 SELECT - FROM - WHERE This familiar query language construct is used by WebOQL as the main construct of queries. Select From Where A boolean condition Definition of variables Query to evaluate x in example, y in x! x.Seniority = 8 [y.Label, y.URL]

24 SELECT - FROM - WHERE For each instantiation of the variables in the from clause check the condition in the where clause, if its true then evaluate the query in the select clause and append it to the result. [Label: seminar in www. URL: www … /s.html] [Label: databases. URL: www … /index.html]

25 Select [y.title, y.publication] From x in cs papers, y in x’ Missing data Publication - undefined

26 Compute a listing of the papers’ publication data grouped by title. Select [x.Title / Select [z.Publication] from y in cspapers, z in y’ Where x.title = z.title ] From w in csPapers, x in w’

27 Schema – a distinguished hypertree Browsing function – maps strings (URLs) to hypertree, it defines a graph where the nodes are pages and there is an arc between node a and b if the content of the page at node a contains an external arc whose url attribute is the url of the page at node b.

28 Analogy with Relational database Hypertree > Relations Webs > databases Schema of a web >catalog of a database

29 Select [x.Tag] From x in browse(“ [Tag :head] Tag : body]

30 SFW creates a web Select Title and URLs of papers authored by Smith. Select [y.Title, y’.URL] as schema From x in csPapers, y in x’ Where y.authors ~”smith”

31 Create a web page with URL “Group Names” whose content is the list of group names (assume that there is no such page in the current web) Select [x.Group] as “Group Names” from x in csPapers

32 Create several pages ; one for each research group (using the group name as URL). Each page contains the publications of the corresponding group Select x’ as x.Group from x in csPapers

33 Data Model Records as Labels on Arcs Internal and External Arcs [Label: Theatres Online, Url: Base: Text: This page contains...] [Tag: UL Text: one of the…] [Label: Sports Zone, Url: Base: Text: Sports Zone…] [Tag: XYZ, Text: One of the…] [Tag: XYZ, Text: If you are…] [Tag: XYZ, Text: …] [Tag: H1, Text: City Overview…] [Tag: LI, Text: One of the…] [Tag: L1, Text: If you are interested…] [Tag: L1, Text: All the hotels…] [Tag: XYZ, Text: Contains…] [Label: All the Hotels, Url: Base: Text: These are all…]

34 Query: list elements containing “ticket” doc := “ [tag “UL”/ Select y from y in doc !’ where y’.text ~ “ticket”] [Tag: LI] [Label: Theatres Online, Url: Base: Text: This page contains...] [Tag: UL] [Label: Sports Zone, Url: Base: Text: Sports Zone…] [Tag: XYZ, Text: One of the…] [Tag: XYZ, Text: If you are…] [Tag: XYZ, Text: …]

35 Web restructuring Using these tree operators we have shown how a tree Can be restructured. To restructure a web we must have a function which maps one web to another. The new web has some hypertree as its schema while the browsing function is an extension of the old web ’ s browsing function – targets URLs which were not previously targeted. The way it is done in WebOQL is by using the AS clause.

36 Web restructuring Generally the select clause of WebOQL has the form of: Select q 1 as s 1, q 2 as s 2, …., q n as s n S i can be either the key word schema, or a string query. An as clause which evaluates to schema defines the schema of the web. [Title: y.Group] as schema Title: students Title: professors

37 Web restructuring Generally the select clause of WebOQL has the form of: Select q 1 as s 1, q 2 as s 2, …., q n as s n S i can be either the key word schema, or a string query. An as clause which evaluates to a string defines a page and is treated as the URL for it. [x.Name] as y.Group [Name: moshe][Name: arik] students

38 Web restructuring After a web is created there are two possibilities : either query it further (restructure it) or return it to the host application. If we want to return the web to the host application for the sake of showing it to a browser then we must format the pages in an HTML compliant way. This is easily done by restructuring it using HTML tags as labels.

39 Document restructuring Web documents are a perfect example of semi structured data since they do not have a fixed schema and can have various irregularities. In an HTML document most of the tags may appear any number of times or not at all. WebOQL uses a wrapper which creates abstract syntax trees (AST) from any arbitrary HTML document. This is easily done since the markup tags of HTML reflects the logical relationship between the various information items. Example: item 1. item 2.

40 Generate a web consisting of a page for each research group containing a title and author of all its publications, and an index web page, that lists all the groups and provides links to their pages newWeb Select unique [Name : x.Group, url : x.Group] as schema [y.Title, y.Authors ] as x.Group From x in csPapers, y in x’

41 [Name:… Url:..] [Name: Prog. Lang Url: Prog.Lang..] [Name: Card Punching Url: Card Punching] “As Schema” Card Punching [Titles: Recent… Authors: Smith] [Titles: Arc… Authors: Smith] Prog. Lang. [Titles: Cobol… Authors: James J] [Titles: Assembly Lan Authors: John,..] “As x. group”

42 NewerWeb < newWeb| select [ Tag: “H3”, Text: y.Title ] + [ Tag: “BR”, Text: y.Publication ] + [ Tag: “BR”, Text: y.Authors ] + [ Tag: “P” ] as x.Name from x in schema, y in x.Name | select [ Tag: “H2”, Text: “Publications of the” * x.Name * “ Group” ] + x.Name + [ Tag: “A”, Label: “To Index”, Url: “ of Projects.html” ] as “ * x.Name * “.html” from x in schema

43 | select [ Url: “ of Projects.html” ] as schema, [ Tag: “H2”, Text: “Index of Projects” ] + [ Tag: “UL” / select [ Tag: “LI” / [Tag: “A”, Label: x.Name, Url: “ * x.name * “.html” ] from x in schema ] as “ of Projects.html

44 Index of Projects Card Punching Programming Languages ….. Index Page

45 Publications of the Card Punching group recent Discoveries in Card Punching Technical Report TROIS Peter Smith, John Brown Are Magnetic Media Better ? ACM TOCP Vol 3 No. (1942) pp.2337 Peter Smith, John Brown To index Group Pages

46 Document restructuring Navigation patterns: In the examples we have seen the variables used in the queries ranged over simple trees of the tree we queried, however in the WWW variables may range over several linked sub trees whose structure is not fully known to us. ^ - record predicate which is true for every internal arc. [Tag= “ H2 ” ] - record predicate which is true for every arc which has an ‘ H2 ’ tag. select [x.text] from x in “someone’s.html” via ^*[Tag = “H2”]

47 Document restructuring Navigation patterns: In the examples we have seen the variables used in the queries ranged over simple trees of the tree we queried, however in the WWW variables may range over several linked sub trees whose structure is not fully known to us. select [x.text] from x in “someone’s.html” via >*[not(Tag = “H2”)] > - record predicate which is true for every external arc. [not(Tag= “ H2 ” )] - record predicate which is true for every arc which does not have an ‘ H2 ’ tag.

48 Document restructuring Navigation patterns: When navigation patterns are omitted then they query is treated as if there was a navigation pattern which always evaluated to true. Variables are instantiated in left to right depth-first or breadth-first search. Since the default is breadth- first to use depth-first the key word viadfs is used instead of via.

49 Navigation Pattern [Not (Tag = “A”)]* - Path of any length composed of arcs not having an attribute tag with value “A”. [Tag = “LI”] [Tag = “A”] – path of length 2 ^*> - all paths in a tree that lead from root to an external arc Select [x.url] from x in “ Via [not (tag = “Table”)]*> All the external arcs in the document pointed to by the “http”……” that do not occur within a table

50 Select [x.url,x.text] From x in “ Via (^*[Labled “Next’’]>)* What this query will produce?

51 Tail Variables Variable in upper case iterates over tails plus simple trees

52 [Tag: H3, Text: Price…] [Tag: H3, Text: Price…] [Tag: UL] [Tag: LI] Select X ! & From X in via ^* [Tag = “H3”] Where X!.Tag=“UL” and X.Text ~ “Price”

53 [Tag: H2, Text: Publications of the] [Tag: H3, Text:] [Tag: BR, Text:] [Tag: BR, Text: y] [Tag: P, Text: ] [Tag: P, Text: ] [Label: To index, Url: Base: Text: indexofprojects] [Tag: H3, Text:] [Tag: BR, Text:] [Tag: BR, Text: y] Tree generated by Query [Tag: “OL”/Select [Tag: “LI” / X&3] from X in where X.tag = “H3” [Tag: LI] [Tag: H3] [Tag: OL]

54 [Tag: “OL”/Select [Tag: “LI”/ Select y from y in X while not y.Tag=“p”] From X in where X.tag = “H3” ]

55 Project webselect [x.proj name, x.proj descr] as “projects [x.emp name, x.emp phone] as “people” [x.emp name] as “x.proj name” [x.proj name] as “x.emp name” From x in “SQLDb. Select proj name, emp name, emp phone, proj descr from proj, emp, works in where Emp.id = worksIn.empid and proj.id = worksIn.projId;” Generate a web containing a page for each project, a page for each person and two index pages, listing all the projects and all the people, a person’s page contains pointers to the Projects in which he /she is involved and a project page contains pointers to the pages or the people involved in it.

56 [Label: Full Version, Url: Base: Text: 1k098k79…] [Tag: UL, Text: Recent…] [Tag: LI, Text: Are Magnetic…] [Tag: BR, Text: ] [Tag: H2, Text: Card Punching…] [Tag: H2, Text: Programming…] [Tag: H1, Text: Publications of Research…] [Tag: UL, Text: …] [Tag: H2, Text: Databases…] [Tag: UL, Text: Cobol in AI Sam James…] [Tag: LI, Text: Recent…] [Tag: LI, Text: Cobol in…] [Tag: LI, Text: Assembly for…] …. [Label: Abstract, Url: Base: Text: Are Magnetic Media…] [Tag: BR, Text: ] [Tag: BR, Text: ACM TOCP Vol. 3 No. (1942) pp 23-37] [Tag: B, Text: Peter Smith…] [Tag: BR, Text: ] [Tag: XYZ, Text: Are Magnetic] …. [Tag: CITE, Text: Are Magnetic…]

57 Select [Title: y”.Text, Author: y”!!.text] From x in “ y in x’ Where x.Tag = UL Query - Retrieve titles and authors of each paper x range over simple trees and y over elements under UL

58 Select [title: y”.Text, authors: y”!!.text, Publications: y”!3.Text ps-url: y’!4.url abstract-url:y’!!.url] as “pubsdb: insert” From X in y in X!’ Where X.tag = “H2”

59 [Tag: H1, Text: Reports in …] [Tag: HR, Text:] [Tag: BR, Text:] [Tag: BR, Text: ] [Tag: P, Text: ] [Tag: P, Text: ] [Tag: CITE, Text:Efficient] [Tag: BR, Text:] [Tag: H2, Text: David Rice] [Tag: CITE, Text: Indexing] [Label: Indexing Sound, Url: Base: Text: ;sd..sGhj&9870….] [Tag: XYZ, Text:CS-TR ] [Label:Abstract Available Online, Url: Base: Text: Indexing Sound….] [Label: Efficient Clustering…., Url: Base: Text:.fHjs*9))fujs…….] [Tag: XYZ, Text:CS-TR ] [Label:Temporal Constraints, Url: Base: Text: ;+-9ivm27&813nd….] [Tag: XYZ, Text:CS-TR ] [Tag: HR, Text:] [Tag: H2, Text: John Smith] …

60 Select [title: Y.text author: X.text publications: Y!!.Text PS-Url: Y’:Url abstract-url:Y!4.Url ] as “PubsDb: insert” From X in “ Y in X! while not (Y.Tag = “HR”) where X.Tag = “H2” and Y.Tag=“CITE”

61 [Label: Full Version, Url: Base: Text: #hH6YiaP….] [Tag: UL, Text: Recent…] [Tag: LI, Text: Are Magnetic…] [Tag: BR, Text: ] [Tag: H2, Text: Card Punching…] [Tag: H2, Text: Programming…] [Tag: H1, Text: Publications of Research…] [Tag: UL, Text: …] [Tag: H2, Text: Databases…] [Tag: UL, Text: Cobol in AI Sam James…] [Tag: LI, Text: Recent…] [Tag: LI, Text: Cobol in…] [Tag: LI, Text: Assembly for…] …. [Label: Abstract, Url: Base: Text: It is company…] [Tag: BR, Text: ] [Tag: BR, Text: Technical…….] [Tag: B, Text: Peter Brown…] [Tag: BR, Text: ] [Tag: XYZ, Text: Recent….] Figure 5.6 Instantiation of Variables in Query 4 …. [Tag: CITE, Text: Recent…] X y y’ y”

62 Query 4: csPapersselect[Group: X.Text / select[Title: y”.Text, Authors: y”!!.Text, Publication:y”!3.Text / [Label: “Abstract”,Url:y’!!.Url]+ [Label: “Full Version”,Url:y’!4.Url] ] from y in X!’ ] from X in “ where X.Tag = “H2”

63 Architecture API Wrapper Manager Wrapper DBMS File System Web1Web k Query Engine... query URLtree web

64 Parser Rules Each node corresponds to either a subdocument enclosed in an occurrence of a paired tag. For example, root node corresponds to the subdocument enclosed between and or to a subdocument enclosed in an occurrence of a non-paired tag and the tag that follows it Arcs leading to nodes corresponding to the tag and for which the protocol of the associated URL is http are external. All other arcs are internal.

65 The incoming arc to a node contains the attributes of the subdocument represented by this node. Internal arcs are labeled with a record containing two fields: Tag and Text. Tag is the HTML tag corresponding to the subtree that is the destination of the arc. The value of the Text depends on whether Tag is paired or non-paired. If paired, then the value of the text is the text that is enclosed between and excluding markups. If Tag is non-paired, the the value of text is the text between and the tag that comes after it in document.

66 External arcs are labeled with a record containing four fields, label, url, base and text. Label is the label of the hyperlink, the text enclosed between and the tags; url is the value of the href attribute, base is the url of the document being processed and Text is the text of the referred document excluding markup. A dummy tag named is used to enclose pieces of text that are not explicitly tagged. Rules are applied recursively to the text inside occurrences of paired tags.

67 Publications of Research Groups at Cs Dept Card Punching Recent Advances in Card Punching> Peter Smith, John Brown Technical Report TR015 Abstract

68 Full version Are magnetic Media Better? Peter Smith, John Brown, Tom ACM TOCP Vol. 3, No., pp Abstract Full version Programming lang