WEBSQL -University of Toronto

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

Basic Internet Terms Digital Design. Arpanet The first Internet prototype created in 1965 by the Department of Defense.
Chapter 16 The World Wide Web.
4.01 How Web Pages Work.
Web development  World Wide Web (web) is the Internet system for hypertext linking.  A hypertext document (web page) is an online document. It contains.
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
The Internet & The World Wide Web Notes
UNDERSTANDING WEB AND WEB PROJECT PLANNING AND DESIGNING AND EFFECTIVE WEBSITE Garni Dadaian.
Slide 1 Today you will: think about criteria for judging a website understand that an effective website will match the needs and interests of users use.
Chapter 16 The World Wide Web. 2 Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Write basic HTML.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
2013Dr. Ali Rodan 1 Handout 1 Fundamentals of the Internet.
Internet Basics Dr. Norm Friesen June 22, Questions What is the Internet? What is the Web? How are they different? How do they work? How do they.
Internet Fundamentals Total Advantage MS Excel 97, Hutchinson, Coulthard, 1998 McGraw Introduction to HTML Chapter 7.
Web Design (5) Navigation (1). Creating a new website called ‘Navigation’ In Windows Explorer, open the folder “CU3A Web Design Group”; and then the sub-folder.
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
Vijayshankar Raman, CS294-7, Spring Querying the WWW Alberto O. Mendelzon George A. Mihaila Tova Milo.
1/28: The Internet & Website Design What is the Internet? –Parts of the Internet –Internet & WWW basics –Searching the WWW Website design considerations.
HTML ~ Web Design.
Web software. Two types of web software Browser software – used to search for and view websites. Web development software – used to create webpages/websites.
What does WWW stand for? And following abbreviations? HTTP: Hyper Text Transfer Protocol HTML: Hyper Text Mark-up Language URL: Uniform Resource Locator.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
● A system of Internet servers that support specially formatted documents. The documents are formatted in a markup language called HTML. What is the World.
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Web Server.
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Our MP3 Search Engine Crawler –Searching for Artist Name –Searching for Song Title Website Difficulties Looking Back.
1 More About HTML Images and Links. 22 Objectives You will be able to Include images in your HTML page. Create links to other pages on your HTML page.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Week-6 (Lecture-1) Publishing and Browsing the Web: Publishing: 1. upload the following items on the web Google documents Spreadsheets Presentations drawings.
The Internet. The Internet and Systems that Use It Internet –A group of computer networks that encircle the entire globe –Began in 1969 Protocol –Language.
General Architecture of Retrieval Systems 1Adrienn Skrop.
The Web Web Design. 3.2 The Web Focus on Reading Main Ideas A URL is an address that identifies a specific Web page. Web browsers have varying capabilities.
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
4.01 How Web Pages Work.
The World Wide Web.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Distributed Control and Measurement via the Internet
Warm Handshake with Websites, Servers and Web Servers:
Evolution of Internet.
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
Web software.
A Brief Introduction to the Internet
Computer Communication & Networks
What is the Internet? Global system of interconnected computer networks – a network of networks! Hartland Consolidated Schools network Network at your.
browser search engine web page
Information Retrieval
Web Page Concept and Design :
Unit# 5: Internet and Worldwide Web
Web Mining Department of Computer Science and Engg.
Chapter 16 The World Wide Web.
Planning and Storyboarding a Web Site
INFS 230 L Internet Technology
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Presentation transcript:

WEBSQL -University of Toronto 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Scenarios... Find about PCs from IBM query: +IBM +“personal computer” +price can we restrict search to www.ibm.com ? Find a good music store should I ask yahoo or hotbot or lycos or … ? Find pages about databases within 2 links from Joe’s webpage Find recent web pages with title “Bob’s Music Store” 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Problems Queries don’t exploit structure of data Queries don’t exploit link topology of data Source selection hard different search engines have different functionalities, idiosyncratic behaviour different search engines good at different tasks 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria WebSQL Integrate structure/topology constraints with textual retrieval Virtual graph model of document network Need to combine navigation and querying Query Language that utilizes document’s structure and can accept constraints on link topology 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria WebSQL Model web as relational database Use two relations Document and Anchor Document relation has one tuple for each document in the web and the anchor relation has one tuple for each anchor in each document 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria WebSQL SQL-like query language for extracting information from the web. Capable of systematic processing of either all the links in a page, all the pages that can be reached from a given URL through paths that match a pattern, or a combination of both. Provides transparent access to index servers 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Data Model Relational Each web object is a tuple in a Document {url, title, text, type, length, modification info} Hyperlinks are tuples in Anchor {base, href, label} interior links ( )within same document local links ( ) within same server global ( ) across servers 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Document 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Anchor 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria 5/28/2019 Copy-right@sanjay-madria

Find all the pairs of URLs of documents with the same title: SELECT d1.url, d2.url FROM Document d1, Document d2 WHERE d.title = d2.title AND NOT (d1.url = d2.url) This is not possible as there is no way to enumerate all documents. 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria SELECT d1.url, d2.url FROM Document d1 SUCH THAT d1 MENTIONS "something interesting", Document d2 SUCH THAT d2 MENTIONS "something interesting" WHERE d.title = d2.title AND NOT (d1.url = d2.url) 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Retrieves the title and the URL of all the documents that are pointed to from the document whose URL is ``http://www.somewhere.com'' and that reside in the same server SELECT d.url, d.title FROM Document d SUCH THAT "http://www.somewhere.com" -> d 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Regular exp Meaning -> -> => -> | => ->* => ->* = | #> | -> Path of length three composed of two local links followed by one global link Path of length one, either local or global Local paths of any length Path composed of one global link followed by any number of local links Local paths of length zero or one 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Search for pages related to databases in the web site of the Department of Computer Science of the University of Toronto: SELECT d.url FROM Document d SUCH THAT "http://www.cs.toronto.edu" ->* d, WHERE d.text CONTAINS "database" OR d.title CONTAINS "database" 5/28/2019 Copy-right@sanjay-madria

Find Employment job opportunities for software engineers SELECT d1.url, d1.title, d2.url. d2.title FROM Document d1 SUCH THAT d1 MENTIONS "employment job opportunities", Document d2 SUCH THAT d1 =|->|->-> d2 WHERE d2.text CONTAINS "software engineer" 5/28/2019 Copy-right@sanjay-madria

Find the pages describing the publications of some research group SELECT a1.href, d2.title FROM Document d1 SUCH THAT "http://www.university.edu/~group" ->* d1, Anchor a1 SUCH THAT base = d1, Document d2 SUCH THAT a1.href -> d2, WHERE a1.label CONTAINS "papers" 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria SELECT d1.url, d1.title FROM Document d1 SUCH THAT "http://www.university.edu/~group" ->* d1, Anchor a1 SUCH THAT base = d1, WHERE filename(a1.href) CONTAINS "ps.gz" OR filename(a1.href) CONTAINS "ps.Z";, 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria The Labels of all Hyperlinks to Postscript Files SELECT a.label FROM Anchor a SUCH THAT base = "http://www.SomeDoc.html" WHERE a.href CONTAINS ".ps.Z"; Documents about Databases SELECT d.url, d.title FROM Document d SUCH THAT "http://www.OtherDoc.html" ->|=> d WHERE d.title CONTAINS "databases"; 5/28/2019 Copy-right@sanjay-madria

User-defined link types Find documents from a set of documents mention the word ``Canada'' DEFINE LINK [next] AS label CONTAINS "Next"; SELECT d.url FROM Document d SUCH THAT "http://the.starting.document" [next]* d, WHERE d.title CONTAINS "Canada"; 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Defining the Content of a Full-text Index Restrict a search in such a way that only links that point to documents that are deeper in a hierarchy are traversed DEFINE LINK [Deeper] AS server(href) = server(base) AND path(href) CONTAINS path(base); SELECT d.url, d.text FROM Document d SUCH THAT "http://the.document.to.test" [Deeper]* d; 5/28/2019 Copy-right@sanjay-madria

Finding Broken Links in a Page SELECT a.href FROM Anchor a SUCH THAT base = "http://the.document.to.test" WHERE protocol(a.href) = "http" AND doc(a.href) = null; 5/28/2019 Copy-right@sanjay-madria

Finding all the Missing Images SELECT d.url, a.href FROM Document d SUCH THAT "http://the.document.to.test" ->* d, Anchor a SUCH THAT base = d WHERE protocol(a.href) = "http" AND doc(a.href) = null AND file(a.href) CONTAINS ".gif"; 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria If you are about to delete a page from a web, you may be interested in knowing which are the pages that refer to it, thus avoiding potential broken links. The following query finds such pages: SELECT d.url FROM Document d SUCH THAT "http://the.starting.doc" ->* d, Anchor a SUCH THAT base = d WHERE a.href = "http://the.next.deleted.doc"; 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Finding References from Documents in Other Servers Assume you have a page with some links tp pages in other sites and you want to know if your site is referenced from those pages or from pages referenced by them. SELECT d.url FROM Document d SUCH THAT "http://the.starting.doc" ->* d, document d1 such that d=>|->=>d1 Anchor a SUCH THAT base = d1 WHERE a.href = “your server"; 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Finding References to Documents in Other Servers With a query similar to the previous one, you can find all the references to documents in other servers: SELECT a.href FROM Document d SUCH THAT "http://the.starting.doc" ->* d, Anchor a SUCH THAT base = d WHERE NOT server(a.href ) = server(d.url); 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria Find all HTML documents about “hypertext” SELECT d.url, d.title, d.length, d.modif FROM document d SUCH THAT d mentions “hypertext” WHERE d.type =“text”/html” Find all links to applets from documents about java SELECT y.lebel, y.href FROM document x SUCH THAT x MENTIONS “java” ANCHOR y SUCH THAT base = x WHERE y.label CONTAINS “applet” 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria The Good Idea of using structure in answering queries topologies can be useful Can be used for Link maintenance 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria The Bad Too complicated (especially syntax) Easy to write queries that explore the entire web. Does end user care for topology constraint, besides domain constraint? Remote accesses cause huge slow down Check topology constraints at search engine? Availability 5/28/2019 Copy-right@sanjay-madria

Copy-right@sanjay-madria The Ugly How to avoid back links? Fuzzy queries find me “good”, “inexpensive” Chilean restaurants that are “close by” 5/28/2019 Copy-right@sanjay-madria