Vijayshankar Raman, CS294-7, Spring 1999 1 Querying the WWW Alberto O. Mendelzon George A. Mihaila Tova Milo.

Slides:



Advertisements
Similar presentations
Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Advertisements

1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
$200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $500 $100 Category One Category Two Category.
Relative and Absolute Relative Absolute.  In web-page design, a hyperlink (or link) is a reference to a document that the reader can directly follow,
A Mobile World Wide Web Search Engine Wen-Chen Hu Department of Computer Science University of North Dakota Grand Forks, ND
CM143 - Web Week 2 Basic HTML. Links and Image Tags.
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
WHAT HAVE WE DONE SO FAR?  Weeks 1 – 8 : various components of an information retrieval system  Now – look at various examples of information retrieval.
The Internet & The World Wide Web Notes
By: Bihu Malhotra 10DD.   A global network which is able to connect to the millions of computers around the world.  Their connectivity makes it easier.
Lecturer: Ghadah Aldehim
IT 210 The Internet & World Wide Web introduction.
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
Lecturer: Ghadah Aldehim
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
2013Dr. Ali Rodan 1 Handout 1 Fundamentals of the Internet.
Internet Technology I د. محمد البرواني. Project Number 3 Computer crimes in the cybernet Computer crimes in the cybernet Privacy in the cybernet Privacy.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
Internet Fundamentals Total Advantage MS Excel 97, Hutchinson, Coulthard, 1998 McGraw Introduction to HTML Chapter 7.
The World Wide Web (abbreviated as WWW or W3 and commonly known as the Web) is a system of interlinked hypertext documents accessed via the Internet.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
1/28: The Internet & Website Design What is the Internet? –Parts of the Internet –Internet & WWW basics –Searching the WWW Website design considerations.
HTML ~ Web Design.
Objective Understand concepts used to web-based digital media. Course Weight : 5%
Web Engineering we define Web Engineering as follows: 1) Web Engineering is the application of systematic and proven approaches (concepts, methods, techniques,
The WWW as a Database: WWW Query Languages Curtis Dyreson James Cook University ( Townsville, Australia ) Aalborg University.
Unit 15 Webpage Creator. Outlines Introduction Starter Listening Language Work Work study Speaking Writing.
CIS 250 Advanced Computer Applications Internet/WWW Review.
Web software. Two types of web software Browser software – used to search for and view websites. Web development software – used to create webpages/websites.
Internet Research Tips Daniel Fack. Internet Research Tips The internet is a self publishing medium. It must be be analyzed for appropriateness of research.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
Modeling and Querying Web Data A Survey By Li Lu.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
● A system of Internet servers that support specially formatted documents. The documents are formatted in a markup language called HTML. What is the World.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Internet and WWW. Internet Network linking computers to other computers Access to numerous resources – Communications systems Instant messaging.
CSC 9010 Spring, Paula Matuszek. 1 CS 9010: Semantic Web Applications and Ontology Engineering Paula Matuszek Spring, 2006.
NATIONAL AGENCY FOR EDUCATION Check the Source! - Web Evaluation
Web Server.
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004.
Services of the Formal Digital Library (FDL) NA-MKM 2004 January 6, 2004 Lori Lorigo, Cornell University.
The World Wide Web: Information Resource. How a Search Engine works… How Search Works - YouTube
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
The Internet. The Internet and Systems that Use It Internet –A group of computer networks that encircle the entire globe –Began in 1969 Protocol –Language.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
By: The Immigrants :D I mean the Mexican and the Colombian I mean Daniel and Felipe.
Information Networks. Internet It is a global system of interconnected computer networks that link several billion devices worldwide. It is an international.
The World Wide Web.
What is the Internet? © EIT, Author Gay Robertson, 2016.
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
Some Common Terms The Internet is a network of computers spanning the globe. It is also called the World Wide Web. World Wide Web It is a collection of.
Web software.
UNIT 15 Webpage Creator.
Electronic Communication
Search Engines & Subject Directories
Web Design & Development
Web Mining Department of Computer Science and Engg.
A worldwide system of interconnected computer networks.
WEBSQL -University of Toronto
Presentation transcript:

Vijayshankar Raman, CS294-7, Spring Querying the WWW Alberto O. Mendelzon George A. Mihaila Tova Milo

Vijayshankar Raman, CS294-7, Spring Scenarios... §Find about PCs from IBM query: +IBM +“personal computer” +price l can we restrict search to ? §Find a good music store l should I ask yahoo or hotbot or lycos or … ? §Find pages about databases within 2 links from Joe’s webpage §Find recent web pages with title “Bob’s Music Store”

Vijayshankar Raman, CS294-7, Spring Problems §Queries don’t exploit structure of data §Queries don’t exploit link topology of data §Source selection hard l different search engines have different functionalities, idiosyncratic behaviour l different search engines good at different tasks

Vijayshankar Raman, CS294-7, Spring Outline §Motivation §WebSQL §Nuts and Bolts §Query Locality §Good, Bad and Ugly

Vijayshankar Raman, CS294-7, Spring WebSQL  Integrate structure/topology constraints with textual retrieval §Virtual graph model of document network §Need to combine navigation and querying §Query Language that utilizes document’s structure and can accept constraints on link topology

Vijayshankar Raman, CS294-7, Spring Data Model  Relational §Each web object is a tuple in a Document l {url, title, text, type, length, modification info} §Hyperlinks are tuples in Anchor l {base, href, label} interior links ( )within same document local links ( ) within same server global ( ) across servers

Vijayshankar Raman, CS294-7, Spring Examples §SELECT x.url, x.title, y.url, y.title FROM Document x SUCH THAT x MENTIONS “Computer Science”, Document y SUCH THAT x = y -- docs within 2 links from something on CS. §SELECT d.url, d.title FROM Document d SUCH THAT “ = d WHERE d.title CONTAINS “database”; -- docs within 2 links of CS homepage. MENTIONS: search engine, CONTAINS: checked locally

Vijayshankar Raman, CS294-7, Spring More examples  from Toronto from Toronto  Job Opportunities for Software Engineers SELECT e.url FROM Document d SUCH THAT d MENTIONS "Career Opportunities", Document e SUCH THAT d = | -> e WHERE e.text CONTAINS "Software Engineer”; this query is useful, but...

Vijayshankar Raman, CS294-7, Spring Outline §Motivation §WebSQL §Nuts and Bolts §Query Locality §Good, Bad and Ugly

Vijayshankar Raman, CS294-7, Spring Nuts and bolts §SELECT Fields(x1, x2, …, xn) FROM Obj x1 SUCH THAT A1 Obj x2 SUCH THAT A2 … WHERE Condition(x1, x2, … xn) § nested loops join algorithm: for all x1 such that A1 is true for all x2 such that A2 is true …

Vijayshankar Raman, CS294-7, Spring §each atomic condition A1 … Am is of form l Path( from_node, path_expression, to_node) x5 = | (->*) x7 enumerate links to check these l NodePredicate(node) CONTAINS “Bob’s Coffee Place” (x5) query a “customizable set of known” search engines §what queries are computable? l those that don’t have to explore the entire web l “safe” queries: every variable must be either directly solvable in some atomic condition, OR directly derivable from another in some atomic condition

Vijayshankar Raman, CS294-7, Spring Query Locality §distinguish between access to local and remote documents §model communication cost of a query based on l “expected” number of results from search engines l “expected” size of documents l “expected” number of exterior, interior, remote links per document l “expected” cost of network access §can identify potentially expensive components of a query and warn user

Vijayshankar Raman, CS294-7, Spring The Good §Idea of using structure in answering queries §topologies can be useful, with a better interface... §can be used for link maintenancelink maintenance

Vijayshankar Raman, CS294-7, Spring The Bad §Too complicated (especially syntax) l easy to write queries that explore the entire web. §does end user care for topology constraint, besides domain constraint? §Remote accesses cause huge slow down l check topology constraints at search engine? §availability

Vijayshankar Raman, CS294-7, Spring The Ugly §How to avoid back links? §Fuzzy queries l find me “good”, “inexpensive” Chilean restaurants that are “close by”

Vijayshankar Raman, CS294-7, Spring Issues §What kinds of path based queries are useful, intuitive? §How to check the path constraints at the search engine? §Can hypertext links be viewed as yet another kind of link in a semi-structured model

Vijayshankar Raman, CS294-7, Spring Other Work §Other, generic intra-document structure can be useful §Topology, structure can be used by system (instead of by end user) l use links to determine quality of site content l authority sites -- find for query on harvard l classification -- Cha-ChaCha-Cha §Store links at search engine for proximity searches l can generalize to arbitrary links in a directed graph model --- Goldman et. al ’98 l get “see also” info