Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vijayshankar Raman, CS294-7, Spring 1999 1 Querying the WWW Alberto O. Mendelzon George A. Mihaila Tova Milo.

Similar presentations


Presentation on theme: "Vijayshankar Raman, CS294-7, Spring 1999 1 Querying the WWW Alberto O. Mendelzon George A. Mihaila Tova Milo."— Presentation transcript:

1 Vijayshankar Raman, CS294-7, Spring 1999 1 Querying the WWW Alberto O. Mendelzon George A. Mihaila Tova Milo

2 Vijayshankar Raman, CS294-7, Spring 1999 2 Scenarios... §Find about PCs from IBM query: +IBM +“personal computer” +price l can we restrict search to www.ibm.com ? §Find a good music store l should I ask yahoo or hotbot or lycos or … ? §Find pages about databases within 2 links from Joe’s webpage §Find recent web pages with title “Bob’s Music Store”

3 Vijayshankar Raman, CS294-7, Spring 1999 3 Problems §Queries don’t exploit structure of data §Queries don’t exploit link topology of data §Source selection hard l different search engines have different functionalities, idiosyncratic behaviour l different search engines good at different tasks

4 Vijayshankar Raman, CS294-7, Spring 1999 4 Outline §Motivation §WebSQL §Nuts and Bolts §Query Locality §Good, Bad and Ugly

5 Vijayshankar Raman, CS294-7, Spring 1999 5 WebSQL  Integrate structure/topology constraints with textual retrieval §Virtual graph model of document network §Need to combine navigation and querying §Query Language that utilizes document’s structure and can accept constraints on link topology

6 Vijayshankar Raman, CS294-7, Spring 1999 6 Data Model  Relational §Each web object is a tuple in a Document l {url, title, text, type, length, modification info} §Hyperlinks are tuples in Anchor l {base, href, label} interior links ( )within same document local links ( ) within same server global ( ) across servers

7 Vijayshankar Raman, CS294-7, Spring 1999 7 Examples §SELECT x.url, x.title, y.url, y.title FROM Document x SUCH THAT x MENTIONS “Computer Science”, Document y SUCH THAT x = y -- docs within 2 links from something on CS. §SELECT d.url, d.title FROM Document d SUCH THAT “http://www.cs.toronto.edu” = d WHERE d.title CONTAINS “database”; -- docs within 2 links of CS homepage. MENTIONS: search engine, CONTAINS: checked locally

8 Vijayshankar Raman, CS294-7, Spring 1999 8 More examples  from Toronto from Toronto  Job Opportunities for Software Engineers SELECT e.url FROM Document d SUCH THAT d MENTIONS "Career Opportunities", Document e SUCH THAT d = | -> e WHERE e.text CONTAINS "Software Engineer”; this query is useful, but...

9 Vijayshankar Raman, CS294-7, Spring 1999 9 Outline §Motivation §WebSQL §Nuts and Bolts §Query Locality §Good, Bad and Ugly

10 Vijayshankar Raman, CS294-7, Spring 1999 10 Nuts and bolts §SELECT Fields(x1, x2, …, xn) FROM Obj x1 SUCH THAT A1 Obj x2 SUCH THAT A2 … WHERE Condition(x1, x2, … xn) § nested loops join algorithm: for all x1 such that A1 is true for all x2 such that A2 is true …

11 Vijayshankar Raman, CS294-7, Spring 1999 11 §each atomic condition A1 … Am is of form l Path( from_node, path_expression, to_node) x5 = | (->*) x7 enumerate links to check these l NodePredicate(node) CONTAINS “Bob’s Coffee Place” (x5) query a “customizable set of known” search engines §what queries are computable? l those that don’t have to explore the entire web l “safe” queries: every variable must be either directly solvable in some atomic condition, OR directly derivable from another in some atomic condition

12 Vijayshankar Raman, CS294-7, Spring 1999 12 Query Locality §distinguish between access to local and remote documents §model communication cost of a query based on l “expected” number of results from search engines l “expected” size of documents l “expected” number of exterior, interior, remote links per document l “expected” cost of network access §can identify potentially expensive components of a query and warn user

13 Vijayshankar Raman, CS294-7, Spring 1999 13 The Good §Idea of using structure in answering queries §topologies can be useful, with a better interface... §can be used for link maintenancelink maintenance

14 Vijayshankar Raman, CS294-7, Spring 1999 14 The Bad §Too complicated (especially syntax) l easy to write queries that explore the entire web. §does end user care for topology constraint, besides domain constraint? §Remote accesses cause huge slow down l check topology constraints at search engine? §availability

15 Vijayshankar Raman, CS294-7, Spring 1999 15 The Ugly §How to avoid back links? §Fuzzy queries l find me “good”, “inexpensive” Chilean restaurants that are “close by”

16 Vijayshankar Raman, CS294-7, Spring 1999 16 Issues §What kinds of path based queries are useful, intuitive? §How to check the path constraints at the search engine? §Can hypertext links be viewed as yet another kind of link in a semi-structured model

17 Vijayshankar Raman, CS294-7, Spring 1999 17 Other Work §Other, generic intra-document structure can be useful §Topology, structure can be used by system (instead of by end user) l use links to determine quality of site content l authority sites -- find www.harvard.edu for query on harvard l classification -- Cha-ChaCha-Cha §Store links at search engine for proximity searches l can generalize to arbitrary links in a directed graph model --- Goldman et. al ’98 l get “see also” info


Download ppt "Vijayshankar Raman, CS294-7, Spring 1999 1 Querying the WWW Alberto O. Mendelzon George A. Mihaila Tova Milo."

Similar presentations


Ads by Google