Presentation is loading. Please wait.

Presentation is loading. Please wait.

Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University.

Similar presentations


Presentation on theme: "Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University."— Presentation transcript:

1 Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University

2 10000 Foot View of Data Management Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems The Great Data Divide The Great Query Divide

3 Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs Option 3: Design a new data management system from the ground-up –Example: Quark data management system

4 Why Option 1 Wont Work Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems

5 Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM –Drawback: Not powerful enough Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs Option 3: Design a new data management system from the ground-up –Example: Quark data management system

6 XML and Information Retrieval: A SIGIR 2000 Workshop David Carmel, Yoelle Maarek, Aya Soffer XQL and Proximal Nodes Ricardo Baeza-Yates Gonzalo Navarro We consider the recently proposed language … Searching on structured text is becoming more important with XML … … … Find relevant elements in important workshops between the years 1999 and 2001 that are about ‘Ricardo’ and ‘XML’

7 Why Extending (R)DBMSs Won’t Work Violates many assumptions “hardwired” into current database systems Structured queries over structured fields, keyword search queries over text fields –Is author name a structured or text field? Operators have precise, well-defined semantics –Even the query result is not well-defined – do we return a paper or a workshop? Scoring is an attribute tacked on as a relational attribute –How can this scoring generalize IR scoring?

8 Why Extending IR Systems Won’t Work IR systems provide little support for structured data No support for complex operators –How can complex queries be evaluated? Scoring does not take structure into account –How can scoring capture both structured and unstructured data?

9 Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM –Drawback: Not powerful enough Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs –Drawback: Shoehorns alien functionality into already complex systems Option 3: Design a new data management system from the ground-up –Example: Quark data management system

10 Why Option 3 Will Work Designed ground-up with three principles Structural data independence –Users can issues any query (complex and keyword) over any data (structured and unstructured) Generalized scoring –Scoring works over any mix of structured and unstructured data (e.g., XRank over HTML and XML) Flexible query language –Allows for arbitrary return results and scores (e.g., TeXQuery, precursor to XQuery Full-Text, NEXI)

11 Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM –Drawback: Not powerful enough Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs –Drawback: Shoehorns alien functionality into already complex systems Option 3: Design a new data management system from the ground-up –Example: Quark data management system –Most promising alternative!


Download ppt "Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University."

Similar presentations


Ads by Google