Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University.

Slides:



Advertisements
Similar presentations
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Beyond Boolean Queries Ranked retrieval  Thus far, our queries have all been Boolean.  Documents either match or don’t.  Good for expert users with.
Search Engines and Information Retrieval

Information Retrieval and Databases: Synergies and Syntheses IDM Workshop Panel 15 Sep 2003 Jayavel Shanmugasundaram Cornell University.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
1 Statistical correlation analysis in image retrieval Reporter : Erica Li 2004/9/30.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
COMP630 Paper Presentation by Haomian(Eric) Wang.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
CAREER: Towards Unifying Database Systems and Information Retrieval Systems NSF IDM Workshop 10 Oct 2004 Jayavel Shanmugasundaram Cornell University.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Guoliang Li et al.
1 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: Eine Anfragesprache für Information Retrieval in XML- Dokumenten Norbert Fuhr Universität.
Information Retrieval in Practice
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
LOGO XML Keyword Search Refinement 郭青松. Outline  Introduction  Query Refinement in Traditional IR  XML Keyword Query Refinement  My work.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
2 September 2005VLDB Tutorial on XML Full-Text Search XML Full-Text Search: Challenges and Opportunities Jayavel Shanmugasundaram Cornell University Sihem.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
Search Engines and Information Retrieval Chapter 1.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber Max-Planck-Institut für Informatik CIDR 2007)
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
1 The Role of Document Structure in Querying, Scoring and Evaluating XML Full-Text Search Sihem Amer-Yahia AT&T Labs Research - USA Database Department.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Gökay Burak AKKUŞ Ece AKSU XRANK XRANK: Ranked Keyword Search over XML Documents Ece AKSU Gökay Burak AKKUŞ.
Personalizing XML Text Search in Piment Sihem Amer-Yahia AT&T Labs Research - USA Irini Fundulaki Bell Labs - USA Prateek Jain IIT-Kanpur - India Laks.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
2 September 2005VLDB Tutorial on XML Full-Text Search XML Full-Text Search: Challenges and Opportunities Jayavel Shanmugasundaram Cornell University Sihem.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Core Integration Web Services Dean Krafft, Cornell University
Integrating Structured & Unstructured Data. Goals  Identify some applications that have crucial requirement for integration of unstructured and structured.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Modern Information Retrieval Presented by Miss Prattana Chanpolto Faculty of Information Technology.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736.
Structured Text Retrieval Models. Str. Text Retrieval Text Retrieval retrieves documents based on index terms. Observation: Documents have implicit structure.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
One Platform for Mining Structured and Unstructured Data: Dream or Reality? VLDB Panel 13 Sep 2006 Jayavel Shanmugasundaram Yahoo! Research.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
Database Research for the Current Millennium ICDE Panel 1 Apr 2004 Jayavel Shanmugasundaram Cornell University.
Text Search over XML Documents Jayavel Shanmugasundaram Cornell University.
Overview of XML Data Management Research at Cornell Jayavel Shanmugasundaram Cornell University.
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Proposal for Term Project
XRANK: Ranked Keyword Search over XML Documents
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
A research literature search engine with abbreviation recognition
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Evaluation of IR Performance
Structure and Content Scoring for XML
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Introduction to XML IR — Scoring and Ranking XML Group.
MCN: A New Semantics Towards Effective XML Keyword Search
Introduction to Information Retrieval
Structure and Content Scoring for XML
Information Retrieval and Web Design
Information Retrieval and Web Design
Introduction to XML IR XML Group.
Presentation transcript:

Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University

10000 Foot View of Data Management Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems The Great Data Divide The Great Query Divide

Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs Option 3: Design a new data management system from the ground-up –Example: Quark data management system

Why Option 1 Wont Work Structured Unstructured Complex and Structured Ranked Keyword Search Data Queries Database Systems Information Retrieval Systems

Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM –Drawback: Not powerful enough Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs Option 3: Design a new data management system from the ground-up –Example: Quark data management system

XML and Information Retrieval: A SIGIR 2000 Workshop David Carmel, Yoelle Maarek, Aya Soffer XQL and Proximal Nodes Ricardo Baeza-Yates Gonzalo Navarro We consider the recently proposed language … Searching on structured text is becoming more important with XML … … … Find relevant elements in important workshops between the years 1999 and 2001 that are about ‘Ricardo’ and ‘XML’

Why Extending (R)DBMSs Won’t Work Violates many assumptions “hardwired” into current database systems Structured queries over structured fields, keyword search queries over text fields –Is author name a structured or text field? Operators have precise, well-defined semantics –Even the query result is not well-defined – do we return a paper or a workshop? Scoring is an attribute tacked on as a relational attribute –How can this scoring generalize IR scoring?

Why Extending IR Systems Won’t Work IR systems provide little support for structured data No support for complex operators –How can complex queries be evaluated? Scoring does not take structure into account –How can scoring capture both structured and unstructured data?

Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM –Drawback: Not powerful enough Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs –Drawback: Shoehorns alien functionality into already complex systems Option 3: Design a new data management system from the ground-up –Example: Quark data management system

Why Option 3 Will Work Designed ground-up with three principles Structural data independence –Users can issues any query (complex and keyword) over any data (structured and unstructured) Generalized scoring –Scoring works over any mix of structured and unstructured data (e.g., XRank over HTML and XML) Flexible query language –Allows for arbitrary return results and scores (e.g., TeXQuery, precursor to XQuery Full-Text, NEXI)

Bridging the Great Divide Option 1: Tie together existing DB and IR systems –Example: Approaches based on SQL/MM –Drawback: Not powerful enough Option 2: Extend existing DB systems with IR functionality, or vice versa –Example: Add searching and ranking to RDBMSs –Drawback: Shoehorns alien functionality into already complex systems Option 3: Design a new data management system from the ground-up –Example: Quark data management system –Most promising alternative!