Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.

Slides:



Advertisements
Similar presentations
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Advertisements

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.
SQL Group Members: Shijun Shen Xia Tang Sixin Qiang.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
Rutgers University Relational Algebra 198:541 Rutgers University.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
1 IDAR 2007 Emiran Curtmola A Platform for Efficient Full-Text SEARCH on the Web.
Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Join-Queries between two Spatial Datasets Indexed by a Single R*-tree Michael Vassilakopoulos.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Querying Structured Text in an XML Database By Xuemei Luo.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
1 The Role of Document Structure in Querying, Scoring and Evaluating XML Full-Text Search Sihem Amer-Yahia AT&T Labs Research - USA Database Department.
Gökay Burak AKKUŞ Ece AKSU XRANK XRANK: Ranked Keyword Search over XML Documents Ece AKSU Gökay Burak AKKUŞ.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
2 September 2005VLDB Tutorial on XML Full-Text Search XML Full-Text Search: Challenges and Opportunities Jayavel Shanmugasundaram Cornell University Sihem.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Visualization Four groups Design pattern for information visualization
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
ZStream: A Cost-based Query Processor for Adaptively Detecting Composite Events Presented by Rabia Nuray-Turan and Xiaozhi Yu.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
XML Databases Presented By: Pardeep MT15042 Anurag Goel MT15006.
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra 1.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Structure and Content Scoring for XML
MCN: A New Semantics Towards Effective XML Keyword Search
Structure and Content Scoring for XML
Chen Li Information and Computer Science
Relax and Adapt: Computing Top-k Matches to XPath Queries
Introduction to XML IR XML Group.
Presentation transcript:

Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University of California San Diego Alin Deutsch - University of California San Diego

SIGMOD, June Introduction Need for complex full-text predicates beyond simple keyword search Library of Congress (LoC) Biomedical data ACM, IEEE publications INEX data collection Wikipedia XML data set

SIGMOD, June XML real fragment from LoC Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc

SIGMOD, June Query with complex FT predicates Document fragments (nodes) that contain the keywords “Jefferson” and “education” and satisfy the predicates  within a window of 10 words,  with “Jefferson” ordered before “education”

SIGMOD, June Example: LoC document Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc

SIGMOD, June Example: LoC document Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc Return document fragments Naive solution: test the query at each node → redundant Need for efficient evaluation of full-text predicates  use structural relationship between nodes  avoid redundant computation

SIGMOD, June Existing languages Many XML full-text search languages  expressive power, semantics, scores [BAS-06] XQFT-class W3C’s XQuery Full-Text (XQFT), NEXI, XIRQL, JuruXML, XSearch, XRank, XKSearch, Schema Free XQuery Efficient query evaluation limited to  Conjunctive keyword search (no predicates)  Full-text predicates in isolation Need for a universal optimization framework  Guarantee the universality of the solution

SIGMOD, June Contributions Formal semantics for XQFT-class  Unified framework  Capture family of tf*idf scoring methods Structure-aware algorithms to efficiently evaluate XQFT-class languages  XFT full-text algebra  Enable new optimizations inspired by relational rewritings

SIGMOD, June Talk Outline Motivation & Contributions Formalization of XML full-text search Efficient evaluation Experiments Conclusion

SIGMOD, June Formalization: design goals Capture existing full-text languages Language semantics in terms of  keyword patterns  pattern matches  predicates evaluated through matches Manipulate tuples  enable relational query evaluation and rewritings

SIGMOD, June Formalization: patterns Pattern = tuple of simultaneously matching keywords Query expression: “Jefferson” and “education”  within a window of 10 words,  with “Jefferson” ordered before “education” Pattern (“Jefferson”, “education”)

SIGMOD, June Formalization: patterns Formalization specifies  patterns ← conjunction of keywords  set of patterns ← disjunction of keywords  exclusion patterns ← negation of keywords No matches in the document

SIGMOD, June Formalization: matches Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc “Jefferson”, “education” (22, 3)

SIGMOD, June Formalization: matches Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc “Jefferson”, “education” (22, 3) (22, 45)

SIGMOD, June Formalization: matches Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc “Jefferson”, “education” (22, 3) (22, 45) (22, 67)

SIGMOD, June Formalization: matches Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc “Jefferson”, “education” (22, 3) (22, 45) (22, 67) (51, 3) …

SIGMOD, June Formalization: matching tables Matching table represents  Nested relation  Each node in the document  Each pattern in the query  Set of matches

SIGMOD, June Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc Formalization: matching tables NodePatternMatches action“Jefferson”, “education”(28, 45) (51, 45) ………

SIGMOD, June XFT Algebra Similar to relational algebra  Manipulate matching tables  Leverage relational query evaluation + optimization techniques XFT operators  construct matching table R k for each keyword k get (k)  manipulate matching tables R 1 or R 2 R 1 and R 2 R 1 minus R 2 σ times (R), σ ordered (R), σ window (R), σ distance (R)

SIGMOD, June XFT Algebra Query: Nodes that contain the keywords “Jefferson” and “education”  within a window of 10 words,  with “Jefferson” ordered before “education” × Benefit: equivalent query rewritings

SIGMOD, June Talk Outline Motivation & Contributions Formalization of XML full-text search Efficient evaluation Experiments Conclusion

SIGMOD, June Query evaluation: AllNodes Straightforward implementation of the XFT algebra Each node is considered separately  Each tuple is self-contained Relational-style evaluation  Joins → equi-joins  Predicates → selections on set of matches 5

SIGMOD, June Example: LoC document Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc

SIGMOD, June NodePatternMatches 1“Jefferson”22, 28, 51, 54, “Jefferson” “Jefferson”22 1.2“Jefferson”28, “Jefferson” “Jefferson”51 1.3“Jefferson”54, “Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1“education”3, 45, “education” “education”3 1.2“education” “education” “education”45 1.3“education” “education”67 ×

SIGMOD, June NodePatternMatches 1“Jefferson”22, 28, 51, 54, “Jefferson” “Jefferson”22 1.2“Jefferson”28, “Jefferson” “Jefferson”51 1.3“Jefferson”54, “Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1“education”3, 45, “education” “education”3 1.2“education” “education” “education”45 1.3“education” “education”67 × NodePatternMatches 1“Jefferson”, “education”(22,45), (72,67) … 1.1“Jefferson”, “education”(22, 3) 1.2“Jefferson”, “education”(28, 45), (51, 45) 1.2.2“Jefferson”, “education”(51, 45) “Jefferson”, “education”(51, 45) 1.3“Jefferson”, “education”(54, 67), (72, 67) 1.3.2“Jefferson”, “education”(72, 67)

SIGMOD, June NodePatternMatches 1“Jefferson”22, 28, 51, 54, “Jefferson” “Jefferson”22 1.2“Jefferson”28, “Jefferson” “Jefferson”51 1.3“Jefferson”54, “Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1“education”3, 45, “education” “education”3 1.2“education” “education” “education”45 1.3“education” “education”67 × NodePatternMatches 1“Jefferson”, “education”(22,45), (72,67) … 1.1“Jefferson”, “education”(22, 3) 1.2“Jefferson”, “education”(28, 45), (51, 45) 1.2.2“Jefferson”, “education”(51, 45) “Jefferson”, “education”(51, 45) 1.3“Jefferson”, “education”(54, 67), (72, 67) 1.3.2“Jefferson”, “education”(72, 67) Predicate operates one tuple at a time

SIGMOD, June Example: LoC document Congress on education and workforce, comments to appropriate services. 109th Mr Column and co-sponsors Mrs Miller and Mrs Jones. Others include Jefferson on May 2, 2004 Joe Jefferson introduced the following bill. The bill was reintroduced later and was referred to the committee on education and workforce sponsored by Joe Jefferson House of Representatives Current chamber on workforce and services. Committees on education are headed by Jefferson Jefferson and services … HR2739 committee-name action-desc bill congress-info nbrsponsors action legis-session legis legis-body legis-desc

SIGMOD, June Query evaluation: SCU AllNodes = straightforward algorithm Reduce size of intermediate results  structural relationships between nodes  avoid redundant match representation SCU = Smallest Containing Unit 5

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1“Jefferson”22, 28, 51, 54, “Jefferson” “Jefferson”22 1.2“Jefferson”28, “Jefferson” “Jefferson”51 1.3“Jefferson”54, “Jefferson” “Jefferson” “Jefferson”72 Matching tables → SCU tables → captures same information

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1.1.1“education” “education” “education”67 ×

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1.1.1“education” “education” “education”67 NodePatternMatches “Jefferson”, “education”(51, 45) 1.3.2“Jefferson”, “education”(72, 67) × Equi-join does not work Need to compute LCA

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1.1.1“education” “education” “education”67 NodePatternMatches 1.1“Jefferson”, “education”(22, 3) “Jefferson”, “education”(51, 45) 1.2“Jefferson”, “education”(28, 45) 1.3.2“Jefferson”, “education”(72, 67) 1.3“Jefferson”, “education”(54, 67) 1“Jefferson”, “education”(22, 45) … × 1.1 is the LCA of and 1.1.1

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1.1.1“education” “education” “education”67 × NodePatternMatches 1.2“Jefferson”, “education”(28, 45) 1.3“Jefferson”, “education”(54, 67) 1“Jefferson”, “education”(22, 45) … NodePatternMatches EMPTY !!! NodePatternMatches 1.1“Jefferson”, “education”(22, 3) “Jefferson”, “education”(51, 45) 1.2“Jefferson”, “education”(28, 45) 1.3.2“Jefferson”, “education”(72, 67) 1.3“Jefferson”, “education”(54, 67) 1“Jefferson”, “education”(22, 45) …

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1.1.1“education” “education” “education”67 NodePatternMatches 1.1“Jefferson”, “education”(22, 3) “Jefferson”, “education”(51, 45) 1.2“Jefferson”, “education”(28, 45) 1.3.2“Jefferson”, “education”(72, 67) 1.3“Jefferson”, “education”(54, 67) 1“Jefferson”, “education”(22, 45) … ×

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1.1.1“education” “education” “education”67 NodePatternMatches 1.1“Jefferson”, “education”(22, 3) “Jefferson”, “education”(51, 45) 1.2“Jefferson”, “education”(28, 45) 1.3.2“Jefferson”, “education”(72, 67) 1.3“Jefferson”, “education”(54, 67) 1“Jefferson”, “education”(22, 45) … × NodePatternMatches 1.3“Jefferson”, “education”(54, 67) 1“Jefferson”, “education” (22, 45) …

SIGMOD, June NodePatternMatches 1.1.3“Jefferson” “Jefferson”51 1.2“Jefferson” “Jefferson” “Jefferson”72 NodePatternMatches 1.1.1“education” “education” “education”67 NodePatternMatches 1.1“Jefferson”, “education”(22, 3) “Jefferson”, “education”(51, 45) 1.2“Jefferson”, “education”(28, 45) 1.3.2“Jefferson”, “education”(72, 67) 1.3“Jefferson”, “education”(54, 67) 1“Jefferson”, “education”(22, 45) … × NodePatternMatches 1.3“Jefferson”, “education”(54, 67) (72, 67) 1“Jefferson”, “education” (22, 45) … Postorder Stack supports single scan

SIGMOD, June SCU summary Equivalent to AllNodes Structure-awareness reduces size of intermediate results Increase computation cost  Compute LCAs of nodes  Match propagation Stack-based techniques 5

SIGMOD, June Related work on LCA for XML LCA for conjunctive keyword search  XRank [GSBS-03]  Schema-free XQuery [LYJ-04]  XKSearch [XP-05] Shortcomings  No postprocessing, not compositional Input in document order Output postorder traversal  Support for complex predicates is not straightforward

SIGMOD, June Talk Outline Motivation & Contributions Formalization of XML full-text search Efficient evaluation Experiments Conclusion

SIGMOD, June Experimental goals AllNodes vs. SCU  AllNodes: redundant representation  SCU: smaller sizes, more computation SCU Overhead  Stack  Match propagation Benefit of Rewritings  Relational-style rewritings

SIGMOD, June Experimental setup Centrino 1.8GHz with 1GB of RAM XMark generated datasets  Size ranges from 50 MB – 300 MB

SIGMOD, June Experiments: AllNodes vs. SCU Varying document size (q1 - query without predicates) q1 = get (“See”) and get (“internationally”) and get (“description”) and get (“charges”) and get (“ship”)

SIGMOD, June Queries  q4 = σ window>1(“See”, “internationally”, “description”, “charges”, “ship”) (q1)  q5 = σ window> (“See”, “internationally”, “description”, “charges”, “ship”) (q1) Recall that  q1 = get (“See”) and get (“internationally”) and get (“description”) and get (“charges”) and get (“ship”) Experiments: SCU Overhead

SIGMOD, June Experiments: SCU Overhead q4 always true → no match propagation, just the stack overhead q5 always false → propagate all matches Varying query predicates (not pushed)

SIGMOD, June Queries  q2 = σ orderedE(“See”, “internationally”, “description”, “charges”, “ship”) (q1)  q3 = push selections in q2 Recall that  q1 = get (“See”) and get (“internationally”) and get (“description”) and get (“charges”) and get (“ship”) Experiments: Benefit of Rewritings

SIGMOD, June Experiments: Benefit of Rewritings Varying document size (query with predicates) 40% improvement for relational-like query rewritings

SIGMOD, June Conclusion A unified logical framework for XML full-text search languages Algebra admits  Efficient algorithms for operator evaluation  Rewritings of queries into more efficient forms  Facilitate XML joint optimizations of queries on both structure and text search Future work  Score-aware logical framework

SIGMOD, June Thank you! 5