AQAX: Approximate Query Answering for XML Josh Spiegel, M. Pontikakis, S. Budalakoti, N. Polyzotis Univ. of California Santa Cruz.

Slides:



Advertisements
Similar presentations
Dynamic Sample Selection for Approximate Query Processing Brian Babcock Stanford University Surajit Chaudhuri Microsoft Research Gautam Das Microsoft Research.
Advertisements

Metadata in Carrot II Current metadata –TF.IDF for both documents and collections –Full-text index –Metadata are transferred between different nodes Potential.
Md. Mahbub Hasan University of California, Riverside.
The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
Kaushik Chakrabarti(Univ Of Illinois) Minos Garofalakis(Bell Labs) Rajeev Rastogi(Bell Labs) Kyuseok Shim(KAIST and AITrc) Presented at 26 th VLDB Conference,
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Fast Algorithms For Hierarchical Range Histogram Constructions
STHoles: A Multidimensional Workload-Aware Histogram Nicolas Bruno* Columbia University Luis Gravano* Columbia University Surajit Chaudhuri Microsoft Research.
Linked Bernoulli Synopses Sampling Along Foreign Keys Rainer Gemulla, Philipp Rösch, Wolfgang Lehner Technische Universität Dresden Faculty of Computer.
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Icicles  Icicle Maintenance  Icicle-Based Estimators  Quality & Performance  Conclusion.
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications Ashraf Aboulnaga Alaa R. Alameldeen Jeffrey F. Naughton Computer Sciences.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,
1 BA 555 Practical Business Analysis Housekeeping Review of Statistics Exploring Data Sampling Distribution of a Statistic Confidence Interval Estimation.
Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.
Approximate XML Query Answers Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas)
ADVISE: Advanced Digital Video Information Segmentation Engine
Progressive Approximate Aggregate Queries with a Multi-Resolution Tree Structure Iosif Lazaridis, Sharad Mehrotra University of California, Irvine SIGMOD.
1998/5/21by Chang I-Ning1 ImageRover: A Content-Based Image Browser for the World Wide Web Introduction Approach Image Collection Subsystem Image Query.
Liang Jin and Chen Li VLDB’2005 Supported by NSF CAREER Award IIS Selectivity Estimation for Fuzzy String Predicates in Large Data Sets.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Dependency-Based Histogram Synopses for High-dimensional Data Amol Deshpande, UC Berkeley Minos Garofalakis, Bell Labs Rajeev Rastogi, Bell Labs.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Approximate XML Query Answers Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Bell Labs) Yannis Ioannidis (U. of Athens, Hellas)
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Query Optimization Allison Griffin. Importance of Optimization Time is money Queries are faster Helps everyone who uses the server Solution to speed lies.
1 Chapter Seven Introduction to Sampling Distributions Section 1 Sampling Distribution.
S. B. Roy, S. A.-Yahia, A. Chawla, G. Das, and C. Yu SIGMOD 2010 Constructing and Exploring Composite Items 2011/4/14 1.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
XPathLearner: An On-Line Self- Tuning Markov Histogram for XML Path Selectivity Estimation Authors: Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
A Novel Approach for Approximate Aggregations Over Arrays SSDBM 2015 June 29 th, San Diego, California 1 Yi Wang, Yu Su, Gagan Agrawal The Ohio State University.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries Surajit Chaudhuri Gautam Das Vivek Narasayya Presented by Sushanth.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Histograms for Selectivity Estimation
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
1 Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration Fangjiao Jiang Renmin University of China Joint work with Weiyi Meng.
Presented By Anirban Maiti Chandrashekar Vijayarenu
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP BY QUERIES Swaroop Acharya,Philip B Gibbons, VishwanathPoosala By Agasthya Padisala Anusha Reddy.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Histograms for Selectivity Estimation, Part II Speaker: Ho Wai Shing Global Optimization of Histograms.
Chen Li Department of Computer Science Joint work with Liang Jin, Nick Koudas, Anthony Tung, and Rares Vernica Answering Approximate Queries Efficiently.
XCluster Synopses for Structured XML Content Alkis Polyzotis (UC Santa Cruz) Minos Garofalakis (Intel Research, Berkeley)
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
University of Texas at Arlington Presented By Srikanth Vadada Fall CSE rd Sep 2010 Dynamic Sample Selection for Approximate Query Processing.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
A paper on Join Synopses for Approximate Query Answering
Structure and Value Synopses for XML Data Graphs
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Probabilistic Data Management
StreamApprox Approximate Stream Analytics in Apache Spark
StreamApprox Approximate Computing for Stream Analytics
Communication and Memory Efficient Parallel Decision Tree Construction
Structure and Content Scoring for XML
Structure and Content Scoring for XML
Presentation transcript:

AQAX: Approximate Query Answering for XML Josh Spiegel, M. Pontikakis, S. Budalakoti, N. Polyzotis Univ. of California Santa Cruz

Motivation for AQAX XML Data Synopsis query Q Answer Approximate Answer query Q Efficient to compute Expensive ‣ Goal: provide feedback for ad-hoc queries The user can see “previews” of results The user may not need the exact answer ‣ Key for on-line exploration of large XML data sets

System Architecture 4.The client receives the result synopsis and aggregates it in a more manageable working synopsis Visual model: XCluster 4.The user explores the answers by manipulating the working synopsis 2.The server processes the query and generates an approximate answer Summarization model: XCluster 2.The answer is returned to the client in the form of a result synopsis 1.The user formulates a query and selects a level of precision

User Interface Visual Query Approximate Result Result Exploration Result Statistics Synopsis Selection Query Manipulation

XCluster: Synopsis Model ‣ Structural information: node- and edge-counts Node-count: number of elements in cluster Edge-count: average number of children ‣ Value information: value-distribution summaries XClusterDATA

‣ Elements include values of different types ‣ Similarly, queries involve predicates of different types Content Heterogeneity 2003 The history of histograms (abridged) Yannis Ioannidis The history of histograms is long and rich, full of detailed information in every step. It includes... //paper[year>2000][author contains “Ioannidis”]// abstract[ftcontains histograms,history] Numerical String Text RangeSubstring Term Containment

Content Summarization “The history of histograms is long and rich, full of detailed information in every step. It...” TermFreq 0 (history)2 1 (histogram)7 2 (data)6 3 (database)5 4 (information)3 5 (value)2 BucketFreq /3 TextTerm MatrixTerm Histogram ‣ Types of value summaries: Numerical content: Histograms String Content: Pruned Suffix Tries Text Content: End-biased Term Histograms ‣ XCluster provides a unified framework for structure and heterogeneous value content

Query Evaluation XCluster Result XCluster Query ‣ Efficient computation in a single depth-first pass ‣ Result synopsis has bounded size |Q|*|S| ‣ Accuracy depends on “tightness” of nodes A node is tight if it contains elements that have similar structure and values

Synopsis Construction XML Data Reference Summary XCluster with detailed value distributions XCluster ç± Stage 1Stage 2Stage 3 ‣ Goal: cluster elements in nodes according to their structural and value-based similarity ‣ 3-stage build process: Step 1: Construct reference synopsis (fine grained clustering) Step 2: Cluster based on structure and values Step 3: Build value summaries ‣ Use of novel heuristics to determine a good clustering of elements

XCLUSTER: Visual Model ‣ XCluster is used as the visual model for result presentation ‣ The initial view is the coarsest XCluster summary of the answer ‣ The user can explore the results by manipulating the synopsis Structural operations: split/merge nodes Value operations: render distributions, compute aggregates MergeSplit Render/Aggregate

Value Approximation ‣ Error is close to 10% with a modest space budget ‣ Quick convergence to small errors ‣ XCluster captures well the correlation b/w structure and values ‣ Workload: 1000 random twig queries, categorized by type of value predicate ‣ Estimation error is measured as avg. absolute relative error (lower is better)

Structure Approximation ‣ Considerable improvement compared to previous techniques ‣ XCluster captures well the correlations within the XML structure ‣ Workload: 1000 random structure-only twig queries ‣ Error is measured as element simulation distance (lower is better)

References ‣ Approximate XML Query Answers N. Polyzotis, M. Garofalakis, Y. Ioannidis In Proceedings of SIGMOD, 2004 ‣ XCluster Synopses for Structured XML Content N. Polyzotis, M. Garofalakis In Proceedings of ICDE, 2006