Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.

Slides:

Advertisements

Similar presentations

Ting Chen, Jiaheng Lu, Tok Wang Ling

Advertisements

Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.

Efficient Top-k Search across Heterogeneous XML Data Sources Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Rui Zhou 1 1 Swinburne University of Technology.

Computing Structural Similarity of Source XML Schemas against Domain XML Schema Jianxin Li 1 Chengfei Liu 1 Jeffrey Xu Yu 2 Jixue Liu 3 Guoren Wang 4 Chi.

A General Algorithm for Subtree Similarity-Search The Hebrew University of Jerusalem ICDE 2014, Chicago, USA Sara Cohen, Nerya Or 1.

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Computer Science and Engineering Inverted Linear Quadtree: Efﬁcient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.

13/04/20151 SPARK: Top- k Keyword Query in Relational Database Wei Wang University of New South Wales Australia.

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Efficient Keyword Search for Smallest LCAs in XML Database Yu Xu Department of Computer Science & Engineering University of California, San Diego Yannis.

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,

Representing and Querying Correlated Tuples in Probabilistic Databases

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Reasoning and Identifying Relevant Matches for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.

Suggestion of Promising Result Types for XML Keyword Search Joint work with Jianxin Li, Chengfei Liu and Rui Zhou ( Swinburne University of Technology,

Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.

Flexible and Efficient XML Search with Complex Full-Text Predicates Sihem Amer-Yahia - AT&T Labs Research → Yahoo! Research Emiran Curtmola - University.

COMP630 Paper Presentation by Haomian(Eric) Wang.

XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.

Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University.

Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.

1 Flexible Querying of XML Documents Krishnaprasad Thirunarayan and Trivikram Immaneni Department of Computer Science and Engineering Wright State University.

TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.

LCA -Based Selection for XML Document Collections Georgia Koloniari joint work with Evaggelia Pitoura Department of Computer Science University of Ioannina,

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.

FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.

QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.

Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.

1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.

Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree

Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.

From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.

Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu.

Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA.

Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.

1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB

XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.

1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.

Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,

Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.

1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.

Efficient processing of path query with not-predicates on XML data

Probabilistic Data Management

Preference Query Evaluation Over Expensive Attributes

Pyramid Sketch: a Sketch Framework

Structure and Content Scoring for XML

MCN: A New Semantics Towards Effective XML Keyword Search

Structure and Content Scoring for XML

Efficient Processing of Top-k Spatial Preference Queries

Wei Wang University of New South Wales, Australia

Relax and Adapt: Computing Top-k Matches to XPath Queries

Introduction to XML IR XML Group.

Presentation transcript:

Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search over Probabilistic XML Data ICDE 2011, Hannover, Germany, April, 2011 The 27 th IEEE International Conference on Data Engineering

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 2 Outline Introduction Problem and Challenge Our solution Experiments Conclusions

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 3 Outline Introduction  Keyword search on deterministic XML  Probabilistic XML Problem and Challenge Our solution Experiments Conclusions

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 4 Keyword search on deterministic XML Why keyword search on XML (or structured data)?  most popular way of searching information  no need to learn complex structured query languages  no need or difficult to know the underlying schema / content LCA (Lowest Common Ancestors) based approaches  SLCA (Smallest LCA) [Xu and Papakonstantinou SIGMOD05]  ELCA (Exclusive LCA) [XRank - Guo et al. SIGMOD03, Xu and Papakonstantinou EDBT08, Zhou et al. EDBT10 ]  Other LCA variants, some impose conditions on the LCA nodes or refine the returned fragments, e.g., XSEarch – interconnection [Cohen et al. VLDB03], MLCA [Y Li et al. VLDB2004], XSeek [Liu and Chen SIGMOD07], CVLCA [G Li et al. CIKM07], etc. r x2x2 a1a1 a2a2 x1x1 x3x3 x4x4 a3a3 b1b1 b2b2 b3b3 b4b4

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 5 Probabilistic XML and models Uncertain data can be obtained from everywhere, e.g., information extraction, NLP, data cleaning, data integration Many raw data comes from web, natural to use XML Simple dependencies of the data easily captured by parent-child relationship A popular model  PrXML {ind, mux} (first proposed as ProTDB [Nierman and Jagadish VLDB02])  ind  independent; mux  muturally-exclusive Other probabilistic XML models  PrXML C, where C is a subset of {ind, mux, det, exp, cie} [Abiteboul et al. VLDB Journal 09]  det  deterministic; exp  explicit ; cie  conjunction of independent events

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 6 A p-document r IND MUX C2C2 C3C3 C1C1 IND D2D2 D1D1 E1E1 E2E2 B1B1 B2B2 C4C4 B3B3 C5C5 MUX B5B5 B4B4 C6C

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 7 Possible world semantics C1C1 MUX IND D2D2 D1D1 E1E1 E2E C1C1 D1D1 C1C1 E1E1 C1C1 IND D2D2 E2E C1C1 D2D2 C1C1 E2E2 C1C1 D2D2 E2E = 0.1*0.7*(1-0.9) 0.063= 0.1*0.7* = 0.1*(1-0.7)*0.9 C1C = 0.1*(1-0.7)*(1-0.9) C1C1 0.1

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 8 Find info. from a p-document An important issue: how to query a p-document  Twig queries: [Kimelfeld et al. VLDB Journal 09], [Chang et al. EDBT 09]  Keyword queries (search): open, and the focus of this work;

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 9 Outline Introduction Problem and Challenges  Keyword Search on Probabilistic XML Our solution Experiments Conclusions

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 10 Keyword search on probabilistic XML The setup:  A p-document encodes a large number of possible worlds. We should always avoid generating all possible worlds from the p-document. Some questions:  First, what is the semantics of keyword search on a p- document?  Second, can we use any traditional method?  Third, if not, what shall we do?

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 11 Semantics of keyword search on a p-document We model the results of a keyword query on a p- document T as:  A set of 2-tuples (v, f)  v is an ordinary node in T  f is the probability (confidence) for v to be an SLCA in all possible worlds The results are defined on possible worlds, but we will attempt to avoid generating possible worlds when computing the results.

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 12 Can we use any traditional method? To find traditional SLCAs on a p-document naively will bring in trouble:  Distributional nodes are not answers;  MUX semantics does not allow two branches coexist;  An SLCA’s parent may also be SLCA in some possible worlds;  Many nodes can be SLCAs, so we may need answers with top-k probabilities; (ranking the 2-tuples (v,f) by the confidence field f)

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 13 Outline Introduction Problem and Challenge Our solution  How to compute SLCA probability of a node  Two top-k search algorithms Experiments Conclusions

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 14 Computing SLCA Probability of a Node Idea:  IND nodes and MUX nodes No need to compute;  Ordinary nodes Use keyword distribution probabilities of the child nodes Introduce this first!

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 15 keyword distribution probabilities Keyword distribution probabilities (local probabilities)  For each node v in a p-document, we have a table recording the probabilities of keyword distribution under v. For leaf nodes  One field ‘1’, the others ‘0’ For Internal nodes  Computed in a bottom-up way {}{k 1 }{k 2 }{k 1,k 2 } p0p1p2p3 {}{k 1 }{k 2 }{k 1,k 2 } 0100

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 16 Ordinary node p (ordinary node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } p0 = 0.1 * 0.2 = 0.02 {}{k 1 }{k 2 }{k 1,k 2 } p0p1p2p3

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 17 Ordinary node p (ordinary node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } p0p1p2p3 p1 = 0.1 * * * 0.4 = 0.22

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 18 Ordinary node p (ordinary node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } p0p1p2p3 p2 = 0.1 * * * 0.3 = 0.28

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 19 Ordinary node p (ordinary node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } p0p1p2p3 p3 = 1 – p0 – p1 – p2 = 0.48 p3 = … = 0.48 or

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 20 IND node (similar to Ordinary node) p (IND node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } λ1λ1 λ2λ2 {}{k 1 }{k 2 }{k 1,k 2 } 0.1*λ1 + 1-λ10.3*λ10.5*λ10.1*λ1 {}{k 1 }{k 2 }{k 1,k 2 } 0.2*λ2 + 1-λ20.4*λ20.3*λ20.1*λ2 p (IND node) c1c1 c2c2 1 1

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 21 MUX node p (MUX node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } {}{k 1 }{k 2 }{k 1,k 2 } λ1λ1 λ2λ2 {}{k 1 }{k 2 }{k 1,k 2 } 0.1*λ10.3*λ10.5*λ10.1*λ1 {}{k 1 }{k 2 }{k 1,k 2 } 0.2*λ20.4*λ20.3*λ20.1*λ2 p (MUX node) c1c1 c2c2 1 1

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 22 MUX node p (MUX node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } p0p1p2p3 1 1 {}{k 1 }{k 2 }{k 1,k 2 } 0.1*λ10.3*λ10.5*λ10.1*λ1 {}{k 1 }{k 2 }{k 1,k 2 } 0.2*λ20.4*λ20.3*λ20.1*λ2 p0 = 0.1*λ *λ λ1 -λ2 p1 = 0.3*λ *λ2 p2 = 0.5*λ *λ2 p3 = 0.1*λ *λ2

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 23 Progressive computing with multiple children p (IND or MUX or Ordinary) c1c1 c3c3 λ1λ1 λ3λ3 c2c2 λ2λ2 c1c1 c3c3 c2c2 p c1c1 c3c3 c2c2 p Intermediate result final result

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 24 Computing SLCA Probability of a Node p (ordinary node) c1c1 c2c2 {}{k 1 }{k 2 }{k 1,k 2 } Idea: using keyword distribution probability of the children {}{k 1 }{k 2 }{k 1,k 2 } P slca (p) =0.3* *0.4 = 0.29 Assume we have got these

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 25 Computing SLCA Probability Progressively c1c1 c3c3 c2c2 p c1c1 c3c3 c2c2 p Intermediate result – the same as keyword distribution probabilities plus an extra field final result The SLCA Probability of node p may be non-zero

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 26 Algorithms Integrate the SLCA Probability Computation into Algorithms  First algorithm: PrStack algorithm, stack-based, scans all keyword inverted list once  Second algorithm: EagerTopK algorithm, applies some important pruning strategies

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 27 PrStack algorithm Scan all keyword inverted list once in document order Use extended Dewey code Computing a node’s SLCA probability when the node is popped from a stack A node’s SLCA probability is computed after all its children have been processed

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 28 EagerTopK Algorithm First, find traditional SLCAs (disregarding node types) using the algorithm in [Xu and Papakonstantinou SIGMOD 05]; Then, start from these initial SLCAs, trace up towards the root, pick out ordinary nodes as SLCAs, and compute their probabilities; Several upper bounds are used for pruning

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 29 Pruning Properties Pruning a chain of nodes  Property 1 (IND and Ordinary nodes)  Property 2 (MUX nodes)  Property 3 (All types, looser than Property 1 and 2) Pruning a single node  Property 4 (Ordinary nodes)

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 30 Pruning a chain of nodes Property 1 (IND and Ordinary nodes) p (IND node or Ordinary nodes) c1c1 c2c2 r ≤ C1 exists and contains all the keywords

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 31 Pruning a chain of nodes Property 2 (MUX nodes) p (MUX node) c1c1 c2c2 r ≤ C1 exists and contains all the keywords

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 32 Pruning a chain of nodes Property 3 (all types of nodes, d i are descendants of p) p (all types of nodes) d1d1 d2d2 r ≤ d1, d2 are descendants of p c ≤

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 33 Pruning a single node Property 4 (ordinary nodes) p (Ordinary node) c1c1 c2c2 r Local SLCA probability of node p Existence probability of node p

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 34 Outline Introduction Problem and Challenge Our solution Experiments Conclusions

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 35 Experiments Experimental setup  Intel P4 3.0GHz CPU, 2G RAM, Win XP System, Java  Datasets: DBLP (large and shallow), Mondial (deep, complex and small) and XMark (tuneable deep and size)  Insert distributional nodes randomly into test datasets using the same method as [Kimelfeld et al. SIGMOD 08]  Keyword queries are randomly selected according to different datasets.

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 36 Test datasets Keyword queries

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 37 Time cost and Memory cost

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 38 Varying k

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 39 Varying document size 4 XMark datasets size from 10MB to 80MB, top k=10,

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 40 Outline Introduction Problem and Challenge Our solution Experiments Conclusions

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 41 Conclusions Study: keyword search on probabilistic XML data Contributions:  Result semantics for keyword search on a probabilistic XML document: SLCA semantics on a p-document  SLCA Probability Computation without generating possible worlds  Algorithms PrStack: easy to implement EagerTopK: faster to give top-k answers using a few upper bounds  Experiments conducted

Top-k Keyword Search over Probabilistic XML Data, Jianxin Li, Chengfei Liu, Rui Zhou, Wei Wang 42 Thanks! & Questions?