指導教授：陳良弼老師報告者：鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,

Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )

Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.

Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,

Cleaning Uncertain Data with Quality Guarantees Dr. Reynold Cheng Department of Computer Science The University of Hong Kong

PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.

Cleaning Uncertain Data for Top-k Queries Luyi Mo, Reynold Cheng, Xiang Li, David Cheung, Xuan Yang The University of Hong Kong {lymo, ckcheng, xli, dcheung,

Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.

LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:

Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Progress Report on Continuous Data Stream Management  Mining Frequent Itemsets over Data Streams  Music Virtual Channel Presented by: Dr. Yi-Hung Wu.

Research Work For This Year Investigate queries with “GROUP BY” or “COUNT” over P2P network –E.g. Count the number of copies having some requested keyword.

Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.

1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.

Continuous Data Stream Processing

Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.

Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]

On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Presented by Xi Zhang Feburary 8 th, 2008.

Continuous Processing of Preference Queries in Data Streams : a Survey

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung

Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.

Spatial-Temporal Models in Location Prediction Jingjing Wang 03/29/12.

Monte Carlo Simulation of Photon Migration 報告者：郭雅涵指導教授：蘇文鈺老師

Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Richa Varshney.

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.

K-Hit Query: Top-k Query Processing with Probabilistic Utility Function SIGMOD2015 Peng Peng, Raymond C.-W. Wong CSE, HKUST 1.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,

Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.

Query Processing over Incomplete Autonomous Databases Presented By Garrett Wolf, Hemal Khatri, Bhaumik Chokshi, Jianchun Fan, Yi Chen, Subbarao Kambhampati.

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.

Retroactive Answering of Search Queries Beverly Yang Glen Jeh.

Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.

Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.

D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications 陳良弼 Arbee L.P. Chen National Chengchi University 9/21/2012 at NCHU.

Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.

A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB

Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue.

Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.

A presentation to El Paso del Norte Software Association

Probabilistic Data Management

Query in Streaming Environment

Lecture 16: Probabilistic Databases

Tuning the top-k view update process

A Framework for Clustering Evolving Data Streams

A Fast Algorithm for Subspace Clustering by Pattern Similarity

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Section 11.7 Probability.

Uncertain Data Mobile Group 报告人：郝兴.

Efficient Processing of Top-k Spatial Preference Queries

Presentation transcript:

指導教授：陳良弼老師報告者：鄧雅文

 Introduction  Related Work  Problem Formulation  Future Work

 Top-k query on certain data ◦ Rank results according to a user-defined score ◦ Important for explore large databases ◦ E.g., top-2 = {T 1, T 2 } TIDPIDScore T1T1 A100 T2T2 B90 T3T3 C80 T4T4 D70

 Uncertain database ◦ How to define top-k on uncertain data? ◦ Mutually exclusive rules  E.g., T 1 ♁ T 4 TIDPIDScorePr. T1T1 A T2T2 B900.9 T3T3 C800.6 T4T4 A700.8 …………

 C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, ◦ Causes:  Sensor networks, privacy, trajectories prediction… ◦ The main areas of research on the uncertain data:  Modeling of uncertain data  Uncertain data management  Top-k query, range query, NN query…  Uncertain data mining  Clustering, classification, frequent pattern, outliers…

 M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, ◦ Possible Worlds

◦ U-Topk query  Return k tuples that can co-exist in a possible world with the highest probability  E.g., {T 1, T 2 } as U-Top2 ◦ U-kRanks query  Return k tuples each of which is a clear winner in its rank over all possible worlds  E.g., {T 2, T 6 } as U-2Ranks

s 1,1 = {t1} p = 0.4  U-Topk s 2,2 = {t1, t2} p = 0.28 s 1,2 = {t2} p = 0.42 s 2,3 = {t2, t5} p = s 0,1 = {} p = 0.6 s 0,2 = {} p = 0.18 s 1,3 = {t2} p = s 1,2 = {t1} p = 0.12 s 0,0 = {} p = 1 1 t1: t2: t5: 0.6 Storage Layer buffer: probability priority queue Complete! return {t1, t2} as top-2 Find U-Top2 query answer.

 U-kRanks i=1i=2 {} 1 {} 0.6 {} 0.18 Find U-2Ranks query answer. answer: ubound: 11 Storage Layer Report: t1: 0.4 {t1} 0.4 P t1,1 = 0.4 t t2: 0.7 {t2} P t2,1 = 0.42 t top1: t2(0.42) top1top2 {t1} 0.12 {t1, t2} P t2,2 = 0.28 t t5: 0.6 {} {t5} {t1} {t1, t5} {t2} {t2, t5} P t5,2 = t t6: 1 {} 0 {t6} {t1} 0 {t2} 0 {t5} 0 {t1, t6} {t2, t6} {t5, t6} P t6,2 = t top2: t6(0.324)

 M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, ◦ PT-k query  Return a set of all tuples whose top-k probability values are at least p  E.g., {T 1, T 2, T 5 } as PT-2 (with p=0.4)

 C. Jin, K. Yi, L. Chen, J. Yu, X. Lin. Sliding- Window Top-k Queries on Uncertain Streams. In VLDB, ◦ Applicable to those definitions of top-k above ◦ Maintain compact sets  A compact set of the window guarantees that tuples not in this compact set would not be the top-k answer of this window ◦ Both time- and space-efficient

 T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, ◦ The tradeoff between reporting high-scoring tuples and tuples with a high probability of being in the top-k ◦ Return a number of typical vectors that efficiently sample the distribution of all potential top-k tuple vectors

 Example: ◦ In an International Tenpin Bowling Championship, the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.

◦ U-Top3={T 2, T 5, T 6 } ◦ But U-Top2={T 1, T 2 }, U-Top1={T 1 } ◦ How about also considering {T 1, T 2, T 5 } as top-3? TIDPlayerPr. T1T1 A T2T2 D T3T3 B T4T4 C T5T5 C T6T6 B T7T7 D T8T8 A Possible WorldPr.Possible WorldPr. PW1T1, T2, T3, T PW9T2, T3, T4, T PW2T1, T2, T3, T PW10T2, T3, T5, T PW3T1, T2, T4, T PW11T2, T4, T6, T PW4T1, T2, T5, T PW12T2, T5, T6, T PW5T1, T3, T4, T PW13T3, T4, T7, T PW6T1, T3, T5, T PW14T3, T5, T7, T PW7T1, T4, T6, T PW15T4, T6, T7, T PW8T1, T5, T6, T PW16T5, T6, T7, T

 We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C). ◦ Confidence: to express the top-(k-1) probabilities of the sets formed by k-1 tuples of this possible top-k answer  E.g., k=3 {T1, T2, T3} as a possible top-k with P= C is composed in some way of Pr({T1, T2}) to be top-2= and its confidence, Pr({T1, T3}) to be top-2= and its confidence, Pr({T2, T3}) to be top-2= and its confidence

 Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set. ◦ E.g., {T 1, T 3, T 5 }: P=0.8, C=0.4 {T 1, T 4, T 7 }: P=0.5, C=0.7 {T 2, T 6, T 7 }: P=0.3, C=0.2  this will not be returned

 Formulate the confidence function  Find an algorithm to generate the result set  Try to calculate the confidence in an efficient way  Carry out an empirical study on datasets