Query in Streaming Environment

Slides:

Advertisements

Similar presentations

Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.

Advertisements

Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York.

Indexing DNA Sequences Using q-Grams

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.

指導教授：陳良弼老師報告者：鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.

Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.

Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.

What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.

A survey on stream data mining

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.

Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.

Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.

Detecting Distance-Based Outliers in Streams of Data Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria CIKM 07.

«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.

Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

Exact indexing of Dynamic Time Warping

1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.

August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.

Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.

D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

BLFS: Supporting Fast Editing/Writing for Large- Sized Multimedia Files Seung Wan Jung 1, Seok Young Ko 2, Young Jin Nam 3, Dae-Wha Seo 1, 1 Kyungpook.

Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.

Advanced Algorithms Analysis and Design

Tian Xia and Donghui Zhang Northeastern University

Memory Management.

Updating SF-Tree Speaker: Ho Wai Shing.

The Stream Model Sliding Windows Counting 1’s

Zhu Han University of Houston Thanks for Professor Dan Wang’s slides

Database Management System

Information Retrieval in Practice

Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,

Byung Joon Park, Sung Hee Kim

Chapter 12: Query Processing

Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2

Aziz Nasridinov and Young-Ho Park*

Chapter 15 QUERY EXECUTION.

CARPENTER Find Closed Patterns in Long Biological Datasets

DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS

Objective of This Course

Data Warehousing and Data Mining

Incremental Training of Deep Convolutional Neural Networks

Predicting Traffic Dmitriy Bespalov.

A Framework for Clustering Evolving Data Streams

Probabilistic n-of-N Skyline Computation over Uncertain Data Streams

The Coverage Problem in a Wireless Sensor Network

Efficient Cache-Supported Path Planning on Roads

Survey on Coverage Problems in Wireless Sensor Networks - 2

Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.

Presentation transcript:

Query in Streaming Environment D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang1, Tie Hua Zhou1, Kyung Ah Kim2, Eun Jong Cha2, and Keun Ho Ryu1* 1Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea {smile2867, thzhou, khryu}@dblab.chungbuk.ac.kr 2Department of Biomedical Engineering, Chungbuk National University, Chungbuk, Korea {kimka, ejcha}@chungbuk.ac.kr Abstract. There has been a concerted effort in recent years to build data stream management systems for a specific streaming application. Requirement for the lowest space usage and fast response, the traditional skyline is not suit for streaming data process. Two approaches are proposed to solve this problem, namely D-skyline and T-skyline. These two methods are more excellent to adapt to this kind of data characters that are huge, vary, distributed, and coming in a high-speed rate. Focus on similarity search in streaming environment; our proposed methods give almost "real" results in an approximate way. Keywords: skyline, similarity search, stream processing. 1 Introduction In many stream applications [1, 2, 3], similarity search is more practical than exact match in stream processing, where both query and data are always changed over time. The length of multi-streams could be very large, since new values are continuously appended. Therefore, the similarity of multi-streams is expressed by means of the last values of each stream, using a sliding window approach. The naïve approach is to delete the old items by using timestamp techniques, to re-apply the data reduction mining technique on the new items, and finally store the resulting summary only in the access memory to do the further final approximated result analysis. This process is very efficiently both in CPU time and numeric items calculated by one-pass processing. Since multi-streams are usually too large to be stored in main memory, skyline algorithms for similarity search are used in the sense that emerged as an important summarization technique happens in the main memory. Several algorithms [4, 5] have been proposed targeting the efficient skyline evaluation on large datasets. These solutions always classified into two categories, depending on whether they assume an index. Intuitively, the index-based schemes are faster than index- independent strategies, since they avoid accessing the entire data collection, yet their applicability is significantly limited by the indexing requirement. They may not to be * Corresponding author. - 170 -

2 Skyline Operator and Future Work indexed which the data are dynamically produced in many streaming applications (such as moving sensors, predicted analysis). Therefore, the traditional techniques may not suit for streams exactly. Our proposed D-skyline and T-skyline is more excellent to adapt to this kind of data characters. 2 Skyline Operator and Future Work D-skyline and T-skyline are focus on requirement for the lowest space usage and fast response to the users in a high accuracy guarantee results. D-skyline algorithm organizes already computed multi-streams into sub-windows such that candidate similar search tuples could quickly prune when they are dominated by some other streams. For a candidate query, only their distance under the threshold could be calculated and stored into sub-windows. In a defined time series, only the frequent times for each stream need to be shown in each sub-window, then the most appeared streams as an approximated result of similar search queries for the whole sliding window domain. Other unsigned items would be removed in order to release space for continuously coming new streams. T-skyline is similar to D-skyline except one pass on the sub-windows, which need top-k items as a list only. One thing is that T-skyline may use the lowest space usage than others and give a very exactly answers to satisfy a more fast response. We demonstrate that both our methods are efficiency as data reducing mining techniques for similarity search over multi-streams distributed processing. The important thing is that they give more exactly approximate results as lower as possible on space usage. In the future, we will show a more detailed discussion on the difference between these two methods after evaluation experiments. Acknowledgments. This work was supported by the Korea Institute of Energy Research (KIER) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2012-0000478). References Nehme, R.V., Rundensteiner, E.A., Bertino, E.: Tagging Stream Data for Rich Real- Time Services. Journal of VLDB Endowment, Vol. 2, Issue: 1, pp. 73-84 (2009) Vu, T. H. N., Park, N.K., Lee, Y.K., Lee, Y.M., Ryu, K.H.: Online discovery of Heart Rate Variability patterns in mobile healthcare services. Journal of Systems and Software. Vol. 83 Issue: 10, pp. 1930-1940 (2010) Lee, Y.K., Shin, J.P., Kim, K.D., Ryu, K.H.: An adaptive data storage and historical query processing for storage-centric sensor network. Journal of Innovative Computing, Information and Control, Vol.7, No.5, pp. 2945-2959 (2011) Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable Skyline Computation Using Object-based Space Partitioning. In: 35th SIGMOD international conference on Management of data, pp. 483-494. ACM Press, Providence (2009) Zhang, S., Mamoulis, N., Kao, B., Cheung, D.W.L.: Efficient Skyline Evaluation over Partially Ordered Domains. Journal of VLDB Endowment, Vol. 3, Issue: 1, pp. 1255-1266 (2010) - 171 -