D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.

Slides:

Advertisements

Similar presentations

Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.

Advertisements

Pattern Matching against Distributed Datasets within DAME Andy Pasley University of York.

Indexing DNA Sequences Using q-Grams

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.

指導教授：陳良弼老師報告者：鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.

Chapter 5: Introduction to Information Retrieval

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.

Maintaining Sliding Widow Skylines on Data Streams.

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.

Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.

T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.

Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.

What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.

1 Continuous k-dominant Skyline Query Processing Presented by Prasad Sriram Nilu Thakur.

A survey on stream data mining

An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.

Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.

Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.

Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.

Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,

Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.

Detecting Distance-Based Outliers in Streams of Data Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria CIKM 07.

International Conference on Computer Convergence Technology 2011 Tie Hua Zhou Database/Bioinformatics Laboratory Chungbuk National.

«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,

Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.

HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.

Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

Exact indexing of Dynamic Time Warping

1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.

August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.

Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.

Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.

Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.

Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Predictive Modeling in Data Management Byung S. Lee Computer Science University of Vermont

BLFS: Supporting Fast Editing/Writing for Large- Sized Multimedia Files Seung Wan Jung 1, Seok Young Ko 2, Young Jin Nam 3, Dae-Wha Seo 1, 1 Kyungpook.

Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan Shruti P. Gopinath CSE 6339.

Tian Xia and Donghui Zhang Northeastern University

The Stream Model Sliding Windows Counting 1’s

Byung Joon Park, Sung Hee Kim

Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2

Query in Streaming Environment

Aziz Nasridinov and Young-Ho Park*

DISTRIBUTED CLUSTERING OF UBIQUITOUS DATA STREAMS

Objective of This Course

Xu Zhou Kenli Li Yantao Zhou Keqin Li

Survey on Coverage Problems in Wireless Sensor Networks - 2

MEET-IP Memory and Energy Efficient TCAM-based IP Lookup

Presentation transcript:

D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun Ho Ryu 1 * 1 Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea {smile2867, thzhou, 2 Department of Biomedical Engineering, Chungbuk National University, Chungbuk, Korea {kimka, Abstract. There has been a concerted effort in recent years to build data stream management systems for a specific streaming application. Requirement for the lowest space usage and fast response, the traditional skyline is not suit for streaming data process. Two approaches are proposed to solve this problem, namely D-skyline and T-skyline. These two methods are more excellent to adapt to this kind of data characters that are huge, vary, distributed, and coming in a high-speed rate. Focus on similarity search in streaming environment; our proposed methods give almost " real " results in an approximate way. Keywords: skyline, similarity search, stream processing. 1 Introduction In many stream applications [1, 2, 3], similarity search is more practical than exact match in stream processing, where both query and data are always changed over time. The length of multi-streams could be very large, since new values are continuously appended. Therefore, the similarity of multi-streams is expressed by means of the last values of each stream, using a sliding window approach. The naïve approach is to delete the old items by using timestamp techniques, to re-apply the data reduction mining technique on the new items, and finally store the resulting summary only in the access memory to do the further final approximated result analysis. This process is very efficiently both in CPU time and numeric items calculated by one-pass processing. Since multi-streams are usually too large to be stored in main memory, skyline algorithms for similarity search are used in the sense that emerged as an important summarization technique happens in the main memory. Several algorithms [4, 5] have been proposed targeting the efficient skyline evaluation on large datasets. These solutions always classified into two categories, depending on whether they assume an index. Intuitively, the index-based schemes are faster than index-independent strategies, since they avoid accessing the entire data collection, yet their applicability is significantly limited by the indexing requirement. They may not to be * Corresponding author

indexed which the data are dynamically produced in many streaming applications (such as moving sensors, predicted analysis). Therefore, the traditional techniques may not suit for streams exactly. Our proposed D-skyline and T-skyline is more excellent to adapt to this kind of data characters. 2 Skyline Operator and Future Work D-skyline and T-skyline are focus on requirement for the lowest space usage and fast response to the users in a high accuracy guarantee results. D-skyline algorithm organizes already computed multi-streams into sub-windows such that candidate similar search tuples could quickly prune when they are dominated by some other streams. For a candidate query, only their distance under the threshold could be calculated and stored into sub-windows. In a defined time series, only the frequent times for each stream need to be shown in each sub-window, then the most appeared streams as an approximated result of similar search queries for the whole sliding window domain. Other unsigned items would be removed in order to release space for continuously coming new streams. T-skyline is similar to D-skyline except one pass on the sub-windows, which need top-k items as a list only. One thing is that T-skyline may use the lowest space usage than others and give a very exactly answers to satisfy a more fast response. We demonstrate that both our methods are efficiency as data reducing mining techniques for similarity search over multi-streams distributed processing. The important thing is that they give more exactly approximate results as lower as possible on space usage. In the future, we will show a more detailed discussion on the difference between these two methods after evaluation experiments. Acknowledgments. This work was supported by the Korea Institute of Energy Research (KIER) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No ). References 1.Nehme, R.V., Rundensteiner, E.A., Bertino, E.: Tagging Stream Data for Rich Real- Time Services. Journal of VLDB Endowment, Vol. 2, Issue: 1, pp (2009) 2.Vu, T. H. N., Park, N.K., Lee, Y.K., Lee, Y.M., Ryu, K.H.: Online discovery of Heart Rate Variability patterns in mobile healthcare services. Journal of Systems and Software. Vol. 83 Issue: 10, pp (2010) 3.Lee, Y.K., Shin, J.P., Kim, K.D., Ryu, K.H.: An adaptive data storage and historical query processing for storage-centric sensor network. Journal of Innovative Computing, Information and Control, Vol.7, No.5, pp (2011) 4.Zhang, S., Mamoulis, N., Cheung, D.W.: Scalable Skyline Computation Using Object- based Space Partitioning. In: 35th SIGMOD international conference on Management of data, pp ACM Press, Providence (2009) 5.Zhang, S., Mamoulis, N., Kao, B., Cheung, D.W.L.: Efficient Skyline Evaluation over Partially Ordered Domains. Journal of VLDB Endowment, Vol. 3, Issue: 1, pp (2010)