Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Raghavendra Madala. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee Performance Evaluation Conclusion 2 ICICLES: Self-tuning.
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
gSpan: Graph-based substructure pattern mining
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search: suffix trees)
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Progress Report on Continuous Data Stream Management  Mining Frequent Itemsets over Data Streams  Music Virtual Channel Presented by: Dr. Yi-Hung Wu.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Data Mining Association Analysis: Basic Concepts and Algorithms
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
Continuous Data Stream Processing
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Top-k Monitoring in Wireless Sensor Networks Minji Wu, Jianliang Xu, Xueyan Tang, and Wang-Chien Lee IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Querying Structured Text in an XML Database By Xuemei Luo.
NIBEDITA MAULIK GRAND SEMINAR PRESENTATION OCT 21 st 2002.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Wang-Chien Lee 2, Tz-Chiao Weng 1 and Vincent S. Tseng 1 1 Department of Computer.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
Chapter 1 Overview of Databases and Transaction Processing.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Andy Nguyen Christopher Piech Jonathan Huang Leonidas Guibas. Stanford University.
Book web site:
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively
Data Mining Association Analysis: Basic Concepts and Algorithms
Music Matching Speaker : 黃茂政 指導教授 : 陳嘉琳 博士.
Online Frequent Episode Mining
RE-Tree: An Efficient Index Structure for Regular Expressions
Frequent Pattern Mining
Byung Joon Park, Sung Hee Kim
Approximate Frequency Counts over Data Streams
Presentation transcript:

Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence queries Date: 2005/10/21 Post-Excellence Project Subproject 6

Continuous Data Stream Management 2 Clustering engine Clustering engine Music metadata Music Virtual Channel  Extensions … 1 1 N N 2 2 … Music collections Internet V.C. player V.C. player Filtering engine Filtering engine Music channel simulator Music channel simulator Interface Profile monitor Profile monitor Cluster monitor Cluster monitor Channel monitor Channel monitor Favorite channel Favorite channel Cluster coordinator Cluster coordinator Peer search engine Peer search engine Profile database Profile database MusicXML database MusicXML database XML Filtering engine XML Filtering engine

Continuous Data Stream Management 3 An Extension on Virtual Channel rangekNN  After a player starts a range (or kNN) search,  It updates its profile periodically  The search results are continuously maintained V.C. player (query) V.C. player (peer)

Continuous Data Stream Management 4 An Extension on Virtual Channel  Compared with the clustering engine  A flexible definition of “clusters”  Update is more natural than insertion/deletion  No need of parameter setting and re-clustering  Indexing can relieve the pain of frequent update  Compared with the problem of moving objects  Movements in a high-dimensional feature space  In most cases every object is also a query  Prediction of object movement is possible

Continuous Data Stream Management 5  When a music piece is played on a channel,  The corresponding musicXML file can be obtained  A query can be a portion of musicXML or XQuery An Extension on Favorite Channel

Continuous Data Stream Management 6 An Extension on Favorite Channel  Compared with query segments  More musical semantic in a query  Do not interfere the music playback  Matching on complex tree-structures Common subquery is still useful

Continuous Data Stream Management 7 Research Issues  Peer Search Engine  An indexing method to support continuous query processing for high-dimensional moving objects  A prediction-based bounding mechanism to reduce the frequency of profile update  XML Filtering Engine  An online method to enable tree pattern mining over a data stream  An indexing mechanism to support XML filtering

Discovering Frequent Tree Patterns over Data Streams Submitted for publication

Continuous Data Stream Management 9 Problem Definition  As the query trees stream in, find out the subtrees which occur more then θ·N times, where N is the number of trees received so far and 0 ≦ θ ≦ 1 STMer Frequent Tree Patterns T1 T3 T2

Continuous Data Stream Management 10 Problem Definition (Cont.)  Labeled ordered tree  Induced subtree B DC differs from B CD A BE CD Tree patternQuery Tree

Continuous Data Stream Management 11 An Example  Given θ = 0.6 Frequent Tree Patterns (occurrence > 0.6*1) : STMer A BC A BC ABC A B A C Frequent Tree Patterns (occurrence > 0.6*2) : B B DE Frequent Tree Patterns (occurrence > 0.6*3) : AB A B A BF

Continuous Data Stream Management 12 Main Difficulties  The properties of data streams:  One pass  Traditional tree mining methods fail  Fast input rate  Efficiency issue is critical  Incremental  An incremental algorithm is required  Unbounded  Approximate counting is needed

Continuous Data Stream Management 13 An Overview of Our Method  Subtree generation  Subtree maintenance STMer T1 A candidate pool Requests on demand

Continuous Data Stream Management 14 String Representation  DFS order on T  (label, level) node sequence S

Continuous Data Stream Management 15 Subtree Generation Data stream BufferA1 A TDTD A t1t1 A,1 BufferA1B2 A B TDTD B1 B A B A1B2 t2t2 B,2

Continuous Data Stream Management 16 Subtree Generation (Cont.) Data stream t1t1 t2t2 B1 B A B A1B2 A1 A B,2 BufferA1B2C2 TDTD A BC C1 C A C A1C2 A BC A1B2C2 A,1 C,2 t3t3

Continuous Data Stream Management 17 Subtree Generation (Cont.) A1 B1 B2 Φ APT C1 D2 D1 E3 E2 E1 C2 D3 E4 C2 D3 E4 BufferA1B2 TDTD A B C D E F2 C2D3E4

Continuous Data Stream Management 18 Subtree Maintenance BufferA1B2E2 (E2, 1, 3) APT A1 B1 E1 B2E2 Φ GPT +1 #query trees received = 321 (A1, 5, 0) (B2, 4, 1) Φ (C3, 2, 1) +1

Continuous Data Stream Management 19 Experiments on Sensitivity Minimum support Error parameter

Continuous Data Stream Management 20 Experiments on Comparison  StreamT (ICDM ’ 02)

Continuous Data Stream Management 21 Conclusion  Contribution  A novel technique is proposed for efficient subtree generation  A compact structure is employed to reduce the the memory requirement of the candidate pool  Current work  Mining closed frequent subtrees over data streams A BC 2 A B 5 A C 2 A 5