Mining Dynamics of Data Streams in Multi-Dimensional Space 11/9/2018 Mining Dynamics of Data Streams in Multi-Dimensional Space Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj
Challenges of Stream Data Mining 11/9/2018 Challenges of Stream Data Mining Mining query mode: continuous, ad-hoc, progressive? Mining mode: batched vs. interactive vs. lazy mining? Time constraints: real-time? What patterns to be mined? Finding patterns, anomaly, differences, …in multiple streams Mining dynamics (changes, trends and evolutions) of data streams Multi-level/multi-dimensional processing and data mining Most stream data are at pretty low-level or multi-dimensional in nature November 9, 2018 Mining Dynamics of Data Streams
Why Mining Dynamics of Data Streams in Multi-Dimensional Space? 11/9/2018 Why Mining Dynamics of Data Streams in Multi-Dimensional Space? Dynamics (changes, trends and evolutions) of data streams Perhaps the most interesting thing in streams Cannot just look at the current data? Save something! Multi-dimensional stream mining Most real stream data are at low-level or multi-dimensional in nature How to examine dynamically at multi-dimensions? Finding dynamics: patterns and outliers in certain dimensional space November 9, 2018 Mining Dynamics of Data Streams
Stream Data Mining Tasks Multi-dimensional (on-line) analysis of streams Clustering data streams Classification of data streams Mining frequent patterns in data streams Mining sequential patterns in data streams Mining partial periodicity in data streams Mining notable gradients in data streams Mining outliers and unusual patterns in data streams ……, more? November 9, 2018 Mining Dynamics of Data Streams
Example 1: Multi-Dimensional (OLAP) Analysis Analysis of Web click streams Raw data at low levels: seconds, web page addresses, user IP addresses, … Analysts want: changes, trends, unusual patterns, at reasonable levels of details E.g., Average clicking traffic in North America on sports in the last 15 minutes is 40% higher than that in the last 24 hours.” Analysis of power consumption streams Raw data: power consumption flow for every household, every minute Patterns one may find: average hourly power consumption surges up 30% for manufacturing companies in Chicago in the last 2 hours today than that of the same day a week ago November 9, 2018 Mining Dynamics of Data Streams
Example 2: Multi-Dimensional Classification Dynamic model update for loan or investment Huge amount of incoming flow of changing information with multiple dimensional space (factors) E.g., Should we invest this company based on the situation of the current market? Classification in dynamic (volatile) stock market Classification of stocks based on their current streams E.g., Is Lucent going to be up in the next little while? November 9, 2018 Mining Dynamics of Data Streams
Example 3: Hi-Dimensional Clustering Network intrusion detection Huge amount of incoming flow of network traffic information, multiple dimensional features in nature Find burst of activities/traffic in real time On-line clustering to detect abrupt changes What are the changes of e-mail or text information Clustering based on frequent terms Can we perform such clustering in real-time? November 9, 2018 Mining Dynamics of Data Streams
Methodology in Stream Data Mining Multi-dimensional (on-line) analysis Mining dynamics of data streams Time is a special dimension Tilted time frame (multiple time granularity) Stream data reduction and pre-computation What kind of multi-dimensional data to be pre-computed and stored for OLAP analysis? What kind of data to be pre-computed/stored for classification? For clustering? For mining frequent patterns? For mining sequential patterns? partial periodic patterns? …… How to do incremental updates? How to find changes? November 9, 2018 Mining Dynamics of Data Streams
?- Questions in Stream Data Mining Will stream data mining be real in practice? Should we develop general stream data mining principles, or ad-doc application-oriented methods? How are stream data mining methods different from incremental mining? How are stream data mining linked with stream data management system? With continuous query processing? Can we do privacy-preserving mining with stream data? November 9, 2018 Mining Dynamics of Data Streams
Mining Dynamics of Data Streams www.cs.uiuc.edu/~hanj Thank you !!! November 9, 2018 Mining Dynamics of Data Streams