9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist.

Slides:



Advertisements
Similar presentations
Ziming Zhang, Yucheng Zhao and Yiwen Wan.  Introduction&Motivation  Problem Statement  Paper Summeries  Discussion and Conclusions.
Advertisements

Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Data Mining and Intrusion Detection
Intrusion Detection and Containment in Database Systems Abhijit Bhosale M.Tech (IT) School of Information Technology, IIT Kharagpur.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Information Agents for Autonomous Acquisition of Sensor Network Data A. Rogers and N. R. Jennings University of Southampton, UK M. A. Osborne and S. J.
Jessica Lin, Eamonn Keogh, Stefano Loardi
Based on Slides by D. Gunopulos (UCR)
Instructor : Dr. K. R. Rao Presented by: Rajesh Radhakrishnan.
1 Real Time, Online Detection of Abandoned Objects in Public Areas Proceedings of the 2006 IEEE International Conference on Robotics and Automation Authors.
Clustering Unsupervised learning Generating “classes”
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
11/11/051 ME A Novel Technique for Learning Rare Events Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas.
Multiresolution Semantic Visualization of Network Traffic Alefiya Hussain, Arun Viswanathan USC/Information Sciences Institute Discover PatternsCreate.
Data Mining for Intrusion Detection: A Critical Review Klaus Julisch From: Applications of data Mining in Computer Security (Eds. D. Barabara and S. Jajodia)
Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA
What’s Making That Sound ?
10/24/081 Anomaly Detection Using Data Mining Techniques Margaret H. Dunham, Yu Meng, Donya Quick, Jie Huang, Charlie Isaksson CSE Department Southern.
STIFF: A Forecasting Framework for Spatio-Temporal Data Zhigang Li, Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Context-aware Adaptive Routing for Delay Tolerant Networking Mirco Musolesi Joint work with Cecilia Mascolo Department of Computer Science University College.
UNIVERSITY OF SOUTHERN CALIFORNIA 1 ELECTION: Energy-efficient and Low- latEncy sCheduling Technique for wIreless sensOr Networks S. Begum, S. Wang, B.
11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas Vijay Kumar UMKC Kansas.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica TrajPattern: Mining Sequential Patterns from Imprecise Trajectories.
STIFF: A Forecasting Framework for Spatio-Temporal Data Zhigang Li, Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Tokyo Research Laboratory © Copyright IBM Corporation 2005SDM 05 | 2005/04/21 | IBM Research, Tokyo Research Lab Tsuyoshi Ide Knowledge Discovery from.
Visualization and Exploration of Temporal Trend Relationships in Multivariate Time-Varying Data Teng-Yok Lee & Han-Wei Shen.
07/03/06 - Tunisia1 ME Data Mining Research at SMU Margaret H. Dunham, DBGroup: Yu Meng, Jie Huang, Lin Lu, Donya Quick, Michael Pierce CSE Department.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
12/9/08, Sandia National Labs 1 Anomaly Detection Using Data Mining Techniques Margaret H. Dunham, Yu Meng, Donya Quick, Jie Huang, Charlie Isaksson CSE.
Adaptive Tracking in Distributed Wireless Sensor Networks Lizhi Yang, Chuan Feng, Jerzy W. Rozenblit, Haiyan Qiao The University of Arizona Electrical.
Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006.
A Distributed Multimedia Data Management over the Grid Kasturi Chatterjee Advisors for this Project: Dr. Shu-Ching Chen & Dr. Masoud Sadjadi Distributed.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
11/3/041 ME Extensible Markov Model Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas 75275
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Context-aware Adaptive Routing for Delay Tolerant Networking
P.Demestichas (1), S. Vassaki(2,3), A.Georgakopoulos(2,3)
DATA MINING © Prentice Hall.
QianZhu, Liang Chen and Gagan Agrawal
Supervised Time Series Pattern Discovery through Local Importance
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Lin Lu, Margaret Dunham, and Yu Meng
Baselining PMU Data to Find Patterns and Anomalies
Image Segmentation Techniques
A survey of network anomaly detection techniques
Data Warehousing and Data Mining
ADVANCED TOPICS IN DATA MINING CSE 8331 Spring 2010 Part I
DATA MINING Introductory and Advanced Topics Part II - Clustering
Extraction of Multi-scale Outlier Hierarchy From Spatio-temporal Data Stream Jianming Lv.
Automatic Segmentation of Data Sequences
Discovery of Significant Usage Patterns from Clickstream Data
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas USA

Outline CTBTO Data CTBTO Data CTBTO Modeling Requirements CTBTO Modeling Requirements EMM EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 2

CTBTO Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 3 As a Data Miner I must first understand your DATA Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide Spatial (source and sensor) Temporal STREAM Data

From Sensors to Streams Stream Data - Data captured and sent by a set of sensors Stream Data - Data captured and sent by a set of sensors Real-time sequence of encoded signals which contain desired information. Real-time sequence of encoded signals which contain desired information. Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items Stream data is infinite - the data keeps coming. Stream data is infinite - the data keeps coming. 11/26/07 – IRADSN’07 4

CTBTO & Data Mining Data Mining techniques must be defined based on your data and applications Data Mining techniques must be defined based on your data and applications Can’t use predefined fixed models and prediction/classification techniques. Can’t use predefined fixed models and prediction/classification techniques. Must not redo massive amounts of algorithms already created. Must not redo massive amounts of algorithms already created. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 5

CTBTO + DM Requirements Model:Model: Handle different data types (seismic, hydroacoustic, etc.) Handle different data types (seismic, hydroacoustic, etc.) Spatial + Temporal (Spatiotemporal) Spatial + Temporal (Spatiotemporal) Hierarchical Hierarchical Scalable Scalable Online Online Dynamic Dynamic Anomaly Detection:Anomaly Detection: Not just specific wave type or data values Not just specific wave type or data values Relationships between arrival of waves/data Relationships between arrival of waves/data Combined values of data from all sensors Combined values of data from all sensors 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 6

EMM (Extensible Markov Model) Time Varying Discrete First Order Markov Model Time Varying Discrete First Order Markov Model Nodes are clusters of real world states. Nodes are clusters of real world states. Overlap of learning and validation phases Overlap of learning and validation phases Learning: Learning: Transition probabilities between nodesTransition probabilities between nodes Node labels (centroid or medoid of cluster)Node labels (centroid or medoid of cluster) Nodes are added and removed as data arrivesNodes are added and removed as data arrives Applications: prediction, anomaly detection Applications: prediction, anomaly detection 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 7

Research Objectives Apply proven spatiotemporal modeling technique to seismic data Apply proven spatiotemporal modeling technique to seismic data Construct EMM to model sensor data Construct EMM to model sensor data Local EMM at location or areaLocal EMM at location or area Hierarchical EMM to summarize lower level modelsHierarchical EMM to summarize lower level models Represent all data in one vector of valuesRepresent all data in one vector of values EMM learns normal behaviorEMM learns normal behavior Develop new similarity metrics to include all sensor data types (Fusion) Develop new similarity metrics to include all sensor data types (Fusion) Apply anomaly detection algorithms Apply anomaly detection algorithms 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 8

EMM Creation/Learning 9/15/20089 <18,10,3,3,1,0,0><17,10,2,3,1,0,0><16,9,2,3,1,0,0><14,8,2,3,1,0,0><14,8,2,3,0,0,0><18,10,3,3,1,1,0.> 1/3 N1 N2 2/3 N3 1/1 1/3 N1 N2 2/3 1/1 N3 1/1 1/2 1/3 N1 N2 2/3 1/2 N3 1/1 2/3 1/3 N1 N2 N1 2/2 1/1 N1 1

Input Data Representation Vector of sensor values (numeric) at precise time points or aggregated over time intervals. Vector of sensor values (numeric) at precise time points or aggregated over time intervals. Need not come from same sensor types. Need not come from same sensor types. Similarity/distance between vectors used to determine creation of new nodes in EMM. Similarity/distance between vectors used to determine creation of new nodes in EMM. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 10

11/3/0411 Anomaly Detection with EMM Objective : Detect rare (unusual, surprising) events Objective : Detect rare (unusual, surprising) events Advantages: Advantages: Dynamically learns what is normalDynamically learns what is normal Based on this learning, can predict what is not normalBased on this learning, can predict what is not normal Do not have to a priori indicate normal behaviorDo not have to a priori indicate normal behavior Applications: Applications: Network IntrusionNetwork Intrusion Data: IP traffic data, Automobile traffic dataData: IP traffic data, Automobile traffic data Seismic: Seismic: Unusual Seismic EventsUnusual Seismic Events Automatically Filter out normal eventsAutomatically Filter out normal events Weekdays Weekend Minnesota DOT Traffic Data Detected unusual weekend traffic pattern

EMM with Seismic Data Input – Wave arrivals (all or one per sensor) Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic data Identify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Initial Testing with RDG data Initial Testing with RDG data Use amplitude, period, and wave type Use amplitude, period, and wave type 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 12

New Distance Measure Data = Data = Different wave type = 100% difference Different wave type = 100% difference For events of same wave type: For events of same wave type: 50% weight given to the difference in amplitude.50% weight given to the difference in amplitude. 50% weight given to the difference in period.50% weight given to the difference in period. If the distance is greater than the threshold, a state change is required. If the distance is greater than the threshold, a state change is required. amplitude = amplitude = | amplitude new – amplitude average | / amplitude average | amplitude new – amplitude average | / amplitude average period = period = | period new – period average | / period average | period new – period average | / period average 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 13

EMM with Seismic Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 14 States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively.

Preliminary Testing RDG data February 1, 1981 – 6 earthquakes RDG data February 1, 1981 – 6 earthquakes Find transition times close to known earthquakes Find transition times close to known earthquakes 9 total nodes 9 total nodes 652 total transitions 652 total transitions Found all quakes Found all quakes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 15

EMM Nodes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 16 Node #Average amplitudeAverage periodPhase code  m secP (primary wave)  m secP (primary wave)  m secP (primary wave)  m secP (primary wave)  m sec P (primary wave)  m 0.96 sec P (primary wave)  m 20.4 sec P (primary wave)  m sec P (primary wave)  m 1.2 sec P (primary wave).

Hierarchical EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 17 Summary EMM Regional EMM Local EMM Regional EMM Local EMM

9/15/2008 CTBTO Data Mining/Data Fusion Workshop 18 DATA NEEDED Now What? NOISE MAY NOT BE BAD KDD CUP Interest DM COMMUNITY

References Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio- Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio- Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp ) Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp ) Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 19