Download presentation
Presentation is loading. Please wait.
Published byLayla Kerr Modified over 9 years ago
1
9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA mhd@engr.smu.edu
2
Outline CTBTO Data CTBTO Data CTBTO Modeling Requirements CTBTO Modeling Requirements EMM EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 2
3
CTBTO Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 3 As a Data Miner I must first understand your DATA Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide Spatial (source and sensor) Temporal STREAM Data
4
From Sensors to Streams Stream Data - Data captured and sent by a set of sensors Stream Data - Data captured and sent by a set of sensors Real-time sequence of encoded signals which contain desired information. Real-time sequence of encoded signals which contain desired information. Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items Stream data is infinite - the data keeps coming. Stream data is infinite - the data keeps coming. 11/26/07 – IRADSN’07 4
5
CTBTO & Data Mining Data Mining techniques must be defined based on your data and applications Data Mining techniques must be defined based on your data and applications Can’t use predefined fixed models and prediction/classification techniques. Can’t use predefined fixed models and prediction/classification techniques. Must not redo massive amounts of algorithms already created. Must not redo massive amounts of algorithms already created. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 5
6
CTBTO + DM Requirements Model:Model: Handle different data types (seismic, hydroacoustic, etc.) Handle different data types (seismic, hydroacoustic, etc.) Spatial + Temporal (Spatiotemporal) Spatial + Temporal (Spatiotemporal) Hierarchical Hierarchical Scalable Scalable Online Online Dynamic Dynamic Anomaly Detection:Anomaly Detection: Not just specific wave type or data values Not just specific wave type or data values Relationships between arrival of waves/data Relationships between arrival of waves/data Combined values of data from all sensors Combined values of data from all sensors 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 6
7
EMM (Extensible Markov Model) Time Varying Discrete First Order Markov Model Time Varying Discrete First Order Markov Model Nodes are clusters of real world states. Nodes are clusters of real world states. Overlap of learning and validation phases Overlap of learning and validation phases Learning: Learning: Transition probabilities between nodesTransition probabilities between nodes Node labels (centroid or medoid of cluster)Node labels (centroid or medoid of cluster) Nodes are added and removed as data arrivesNodes are added and removed as data arrives Applications: prediction, anomaly detection Applications: prediction, anomaly detection 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 7
8
Research Objectives Apply proven spatiotemporal modeling technique to seismic data Apply proven spatiotemporal modeling technique to seismic data Construct EMM to model sensor data Construct EMM to model sensor data Local EMM at location or areaLocal EMM at location or area Hierarchical EMM to summarize lower level modelsHierarchical EMM to summarize lower level models Represent all data in one vector of valuesRepresent all data in one vector of values EMM learns normal behaviorEMM learns normal behavior Develop new similarity metrics to include all sensor data types (Fusion) Develop new similarity metrics to include all sensor data types (Fusion) Apply anomaly detection algorithms Apply anomaly detection algorithms 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 8
9
EMM Creation/Learning 9/15/20089 <18,10,3,3,1,0,0><17,10,2,3,1,0,0><16,9,2,3,1,0,0><14,8,2,3,1,0,0><14,8,2,3,0,0,0><18,10,3,3,1,1,0.> 1/3 N1 N2 2/3 N3 1/1 1/3 N1 N2 2/3 1/1 N3 1/1 1/2 1/3 N1 N2 2/3 1/2 N3 1/1 2/3 1/3 N1 N2 N1 2/2 1/1 N1 1
10
Input Data Representation Vector of sensor values (numeric) at precise time points or aggregated over time intervals. Vector of sensor values (numeric) at precise time points or aggregated over time intervals. Need not come from same sensor types. Need not come from same sensor types. Similarity/distance between vectors used to determine creation of new nodes in EMM. Similarity/distance between vectors used to determine creation of new nodes in EMM. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 10
11
11/3/0411 Anomaly Detection with EMM Objective : Detect rare (unusual, surprising) events Objective : Detect rare (unusual, surprising) events Advantages: Advantages: Dynamically learns what is normalDynamically learns what is normal Based on this learning, can predict what is not normalBased on this learning, can predict what is not normal Do not have to a priori indicate normal behaviorDo not have to a priori indicate normal behavior Applications: Applications: Network IntrusionNetwork Intrusion Data: IP traffic data, Automobile traffic dataData: IP traffic data, Automobile traffic data Seismic: Seismic: Unusual Seismic EventsUnusual Seismic Events Automatically Filter out normal eventsAutomatically Filter out normal events Weekdays Weekend Minnesota DOT Traffic Data Detected unusual weekend traffic pattern
12
EMM with Seismic Data Input – Wave arrivals (all or one per sensor) Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic data Identify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Initial Testing with RDG data Initial Testing with RDG data Use amplitude, period, and wave type Use amplitude, period, and wave type 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 12
13
New Distance Measure Data = Data = Different wave type = 100% difference Different wave type = 100% difference For events of same wave type: For events of same wave type: 50% weight given to the difference in amplitude.50% weight given to the difference in amplitude. 50% weight given to the difference in period.50% weight given to the difference in period. If the distance is greater than the threshold, a state change is required. If the distance is greater than the threshold, a state change is required. amplitude = amplitude = | amplitude new – amplitude average | / amplitude average | amplitude new – amplitude average | / amplitude average period = period = | period new – period average | / period average | period new – period average | / period average 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 13
14
EMM with Seismic Data 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 14 States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively.
15
Preliminary Testing RDG data February 1, 1981 – 6 earthquakes RDG data February 1, 1981 – 6 earthquakes Find transition times close to known earthquakes Find transition times close to known earthquakes 9 total nodes 9 total nodes 652 total transitions 652 total transitions Found all quakes Found all quakes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 15
16
EMM Nodes 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 16 Node #Average amplitudeAverage periodPhase code 1 1.649 m 0.119 secP (primary wave) 2 8.353 m 0.803 secP (primary wave) 3 23.237 m 0.898 secP (primary wave) 4 87.324 m 0.997 secP (primary wave) 5 253.333 m 1.282 sec P (primary wave) 6 270.524 m 0.96 sec P (primary wave) 7 7.719 m 20.4 sec P (primary wave) 8 723.088 m 1.962 sec P (primary wave) 9 1938.772 m 1.2 sec P (primary wave).
17
Hierarchical EMM 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 17 Summary EMM Regional EMM Local EMM Regional EMM Local EMM
18
9/15/2008 CTBTO Data Mining/Data Fusion Workshop 18 DATA NEEDED Now What? NOISE MAY NOT BE BAD KDD CUP Interest DM COMMUNITY
19
References Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio- Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio- Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531. Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp 258-265. Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp 258-265. Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. 9/15/2008 CTBTO Data Mining/Data Fusion Workshop 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.