11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 Vijay Kumar UMKC Kansas.

Slides:



Advertisements
Similar presentations
Pattern Finding and Pattern Discovery in Time Series
Advertisements

Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
C6 Databases.
Kien A. Hua Division of Computer Science University of Central Florida.
9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist.
1 Enviromatics Spatial database systems Spatial database systems Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
4/24/09 - KSU Spatiotemporal Stream Mining Using EMM Margaret H. Dunham Southern Methodist University Dallas, Texas This material.
Database Systems: Design, Implementation, and Management Tenth Edition
Xyleme A Dynamic Warehouse for XML Data of the Web.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Locality Optimizations in OceanStore Patrick R. Eaton Dennis Geels An introduction to introspective techniques for exploiting locality in wide area storage.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Managing Data Resources. File Organization Terms and Concepts Bit: Smallest unit of data; binary digit (0,1) Byte: Group of bits that represents a single.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
11/11/051 ME A Novel Technique for Learning Rare Events Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
10/24/081 Anomaly Detection Using Data Mining Techniques Margaret H. Dunham, Yu Meng, Donya Quick, Jie Huang, Charlie Isaksson CSE Department Southern.
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
Datawarehouse Objectives
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Warehouse. Design DataWarehouse Key Design Considerations it is important to consider the intended purpose of the data warehouse or business intelligence.
1 Data Warehouses BUAD/American University Data Warehouses.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Data Warehousing.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Chapter 13 Designing Databases Systems Analysis and Design Kendall & Kendall Sixth Edition.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Foundations of Business Intelligence: Databases and Information Management.
Visualization Four groups Design pattern for information visualization
07/03/06 - Tunisia1 ME Data Mining Research at SMU Margaret H. Dunham, DBGroup: Yu Meng, Jie Huang, Lin Lu, Donya Quick, Michael Pierce CSE Department.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
12/9/08, Sandia National Labs 1 Anomaly Detection Using Data Mining Techniques Margaret H. Dunham, Yu Meng, Donya Quick, Jie Huang, Charlie Isaksson CSE.
Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006.
Data Warehousing.
Towards Unifying Vector and Raster Data Models for Hybrid Spatial Regions Philip Dougherty.
AegisDB: Integrated realtime geo-stream processing and monitoring system Chengyang Zhang Computer Science Department University of North Texas.
MULTIMEDIA DATA MODELS AND AUTHORING
11/3/041 ME Extensible Markov Model Margaret H. Dunham, Yu Meng, Jie Huang CSE Department Southern Methodist University Dallas, Texas 75275
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Intro to MIS – MGS351 Databases and Data Warehouses
Gedas Adomavicius Jesse Bockstedt
Chapter 13 Business Intelligence and Data Warehouses
Chapter 5 Data Management
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Jiawei Han Department of Computer Science
Data Warehousing and Data Mining
ADVANCED TOPICS IN DATA MINING CSE 8331 Spring 2010 Part I
Chapter 17 Designing Databases
Multidimensional Space,
Discovery of Significant Usage Patterns from Clickstream Data
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas Vijay Kumar UMKC Kansas City, Missouri

From Sensors to Streams – An Outline nData Stream Overview nData Stream Visualization n Temporal Heat Map nData Stream Modeling n Extensible Markov Model nData Stream Hierarchy 11/26/07 – IRADSN’07 2

From Sensors to Streams – An Outline nData Stream Overview nData Stream Visualization n Temporal Heat Map nData Stream Modeling n Extensible Markov Model nData Stream Hierarchy 11/26/07 – IRADSN’07 3

From Sensors to Streams nData captured and sent by a set of sensors is usually referred to as “stream data”. nReal-time sequence of encoded signals which contain desired information. It is continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items nStream data is infinite - the data keeps coming. 11/26/07 – IRADSN’07 4

Data Stream Management Systems (DSMS) nSoftware to facilitate querying and managing stream data. nRetrieve the most recent information from the stream nData aggregation facilitates merging together multiple streams nModeling stream data to “summarize” stream nVisualization needed to observe in real-time the spatial and temporal patterns and trends hidden in the data. 11/26/07 – IRADSN’07 5

DSMS Problems nStream Management development in state similar to that of databases prior to 1970’s n Each system/researcher looks at specific application or system n No standards concerning functionality n No standard query language nUnreasonable to expect end users will access raw data, data in the DSMS, or even data at a summarized view nDomain experts need to “see” a higher level of data 11/26/07 – IRADSN’07 6

Our Proposal Four level data abstraction to facilitate the creation of actionable intelligence for domain experts evaluating sensor data. 11/26/07 – IRADSN’07 7

From Sensors to Streams – An Outline nData Stream Overview nData Stream Visualization n Temporal Heat Map nData Stream Modeling n Extensible Markov Model nData Stream Hierarchy 11/26/07 – IRADSN’07 8

Assumptions for Our Research nEnd User: n May not be knowledgeable concerning sensors n Probably a Domain Expert n May not need to see exact sensor values n Concerned with trends and approximate values n Need to see data from MANY sensors at one time n Need to see data continuously in a visualization of the stream 11/26/07 – IRADSN’07 9

Suppose There Were MANY Sensors nTraditional line graphs would be very difficult to read nRequirements for new visualization technique: n High level summary of data n Handle multiple sensors at once n Continuous n Temporal n Spatial 11/26/07 – IRADSN’07 10

Temporal Heat Map nAlso called Temporal Chaos Game Representation (TCGR) nTemporal Heat Map (THM) is a visualization technique for streaming data derived from multiple sensors. n It is a two dimensional structure similar to an infinite table. n Each row of the table is associated with one sensor value. n Each column of the table is associated with a point in time. n Each cell within the THM is a color representation of the sensor value nColors normalized (in our examples) n 0 – While n 0.5 – Blue n Red 11/26/07 – IRADSN’07 11

10/11/07 12 NGDM'07 Cisco – Internal VoIP Traffic Data Time → Values → Complete Stream: CiscoEMM.pngCiscoEMM.png VoIP traffic data was provided by Cisco Systems and represents logged VoIP traffic in their Richardson, Texas facility from Mon Sep 22 12:17: to Mon Nov 17 11:29:

Derwent River (UK) 11/26/07 – IRADSN’07 13 Derwent Temporal Heat Map derwentrotate.png

From Sensors to Streams – An Outline nData Stream Overview nData Stream Visualization n Temporal Heat Map nData Stream Modeling n Extensible Markov Model nData Stream Hierarchy 11/26/07 – IRADSN’07 14

Data Stream Modeling Requirements n Summarization (Synopsis )of data n Use data NOT SAMPLE n Temporal and Spatial n Dynamic n Continuous (infinite stream) n Learn n Forget n Sublinear growth rate - Clustering 11/26/07 – IRADSN’07 15

Extensible Markov Model nExtensible Markov Model (EMM): at any time t, EMM consists of a Markov Chain with designated current node, N n, and algorithms to modify it, where algorithms include: n EMMCluster, which defines a technique for matching between input data at time t + 1 and existing states in the MC at time t. n EMMIncrement algorithm, which updates MC at time t + 1 given the MC at time t and clustering measure result at time t + 1. n EMMDecrement algorithm, which removes nodes from the EMM when needed. n In addition, the EMM has associated Data Mining functions such a Rare Event Detection and Prediction Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp /26/07 – IRADSN’07 16

10/11/0717 NGDM'07 EMM Learning 1/3 N1 N2 2/3 N3 1/1 1/3 N1 N2 2/3 1/1 N3 1/1 1/2 1/3 N1 N2 2/3 1/2 N3 1/1 2/3 1/3 N1 N2 N1 2/2 1/1 N1 1

11/26/07 – IRADSN’07 18 N2 N1N3 N5N6 2/2 1/3 1/2 N1N3 N5N6 1/6 1/3 EMM Forgetting

11/26/07 – IRADSN’07 19 EMM Sublinear Growth Rate Minnesota Department of Transportation (MnDot)

From Sensors to Streams – An Outline nData Stream Overview nData Stream Visualization n Temporal Heat Map nData Stream Modeling n Extensible Markov Model nData Stream Hierarchy 11/26/07 – IRADSN’07 20

Traditional DBMS Data Abstraction nThree levels of data abstraction n Physical, n Logical n External nData is normally pulled to the user by a query 11/26/07 – IRADSN’07 21

Proposed DSMS Data Abstraction nAbstraction n Level 0 - Physical Level Raw data from sensors Cannot be stored n Level 1 – DSMS Sensor data is merged, aggregated, and cleansed. DSMS queries may be processed against this data. n Level 2 – Model Summarization (Synopsis )of data n Level 3 – Domain Expert Summary Visualization nData is normally pushed to the user 11/26/07 – IRADSN’07 22

11/26/07 – IRADSN’07 23 LevelsLowest Level Highest Level Abstraction Inter-level Data Migration Memory Hierarchy nExternal Storage Subset/Cache/BufferFetch/Prefetch DBMS Data Hierarchy 3Physical Storage External ViewFetch, Prefetch Data Warehouse nOperational Data Cube/ Multidimensional View Aggregation Stream Hierarchy 4Sensor DataVisualization/TriggersAutomatic Push

11/26/07 – IRADSN’07 24

Stream Hierarchy Summary nExcept for the inter-level functionality requirements, each level functionality is independent of the others and may differ across different implementations. nThe model used must capture time and ordering of data, be able to both learn and forget, and use some variation of clustering. nVisualization at the domain expert level must capture both time and ordering. It addition it should be able to be easily “read” for many sets of sensors. 11/26/07 – IRADSN’07 25