DataJewel 1 : Tightly Integrating Visualization with Temporal Data Mining Mihael Ankerst, David H. Jones, Anne Kao, Changzhou Wang 1 US patent pending.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Decision Support and Artificial Intelligence Jack G. Zheng May 21 st 2008 MIS Chapter 4.
Personalized Presentation in Web-Based Information Systems Institute of Informatics and Software Engineering Faculty of Informatics and Information Technologies.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Seismo-Surfer a tool for collecting, querying, and mining seismic data Yannis Theodoridis University of Piraeus
Fakultät für Informatik Technische Universität München A Quantitative Perspective on Systems of Systems Formerly: Upscaling for Systems of Systems Astrid,
Mgt 240 Lecture Decision Support Systems March 3, 2005.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
1 Intro: What is datamining? Data are generated in large amount. E.g. transactions, telephone calls. Data is collected because believed to be a potential.
Mgt 240 Lecture MS Excel: Decision Support Systems September 16, 2004.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
Mihael Ankerst Boeing Phantom Works Visual Data Mining SC4Devo – July 15 th 2004.
DATA WAREHOUSING.
Clementine Server Clementine Server A data mining software for business solution.
Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Toward Knowledge Discovery in Databases Attached to Grids Peter Brezany Institute for Software.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Automatic Data Ramon Lawrence University of Manitoba
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Visualization and Data Mining Daniel A. Keim Professor and Head of Data Mining and Information Visualization University of Constance Konstanz, Germany.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Time Series Data Analysis - II
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.
1.Knowledge management 2.Online analytical processing 3. 4.Supply chain management 5.Data mining Which of the following is not a major application.
Cyber-Infrastructure for Agro-Threats Steve Goddard Computer Science & Engineering University of Nebraska-Lincoln.
Excel-Based Solutions For Large Data Systems by Douglas M. Smith / Abundant Solutions Data can be extracted from large data systems (mainframe, AS/400,
Data-mining & Data As we used Excel that has capability to analyze data to find important information, the data-mining helps us to extract information.
Perception-Based Classification (PBC) System Salvador Ledezma April 25, 2002.
AuthorLink: Instant Author Co-Citation Mapping for Online Searching Xia Lin Howard D. White Jan Buzydlowski Drexel University Philadelphia,
Chapter 1 Introduction to Data Mining
ITR: Collaborative research: software for interpretation of cosmogenic isotope inventories - a combination of geology, modeling, software engineering and.
Millions of points of measurement Dense spatial and temporal data Need visual analytic tools as conventional analyses are too inefficient Visualization.
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
A Model for Fast Web Mining Prototyping Nivio Ziviani UFMG – Brazil Álvaro Pereir a Ricardo Baeza-Yates Jesus Bisbal UPF – Spain.
Data Mining By Dave Maung.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Supporting Large-scale Social Media Data Analyses with Customizable Indexing Techniques on NoSQL Databases.
April 28, 2003 Early Fault Detection and Failure Prediction in Large Software Systems Felix Salfner and Miroslaw Malek Department of Computer Science Humboldt.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Chapter 4 Decision Support System & Artificial Intelligence.
Domain Classes – Part 1.  Analyze Requirements as per Use Case Model  Domain Model (Conceptual Class Diagram)  Interaction (Sequence) Diagrams  System.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Mining Time Series State Changes with Prototype Based Clustering.
Expert System Participants
Lucent Technologies - Proprietary 1 Interactive Pattern Discovery with Mirage Mirage uses exploratory visualization, intuitive graphical operations to.
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 iWay DQC and iDP Kam Wong Solutions Architect Exploring Techniques of Data Quality and Profiling April 20, 2012 What Is Data Profiling? What Are Some.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
Supplier Recovery Claim Automation
Detection and Analysis of Threats to the Energy Sector (DATES)
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Dr. Sudha Ram Huimin Zhao Department of MIS University of Arizona
Document Visualization at UMBC
Data Warehousing and Data Mining
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

DataJewel 1 : Tightly Integrating Visualization with Temporal Data Mining Mihael Ankerst, David H. Jones, Anne Kao, Changzhou Wang 1 US patent pending

DataJewel: A novel Architecture for temporal data mining Motivation:  In different domains, different kind of patterns are of interest  Architecture that provides access to many temporal mining algorithms  Databases are built based on organizational needs  Architecture that links together databases  Databases can be huge in size  Data has to be compressed  Current Data Mining tools are for data mining experts  Architecture that is very intuitive and easy to use

Visual Data Mining Information Visualization Data Mining Visual Data Mining Data Mining Algorithms ++-- Actionable Evaluation Flexibility User Interaction Visualization --++

Visual Data Mining Architecture: Tightly Integrated Visualization Data Knowledge DM-Algorithm Result Visualization of the data Result DM-Algorithm step 1 Data Knowledge DM-Algorithm step n Visualization + Interaction Preceding Visualization (PV) Subsequent Visualization (SV) Tightly integrated Visualization (TIV) Visualization of the result Data DM-Algorithm Knowledge Visualization of the result Result

Data source layer Statistical layer Data mining layer Visualization layer Access and link multiple heterogeneous databases, data sources Compression, aggregation, sampling Extensible set of data mining algorithms for automatic pattern discovery Extensible set of visualizations for representing data and the patterns + interaction capabilities for the user to incorporate domain expertise Architecture of DataJewel

TimeEvent typeLocation… 09/11/2001Door brokenSeattle… 09/12/2001……… January 2002 S M T W T F S Tuesday, Jan 1 st 2002 Doors Engine Landing Gear Lights The Visualization Component

Goal: Mining algorithms should be  Very efficient (result in interactive times)  Types of patterns: single event: recurrence, periodicity,… multiple events: similarity, causality, clustering,…  Tightly integrated with the visualization Solution:  Algorithm computes pattern and updates visualization by assigning unique colors just to events which are contained in the pattern All algorithms result in updating the color assignment: - CalendarView visualizes the data and the patterns - CalendarView visualizes the data and the patterns - Same color assignment interface is used by the user and the algorithm - Same color assignment interface is used by the user and the algorithm The Temporal Mining Component

Implemented new mining algorithms  LongestStreak  Most Deviations  Correlated Events  Basic ideas of algorithms are motivated by control charting (stabilized p-chart) time Frequency mean

The Statistical & Database Component Access to data from different databases Precompute compressed/aggregated/ sampled data Use lookup tables to further compress data  Currently, we can analyze millions of records in real-time

The Statistical & Database Component DateATAComplaint_t xt … 1/1/200035….… 1/1/200035…… 1/1/200039…… Procurement DB Maintenance DB Procurement DB Maintenance DB DateATAComplaint_t xt … 12/1/200073….… 12/1/200073…… 15/1/200049…… Airline_a Airline_b

The Statistical & Database Component DateATAComplaint_t xt … 1/1/200035….… 1/1/200035…… 1/1/200039…… Procurement DB Maintenance DB Select Date, ATA, count(*) as Freq From airline_a GROUP BY Date, ATA ORDER BY Date, ATA DateATAFreq 12/1/ /1/ ……… Aggregate data with: DateATAComplaint_t xt … 12/1/200073….… 12/1/200073…… 15/1/200049…… Airline_a Airline_b

The Statistical & Database Component DateATAComplaint_t xt … 1/1/200035….… 1/1/200035…… 1/1/200039…… Procurement DB Maintenance DB Select Date, ATA, count(*) as Freq From airline_b GROUP BY Date, ATA ORDER BY Date, ATA DateATAFreq 1/1/ /1/ ……… Aggregate data with: DateATAComplaint_t xt … 1/1/200035….… 1/1/200035…… 1/1/200039…… Airline_a Airline_b

User-Centric Data Mining User selects data source/ attributes Data is compressed and loaded Data is visualized User selects date range User interacts with visualization User invokes algorithm Raw data is shown User selects visualization technique

Using 41 “different” colors… DataJewel – Scenario: Mining Algorithm

Press here for running mining algorithm DataJewel – Scenario: Mining Algorithm

DataJewel – Scenario: User Interaction

One airline, one model, ATA: 73 (Engine fuel/ control) Screenshots

One airline, one model, ATA: 49 (airborne auxiliary power) Screenshots

Conclusions  Data mining algorithms and visualization technique can nicely complement each other  CalendarView is a new visualization technique, representing frequency of daily events  DataJewel uses the same visualization to represent the data and the patterns. The color assignment interface is used by both the user (to incorporate domain knowledge) and for the computer (to represent the discovered patterns). These two key properties greatly improve the applicability of the system by domain experts. Future work: user studies, new visualizations, algorithms, …