University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Mining Time Series State Changes with Prototype Based Clustering.

Slides:



Advertisements
Similar presentations
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Advertisements

Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
R Mohammed Wahaj. What is R R is a programming language which is geared towards using a statistical approach and graphics Statisticians and data miners.
Harlan Shannon Meteorologist U.S. Department of Agriculture Office of the Chief Economist World Agricultural Outlook Board Washington D.C., U.S.A. An Overview.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
Data Mining.
Game Mathematics & Game State The Complexity of Games Expectations of Players Efficiency Game Mathematics o Collision Detection & Response o Object Overlap.
Intrusion detection Anomaly detection models: compare a user’s normal behavior statistically to parameters of the current session, in order to find significant.
Lecture 4 TIES445 Data mining Nov-Dec 2007 Sami Äyrämö.
Probabilistic Databases Amol Deshpande, University of Maryland.
Fault Prediction and Software Aging
Unit 3a Industrial Control Systems
Stephanie Fultz. Overall Modeling Modeling is a way of thinking about the problems using models organized around the real world ideas. A modeling method.
WAC/ISSCI Automated Anomaly Detection Using Time-Variant Normal Profiling Jung-Yeop Kim, Utica College Rex E. Gantenbein, University of Wyoming.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
PROGRAMMING LANGUAGES The Study of Programming Languages.
Data Mining Chun-Hung Chou
Development of numerical library software in Java Feb 8, 2000 H.Okazawa, Shizuoka Seika College, Japan and T.Sasaki, KEK,Japan.
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Data Mining as Pre-EDD Investigatory Tool Team 9.
Chapter 1 Introduction to Data Mining
Presenter : Ching-Hua Huang 2013/9/16 Visibility Enhancement for Silicon Debug Cited count : 62 Yu-Chin Hsu; Furshing Tsai; Wells Jong; Ying-Tsai Chang.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
Grant Pannell. Intrusion Detection Systems  Attempt to detect unauthorized activity  CIA – Confidentiality, Integrity, Availability  Commonly network-based.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Reusability and Effective Test Automation in Telecommunication System Testing Mikael Mattas Supervisor: Professor Sven-Gustav Häggman Instructor: B.Sc.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Software Reliability Assurance for Real-time Systems Joel Henry, Ph.D. University of Montana NASA Software Assurance Symposium September 4, 2002.
© Fraunhofer IESE Domain-specific Modeling as an Enabling Technology for SMEs Christian Schäfer
Enhancing Interactive Visual Data Analysis by Statistical Functionality Jürgen Platzer VRVis Research Center Vienna, Austria.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
April 28, 2003 Early Fault Detection and Failure Prediction in Large Software Systems Felix Salfner and Miroslaw Malek Department of Computer Science Humboldt.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
VisDB: Database Exploration Using Multidimensional Visualization Maithili Narasimha 4/24/2001.
Cluster Sorting, Analysis and 3D-Display For the Mapping of the Dragonfly’s Neuro- Network Advisor: Prof. Hannay Client: Prof. Olberg (Biology )
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Intelligent Agent Framework1 From Chapter 7 of Constructing Intelligent Agents with Java.
Visualization in Problem Solving Environments Amit Goel Department of Computer Science Virginia Tech June 14, 1999.
A Data/Detector Characterization Pipeline (What is it and why we need one) Soumya D. Mohanty AEI January 18, 2001 Outline of the talk Functions of a Pipeline.
SASMI Self-Awareness and Self-Monitoring for Innovation.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
Importance of user interface design – Useful, useable, used Three golden rules – Place the user in control – Reduce the user’s memory load – Make the.
UNIT-III Group Technology and Computer Aided Process Planning
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Adventure Game for Children With Disabilities Luke Innes Supervisor: Brett Wilkinson.
Using Bayesian Networks to Predict Plankton Production from Satellite Data By: Rob Curtis, Richard Fenn, Damon Oberholster Supervisors: Anet Potgieter,
DataJewel 1 : Tightly Integrating Visualization with Temporal Data Mining Mihael Ankerst, David H. Jones, Anne Kao, Changzhou Wang 1 US patent pending.
Application Development in Engineering Optimization with Matlab and External Solvers Aalto University School of Engineering.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,
SZRZ6014 Research Methodology Prepared by: Aminat Adebola Adeyemo Study of high-dimensional data for data integration.
The Application of Data Mining in Telecommunication by Wang Lina February 2003.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
Business process management (BPM)
Agenda Preliminaries Motivation and Research questions Exploring GLL
Business process management (BPM)
A Unifying View on Instance Selection
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Software metrics.
Visualizing the Attracting Structures Results and Conclusions
Data Mining: Concepts and Techniques
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Mining Time Series State Changes with Prototype Based Clustering Markus Pylvänen Sami Äyrämö Tommi Kärkkäinen University of Jyväskylä

Department of Mathematical Information Technology ICANNGA 2009 The Problem Industrial processes produce a huge amount of multivariate time series data Manual surveillance requires too much resources Malfunction should be detected before the occurrence –The malfunction state and the preceeding states, or even sequence of the states, must be recognized, characterized and detected for proactive surveillance

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Äyrämö, S., Knowledge Mining using Robust Clustering, PhD Thesis, University of Jyväskylä, 2006

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 About the Domain Monitoring of wind turbine gears and mechanical drives manufactured for the process industries By detecting faults before they occur it is possible to plan service breaks in advance and maximize the running time of gear units No a priori information are available on the operational states The visualization tool –detecting and visualizing the state changes in gear units –a simple and understandable view to the process data for the use of industrial process experts

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Gear unit Measured gear unit is 750 kW industrial planetary gear

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 The data Condition of the gear units are monitored by Moventas Condition Management System (CMaS) which uses several sensors for detecting –count of oil particles –vibration –rotation speed –oil temperature –oil pressure Size of test data 2029 × 215 One hour resolution One malfunction was detected by the domain specialist in the test data collected from the gear unit

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 The method Time-series data can be analyzed with many data mining techniques –E.g., clustering and dimension reduction provide information about process states or correlations between measurements Using sequence mining also the order of state changes can be recognized Combining these with visualization can get an overall view to the different states in the process and the order they occurred

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Mining the Time Series State Changes

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Occurrence of Clusters in Timeline Colors represent clusters Each cluster correspond to a particular state Any clustering method can be applied Information about within- and between-cluster similarities is lost Recurrent sequences are still difficult to recognize

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Implementation The MATLAB K-means algorithm was used in the clustering step –The prototype-based methods provide natural representatives for clusters prototypes –Easy to modify for incomplete data sets –Based on classical statistics, not robust against gross errors –The other methods should be tried later when more data will be available Dimension reduction was realized using MATLAB PCA-method Graphical user interface was programmed with Java using JFreeChart library All the written code are open source and licensed with GPLv3

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Transition network Malfunction state

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Window for Comparing Clusters Clusters are compared with one of the vibration variables Malfunction in cluster 5 can be easily seen

University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Conclusions The prototype software was found to be a promising monitoring tool for gear unit monitoring More data from normal behavior and malfunctions are required More efficient clustering techniques (including missing data treatment) must be evaluated Design of the visual outlook must be enhanced