University of Illinois at Urbana-Champaign 1 Analytical and Visual Data Mining Michael Welge Automated Learning Group, NCSA

Slides:



Advertisements
Similar presentations
Data Mining: What? WHY? HOW?
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
1. Abstract 2 Introduction Related Work Conclusion References.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining Concepts 1.1 COT5230 Data Mining Week 1 Data Mining Concepts M O N A S H A U S T R A L I A ’ S I N T E R N A T I O N A L U N I V E R S I T.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Anomaly detection with Bayesian networks Website: John Sandiford.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Advanced Database Concepts
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Ghada H. El-Khawaga Marwa M. El-Sadeeq  What is data mining ?  Why data mining?  Data mining types  Data mining tasks  Knowledge discovery.
Data Mining.
Data Mining – Intro.
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Week 11 Knowledge Discovery Systems & Data Mining :
Data Warehousing and Data Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

University of Illinois at Urbana-Champaign 1 Analytical and Visual Data Mining Michael Welge Automated Learning Group, NCSA October 14, 1998

University of Illinois at Urbana-Champaign 2 Why Data Mining? -- Potential Applications Database analysis, decision support, and automation –Market and Sales Analysis –Fraud Detection –Manufacturing Process Analysis –Risk Analysis and Management –Experimental Results Analysis –Scientific Data Analysis –Text Document Analysis

University of Illinois at Urbana-Champaign 3 Data Mining: Confluence of Multiple Disciplines Database Systems, Data Warehouses, and OLAP Machine Learning Statistics Mathematical Programming Visualization High Performance Computing

University of Illinois at Urbana-Champaign 4 Data Mining: On What Kind of Data? Relational Databases Data Warehouses Transactional Databases Advanced Database Systems –Object-Relational –Spatial –Temporal –Text –Heterogeneous, Legacy, and Distributed –WWW

University of Illinois at Urbana-Champaign 5 Why Do We Need Data Mining? Leverage organization’s data assets –Only a small portion (typically - 5%-10%) of the collected data is ever analyzed –Data that may never be analyzed continues to be collected, at a great expense, out of fear that something which may prove important in the future is missed –Growth rates of data precludes traditional “manual intensive” approach

University of Illinois at Urbana-Champaign 6 Why Do We Need Data Mining? As databases grow, the ability to support the decision support process using traditional query languages become infeasible –Many queries of interest are difficult to state in a query language ( Query formulation problem) –“find all cases of fraud” –“find all individuals likely to buy a FORD Expedition” –“find all documents that are similar to this customers problem”

University of Illinois at Urbana-Champaign 7 Knowledge Discovery Process Data Mining: is a step in the knowledge discovery process consisting of particular algorithms (methods) that under some acceptable objective, produces a particular enumeration of patterns (models) over the data. Knowledge Discovery Process: is the process of using data mining methods (algorithms) to extract (identify) what is deemed knowledge according to the specifications of measures and thresholds, using a database along with any necessary preprocessing or transformations.

University of Illinois at Urbana-Champaign 8 Data Mining: A KDD Process

University of Illinois at Urbana-Champaign 9 Knowledge Discovery Process Application Domain First and foremost you must understand your data and your business. It may be that you wish to increase the response from a direct mail campaign. So do you want to build a model to: – increase the response rate – increase the value of the response Depending on your specific goal, the model you choose may be different.

University of Illinois at Urbana-Champaign 10 Knowledge Discovery - Selecting Data The task of selecting data begins with deciding what data is needed to solve the problem. Issues: –Database incompatibility –Data may be in an obscure form –Data is incomplete

University of Illinois at Urbana-Champaign 11 Knowledge Discovery - Preparing The Data Data may have to be loaded from legacy systems or external sources, stored, cleaned, and validated. Issues: –Data may be in a format incompatible for its end use –Data may have many missing, incomplete, or erroneous values –Field descriptions may be unclear, confusing, or have different meanings depending on the source –Data may be stale

University of Illinois at Urbana-Champaign 12 Knowledge Discovery - Transforming Data Considerable planning and knowledge of your data should go into the transformation decision. Data transformation are at the heart of developing a sound model.

University of Illinois at Urbana-Champaign 13 Knowledge Discovery - Types of Transformations Feature construction –applying a mathematical formula to existing data features Feature subset selection –removing columns which are not pertinent or redundant, or contain uninteresting predictors Aggregating data –grouping features together and finding sums, maximums, minimums, or averages Bin the data –breaking up continuous ranges into discrete segments

University of Illinois at Urbana-Champaign 14 Knowledge Discovery - Data Mining The process of building models differ among: –Supervised learning (classification, regression, time series problems) –Unsupervised learning (database segmentation) –Pattern identification and description (link analysis) Once you have decided on the model type, you must choose an method for building the model (decision tree, neural net, K-nearest neighbor ), then the algorithm (backpropagation)

University of Illinois at Urbana-Champaign 15 Knowledge Discovery - Analyze and Deploy Once the model is built, its implications must be understood. Graphical representations of relationships between independent and dependent variables may be necessary. Also, attention should be focused on important aspects of the model such as outliers or value. Model deployment may mean writing a new application, embedding into an existing system, or applying it to an existing data set. Model monitoring should be established.

University of Illinois at Urbana-Champaign 16 Required Effort for Each KDD Step

University of Illinois at Urbana-Champaign 17 What Data Mining Will Not Do What Data Mining Will Not Do Automatically find answers to questions you do not ask Constantly monitor your database for new and interesting relationships Eliminate the need to understand your business and your data Remove the need for good data analysis skills

University of Illinois at Urbana-Champaign 18 Data Mining Models and Methods

University of Illinois at Urbana-Champaign 19 Deviation Detection identify outliers in a dataset typical techniques - probability distribution contrasts, supervised/unsupervised learning hypothetical example: Point-of-sale fraud detection

University of Illinois at Urbana-Champaign 20 Fraud and Inappropriate Practice Prevention Background: Through regular review, HR has developed a collaborative relationship with its Sales Associates (SAs). Semi-annual meetings allow review of the SAs practices with similar SAs across the country. Goal: The approach is aimed at modifying SAs behavior to promote better service rather than at investigating and prosecuting SAs, although both strategies are employed.

University of Illinois at Urbana-Champaign 21 Fraud and Inappropriate Practice Prevention Business Objective: The focus of this project was on the recent and steady 12% annual rise in overrides. The overall business objective of the project was to find a way to ensure that the overrides were appropriate with out negatively affecting service provided by the SAs.

University of Illinois at Urbana-Champaign 22 Fraud and Inappropriate Practice Prevention Approach: To identify potential fraudulent overrides or overrides arising from inappropriate practices. To develop general profiles of the SAs practices in order to compare practice behavior of individual SAs.

University of Illinois at Urbana-Champaign 23 Fraud and Inappropriate Practice Prevention

University of Illinois at Urbana-Champaign 24 Database Segmentation regroup datasets into clusters that share common characteristics typical technique - unsupervised leaning (SOMs, K-Means) hypothetical example: Cluster all similar regimes (financial, free form text)

University of Illinois at Urbana-Champaign 25 Self Organizing Maps Example - Text Clustering This data is considered to be confidential and proprietary to Caterpillar and may only be used with prior written consent from Caterpillar.

University of Illinois at Urbana-Champaign 26 Predictive Modeling past data predicts future response typical technique - supervised learning (Artificial Neural Networks, Decision Trees, Naïve Bayesian) hypothetical example (classification): Who is most likely to respond to a direct mailing hypothetical example (predication): How will the German Stock Price Index perform in the next 3, 5, 7, days

University of Illinois at Urbana-Champaign 27 Predictive Modeling - Prior Probabilities

University of Illinois at Urbana-Champaign 28 Predictive Modeling - Posterior Probabilities

University of Illinois at Urbana-Champaign 29 Link Analysis relationships between records/attributes in datasets typical techniques - rule association, sequence discovery hypothetical example (rule association): When people buy a hammer they also buy nails 50% of the time hypothetical example ( sequence discovery): When people buy a hammer they also buy nails within the next 3 months 18% of the time, and within the subsequent 3 months 12% of the time

University of Illinois at Urbana-Champaign 30 Link Analysis (Rule Association) Given a database, find all associations of the form: IF THEN Prevalence = frequency of the LHS and RHS occurring together Predictability = fraction of the RHS out of all items with the LHS

University of Illinois at Urbana-Champaign 31 Rule Association - Basket Analysis

University of Illinois at Urbana-Champaign 32 Association Rules - Basket Analysis implies –prevalence = 4.99%, predictability = 22.89% implies –prevalence = 0.94%, predictability = 28.14% implies –prevalence = 2.11%, predictability = 38.22% implies –prevalence = 1.36%, predictability = 41.02% implies –prevalence = 1.16%, predictability = 38.01%

University of Illinois at Urbana-Champaign 33 Requirements For Successful Data Mining There is a sponsor for the application. The business case for the application is clearly understood and measurable, and the objectives are likely to be achievable given the resources being applied. The application has a high likelihood of having a significant impact on the business. Business domain knowledge is available. Good quality, relevant data in sufficient quantities is available.

University of Illinois at Urbana-Champaign 34 Requirements For Successful Data Mining The right people – business domain, data management, and data mining experts. People who have “been there and done that” For a first time project the following criteria could be added: The scope of the application is limited. Try to show results within 3-6 months. The data source should be limited to those that are well know, relatively clean and freely accessible.

University of Illinois at Urbana-Champaign 35 Rapid KD Development Environment

University of Illinois at Urbana-Champaign 36 Rapid KDD Development Environment

University of Illinois at Urbana-Champaign 37 Why Information Visualization Gain insight into the contents and complexity of the database being analyzed Vast amounts of under utilized data Time-critical decisions hampered Key information difficult to find Results presentation Reduced perceptual, interpretative, cognitive burden

University of Illinois at Urbana-Champaign 38 Typical Data Abstract corporate data Mostly discrete not continuous Often multi-dimensional Quantitative Text Historical or real-time

University of Illinois at Urbana-Champaign 39 Typical Applications Historical Data Analysis –Marketing Data Mining Analysis –Portfolio Performance Attribution –Fraud/Surveillance Analysis Decision Support –Financial Risk Management –Operations Planning –Military Strategic Planning Typical Applications

University of Illinois at Urbana-Champaign 40 Typical Applications (cont) Monitoring Real-Time Status –Industrial Process Control –Capital Markets Trading Management –Network Monitoring Management Reporting –Financial Reporting –Sales and Marketing Reporting

University of Illinois at Urbana-Champaign 41 Click on me.. I am an animation Marketing Data Mining Analysis

University of Illinois at Urbana-Champaign 42 Risk Management

University of Illinois at Urbana-Champaign 43 Capital Markets Trading Management

University of Illinois at Urbana-Champaign 44 Network Monitoring

University of Illinois at Urbana-Champaign 45 Industrial Process Control

University of Illinois at Urbana-Champaign 46 Crisis Monitoring Ground (Student) ViewAerial/Oracular (Instructor) View Normal IgnitedDestroyed Extinguished Fire Alarm Flooding Color code for compartment status Engulfed

University of Illinois at Urbana-Champaign 47 3D Financial Reporting

University of Illinois at Urbana-Champaign 48 Statistics Visualizer

University of Illinois at Urbana-Champaign 49 Scatter Visualizer

University of Illinois at Urbana-Champaign 50 Splat Visualizer

University of Illinois at Urbana-Champaign 51 Tree Visualizer

University of Illinois at Urbana-Champaign 52 Map Visualizer

University of Illinois at Urbana-Champaign 53 Decision Tree