Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.

Slides:



Advertisements
Similar presentations
Data warehouse example
Advertisements

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining – Intro.
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Business Intelligence: Essential of Business
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5: Data Mining for Business Intelligence
Data Mining Chapter 26.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Business Intelligence
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Data Mining – Day 2 Fabiano Dalpiaz Department of Information and Communication Technology University of Trento - Italy
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Basic Data Mining Technique
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Academic Year 2014 Spring Academic Year 2014 Spring.
Lecture 10 (big data) Knowledge Induction using association rule and decision tree (Understanding customer behavior Using data mining skills)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Topic 3: Cluster Analysis
Data Mining: Concepts and Techniques Course Outline
Self organizing networks
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
A Comparison of Capabilities of Data Mining Tools
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Topic 5: Cluster Analysis
Presentation transcript:

Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang

Outline Introduction Introduction Mining Process Mining Process Main Functionalities of Intelligent Miner Main Functionalities of Intelligent Miner Other Data Mining Products Other Data Mining Products Data Mining and Privacy Data Mining and Privacy Summary Summary References References

What is Data Mining Data mining: discovering interesting patterns from large amounts of data Data mining: discovering interesting patterns from large amounts of data –Knowledge discovery (mining) in databases (KDD), data/pattern analysis, information harvesting, business intelligence, etc.

Evolution of Database Technology 1960s: 1960s: –Data collection, database creation 1970s: 1970s: –Relational data model, relational DBMS implementation 1980s ~ present: 1980s ~ present: –RDBMS, advanced data models 1990s—2000s: –Data mining and data warehousing, multimedia databases, and Web databases

Data Mining VS. Database Query Database Database Data Mining Data Mining – Find all customers who have purchased milk – Find all items which are frequently purchased with milk. (association rules) – Identify customers who have purchased more than $10,000 in the last month. – Identify customers with similar buying habits. (Clustering)

Data Mining Process (KDD) Data Cleaning Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation J. Han. and M. Kamber. Data Mining: Concepts and Techniques,2001

About DB2 Intelligent Miner DB2 Intelligent Miner for Data “focused on the large-scale mining, such as large volumes of data, parallel data mining on Windows NT, Sun Solaris, and OS/390” – IBM DB2 Intelligent Miner for Data “focused on the large-scale mining, such as large volumes of data, parallel data mining on Windows NT, Sun Solaris, and OS/390” – IBM

Main Functionalities Cluster analysis Cluster analysis –Group the data that share similar trends and patterns Classification Classification –Predict the outcome based on historical data Association analysis Association analysis –Finding frequent patterns.

This follows an example from Quinlan’s ID3 Classification

Classification

Classification

Association –Association Rule: identifies relationships –Example “ 30% customers buy shirts in all the transactions, 60% of these customers “ 30% customers buy shirts in all the transactions, 60% of these customers will also by a tie” will also by a tie” Confidence factor is 60% Confidence factor is 60% Support – if buying shirt and tie together is observed in 12% of all transactions, then the support is thus 12% Support – if buying shirt and tie together is observed in 12% of all transactions, then the support is thus 12% Lift = 60% / 30%=2 Lift = 60% / 30%=2

Association Support Confidence Type Lift Rule Body Rule Head (%) (%) [203] + [1207] => [1716] [203] + [1719]=> [1716] [202] + [802]=> [1716] [203] + [802]=> [1716] [203] + [705]=> [1716] [202] + [1718]=> [1716] [711] + [203]=> [710] [202] + [1702]=> [1703] [202] + [1207]=> [1703] [201] + [711]=> [710] [201] + [1702]=> [1703]

Data Mining Products more than 50 commercial data mining tools more than 50 commercial data mining tools Wide range of pricing Wide range of pricing –SAS Institute’s Enterprise Miner ~ $80k –SPSS Inc. Clementine ~ 75K –IBM Intelligent Miner ~ $60k –Desktop products start at few hundred dollars

Data Mining Products AlgorithmIBMSASSPSS Neural Network √√√ Decision Tree √√√ Clustering√√ Association√√ Nearest Neighbour √ Kohonen Self- Organizing Map √√ Data Ming Product Comparison on Algorithm

Data Mining & Privacy Release limited subset of data Release limited subset of data –Hide attributes that potentially related to personal information Release Encrypted Data Release Encrypted Data Audit to detect misuse of Data Audit to detect misuse of Data Set up Data Mining Controller Set up Data Mining Controller

Summary Introduction to Data Mining Introduction to Data Mining A KDD Data Mining Process A KDD Data Mining Process Functionalities of Intelligent Miner Functionalities of Intelligent Miner Commercial Data Mining Tools Commercial Data Mining Tools Data Mining & Privacy Data Mining & Privacy

References Angoss Whitepaper: Retrieved on Oct26th, Retrieved on Oct26th,2003http:// Retrieved on Oct26th,2003http:// Retrieved on Oct26th,2003 C. Clifton. & D. Marks Security and Privacy Implications of Data Ming.1996 D.W. Abbott, I. P. Matkovsky & J. F. Elder IV. An Evaluation of High-end Data Mining Tools Elder Research. Retrieved on Oct28th, IBM. BD2 Intelligent Mine. Retrieved on Oct26th,2003 Retrieved on Oct26th,2003 J. F. Elder & D. W. Abbott. August, 1988 A comparison of Leading Data Mining Tools J. Han. and M. Kamber. Data Mining: Concepts and Techniques, Retrieved on Nov 10th, Retrieved on Nov 10th,2003 Robert Grossman Retrieved on Oct20th, SPSS. Retrieved on Nov12th,2003 SPSS. Retrieved on Nov12th,2003http://

Evolution of Database Technology 1960s: 1960s: –Data collection, database creation, and network DBMS 1970s: 1970s: –Relational data model, relational DBMS implementation 1980s: 1980s: –RDBMS, advanced data models 1990s—2000s: –Data mining and data warehousing, multimedia databases, and Web databases

Data Mining: On What Kind of Data? Data Sources Data Sources –Relational database –Data warehouses –Transactional databases –WWW Data types Data types –Audio –Image –Text

Output: A Decision Tree for “buys_computer” age? overcast student?credit rating? noyes fair excellent <=30 >40 no yes

Neural network kk - f weighted sum Input vector x output y Activation function weight vector w  w0w0 w1w1 wnwn x0x0 x1x1 xnxn

Neural network

Applications of Clustering Pattern Recognition Pattern Recognition Image Processing Image Processing Economic Science (especially market research) Economic Science (especially market research) WWW WWW –Document classification –Cluster Weblog data to discover groups of similar access patterns

Data Mining & Privacy Data Mining Tool Mining Controller Data warehouse

Examples of Clustering Applications Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Insurance: Identifying groups of motor insurance policy holders with a high average claim cost Insurance: Identifying groups of motor insurance policy holders with a high average claim cost City-planning: Identifying groups of houses according to their house type, value, and geographical location City-planning: Identifying groups of houses according to their house type, value, and geographical location Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults

Association Association and pattern analysis – Applications: Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. – Examples. buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%] major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%]

Data Mining: On What Kind of Data? Relational databases Relational databases Data warehouses Data warehouses Transactional databases Transactional databases Advanced DB and information repositories Advanced DB and information repositories –Object-oriented and object-relational databases –Text databases and multimedia databases –Heterogeneous and legacy databases –WWW

Steps of a KDD Process Learning the application domain:Learning the application domain: –relevant prior knowledge and goals of application Creating a target data set: data selection Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation: Data reduction and transformation: –Find useful features, dimensionality/variable reduction, invariant representation. Choosing functions of data mining Choosing functions of data mining – summarization, classification, regression, association, clustering. Choosing the mining algorithm(s) Choosing the mining algorithm(s) Data mining: search for patterns of interest Data mining: search for patterns of interest Pattern evaluation and knowledge presentation Pattern evaluation and knowledge presentation –visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge Use of discovered knowledge

Strength and Weakness Strength –Algorithm breadth –Graphical output –Available for PC and mainframe environment Weakness –No automation –Data has to reside in IBM’s database system