Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.

Slides:



Advertisements
Similar presentations
Data Mining Tools Overview Business Intelligence for Managers.
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining: A Closer Look Chapter Data Mining Strategies.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Chapter 2. Introduction to Data Mining
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Data Mining By Archana Ketkar.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Data Mining: A Closer Look Chapter Data Mining Strategies 2.
Chapter 5 Data mining : A Closer Look.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
CS-470: Data Mining Fall Organizational Details Class Meeting: 4:00-6:45pm, Tuesday, Room SCIT215 Instructor: Dr. Igor Aizenberg Office: Science.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Warehousing.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Dr. Chen, Management Information Systems 1-1 BMIS235 MIS An Introduction to MIS Jason C.H. Chen, Ph.D. Professor of MIS School of Business Gonzaga University.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Dr. Chen, Management Information Systems 1-1 MBUS626-AIE MIS Review Jason Chou-Hong Chen, Ph.D. Professor of MIS Graduate School of Business, Gonzaga University.
Part I Data Mining Fundamentals. Data Mining: A First View Chapter 1.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
DATA MINING By Cecilia Parng CS 157B.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 4 An Excel-based Data Mining Tool (iData Analyzer) Jason C. H. Chen, Ph.D. Professor of MIS.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Warehousing.
Advanced Database Concepts
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 6 The Data Warehouse Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Academic Year 2014 Spring Academic Year 2014 Spring.
An Excel-based Data Mining Tool Chapter The iData Analyzer.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
1 LM 7 Data Warehouse Dr. Lei Li. Learning Objectives Describe the needs for data warehouse Describe the three levels of a data warehouse Explain the.
1 Data Warehousing Data Warehousing. 2 Objectives Definition of terms Definition of terms Reasons for information gap between information needs and availability.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
MIS2502: Data Analytics Advanced Analytics - Introduction
MIS 451 Building Business Intelligence Systems
An Excel-based Data Mining Tool
MIS2502: Data Analytics Introduction to Advanced Analytics
Presentation transcript:

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 1.1 Data Mining: A Definition

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Data Mining: A Definition The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 4 Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned. Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 5 Data Mining Examples A telephone company used a data mining tool to analyze their customer ’ s data warehouse. The data mining tool found about 10,000 supposedly residential customers that were expending over $1,000 monthly in phone bills. After further study, the phone company discovered that they were really small business owners trying to avoid paying business rates *

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 6 Other Data Mining Examples 65% of customers who did not use the credit card in the last six months are 88% likely to cancel their accounts. If age $25,000 then the minimum loan term is 10 years. 82% of customers who bought a new TV 27" or larger are 90% likely to buy an entertainment center within the next 4 weeks.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining What Can Computers Learn?

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 8 Four Levels of Learning Fact –a simple statement of truth Concept –a set of objects, symbols, or events grouped together because they share certain characteristics Principle –is a step-by-step course of action to achieve a goal. We use procedures in our everyday functioning as well as in the solution of difficult problems Procedure –represents the highest level of learning. Principles are general truths or laws that are basic to other truths. Source: Merril and Tennyson, 1977, p.5 of the text N

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 9 Concepts Computers are good at learning concepts. Concepts are the output of a data mining session. Three Concept Views Classical View Probabilistic View Exemplar View

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 10 Three Concept Views Classical View –Attests that all concepts have definite defining properties. Probabilistic View –Concepts are represented by properties that are probable of concept members. Exemplar View –States that a given instance is determined to be an example of a particular concept if the instance is similar enough to a set of one or more known examples of the concepts

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 11 Figure - A hierarchy of data mining strategies Data Mining Strategies Unsupervised Clustering Supervised Learning Market Basket Analysis Classification Estimation Prediction Categorical/discrete (current behavior) Numeric Future outcome (categorical/numeric) No output attributes

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 12 Supervised Learning Two purposes: 1. Build a learner (classification) model using data instances of known origin. –is an induction process 2. Use the model to determine the outcome new instances of unknown origin. –is a deduction process Supervised learning is the process of building classification models using data instances of known origin.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Supervised Learning: A Decision Tree Example

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 14 Decision Tree A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. Table 1.1 – Hypothetical Training Data for Disease Diagnosis

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 15 Figure 1.1 – A decision tree for the data in Table 1.1

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 16 Table 1.2 Data Instances with an Unknown Classification Table 1.1 – Hypothetical Training Data for Disease Diagnosis

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 17 Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy We can translate any decision tree into a set of production rules. They are rules of the form: IF THEN

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 18 Unsupervised Clustering A data mining method that builds models from data without predefined classes (see Table 1.3). Data instances are grouped together based on a similarity scheme defined by the clustering system. With the help of one or several evaluation techniques, it is up to us to decide the meaning of the formed clusters.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 19 Table 1.3 – Acme Investors Incorporated

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 20 Possible Questions 1. Can I develop a general profile of an online investor? If so, what characteristics distinguish online investors from investors that use a broker? 2. Can I determine if a new customer who does not initially open a margin account is likely to do so in the future? 3. Can I build a model able to accurately predict the average number of trades per month for a new investor? 4. What characteristics differentiate female and male investors? 1. What attribute similarities group customers of Acme Investors together? 2. What differences in attribute values segment the customer database? Questions for supervised learning Questions for unsupervised learning

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Is Data Mining Appropriate for My Problem?

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 22 Data Mining or Data Query? Shallow Knowledge –is factual; tools used: DBMS/SQL Multidimensional Knowledge –Is factual; tools used: OLAP Hidden Knowledge –Represents patterns or regularities in data that cannot be easily found, tools used: data mining Deep Knowledge –Knowledge stored in a database that can only be found if we are given some direction.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 23 Data Mining vs. Data Query: An Example Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Expert Systems or Data Mining?

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 25 Expert System and Knowledge Engineer An expert system is a computer program that emulates the problem-solving skills of one or more human experts. A knowledge engineer is a person trained to interact with an expert in order to capture their knowledge.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 26

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining A Simple Data Mining Process Model

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 28 Figure A simples data mining process model Operational Database Data Warehouse SQL Queries Data Mining Interpretation & Evaluation Result Application

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 29 Characteristics of Data Warehouse Data Warehouse: – Definitions: a subject-oriented, integrated, time- variant, non-updatable collection of data used in support of management decision-making processes –Subject-oriented: e.g. customers, patients, students, products –Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources –Time-variant: Can study trends and changes –Nonupdatable: Read-only, periodically refreshed Data Mart: –A data warehouse that is limited in scope

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 30 A four-step process for performing a data mining session 1. Assembling the data –Operational database (relational databases and flat files) vs. data warehouse 2. Mining the Data (Giving the data to a mining tool) –Instances for building the model or testing the model 3. Interpreting the results 4. Result application

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Data Mining Applications (p.24) Fraud Detection Health care Business and finance Scientific applications Sports and gaming

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 32 Customer Intrinsic Value A B C