Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor.

Similar presentations


Presentation on theme: "Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor."— Presentation transcript:

1 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA 99223 chen@jepson.gonzaga.edu

2 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 1.1 Data Mining: A Definition

3 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 3 1.1 Data Mining: A Definition The process of employing one or more computer learning techniques to automatically analyze and extract knowledge from data.

4 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 4 Induction-based Learning The process of forming general concept definitions by observing specific examples of concepts to be learned. Knowledge Discovery in Databases (KDD) The application of the scientific method to data mining. Data mining is one step of the KDD process.

5 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 5 Data Mining Examples A telephone company used a data mining tool to analyze their customer ’ s data warehouse. The data mining tool found about 10,000 supposedly residential customers that were expending over $1,000 monthly in phone bills. After further study, the phone company discovered that they were really small business owners trying to avoid paying business rates *

6 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 6 Other Data Mining Examples 65% of customers who did not use the credit card in the last six months are 88% likely to cancel their accounts. If age $25,000 then the minimum loan term is 10 years. 82% of customers who bought a new TV 27" or larger are 90% likely to buy an entertainment center within the next 4 weeks.

7 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 7 1.2 What Can Computers Learn?

8 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 8 Four Levels of Learning Fact –a simple statement of truth Concept –a set of objects, symbols, or events grouped together because they share certain characteristics Principle –is a step-by-step course of action to achieve a goal. We use procedures in our everyday functioning as well as in the solution of difficult problems Procedure –represents the highest level of learning. Principles are general truths or laws that are basic to other truths. Source: Merril and Tennyson, 1977, p.5 of the text N

9 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 9 Concepts Computers are good at learning concepts. Concepts are the output of a data mining session. Three Concept Views Classical View Probabilistic View Exemplar View

10 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 10 Three Concept Views Classical View –Attests that all concepts have definite defining properties. Probabilistic View –Concepts are represented by properties that are probable of concept members. Exemplar View –States that a given instance is determined to be an example of a particular concept if the instance is similar enough to a set of one or more known examples of the concepts

11 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 11 Figure - A hierarchy of data mining strategies Data Mining Strategies Unsupervised Clustering Supervised Learning Market Basket Analysis Classification Estimation Prediction Categorical/discrete (current behavior) Numeric Future outcome (categorical/numeric) No output attributes

12 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 12 Supervised Learning Two purposes: 1. Build a learner (classification) model using data instances of known origin. –is an induction process 2. Use the model to determine the outcome new instances of unknown origin. –is a deduction process Supervised learning is the process of building classification models using data instances of known origin.

13 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Supervised Learning: A Decision Tree Example

14 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 14 Decision Tree A tree structure where non-terminal nodes represent tests on one or more attributes and terminal nodes reflect decision outcomes. Table 1.1 – Hypothetical Training Data for Disease Diagnosis

15 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 15 Figure 1.1 – A decision tree for the data in Table 1.1

16 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 16 Table 1.2 Data Instances with an Unknown Classification Table 1.1 – Hypothetical Training Data for Disease Diagnosis

17 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 17 Production Rules IF Swollen Glands = Yes THEN Diagnosis = Strep Throat IF Swollen Glands = No & Fever = Yes THEN Diagnosis = Cold IF Swollen Glands = No & Fever = No THEN Diagnosis = Allergy We can translate any decision tree into a set of production rules. They are rules of the form: IF THEN

18 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 18 Unsupervised Clustering A data mining method that builds models from data without predefined classes (see Table 1.3). Data instances are grouped together based on a similarity scheme defined by the clustering system. With the help of one or several evaluation techniques, it is up to us to decide the meaning of the formed clusters.

19 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 19 Table 1.3 – Acme Investors Incorporated

20 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 20 Possible Questions 1. Can I develop a general profile of an online investor? If so, what characteristics distinguish online investors from investors that use a broker? 2. Can I determine if a new customer who does not initially open a margin account is likely to do so in the future? 3. Can I build a model able to accurately predict the average number of trades per month for a new investor? 4. What characteristics differentiate female and male investors? 1. What attribute similarities group customers of Acme Investors together? 2. What differences in attribute values segment the customer database? Questions for supervised learning Questions for unsupervised learning

21 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 21 1.3 Is Data Mining Appropriate for My Problem?

22 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 22 Data Mining or Data Query? Shallow Knowledge –is factual; tools used: DBMS/SQL Multidimensional Knowledge –Is factual; tools used: OLAP Hidden Knowledge –Represents patterns or regularities in data that cannot be easily found, tools used: data mining Deep Knowledge –Knowledge stored in a database that can only be found if we are given some direction.

23 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 23 Data Mining vs. Data Query: An Example Use data query if you already almost know what you are looking for. Use data mining to find regularities in data that are not obvious.

24 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 24 1.4 Expert Systems or Data Mining?

25 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 25 Expert System and Knowledge Engineer An expert system is a computer program that emulates the problem-solving skills of one or more human experts. A knowledge engineer is a person trained to interact with an expert in order to capture their knowledge.

26 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 26

27 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 27 1.5 A Simple Data Mining Process Model

28 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 28 Figure 1.3 - A simples data mining process model Operational Database Data Warehouse SQL Queries Data Mining Interpretation & Evaluation Result Application

29 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 29 Characteristics of Data Warehouse Data Warehouse: – Definitions: a subject-oriented, integrated, time- variant, non-updatable collection of data used in support of management decision-making processes –Subject-oriented: e.g. customers, patients, students, products –Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources –Time-variant: Can study trends and changes –Nonupdatable: Read-only, periodically refreshed Data Mart: –A data warehouse that is limited in scope

30 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 30 A four-step process for performing a data mining session 1. Assembling the data –Operational database (relational databases and flat files) vs. data warehouse 2. Mining the Data (Giving the data to a mining tool) –Instances for building the model or testing the model 3. Interpreting the results 4. Result application

31 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 31 1.7 Data Mining Applications (p.24) Fraud Detection Health care Business and finance Scientific applications Sports and gaming

32 Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining 32 Customer Intrinsic Value A B C


Download ppt "Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Part I Data Mining Fundamentals Chapter 1 Data Mining: A First View Jason C. H. Chen, Ph.D. Professor."

Similar presentations


Ads by Google