1 CSE 711: DATA MINING Sargur N. Srihari Phone: 645-6164, ext. 113.

Slides:



Advertisements
Similar presentations
1 Input and Output Thanks: I. Witten and E. Frank.
Advertisements

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Machine Learning: finding patterns. 2 Outline  Machine learning and Classification  Examples  *Learning as Search  Bias  Weka.
DATA MINING -ASSOCIATION RULES-
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Fall 2004Data Mining1 IE 483/583 Knowledge Discovery and Data Mining Dr. Siggi Olafsson Fall 2003.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Chapter 6 Decision Trees
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Data Mining: A Closer Look
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
University of Toronto 8/30/20151 Data Mining The Art and Science of Obtaining Knowledge from Data Dr. Saed Sayad.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Decision Trees.
COMP3503 Intro to Inductive Modeling
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
DATA MINING Using Association Rules by Andrew Williamson.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
1 Data Mining Chapter 34 in textbook + Chapter 4 in DATA MINING by P. Adriaans and D. Zantinge.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining Practical Machine Learning Tools and Techniques
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
School of Computer Science & Engineering
Prepared by: Mahmoud Rafeek Al-Farra
Introduction to Data Mining
CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Prepared by: Mahmoud Rafeek Al-Farra
Supporting End-User Access
Data Mining: Introduction
Data Mining CSCI 307 Spring, 2019
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

1 CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113

2 CSE 711 Texts Required Text 1. Witten, I. H., and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, Recommended Texts 1. Adriaans, P., and D. Zantinge, Data Mining, Addison- Wesley,1998.

3 CSE 711 Texts 2. Groth, R., Data Mining: A Hands-on Approach for Business Professionals, Prentice-Hall PTR, Kennedy, R., Y. Lee, et al., Solving Data Mining Problems through Pattern Recognition, Prentice-Hall PTR, Weiss, S., and N. Indurkhya, Predictive Data Mining: A Practical Guide, Morgan Kaufmann, 1998.

4 Introduction Challenge: How to manage ever- increasing amounts of information Solution: Data Mining and Knowledge Discovery Databases (KDD)

5 Information as a Production Factor Most international organizations produce more information in a week than many people could read in a lifetime

6 Data Mining Motivation Mechanical production of data need for mechanical consumption of data Large databases = vast amounts of information Difficulty lies in accessing it

7 KDD and Data Mining KDD: Extraction of knowledge from data Official definition: “non-trivial extraction of implicit, previously unknown & potentially useful knowledge from data” Data Mining: Discovery stage of the KDD process

8 Data Mining Process of discovering patterns, automatically or semi-automatically, in large quantities of data Patterns discovered must be useful: meaningful in that they lead to some advantage, usually economic

9 Export systems Statistics Machine learning KDD Database Visualization Figure 1.1 Data mining is a multi-disciplinary field. KDD and Data Mining

10 Data Mining vs. Query Tools SQL: When you know exactly what you are looking for Data Mining: When you only vaguely know what you are looking for

11 Practical Applications KDD more complicated than initially thought 80% preparing data 20% mining data

12 Data Mining Techniques Not so much a single technique More the idea that there is more knowledge hidden in the data than shows itself on the surface

13 Data Mining Techniques Any technique that helps to extract more out of data is useful Query tools Statistical techniques Visualization On-line analytical processing (OLAP) Case-based learning (k-nearest neighbor)

14 Data Mining Techniques Decision trees Association rules Neural networks Genetic algorithms

15 Machine Learning and the Methodology of Science Analysis Observation Prediction Theory Empirical cycle of scientific research

16 Machine Learning... Analysis Limited number of observation Theory ‘All swans are white’ Reality: Infinite number of swans Theory formation

17 Machine Learning... Prediction Single observation Theory “All swans are white” Theory falsification Reality: Infinite number of swans

18 A Kangaroo in Mist a.)b.)c.) d.)e.)f.) Complexity of search spaces

19 Association Rules Definition: Given a set of transactions, where each transaction is a set of items, an association rule is an expression X  Y, where X and Y are sets of an item.

20 Association Rules Intuitive meaning of such a rule: transactions in the database which contain the items in X tend also to contain the items in Y.

21 Association Rules Example: 98% of customers that purchase tires and automotive accessories also buy some automotive services. Here, 98% is called the confidence of the rule. The support of the rule X  Y is the percentage of transactions that contain both X and Y.

22 Association Rules Problem: The problem of mining association rules is to find all rules which satisfy a user-specified minimum support and minimum confidence. Applications include cross-marketing, attached mailing, catalog design, loss leader analysis, add-on sales, store layout and customer segmentation based on buying patterns.

23 Example Data Sets Contact Lens (symbolic) Weather (symbolic data) Weather ( numeric +symbolic) Iris (numeric; outcome:symbolic) CPU Perf.(numeric; outcome:numeric) Labor Negotiations (missing values) Soybean

24 Contact Lens Data

25 Structural Patterns Part of structural description Example is simplistic because all combinations of possible values are represented in table If tear production rate = reducedthen recommendation = none Otherwise, if age = young and astigmatic = no then recommendation = soft

26 Structural Patterns In most learning situations, the set of examples given as input is far from complete Part of the job is to generalize to other, new examples

27 Weather Data

28 Weather Problem This creates 36 possible combinations (3 X 3 X 2 X 2 = 36), of which 14 are present in the set of examples If outlook = sunny and humidity = highthen play = no If outlook = rainy and windy = truethen play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the abovethen play = yes

29 Weather Data with Some Numeric Attributes

30 Classification and Association Rules Classification Rules: rules which predict the classification of the example in terms of whether to play or not If outlook = sunny and humidity = >83, then play = no

31 Classification and Association Rules Association Rules: rules which strongly associate different attribute values Association rules which derive from weather table If temperature = coolthen humidity = normal If humidity = normal and windy = falsethen play = yes If outlook = sunny and play = nothen humidity = high If windy = false and play = nothen outlook = sunny and humidity = high

32 If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age = young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none Rules for Contact Lens Data

33 Decision Tree for Contact Lens Data tear production rate astigmatism spectacle prescription none soft hardnone

34 Iris Data

35 Iris Rules Learned If petal-length <2.45 then Iris-setosa If sepal-width <2.10 then Iris-versicolor If sepal-width < 2.45 and petal-length <4.55 then Iris- versicolor...

36 CPU Performance Data

37 CPU Performance Numerical Prediction: outcome as linear sum of weighted attributes Regression equation: PRP= MYCT CHMAX Regression can discover linear relationships, not non-linear ones

38 Linear Regression Debt Regression Line Income A simple linear regression for the loan data set

39 Labor Negotiations Data

40 Decision Trees for... Wage increase first year Statutory holidays Wage increase first year Bad Good BadGood  2.5 > 2.5 > 10 < 4  10  4

41 … Labor Negotiations Data Wage increase first year Good BadGood  2.5 > 2.5 > 10 < 4  10  4 Working hours per week Statutory holidays Health plan contribution Wage increase first year Bad GoodBad > 36  36 full half none

42 Soy Bean Data

43 Two Example Rules If [leaf condition is normal and stem condition is abnormal and stem cankers is below soil line and canker lesion color is brown] then diagnosis is rhizoctonia root rot If[leaf malformation is absent and stem condition is abnormal and stem cankers is below soil line and canker lesion color is brown] then diagnosis is rhizoctonia root rot

44 Debt Loan Income No loan A simple linear classification boundary for the loan data set; shaded region denotes class “no loan” Classification

45 Clustering Debt Cluster 1Cluster 2 Cluster 3 Income A simple clustering of the loan data set into 3 clusters; note that the original labels are replaced by +’s

46 Non-Linear Classification Debt No Loan Loan Income An example of classification boundaries learned by a non-linear classifier (such as a neural network) for the loan data set

47 Nearest Neighbor Classifier Debt No Loan Loan Income Classification boundaries for a nearest neighbor classifier for the loan data set