Data Mining CSCI 307 Spring, 2019

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Decision Tree Learning
1 Input and Output Thanks: I. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
Lazy Associative Classification By Adriano Veloso,Wagner Meira Jr., Mohammad J. Zaki Presented by: Fariba Mahdavifard Department of Computing Science University.
Machine Learning: finding patterns. 2 Outline  Machine learning and Classification  Examples  *Learning as Search  Bias  Weka.
Fall 2004Data Mining1 IE 483/583 Knowledge Discovery and Data Mining Dr. Siggi Olafsson Fall 2003.
Chapter 6 Decision Trees
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Data Mining – Output: Knowledge Representation
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Data Mining Practical Machine Learning Tools and Techniques Chapter 1: What’s it all about? Rodney Nielsen Many of these slides were adapted from: I. H.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Figure 1.1 Rules for the contact lens data.. Figure 1.2 Decision tree for the contact lens data.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Data Mining Practical Machine Learning Tools and Techniques Chapter 3: Output: Knowledge Representation Rodney Nielsen Many of these slides were adapted.
1 CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining and Decision Support
Data Science What’s it all about? Rodney Nielsen Associate Professor Director Human Intelligence and Language Technologies Lab Many of these slides were.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 1 of Data Mining by I. H. Witten and E. Frank.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Output: Knowledge Representation WFH: Data Mining,
Machine Learning Reading: Chapter Classification Learning Input: a set of attributes and values Output: discrete valued function Learning a continuous.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques
Data Mining Practical Machine Learning Tools and Techniques
Data Mining Introduction to Classification using Linear Classifiers
Data Mining Practical Machine Learning Tools and Techniques
DECISION TREES An internal node represents a test on an attribute.
Chapter 18 From Data to Knowledge
Data Science Algorithms: The Basic Methods
Classification Algorithms
CSE543: Machine Learning Lecture 2: August 6, 2014
School of Computer Science & Engineering
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Erich Smith Coleman Platt
Discriminant Analysis
Data Science Algorithms: The Basic Methods
CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Machine Learning Dr. Mohamed Farouk.
Decision Tree Saed Sayad 9/21/2018.
Figure 1.1 Rules for the contact lens data.
2018학년도 하계방학 ZEPPELINX, Inc.인턴십 모집
Machine Learning Techniques for Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Clustering.
Classification Algorithms
CSCI N317 Computation for Scientific Applications Unit Weka
A task of induction to find patterns
Data Mining CSCI 307 Spring, 2019
Data Mining CSCI 307, Spring 2019 Lecture 15
Data Mining CSCI 307, Spring 2019 Lecture 21
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 7
Data Mining CSCI 307, Spring 2019 Lecture 18
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Data Mining CSCI 307 Spring, 2019 Lecture 2 Describing Patterns Simple Examples

Data versus Information Society produces huge amounts of data Sources: business, science, medicine, economics, geography, environment, sports, … Potentially valuable resource Raw data is useless: need techniques to automatically extract information from it Data: recorded facts Information: patterns underlying the data Be careful not to create patterns from random noise: https://xkcd.com/2101/

Information is Crucial Example 1: in vitro fertilization Given: embryos described by 60 features Problem: select embryos that will survive Data: historical records of embryos and outcomes Example 2: cow culling Given: cows described by 700 features Problem: select cows to cull Data: historical records and farmers’ decisions

Data Mining Extracting implicit, previously unknown, potentially useful information from data Needed: programs that detect patterns and regularities in the data Strong patterns ==> good predictions Problem 1: most patterns are not interesting Problem 2: patterns may be inexact (or spurious) Problem 3: data may be garbled or missing

Black Box vs. Structural Descriptions Machine learning can produce different types of patterns: Black Box descriptions: Can be used to predict outcome in new situation Are opaque as to how the prediction is made Are not useful for examining how they make predictions. Data Input BLACK BOX Output e.g. Classification

Black Box vs. Structural Descriptions Represent patterns explicitly (e.g. by a set of rules or a decision tree). Can be used to predict outcome in new situation Can be used to understand and explain how prediction is derived (may be even more important) Methods originate from artificial intelligence, statistics, and research on databases

Structural Descriptions Example: if-then rules (from contact-lens data) If tear production rate = reduced then recommendation = none Otherwise, if age = young and astigmatic = no then recommendation = soft Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Myope No Reduced None Hypermetrope Normal Soft Pre-presbyopic Presbyopic Yes Hard …

The Weather Problem: A simple example Conditions for playing a certain game Outlook Temperature Humidity Windy Play Sunny Hot High False No True Overcast Yes Rainy Mild Normal … If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes These rules must be checked in order, otherwise they will be classified incorrectly.

Classification versus Association Rules Classification rule: predicts value of a given attribute (the classification of an example) If outlook = sunny and humidity = high then play = no Association rule: predicts value of arbitrary attribute (or combination) If temperature = cool then humidity = normal If humidity = normal and windy = false then play = yes If outlook = sunny and play = no then humidity = high If windy = false and play = no then outlook = sunny and humidity = high

Weather Data with Mixed Attributes Some attributes have numeric values Outlook Temperature Humidity Windy Play Sunny 85 False No 80 90 True Overcast 83 86 Yes Rainy 75 … If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true If outlook = overcast then play = yes If humidity < 85 If none of the above

The contact lenses data Age Spectacle prescription Astigmatism Tear production rate Recommended lenses Young Myope No Reduced None Normal Soft Yes Hard Hypermetrope hard Pre-presbyopic Presbyopic

Contact Lens Data is Complete Attributes: Age: young, pre-presbyopic, presbyopic Prescription: Myope, Hypermetrope Astigmatism: Yes or No Tear Production: Reduced, Normal All possible combinations of attribute values are represented. Question: How many instances is that? Note: Real input sets are not usually complete. They may have missing values, or not all combinations are present.

A Complete and Correct Rule Set If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope In real life, the classifier may not always produce the correct class. This is a large set of rules. Would a smaller set be better?

A Decision Tree for this Same Problem Pruned decision tree produced by J48. More on this in Chapter 3. We might use a tree to determine the outcome. Notice it is less cumbersome.

Age, Prescription, Astig, Tear Rate, Recommend 8 Young, Hypermetrope, Yes, Normal, Hard 18 Presbyopic, Myope, No, Normal, None Here both the attributes and the outcome are nominal (aka categorical)—preset, finite set of possibilities.

Classifying Iris Flowers This famous data set’s rules are cumbersome and there might be a better way to classify. Note here that there are numeric attributes, but the outcome is a category. setosa Sepal-length Sepal-width Petal-length Petal-width Type 1 5.1 3.5 1.4 0.2 Iris setosa 2 4.9 3.0 … 51 7.0 3.2 4.7 Iris versicolor 52 6.4 4.5 1.5 101 6.3 3.3 6.0 2.5 Iris virginica 102 5.8 2.7 1.9 versicolor virginica If petal-length < 2.45 then Iris-setosa If sepal-width < 2.10 then Iris-versicolor If sepal-width < 2.45 and petal-length < 4.55 then Iris-versicolor ...

Predicting CPU Performance Example: 209 different computer configurations are the instances Cycle time (ns) Main memory (Kb) Cache (Kb) Channels Performance MYCT MMIN MMAX CACH CHMIN CHMAX PRP 125 256 6000 16 128 198 29 8000 32000 32 8 269 480 512 67 1000 4000 45 1 2 …. 208 209 In this case both the attributes and the outcome are numeric. Linear regression function PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX + 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX More on how to do this in Chapter 4.

Data from Labor Negotiations Here the attributes are in rows, instead of the usual columns. In this case the instances are in columns. Attribute Type 1 2 3 … 40 Duration (Number of years) Wage increase first year Percentage 2% 4% 4.3% 4.5 Wage increase second year ? 5% 4.4% 4.0 Wage increase third year Cost of living adjustment {none,tcf,tc} none tcf Working hours per week (Number of hours) 28 35 38 Pension {none,ret-allw, empl-cntr} Standby pay 13% Shift-work supplement 4 Education allowance {yes,no} yes Statutory holidays (Number of days) 11 15 12 Vacation {below-avg,avg,gen} avg gen Long-term disability assistance no Dental plan contribution {none,half,full} full Bereavement assistance Health plan contribution half Acceptability of contract {good,bad} bad good Here's the class

Decision Trees for the Labor Data This decision tree is simple, but does not always predict correctly. The tree makes intuitive sense – bigger wage increase and more holidays are usually positive for an employee.

Decision Trees for the Labor Data This decision tree is more accurate, but may not be as intuitive. It likely reflects compromises made so that a contract is accepted by both the employer and employee

Decision Trees for the Labor Data This tree is simple and approximate, does not classify exactly. Decision Trees for the Labor Data The full tree is more accurate on training data, BUT may not actually work better in real life. It may be "overfitted." The simple tree above is a pruned version of the one to the right.

An early Machine Learning success story! Soybean Classification An early Machine Learning success story! Normal Diaporthe stem canker 3 1 9 … Root Diagnosis Condition Abnormal Yes 2 Condition Stem lodging … St e m ? Abnormal 5 Condition of fruit pods Fruit spots Condition Leaf spot size Leaf 4 Fruit Normal Absent Condition Mold growth Seed July Above normal 7 Time of occurrence Precipitation Environment Sample value Number of values Attribute A domain expert produced rules (72% correct) that did not perform as well as computer generated rules (97.5% correct).

The Role of Domain Knowledge If leaf condition is normal and stem condition is abnormal and stem cankers is below soil line and canker lesion color is brown then diagnosis is rhizoctonia root rot If leaf malformation is absent Is “leaf condition is normal” the same as “leaf malformation is absent”? In this domain, "malformation is absent" is a special case of "leaf condition is normal." It only comes into play when leaf condition is not normal.

What about real applications? So far….examples of toy problems… Examples of small research problems. We will use them a lot because it makes it easier to understand the algorithms and techniques. What about real applications? Use data mining to: Make a decision Do a task faster than an expert Let the expert make the scheme better etc.