1 Input and Output Thanks: I. Witten and E. Frank
2 The weather problem Conditions for playing an outdoor game OutlookTemperatureHumidityWindyPlay SunnyHotHighFalseNo SunnyHotHighTrueNo OvercastHotHighFalseYes RainyMildNormalFalseYes …………… If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes
3 Classification vs. Association rules Classification rule: predicts value of pre- specified attribute (the classification of an example) Associations rule: predicts value of arbitrary attribute or combination of attributes If outlook = sunny and humidity = high then play = no If temperature = cool then humidity = normal If humidity = normal and windy = false then play = yes If outlook = sunny and play = no then humidity = high If windy = false and play = no then outlook = sunny and humidity = high
4 Weather data with mixed attributes Two attributes with numeric values OutlookTemperatureHumidityWindyPlay Sunny85 FalseNo Sunny8090TrueNo Overcast8386FalseYes Rainy7580FalseYes …………… If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = yes If none of the above then play = yes
5 The contact lenses data AgeSpectacle prescriptionAstigmatismTear production rateRecommended lenses YoungMyopeNoReducedNone YoungMyopeNoNormalSoft YoungMyopeYesReducedNone YoungMyopeYesNormalHard YoungHypermetropeNoReducedNone YoungHypermetropeNoNormalSoft YoungHypermetropeYesReducedNone YoungHypermetropeYesNormalhard Pre-presbyopicMyopeNoReducedNone Pre-presbyopicMyopeNoNormalSoft Pre-presbyopicMyopeYesReducedNone Pre-presbyopicMyopeYesNormalHard Pre-presbyopicHypermetropeNoReducedNone Pre-presbyopicHypermetropeNoNormalSoft Pre-presbyopicHypermetropeYesReducedNone Pre-presbyopicHypermetropeYesNormalNone PresbyopicMyopeNoReducedNone PresbyopicMyopeNoNormalNone PresbyopicMyopeYesReducedNone PresbyopicMyopeYesNormalHard PresbyopicHypermetropeNoReducedNone PresbyopicHypermetropeNoNormalSoft PresbyopicHypermetropeYesReducedNone PresbyopicHypermetropeYesNormalNone
6 A complete and correct rule set If tear production rate = reduced then recommendation = none If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age = presbyopic and spectacle prescription = myope and astigmatic = no then recommendation = none If spectacle prescription = hypermetrope and astigmatic = no and tear production rate = normal then recommendation = soft If spectacle prescription = myope and astigmatic = yes and tear production rate = normal then recommendation = hard If age young and astigmatic = yes and tear production rate = normal then recommendation = hard If age = pre-presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none If age = presbyopic and spectacle prescription = hypermetrope and astigmatic = yes then recommendation = none
7 A decision tree for this problem
8 Predicting CPU performance Cycle time (ns) Main memory (Kb) Cache (Kb) ChannelsPerformance MYCTMMINMMAXCACHCHMINCHMAXPRP … PRP = MYCT MMIN MMAX CACH CHMIN CHMAX
9 Data from labor negotiations AttributeType123…40 Duration(Number of years)1232 Wage increase first yearPercentage2%4%4.3%4.5 Wage increase second yearPercentage?5%4.4%4.0 Wage increase third yearPercentage???? Cost of living adjustment{none,tcf,tc}nonetcf?none Working hours per week(Number of hours) Pension{none,ret-allw, empl-cntr}none??? Standby payPercentage?13%?? Shift-work supplementPercentage?5%4%4 Education allowance{yes,no}yes??? Statutory holidays(Number of days) Vacation{below-avg,avg,gen}avggen avg Long-term disability assistance{yes,no}no??yes Dental plan contribution{none,half,full}none?full Bereavement assistance{yes,no}no??yes Health plan contribution{none,half,full}none?fullhalf Acceptability of contract{good,bad}badgood
10 Decision trees for the labor data
11 Instance-based representation Simplest form of learning: rote learning Training instances are searched for instance that most closely resembles new instance The instances themselves represent the knowledge Also called instance-based learning Similarity function defines what’s “learned” Instance-based learning is lazy learning Methods: nearest-neighbor, k-nearest- neighbor, …
12 Learning prototypes/Case Based Reasoning Only those instances involved in a decision need to be stored
13 Representing clusters I Simple 2-D representationVenn diagram Overlapping clusters
14 Representing clusters II a b c d e f g h … Probabilistic assignmentDendrogram NB: dendron is the Greek word for tree