Download presentation
Presentation is loading. Please wait.
Published byEvan Mills Modified over 8 years ago
1
Machine Learning in Practice Lecture 4 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
2
What is Sleep Apnea? http://www.snoresnomore.com/images/snorin2.gif
3
What is Sleep Apnea? http://www.snoresnomore.com/images/snorin2.gif
4
Plan for Today Any questions About the second assignment? Announcements Quiz 1 Answer Key on Blackboard Comments about Assignment 1 Kwiatowska Paper Error Analysis Data Cleansing Trees vs Tables Weka helpful hints ARFF format
5
General Points – Assignment I Book Keeping Issues Write your name on the assignment Write the assignment number Format the assignment using MS Doc Please don’t use.docx formats Visualize the tree in graphical fashion Use Right-Click and then take a screen shot Embed the figure in doc file
6
Visualizing the Tree
8
Kwiatowksa Paper
9
Clinical Prediction Rules Example application of machine learning Rules created by medical practitioners based on their experience Can we use machine learning contribute to the accumulation of medical wisdom? ER1 = If BMI > 40 and Age > 65 and Gender = male Then OSA = Yes ER2 = If BMI < 25 and Age < 25 and Gender = female Then OSA = No
10
Methodological Flaw? Human generated rules don’t cover most of the data ER1 ER2
11
Machine Learning Result
12
ML Wins some and Loses Others ER1 ER2
13
Compare results ER1 = If BMI > 40 and Age > 65 and Gender = male Then OSA = Yes ER2 = If BMI < 25 and Age < 25 and Gender = female Then OSA = No Note: the paper says this is the tree for set B.
14
Claims from paper… Learned rules were largely consistent with human generated rules Automatic If BMI > 28.03 and Gender = Male Then OSA = Yes Human If BMI > 40 and Age > 65 and Gender = Male Then OSA = Yes Do you buy their argument?
15
More claims… Is this a contradiction? Automatic If Gender = Male and MP = 2 Then OSA = No Human If BMI > 40 and Age > 65 and Gender = Male Then OSA = Yes What about the relationship between age and MP or BMI and MP?
16
What is the Mallampati classification? http://www.accessmedicine.com/search/searchAMResultImg.aspx? rootterm=mallampati+score&rootID=46310&searchType=1
17
Thought questions Would you trust medical “wisdom” that comes from data mining? What would be your concerns? What would you want to know about how the “wisdom” was learning?
18
Data Cleansing Obvious things you can fix… Inconsistent naming of nominal values Names with or without middle initial Nick name versus real name Typos City or street names may change over time Street names may change depending on the block Inconsistencies in how forms are filled out Address and phone number fields in different countries
19
Data Cleansing Obvious things you can fix… Inconsistent naming of nominal values Names with or without middle initial Nick name versus real name Typos City or street names may change over time Street names may change depending on the block Inconsistencies in how forms are filled out Address and phone number fields in different countries
20
Data Cleansing Obvious things you can fix… Inconsistent naming of nominal values Names with or without middle initial Nick name versus real name Typos City or street names may change over time Street names may change depending on the block Inconsistencies in how forms are filled out Address and phone number fields in different countries
21
Trees versus Tables
22
Decision Tables vs Decision Trees Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table
23
Decision Tables vs Decision Trees Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table
24
Decision Tables vs Decision Trees Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table
25
Decision Tables vs Decision Trees Open World Assumption Only examine some attributes in particular contexts Uses majority class within a context to eliminate closed world requirement Divide and Conquer approach Closed World Assumption Every case enumerated No generalization except by limiting number of attributes 1R algorithms produces the simplest possible Decision Table
26
Weka Helpful Hints
27
Use the visualize tab to view 3-way interactions
28
Click in one of the boxes to zoom in
29
Use the visualize tab to view 3-way interactions
30
Weka Data Structures
31
ARFF format
32
Types of Attributes Numeric (continuous) @attribute temperature numeric Real numbers or integers Can be compared (less than, greater than, equality, inequality) Some algorithms treat numeric scales as ratios or look at “distances” Some methods normalize numeric scales Some machine learning algorithms treat numbers as nominal values
33
Types of Attributes Nominal (categorical) @attribute outlook {sunny, overcast, rainy} Finite number of pre-specified values Values are just labels (the actual label is not meaningful to the algorithms) Values are not ordered and cannot be compared except for equality/inequality
34
Types of Attributes Strings (just like nominal, makes troubleshooting text processing more convenient) @attribute description string Value can be any string in quotes “Look, Mom! No hands!” Can be converted to a vector of numeric attributes, each representing one word
35
Types of Attributes Date (numeric) @attribute today date ‘YYYY-MM-dd-THH:mm:ss’ 2006-01-24-T12:00:00 Specified as strings but then converted to numbers when file is read
36
Reasoning About Time
37
Not Bad Performance with a Simple Split
38
Threshold is Off
39
Ordinal Values Weka technically does not have ordinal attributes But you can simulate them with “temperature coding”! Try to represent “If X less than or equal to.35”?.2.25.28.31.35.45.47.52.6.63 ABCD A A or B A or B or C A or B or C or D
40
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.