Machine Learning CUNY Graduate Center Lecture 1: Introduction.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
DECISION TREES. Decision trees  One possible representation for hypotheses.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
An Overview of Machine Learning
What is Statistical Modeling
Methods in Computational Linguistics II Queens College Lecture 1: Introduction.
Chapter 2: Pattern Recognition
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Basic Data Mining Techniques
Lecture 5 (Classification with Decision Trees)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Three kinds of learning
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Today Logistic Regression Decision Trees Redux Graphical Models
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Chapter 5 Data mining : A Closer Look.
Introduction to machine learning
Learning Chapter 18 and Parts of Chapter 20
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Machine Learning Queens College Lecture 1: Introduction.
Machine Learning Queens College Lecture 3: Probability and Statistics.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Chapter 9 – Classification and Regression Trees
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Machine Learning Queens College Lecture 2: Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 E. Fatemizadeh Statistical Pattern Recognition.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
M Machine Learning F# and Accord.net.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Lecture Notes for Chapter 4 Introduction to Data Mining
Logistic Regression William Cohen.
Data Mining and Decision Support
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
CSE 4705 Artificial Intelligence
Machine Learning for Computer Security
Introduction to Machine Learning and Tree Based Methods
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Overview of Supervised Learning
Mathematical Foundations of BME Reza Shadmehr
Overview of Machine Learning
Lecture 6: Introduction to Machine Learning
Text Categorization Berlin Chen 2003 Reference:
CS639: Data Management for Data Science
Presentation transcript:

Machine Learning CUNY Graduate Center Lecture 1: Introduction

Today Welcome Overview of Machine Learning Class Mechanics Syllabus Review Basic Classification Algorithm 1

My research and background Speech –Analysis of Intonation –Segmentation Natural Language Processing –Computational Linguistics Evaluation Measures All of this research relies heavily on Machine Learning 2

You Why are you taking this class? For Ph.D. students: –What is your dissertation on? –Do you expect it to require Machine Learning? What is your background and comfort with –Calculus –Linear Algebra –Probability and Statistics What is your programming language of preference? –C++, java, or python are preferred 3

Machine Learning Automatically identifying patterns in data Automatically making decisions based on data Hypothesis: 4 Data Learning Algorithm Behavior Data Programmer Behavior ≥

Machine Learning in Computer Science 5 Machine Learning Biomedical/Cheme dical Informatics Biomedical/Cheme dical Informatics Financial Modeling Natural Language Processing Speech/Au dio Processing Planning Locomotion Vision/Imag e Processing Robotics Human Computer Interaction Analytics

Major Tasks Regression –Predict a numerical value from “other information” Classification –Predict a categorical value Clustering –Identify groups of similar entities Evaluation 6

Feature Representations How do we view data? 7 Entity in the World Web Page User Behavior Speech or Audio Data Vision Wine People Etc. Feature Representation Machine Learning Algorithm Feature Extraction Our Focus

Feature Representations HeightWeightEye ColorGender 66170BlueMale 73210BrownMale 72165GreenMale 70180BlueMale 74185BrownMale 68155GreenMale 65150BlueFemale 64120BrownFemale 63125GreenFemale 67140BlueFemale 68165BrownFemale 66130GreenFemale 8

Classification Identify which of N classes a data point, x, belongs to. x is a column vector of features. 9 OR

Target Values In supervised approaches, in addition to a data point, x, we will also have access to a target value, t. 10 Goal of Classification Identify a function y, such that y(x) = t

Feature Representations HeightWeightEye ColorGender 66170BlueMale 73210BrownMale 72165GreenMale 70180BlueMale 74185BrownMale 68155GreenMale 65150BlueFemale 64120BrownFemale 63125GreenFemale 67140BlueFemale 68165BrownFemale 66130GreenFemale 11

Graphical Example of Classification 12

Graphical Example of Classification 13 ?

Graphical Example of Classification 14 ?

Graphical Example of Classification 15

Graphical Example of Classification 16

Graphical Example of Classification 17

Decision Boundaries 18

Regression Regression is a supervised machine learning task. –So a target value, t, is given. Classification: nominal t Regression: continuous t 19 Goal of Classification Identify a function y, such that y(x) = t

Differences between Classification and Regression Similar goals: Identify y(x) = t. What are the differences? –The form of the function, y (naturally). –Evaluation Root Mean Squared Error Absolute Value Error Classification Error Maximum Likelihood –Evaluation drives the optimization operation that learns the function, y. 20

Graphical Example of Regression 21 ?

Graphical Example of Regression 22

Graphical Example of Regression 23

Clustering Clustering is an unsupervised learning task. –There is no target value to shoot for. Identify groups of “similar” data points, that are “dissimilar” from others. Partition the data into groups (clusters) that satisfy these constraints 1.Points in the same cluster should be similar. 2.Points in different clusters should be dissimilar. 24

Graphical Example of Clustering 25

Graphical Example of Clustering 26

Graphical Example of Clustering 27

Mechanisms of Machine Learning Statistical Estimation –Numerical Optimization –Theoretical Optimization Feature Manipulation Similarity Measures 28

Mathematical Necessities Probability Statistics Calculus –Vector Calculus Linear Algebra Is this a Math course in disguise? 29

Why do we need so much math? Probability Density Functions allow the evaluation of how likely a data point is under a model. –Want to identify good PDFs. (calculus) –Want to evaluate against a known PDF. (algebra) 30

Gaussian Distributions We use Gaussian Distributions all over the place. 31

Gaussian Distributions We use Gaussian Distributions all over the place. 32

Class Structure and Policies Course website: – Google Group for discussions and announcements – –Please sign up for the group ASAP. –Or put your address on the sign up sheet, and you will be sent an invitation. 33

Data Data Data “There’s no data like more data” All machine learning techniques rely on the availability of data to learn from. There is an ever increasing amount of data being generated, but it’s not always easy to process. UCI – LDC (Linguistic Data Consortium) – 34

Half time. 35 Get Coffee. Stretch.

Decision Trees Classification Technique. 36 color h h w w w w w w w w h h h h bluebrown green <66 <140 <150 <66 <64<145 <170 m m m m m m m m m m f f f f f f f f f f

Decision Trees Very easy to evaluate. Nested if statements 37 color h h w w w w w w w w h h h h bluebrown green <66 <140 <150 <66 <64<145 <170 m m m m m m m m m m f f f f f f f f f f

More formal Definition of a Decision Tree A Tree data structure Each internal node corresponds to a feature Leaves are associated with target values. Nodes with nominal features have N children, where N is the number of nominal values Nodes with continuous features have two children for values less than and greater than or equal to a break point. 38

Training a Decision Tree How do you decide what feature to use? For continuous features how do you decide what break point to use? Goal: Optimize Classification Accuracy. 39

Example Data Set 40 HeightWeightEye ColorGender 66170BlueMale 73210BrownMale 72165GreenMale 70180BlueMale 74185BrownMale 68155GreenMale 65150BlueFemale 64120BrownFemale 63125GreenFemale 67140BlueFemale 68165BrownFemale 66130GreenFemale

Baseline Classification Accuracy Select the majority class. –Here 6/12 Male, 6/12 Female. –Baseline Accuracy: 50% How good is each branch? –The improvement to classification accuracy 41

Training Example Possible branches 42 color blue brown green 2M / 2F 50% Accuracy before Branch 50% Accuracy after Branch 0% Accuracy Improvement

Example Data Set 43 HeightWeightEye ColorGender 63125GreenFemale 64120BrownFemale 65150BlueFemale 66170BlueMale 66130GreenFemale 67140BlueFemale 68145BrownFemale 6155GreenMale 70180BlueMale 72165GreenMale 73210BrownMale 74185BrownMale

Training Example Possible branches 44 height <68 1M / 5F 5M / 1F 50% Accuracy before Branch 83.3% Accuracy after Branch 33.3% Accuracy Improvement

Example Data Set 45 HeightWeightEye ColorGender 64120BrownFemale 63125GreenFemale 66130GreenFemale 67140BlueFemale 68145BrownFemale 65150BlueFemale 68155GreenMale 72165GreenMale 66170BlueMale 70180BlueMale 74185BrownMale 73210BrownMale

Training Example Possible branches 46 weight <165 1M / 6F 5M 50% Accuracy before Branch 91.7% Accuracy after Branch 41.7% Accuracy Improvement

Training Example Recursively train child nodes. 47 weight <165 5M height <68 5F1M / 1F

Training Example Finished Tree 48 weight <165 5M height <68 5F weight <155 1M 1F

Generalization What is the performance of the tree on the training data? –Is there any way we could get less than 100% accuracy? What performance can we expect on unseen data? 49

Evaluation Evaluate performance on data that was not used in training. Isolate a subset of data points to be used for evaluation. Evaluate generalization performance. 50

Evaluation of our Decision Tree What is the Training performance? What is the Evaluation performance? –Never classify female over 165 –Never classify male under 165, and under 68. –The middle section is trickier. What are some ways to make these similar? 51

Pruning There are many pruning techniques. A simple approach is to have a minimum membership size in each node. 52 weight <165 5M height <68 5F weight <155 1M 1F weight <165 5M height <68 5F 1F / 1M

Decision Tree Recap Training via Recursive Partitioning. Simple, interpretable models. Different node selection criteria can be used. –Information theory is a common choice. Pruning techniques can be used to make the model more robust to unseen data. 53

Next Time: Math Primer Probability –Bayes Rule –Naïve Bayes Classification Calculus –Vector Calculus Optimization –Lagrange Multipliers 54