INTRODUCTION TO Machine Learning Adapted from: ETHEM ALPAYDIN Lecture Slides for
CHAPTER 1: Introduction
3 Why “Learn” ? Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to “learn” to calculate payroll Learning is used when: Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics)
4 What We Talk About When We Talk About“Learning” Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought “Introduction to Machine Learning (Adaptive Computation and Machine Learning)” also bought “Data Mining: Practical Machine Learning Tools and Techniques ” ( Build a model that is a good and useful approximation to the data.
5 Data Mining Retail: Market basket analysis, Customer relationship management (CRM) Finance: Credit scoring, fraud detection Manufacturing: Optimization, troubleshooting Medicine: Medical diagnosis Telecommunications: Quality of service optimization Bioinformatics: Motifs, alignment Web mining: Search engines...
6 What is Machine Learning? Optimize a performance criterion using example data or past experience. Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to Solve the optimization problem Representing and evaluating the model for inference
7 What is Machine Learning? An Abstract Learning Machine (Algorithm) TRAINING DATA Answer Trained Machine Query
8 Some Learning Machines Linear models Kernel methods Neural networks Markov Model Decision trees …
9 Some Area of Applications inputs Training examples Bioinformatics Ecology OCR HWR Market Analysis Text Categorization Machine Vision System diagnosis
10 Bioinformatics: Biomedical / Biometrics Medicine: Screening Diagnosis and prognosis Drug discovery Security: Face recognition Signature / fingerprint / iris verification DNA fingerprinting 6
11 Computer / Internet Computer interfaces: Troubleshooting wizards Handwriting and speech Brain waves Internet Hit ranking Spam filtering Text categorization Text translation Recommendation 7
Application Types
13 Application Types Learning Association Supervised Learning Classification Regression Unsupervised Learning Reinforcement Learning (unsupervised)
14 Learning Associations Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( pen | note book ) = 0.7
15 Application Types Learning Association Supervised Learning Classification Regression Unsupervised Learning Reinforcement Learning (unsupervised)
16 Classification Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings Discriminant: IF income > θ 1 AND savings > θ 2 THEN low-risk ELSE high-risk
17 Classification: Applications Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles. Speech recognition: Temporal dependency. Use of a dictionary or the syntax of the language. Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech Medical diagnosis: From symptoms to illnesses...
18 Face Recognition Training examples of a person Test images AT&T Laboratories, Cambridge UK
19 Example 1 “Sorting incoming Fish on a conveyor according to species using optical sensing” Sea bass Species Salmon
20 Problem Analysis Set up a camera and take some sample images to extract features: Length Lightness Width Number and shape of fins Position of the mouth, etc… This is the set of all suggested features to explore for use in our classifier!
21 Preprocessing Use a segmentation operation to isolate fishes from one another and from the background. Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features. The features are passed to a classifier.
22
23 Classification Select the length of the fish as a possible feature for discrimination The length is a poor feature alone! Select the lightness as a possible feature.
24 Classification Select the lightness of the fish as a possible feature for discrimination Move decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory
25 Adopt the lightness and add the width of the fish Fish x T = [x 1, x 2 ] lightness width (length) We might add other features that are not correlated with the ones we already have.
26 Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:
27 However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input Issue of generalization!
28 Example: Particle Classification Particles on an air filter P1: P2: P3:
29 Sample data set Features Class
30 Histogram Analysis P1 P2 P1:P2:P3: Area Number P2 Area Number P3
31 Histogram Analysis P2 P3 P1:P2:P3: Perimeter Number Area Perimeter P1 P2 P3
32 Input Space Partitioning P1:P2:P3: Area Perimeter P1 P2 P3 Area Perimeter P1 P2 P3
33 Classification Systems Sensing Use of a transducer (camera or microphone) PR system depends of the bandwidth, the resolution sensitivity distortion of the transducer Segmentation and grouping Patterns should be well separated and should not overlap
34
35 Feature extraction Discriminative features Invariant features with respect to translation, rotation and scale. Classification Use a feature vector provided by a feature extractor to assign the object to a category Post Processing Exploit context input dependent information other than from the target pattern itself to improve performance
36 The Design Cycle Data collection Feature Choice Model Choice Training Evaluation Computational Complexity
37
38 Data Collection How do we know when we have collected an adequately large and representative set of examples for training and testing the system? Feature Choice Depends on the characteristics of the problem domain. Simple to extract, invariant to irrelevant transformation insensitive to noise. Model Choice Unsatisfied with the performance of our fish classifier and want to jump to another class of model.
39 Training Use data to determine the classifier. Many different procedures for training classifiers and choosing models Evaluation Measure the error rate (or performance and switch from one set of features to another one
40 Application Types Learning Association Supervised Learning Classification Regression Unsupervised Learning Reinforcement Learning (unsupervised)
41 Regression [start Matlab demo lecture2.m] Given examples Predict given a new point
42 Linear regression Prediction
43 Regression - Example Price of a used car x : car attributes y : price y = g (x | θ ) g ( ) : model, θ : parameters (car type or model or options … y = wx+w 0
44 Some Regression Applications Navigating a car: Angle of the steering wheel Kinematics of a robot arm α 1 = g 1 (x,y) α 2 = g 2 (x,y) α1α1 α2α2 (x,y)(x,y) Response surface design
45 Response Surface Design Response Surface Design is useful for the modeling and analysis of programs in which a response of interest is influenced by several variables and the objective is to optimize this response. For example: Find the levels of temperature (x 1 ) and pressure (x 2 ) to maximize the yield (y) of a process.
46 Application Types Learning Association Supervised Learning Classification Regression Unsupervised Learning Reinforcement Learning (unsupervised)
47 Unsupervised Learning Learning “what normally happens” No output Clustering: Grouping similar instances Example applications Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs A motif is a pattern common to a set of nucleic or amino acid subsequences which share some biological property of interest (such as being DNA binding sites for a regulatory protein).
48 Application Types Learning Association Supervised Learning Classification Regression Unsupervised Learning Reinforcement Learning (unsupervised)
49 Reinforcement Learning Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agents, partial observability,...
50 Reinforcement Learning (RL) In RL problems, an agent interacts with an environment and iteratively learns a policy, i.e., it maps actions to states to maximize a reward signal. Agent Environment action a t state s t+1 reward r t+1 Update V(s t ) or Q(s t,a t )
51 Learning Autonomous Agent Environment ActionsPerceptions
52 Learning Autonomous Agent Environment ActionsPerceptions Delayed Reward
53 Learning Autonomous Agent Environment ActionsPerceptions New behavior Delayed Reward The Reward won’t be given immediately after agent’s action. Usually, it will be given only after achieving the goal. This delayed reward is the only clue to agent’s learning.
54 Resources: Datasets UCI Repository: UCI KDD Archive: Statlib: Delve:
55 Resources: Journals Journal of Machine Learning Research Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence Annals of Statistics Journal of the American Statistical Association...
56 Resources: Conferences International Conference on Machine Learning (ICML) ICML05: European Conference on Machine Learning (ECML) ECML05: Neural Information Processing Systems (NIPS) NIPS05: Uncertainty in Artificial Intelligence (UAI) UAI05: Computational Learning Theory (COLT) COLT05: International Joint Conference on Artificial Intelligence (IJCAI) IJCAI05: International Conference on Neural Networks (Europe) ICANN05: