1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Artificial Neural Networks (1)
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Tree Approach in Data Mining
Data Mining Classification: Alternative Techniques
Kohonen Self Organising Maps Michael J. Watts
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Pattern Recognition: Readings: Ch 4: , , 4.13
1 Pattern Recognition Pattern recognition is: 1. The name of the journal of the Pattern Recognition Society. 2. A research area in which patterns in data.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Ensemble Learning: An Introduction
Three kinds of learning
Machine Learning: Symbol-Based
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
INSTANCE-BASE LEARNING
CS Instance Based Learning1 Instance Based Learning.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Radial Basis Function (RBF) Networks
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
Inductive learning Simplest form: learn a function from examples
Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
1 CSI 5388:Topics in Machine Learning Inductive Learning: A Review.
Machine Learning CSE 681 CH2 - Supervised Learning.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Learning from observations
George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Symbol-Based Luger: Artificial.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Data Mining and Decision Support
Introduction to Machine Learning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer.
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Chapter 6 Neural Network.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Business Analytics Several odds and ends Copyright © 2016 Curt Hill.
General-Purpose Learning Machine
Data Mining, Neural Network and Genetic Programming
Introduction to Machine Learning
Chapter 11: Learning Introduction
Data Mining Lecture 11.
Learning.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Introduction to Machine Learning
CS639: Data Management for Data Science
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
A task of induction to find patterns
Sanguthevar Rajasekaran University of Connecticut
Presentation transcript:

1 Chapter 10 Introduction to Machine Learning

2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific Ordering l Version Spaces l Candidate Elimination

3 Chapter 10 Contents (2) l Inductive Bias l Decision Tree Induction l Overfitting l The Nearest Neighbor Algorithm l Neural Networks l Supervised Learning l Unsupervised Learning l Reinforcement Learning

4 Training l Learning problems usually involve classifying inputs into a set of of classifications. l Learning is only possible if there is a relationship between the data and the classifications. l Training involves providing the system with data which has been manually classified. l Learning systems use the training data to learn to classify unseen data.

5 Rote Learning l A very simple learning method. l Simply involves memorizing the classifications of the training data. l Can only classify previously seen data – unseen data cannot be classified by a rote learner.

6 Concept Learning l Concept learning involves determining a mapping from a set of input variables to a Boolean value. l Such methods are known as inductive learning methods. l If a function can be found which maps training data to correct classifications, then it will also work well for unseen data – hopefully! l This process is known as generalization.

7 Hypotheses l A hypothesis is a vector of variables: l In concept learning, a training hypothesis is either a positive or negative (true or false). l A ? is used to indicate that any value will be suitable. A Ø is used to indicate that no value will be suitable.

8 Hypotheses - Example l Each hypothesis represents a set of driving conditions. l If a hypothesis is positive, then it represents a safe scenario. l For example: l This represents the hypothesis that it is safe to drive fast in rain 10ft behind the next car having drunk 2 units of alcohol. l This would be a negative training example, as clearly it is not safe!

9 General to Specific Ordering l This hypothesis is the most general hypothesis. It represents the idea that it is safe to drive in any conditions: h g = l The following hypothesis is the most specific hypothesis: it says it is not safe to drive in any conditions: h s = l We can define a partial order over the set of hypotheses: h 1 > g h 2 l This states that h 1 is more general than h 2 l One learning method is to determine the most specific hypothesis that matches all the training data.

10 Partial Order (sort) l H1 = l H2 = l Cannot be ordered, they have not relation

11 More General Hypothesis l ≥

12 Learning Algorithm l Start with the most specific hypothesis, relaxing until a match is found. l l Positive Training has n l Choose l Match with yields l Match with yields -- hypothesis – it is safe to drive if one drives slowly and doesn’t drink

13 Version Spaces l A version space is the set of hypotheses that correctly map all the training data to their categories. l A simplistic learning method would be to start from a version space of all hypotheses and to systematically remove all the ones that do not match the training data. l Clearly this would not be an efficient learning method!

14 Candidate Elimination l Candidate elimination aims to derive one hypothesis which matches all training data. l We start with a set of the most general (h g ) and most specific (h s ) hypotheses. l As each item of training data is examined, the set of hypotheses are modified such that all hypotheses in h s and h g match the training data. l When finished with training, the remaining hypothesis should match unseen data.

15 Inductive Bias l All learning methods have an inductive bias. l The inductive bias of a learning method is the set of restrictions on the learning method. l Without inductive bias, a learning method could not learn to generalize. l Occam’s razor is an example of an inductive bias: The best hypothesis to select is the simplest one.

16 Decision Tree Induction (1) l A decision tree takes an input and gives a Boolean output. l Decision trees can represent more complex scenarios than version spaces.

17 Decision Tree Induction (2) l Decision tree induction involves creating a decision tree from a set of training data that can be used to correctly classify the training data. l ID3 is an example of a decision tree learning algorithm. l ID3 builds the decision tree from the top down, selecting the features from the training data that provide the most information at each stage. l html

18 Decision Tree Induction (3) l ID3 selects attributes based on information gain. l Information gain is the reduction in entropy caused by a decision. l Entropy is defined as: H(S) = - p 1 log 2 p 1 - p 0 log 2 p 0 l p 1 is the proportion of the training data which are positive examples l p 0 is the proportion which are negative examples l The entropy of S is zero when all the examples are positive, or when all the examples are negative. l The entropy reaches its maximum value of 1 when exactly half of the examples are positive, and half are negative.

19 Values with greatest gain are placed near the top of the tree l Considering example page 279 nCountry of origin nBig star 0.01 nGenre 0.17

20 The Problem of Overfitting Black dots represent positive examples, white dots negative. The two lines represent two different hypotheses. In the first diagram, there are just a few items of training data, correctly classified by the hypothesis represented by the darker line. In the second and third diagrams we see the complete set of data, and that the simpler hypothesis which matched the training data less well matches the rest of the data better than the more complex hypothesis, which overfits.

21 The Nearest Neighbor Algorithm (1) l This is an example of instance based learning. l Instance based learning involves storing training data and using it to attempt to classify new data as it arrives. l The nearest neighbor algorithm works with data that consists of vectors of numeric attributes. l Each vector represents a point in n- dimensional space.

22 The Nearest Neighbor Algorithm (2) l When an unseen data item is to be classified, the Euclidean distance is calculated between this item and all training data. l the distance between and is: l The classification for the unseen data is usually selected as the one that is most common amongst the few nearest neighbors. l Shepard’s method involves allowing all training data to contribute to the classification with their contribution being proportional to their distance from the data item to be classified.

23 Neural Networks (1) l An neural network is a network of artificial neurons, which is based on the operation of the human brain. l Neural networks usually have their nodes arranged in layers. l One layer is the input layer, and another is an output layer. l There are one or more hidden layers between these two.

24 Neural Networks (2) l The connections between nodes have weights associated with them, which determine the behavior of the network. l Input data is applied to the input layer. l Neurons fire if their inputs are above a certain level. l If one neuron is connected to another the firing of one may cause the firing of the next.

25 Supervised Learning l Many neural networks use supervised learning. l Pre-classified training data is provided to the network before it is presented with unseen data. l The training data causes the weights in the network to be set to levels such that unseen data can be classified correctly. l Neural networks are able to learn to classify extremely complex functions.

26 Unsupervised Learning l Unsupervised learning networks learn without requiring human intervention. l No training data is required. l The system learns to cluster input data into a set of classifications that are not previously defined. l Example: Kohonen Maps.

27 Reinforcement Learning l Systems that learn using reinforcement learning are given a positive feedback when they classify data correctly, and negative feedback when they classify data incorrectly. l Credit assignment is needed to reward nodes in a network correctly.