General-Purpose Learning Machine Seyed Hamid Hamraz hamid_hamraz@comp.iust.ac.ir
Introduction to Machine Learning How to construct computer programs that automatically improve with experience? Machine learning problems are usually reduced to a few function approximation problems:
Introduction to Machine Learning Variable (Inputs or output) types: Numeric: e.g. weight, height, etc Nominal: e.g. booleans, seasons, etc Numeric variables may be either continuous or discrete Output: Numeric: Regression Problems Nominal: Classification Problems
Introduction to Machine Learning Machine Learning problem types from the perspective of the feedback provided to the learning organ: Supervised: training is in the form of Reinforcement: training is in the form of occasional rewards to the learning system Unsupervised: no clear training is provided Machine Learning problem types from the perspective of the feedback provided to the learning organ: Supervised: training is in the form of Reinforcement: training is in the form of occasional rewards to the learning system Unsupervised: no clear training is provided
Introduction to GPLM Most of machine learning activities are in the shape of solution-finding for a specific problem. From the user’s view, General-Purpose Learning Machine (GPLM) is a black box which receives supervised training instances and predicts answers for unknown instances. A comprehensive class library (maybe a hardware part) for supervised machine learning jobs that can be exploited by a simple user.
GPLM outer view GPLM unknown instance for prediction set of training instances for a special problem GPLM output value additional Learning
GPLM Essential Characteristics Disuse of intellectual cost for adapting the machine to a problem Fast and online learning Fast and efficient prediction Additional learning No strict limitation for the type of problems it can be applied
Nominated Learning Methods as the GPLM Internal Engine Decision Trees Artificial Neural Networks (ANN) Instance-Based Learning
Decision Tree Learning Approximating nominal-valued functions (classification) The learned function is represented by a tree (if-then rules) Learning is to construct the tree regarding that the more strong attributes should reside in higher nodes Information theory methods
Play Tennis Decision Tree
Is Decision Tree Learning suitable for GPLM? Intellectual cost not required Fast and efficient prediction (linear to the number of inputs) Slow learning No additional learning Only for classification Difficulty in dealing with real inputs
Artificial Neural Networks Inspired by the complex web of interconnected neurons in the brain Robust method for approximating both numeric- and nominal-valued functions Feed-forward networks are mostly utilized The BackPropagation algorithm is the most commonly used ANN Learning technique
ANN for Steering Autonomous Vehicle
Is ANN suitable for GPLM Fast and efficient estimation No limitation in the type of inputs or output Requires a knowledgeable user for each problem: finding the appropriate network topology Very slow learning No additional learning
Instance-Based Learning Does not provide explicit representation for the function which is to be approximated Saves the instances during the training session Retrieves a few similar instances to the unknown one, and estimates the output based on them Lazy: does not do anything special during learning session, and postpones the process to the estimation time
K-Nearest Neighbors (KNN) Algorithm More specific sub-type of Instance-Based algorithms Instances are represented in the form of points in a multi-dimensional space The similarity metric is the Euclidean distance
Is KNN suitable for GPLM? Intellectual cost not required Fast and online learning Additional learning at any time No limitation in the type of inputs or output Slow estimation
KNN Fatal Issues for GPLM Overcoming laziness Output estimation based on retrieved neighbors Regression Classification Feature (input) weighing Nominal inputs Null-valued inputs
KNN Optimization Issues for GPLM Finding appropriate value of the K Reducing storage requirements Noisy training instances Missing attributes …
Overcoming Laziness Indexing structures 1 B-tree, hash indexing 2-10 Dimensionality Indexing Technique 1 B-tree, hash indexing 2-10 quad-tree, grid-file, KD-B-tree, R-tree, R*-tree 10-30 X-tree, TV-tree, M-tree, Pyramid-Techniques > 30 ?
Approximate Nearest Neighbors No feasible indexing method for high-dimensional problems Machine learning is based on approximation Approximate nearest neighbors retrieval can lead to acceptable learning result Approximate nearest neighbors can be retrieved far easier than the exact ones
Approximating f(x,y): K=1 Implemented KDTree X Y Y Y X Y X Approximating f(x,y): K=1
Discussion Adding a new instance: Finding the exact nearest neighbor: Finding the container rectangle (approximate nearest neighbor): Splitting methods: Equal Distribution Middle (recursive)
Output Estimation Numeric-valued output:
Output Estimation Nominal-valued output:
Feature Weighing The effect degree of each attribute on the value of the output Simple distance function Different effect degrees, different scales
Feature Weighing learning rate number of instances
Feature Weighing Methods Wrapper methods: search the domain space of the W, receive feedback from the learning tool Filter methods: Information Theory Filter methods can determine the W faster Wrappers can determine a more suitable W
Implemented Feature Weighing Mechanism A wrapper method which (binary) searches [0,1] for each The search is done concurrently for all A gradient-descent like method; the gradient is not calculated analytically The value each should be changed at each step is estimated through a race between two values of the vector W The race is upon cross-validation (LOOCV) Requires standard methods to avoid local minima
Binary Search Effect Specifier Holds a local copy of W, that is updated at the end of each iteration Index in the W vector float high = 1, low = 0; while (high - low > precision) { float[] W1 = (float[])localWeights.clone(); float[] W2 = (float[])localWeights.clone(); W1[id] = high; W2[id] = low; WeightedEuclidianDistance sm1 = new WeightedEuclidianDistance(W1); WeightedEuclidianDistance sm2 = new WeightedEuclidianDistance(W2); double result = match(sm1, sm2); if (result > 0) { weightHolder.setWeight(id, high); low += (high - low) / division; } else { weightHolder.setWeight(id, low); high -= (high - low) / division; } updateLocalWeights(); An object that holds the values of the vector W. The object is shared among all the threads.
Discussion The algorithm can be realized in Suitable for multiple parallel processor programming Limits the weights; the weights can grow and shrink in a bounded domain
9 inputs; 6-valued output Glass Classification 9 inputs; 6-valued output Weights {0.5, 0.77, 1.0, 0.9, 0.28, 0.4, 0.77, 0.76, 0.38}
4 inputs; 3-valued output Iris Classification 4 inputs; 3-valued output Weights {1.0, 0.825, 0.9, 0.34}
Pendigits Classification 16 inputs; 10-valued output Weights {0.64, 0.81, 0.2, 0.88, 0.58, 0.92, 0.46, 0.75, 0.67, 0.89, 0.63, 0.9, 0.49, 0.75, 0.4, 0.92}
Domains {(0,100), (0,100), (0,100), (0,100)} Weights {0.0, 0.06, 0.17, 1.0}
Domains {(0,100), (0,100), (0,100), (0,1)} Weights {0.0, 0.048, 0.833, 0.423}
Domains {(0,100), (0,100), (0,100), (0,1)} Weights {0.06, 0.0, 0.005, 0.95}
Summary GPLM idea introduced: GPLM characteristics Candidate machine learning methods as internal GPLM engine: Decision Trees, ANN, IBL KNN issues for applying to GPLM Efficient query (overcoming laziness) Output estimation Feature weighing Empirical result presented