Genetic Algorithms Schematic of neural network application to identify metabolites by mass spectrometry (MS) Developed by Dr. Lars Kangas Input to Genetic Algorithm is measure of fitness from comparison of in silico and experimental MS Output are “chromosomes” translated into weights for neural network that is part of model for metabolite MS
Very Brief History of genetic algorithms: Genetic Algorithm were developed by John Holland in 60’s and 70’s Author of “Adaption in Natural and Artificial Systems” More recent book on the subject “An Introduction to Genetic Algorithms” by Melanie Mitchell (MIT Press, Cambridge, MA, 2002)
Natural adaption: Populations of organisms are subjected to environmental stress. Fitness is manifest by ability to survive and reproduce Fitness is passed to offspring by genes that are organized on chromosomes. If environmental conditions change, evolution creates a new population with different characteristics that optimize fitness under the new conditions
Basic tools of evolution Recombination (crossover) occurs during reproduction. Chromosome of offspring is a mixture of chromosomes from parents Mutation changes a single gene within a chromosome. To be expressed, organism must survive and pass modified chromosome to offspring
Artificial adaptation : Represent a candidate solution to a problem by a chromosome Define a fitness function on the domain of all chromosomes Define the probabilities of crossover and mutation. Select 2 chromosomes for reproduction based on their fitness Produce new chromosomes by crossover and mutation Evaluate fitness of new chromosomes Completes a “generation”
Artificial adaptation continued: In generations create a population of solutions with high fitness Repeat whole process several times and merge best solutions Simple example: Find the position of the maximum of a normal distribution with mean of 16 and standard deviation of 4
Fitness function
Chromosome = binary representation of integers between 0 and 31 (requires 5 bits) 0 to 31 covers the range where fitness is significantly different from zero Fitness of chromosome = value of fitness function f(x) where x is decimal equivalent of a 5-bit binary Crossover probability (rate) = 0.75 Mutation probability (rate) = Size of population, n = 4 Problem set up
Method to select chromosomes for refinement Calculate fitness f(x i ) for each chromosome in population Assigned each chromosome a discrete probability by Use p i to design a roulette wheel Divide number line between 0 and 1 into segments of length p i in a specified order Get r, random number uniformly distributed between 0 and 1 Choose the chromosome of the line segment containing r
00100 = 4fitness = pi = = 9fitness = pi = = 27fitness = pi = = 31fitness = pi = i f(x i ) = st generation: 5-bit binary numbers chosen randomly Assume the pair with largest 2 probabilites (01001 and 11011) are selected for replication
Assume a mixing point (locus) is chosen between first and second bit. Crossover selected to induce change Mutation is rejected as method to induce change
Evaluate fitness of new population = 4fitness = pi = = 11fitness = pi = = 25fitness = pi = = 31fitness = pi = i f(x i ) = about 2 times that of the 1 st generation Repeat until fitness of population is almost uniform Values of all chromosomes should be near 16
Crowding: In the initial chromosome population of this example has 86% of the selection probability. Potentially can lead to imbalance of fitness over diversity Limit the ability of GA to explore new regions of search space Solution: penalize choice of similar chromosomes for mating
and are the mean and standard deviation of fitness in the population In early generations, selection pressure should be low to enable wider coverage of search space (large ) In later generations selection pressure should be higher to encourage convergence to optimum solution (small ) Sigma scaling allows variable selection pressure Sigma scaling of fitness f(x)
Positional bias: Single-point crossover lets near-by loci stay together in children One of several methods to avoid positional bias
Genetic Algorithm for real-valued variables Real-valued variables can be converted to binary representation as in example of finding maximum of normal distribution. Results in loss of significance unless one uses a large number of bits Arithmetic crossover Parents and Choose k th gene at random Children 0 < <1
Discrete crossover: With uniform probability, each gene of child chromosome chosen to be a gene in one or the other parent chromosomes at the same locus. Parents and Child Normally distributed mutation: Choose random number from normal distribution with zero mean and standard deviation comparable to size of genes (e.g. = 1 for genes scaled between -1 and +1). Add to randomly chosen gene. Re-scale if needed. More methods for real-valued variables
Using GA in training of ANN ANN with 11 weights: 8 to hidden layer, 3 to output w 1A w 1B w 2A w 2B w 3A w 3B w 0A w 0B w AZ w BZ w 0Z
Chromosome for weight optimization by GA Scaled to values between -1 and +1 Use methods crossover and mutation for real numbers to modify chromosome Fitness function: mean squared deviation between output and target
Use feed forward to determine the fitness of this new chromosome
Genetic algorithm for attribute selection Find the best subset of attributes for data mining GA is well suited to this task since, with diversity, it can explore many combinations of attributes.
WEKA’s GA applied to attribute selection Default values: Population size = 20 Crossover probability = 0.6 Mutation probability = Example: breast-cancer classification Wisconsin Breast Cancer Database Breast-cancer.arff 683 instances 9 numerical attributes 2 target classes benign=2 malignant=4
Tumor characteristics 1.clump-thickness 2.uniform-cell size 3.uniform-cell shape 4.marg-adhesion 5.single-cell size 6.bare-nuclei 7.bland-chomatin 8.normal-nucleoli 9.mitoses Severity scores 5,1,1,1,2,1,3,1,1,2 5,4,4,5,7,10,3,2,1,2 3,1,1,1,2,2,3,1,1,2 6,8,8,1,3,4,3,7,1,2 4,1,1,3,2,1,3,1,1,2 8,10,10,8,7,10,9,7,1,4 1,1,1,1,2,10,3,1,1,2 2,1,2,1,2,1,3,1,1,2 2,1,1,1,2,1,1,1,5,2 4,2,1,1,2,1,2,1,1,2 Severity scores are attributes Last number in a row is class label Examples from dataset
Characteristic 1.clump-thickness 2.uniform-cell size 3.uniform-cell shape 4.marg-adhesion 5.single-cell size 6.bare-nuclei 7.bland-chomatin 8.normal-nucleoli 9.mitoses Severity score 5,1,1,1,2,1,3,1,1,2 5,4,4,5,7,10,3,2,1,2 3,1,1,1,2,2,3,1,1,2 6,8,8,1,3,4,3,7,1,2 4,1,1,3,2,1,3,1,1,2 8,10,10,8,7,10,9,7,1,4 1,1,1,1,2,10,3,1,1,2 2,1,2,1,2,1,3,1,1,2 2,1,1,1,2,1,1,1,5,2 4,2,1,1,2,1,2,1,1,2 Chromosomes have 9 binary genes gene k = 1 means k th severity score included Fitness: accuracy of naïve Bayes classification
Background on Bayesian classification
27 posterior Class likelihoodprior normalization Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Assign client to class with higher posterior With normalized, assign to class with P(C|x) > 0.5 P(C|x) = 0.5 is a discriminant in attribute space Bayes’ Rule for binary classification
28 posterior Class likelihoodprior normalization Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Prior is information relevant to classifying that is independent of attributes Class likelihood is probability that member of class C will have attribute x Bayes’ Rule for binary classification
29 posterior Class likelihoodprior normalization Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Prior = risk tolerance of bank (determined from loan-approval history) Class likelihood = is x like other high-risk applications? Example: Bayes’ Rule for loan approval
30 posterior Class likelihoodprior normalization Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Normalization is generally not necessary for classification Normalized Bayes’ rule for binary classification
Bayes’ Rule: K>2 Classes 31 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
32 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) With class labels r i t, estimators are Estimate priors and class likelihoods from data set Number of examples in class is estimate of its prior. Assume members of class are Gaussian distributed. Mean and covariance parameterize class likelihood.
Assume x i are independent, offdiagonals of ∑ are 0, p(x|C) is product of probabilities for each component of x 33 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) naïve Bayes classification Each class has set of means and variances for the components of the attributes in that class
Open file breast-cancer.arff Check attribute 10 (class) to see the number of examples in each class Attribute selection using WEKA’s genetic algorithm method
benign malignant
Open file breast-cancer.arff Click on attribute 10 (class) to see the number of examples in each class Attribute selection using WEKA’s genetic algorithm method Click on any other attribute.
clump thickness Distribution of attribute scores (1 – 10) over examples in dataset Severity of clump thickness positively correlated with malignancy increasing severity
Baseline performance measures use naïve Bayes classifier
Under the Select-Attributes tab of Weka Explorer Press choose button under Attribute Evaluator Under Attribute Selection find WrapperSubsetEval
Click on WrapperSubsetEval to bring up dialog box which shows ZeroR as the default classifier Find the Naïve Bayes classifier, click OK Evaluator has been selected
Under the Select-Attribute tab of Weka Explorer Press choose button under Search Method Find Genetic Search (see package manager in Weka 3.7) Start search with default settings including “Use full training set”
Fitness function: linear scaling of the error rate of naïve Bayes classification such that the highest error rate corresponds to a fitness of zero How is subset related to chromosome?
Any subset that includes 9 th attribute has low fitness Results with Weka 3.6
Increasing the number of generations to 100 does not change the attributes selected 9 th attribute “mitoses” has been deselected Return to Preprocess tab, remove “mitoses” and reclassify
Performance with reduced attribute set is slightly improved Slight improvement Misclassified malignant cases decreased by 2
Weka has other attribute selection techniques For theory see “information gained” is alternative to SubSetEval with GA search Ranker is the only Search Method that can be used with InfoGainAttributeEval