Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.

Similar presentations


Presentation on theme: "Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks."— Presentation transcript:

1 Machine Learning CS 165B Spring 2012 1

2 Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks (Ch. 4) Linear classifiers Support Vector Machines Bayesian Learning (Ch. 6) Genetic Algorithms (Ch. 9) Instance-based Learning (Ch. 8) Clustering Computational learning theory (Ch. 7) 2

3 Genetic Algorithms - History Pioneered by John Holland in the 1970 ’ s Got popular in the late 1980 ’ s Based on ideas from Darwinian Evolution Can be used to solve a variety of problems that are not easy to solve using other techniques Particularly well suited for hard problems where little is known about the underlying search space Widely-used in business, science and engineering

4 Classes of Search Techniques Search Techniqes Guided random search techniqes Enumerative Techniqes BFS DFS Dynamic Programming Tabu SearchHill Climbing Simulated Anealing Evolutionary Algorithms Genetic Algorithms Genetic Programming

5 Background: Biological Evolution Biological analogy for learning –Lamarck:  Species adapt over time  Pass this adaptation to offsprings –Darwin:  Consistent, heritable variation among individuals in population  Natural selection of the fittest –Mendel:  A mechanism for inheriting traits  genotype → phenotype mapping (“code”) –Epigenetics:  Non-genetic inheritance

6 Evolution in the real world Each cell of a living thing contains chromosomes - strings of DNA Each chromosome contains a set of genes - blocks of DNA Each gene determines some aspect of the organism (like eye colour) A collection of genes is sometimes called a genotype A collection of aspects (like eye colour) is sometimes called a phenotype Reproduction involves recombination of genes from parents and then small amounts of mutation (errors) in copying The fitness of an organism is how much it can reproduce before it dies Evolution based on “ survival of the fittest ”

7 Start with a Dream… Suppose you have a problem You don’t know how to solve it What can you do? Can you use a computer to somehow find a solution for you? This would be nice! Can it be done?

8 A dumb solution A “blind generate and test” algorithm: Repeat Generate a random possible solution Test the solution and see how good it is Until solution is good enough

9 Can we use this dumb idea? Sometimes - yes: –if there are only a few possible solutions –and you have enough time –then such a method could be used For most problems - no: –many possible solutions –with no time to try them all –so this method can not be used

10 A “less-dumb” idea (GA) Generate a set of random solutions Repeat Test each solution in the set (rank them) Remove some bad solutions from set Duplicate some good solutions make small changes to some of them Until best solution is good enough

11 GA(Fitness, Fitness_threshold, p, r, m) Fitness : assigns evolution score Fitness_threshold : termination criterion p : number of hypotheses in a generation r : fraction of population to be replaced m : mutation rate Initialize: P ← Generate p random hypotheses Evaluate: for each h in P, compute Fitness ( h ) While  max h Fitness ( h )  Fitness_threshold –Create a new generation P (5 steps: Select, Crossover, Mutate, Update, Evaluate) Return the hypothesis in P that has the highest Fitness

12 Select: Probabilistically select  r  p members of P to be non- replaced part of P S Add rp members to these “ survivors ” –Crossover: Probabilistically select rp  pairs of hypotheses from P. For each pair  h   h  , produce two offspring by applying the Crossover operator. Add all offspring to P S –Mutate: Invert a randomly selected bit in m % of random members of P S –(Apply other operators,  e.g., Invert: “ switch ” portions of each hypothesis) –Update: P ← P S –Evaluate: for each h in P, compute Fitness ( h ) Create a New Generation Ps

13 How to encode a hypothesis/solution? Obviously this depends on the problem! GA’s often encode solutions as fixed length “bitstrings” (e.g. 101110, 111111, 000101) Each bit represents some aspect of the proposed solution to the problem For GA’s to work, we need to be able to “test” any string and get a “score” indicating how “good” that solution is

14 Representing hypotheses Hypothesis representation specifics: –“1”s in all attribute value positions implies “don’t care” –Single “1”: only one attribute value possible –“0”s in all positions: no attribute values possible Can represent rule-sets using concatenation of rules Can construct symbolic strings (computer programs) instead of bit strings

15 Fitness landscapes

16 Representing hypotheses about concepts Hypotheses: represent as conjunctive rules/bit strings –Fixed length bit strings Enumerate attributes and possible attribute values One approach; for each value a of attribute A : –  for A  a possible,  if not –Multiple values of A means  –Different attribute means  Same for the target (post conditions) IF Wind  Strong THEN PlayTennis  WindOutlook Sunny Overcast Rain Strong Weak   WindOutlook Sunny Overcast Rain Strong Weak  Tennis? Yes No

17 Typical operators Initial strings Crossover Mask Single-point Crossover Offspring 11101001000 00001010101 11111000000 11101010101 00001001000 Two-point Crossover 11101001000 00001010101 00111110000 11001011000 00101000101 Uniform Crossover 11101001000 00001010101 10011010011 10001000100 01101011001 Point Mutation 1110100100011101011000

18 Selecting most fit hypotheses Fitness proportionate selection: –Also called “Roulette Wheel Selection” Crowding problem possible under proportionate selection –One genetic structure becomes dominant –Low diversity in population Alternatives to simple random selection –Tournament selection (for more diverse population):  Pick h 1, h 2 at random with uniform probability  With probability p, select the more fit –Rank selection:  Sort all hypotheses by fitness  Probability of selection is proportional to rank –Not fitness

19 Representation: encoding into bit strings: –IF a 1  T  a 2  F THEN c  T; IF a 2  T THEN c  F a 1 a 2 c a 1 a 2 c 10 01 1 11 10 0 –Encode multiple rules  Length of hypothesis grows with number of rules Fitness based on predictive correctness of hypotheses Fitness  h  correct  h   Genetic operators ???: –Need to accommodate variable length rule sets  Two point crossover –Standard mutation Example: GABIL

20 Two Point Crossover Variable length rule sets – Examples with various representations of a and c values –a 1 a 2 c a 1 a 2 c 10 01 1 11 10 0 – a 1 a 2 c a 1 a 2 c a 1 a 2 c 10 01 1 11 10 0 01 11 1 Result must be a well-formed (WF) bitstring hypothesis –10 01 1 11 10 0 00 0 a 1 a 2 c a 1 a 2 c a 1 a 2 c ?? Idea: find the corresponding positions to do crossover –Choose #crossover points –Choose points in first parent randomly –Choose points in second parent to give WF rules

21 Crossover with variable length bitstrings Two (initially equal length) hypotheses: a 1 a 2 c a 1 a 2 c h 1 : 10 01 1 11 10 0 h 2 : 01 11 0 10 01 0 Choose crossover points for h, e.g., after bits ,  Now restrict crossover points in h 2 to those that produce bitstrings with well-defined semantics e.g., , ,  Example: if we choose , result is a 1 a 2 c h 3 :  a 1 a 2 c a 1 a 2 c a 1 a 2 c h 4 : 

22 GABIL results 92% correctness in basic form Can extend to many variants –Add new genetic operators, also applied probabilistically:  Add Alternative : generalize constraint on a i by changing a  to   Drop Condition : generalize constraint on a i by changing every  to  –Performance improves to 95% Add two bits to determine whether to allow these a 1 a 2 c a 1 a 2 c AA DC 01 11 0 10 01 0 1 0 –The learning strategy also evolves! r  0.6, m  0.001, p  100-1000 Performance of GABIL comparable to symbolic rule/tree learning methods

23 23 Example: Traveling Salesman Problem (TSP) The traveling salesman must visit every city in his territory exactly once and then return to the starting point; given the cost of travel between all cities, how should he plan his itinerary for minimum total cost of the entire tour? TSP is NP-Complete

24 24 TSP solution by GA A vector v = (i 1 i 2… i n ) represents a tour (v is a permutation of {1,2,…,n}) Fitness f of a solution is the inverse cost of the corresponding tour Initialization: use either some heuristics, or a random sample of permutations of {1,2,…,n} Fitness: proportionate selection

25 25 TSP Crossover Build offspring by choosing a sub-sequence of a tour from one parent and preserving the relative order of cities from the other parent and feasibility Example: p 1 = (1 2 3 4 5 6 7 8 9) and p 2 = (4 5 2 1 8 7 6 9 3) First, the segments between cut points are copied into offspring o 1 = (x x x 4 5 6 7 x x) and o 2 = (x x x 1 8 7 6 x x)

26 26 TSP Crossover Next, starting from the second cut point of one parent, the cities from the other parent are copied in the same order The sequence of the cities in the second parent is 9 – 3 – 4 – 5 – 2 – 1 – 8 – 7 – 6 After removal of cities from the first offspring we get 9 – 3 – 2 – 1 – 8 This sequence is placed in the first offspring o 1 = (2 1 8 4 5 6 7 9 3), and similarly in the second o 2 = (3 4 5 1 8 7 6 9 2)

27 27 TSP Inversion The sub-string between two randomly selected points in the path is reversed Example: (1 2 3 4 5 6 7 8 9) is changed into (1 2 7 6 5 4 3 8 9) Such simple inversion guarantees that the resulting offspring is a legal tour

28 Hypothesis space search by GA Local minima not much of a problem –Big jumps in search space Crowding as a problem –Various strategies to overcome over- representation of successful individuals  Tournament/rank selection  Fitness sharing –Fitness of an individual reduced by the presence of similar individuals

29 Population evolution and schemas How to characterize evolution of population in GA? Schema: string containing , ,  (don ’ t care) –E.g.  –Many instances of a schema: , , … –An individual represents many schemas  0010 represents 16 schemas Characterize population by – m ( s, t )  number of instances of schema s in the population at time t Schema theorem characterizes E(m(s,t+1)) in terms of –m(s,t) –Operators of genetic algorithm  Selection  Recombination  Mutation

30 Characterizing Population Change m ( s, t )  number of instances of schema s in the population at time t Find expected value of m ( s, t  ) [E(m( s, t  )] in terms of – m ( s, t ) – breeding parameters Schema theorem provides lower bound on E( m ( s, t  ) ) –First consider case of selection only –Then add effects of  crossover  mutation

31 Selection only f ( h )  fitness of hypothesis (or bit string) h f ( t )  average fitness of population of n at time t m ( s, t )  number of instances of schema s at time t u ( s, t )  average fitness of instances of s at time t n  size of population p t  set of bit strings at time t ( i.e., population ) Probability of selecting h in one selection step Probability in 1 selection of getting an instance of s

32 Schema Theorem After n selections (entire new generation): BUT crossover/mutation may reduce number of instances of s : p c  probability of single point crossover operator p m  probability of mutation operator l  length of single bit strings o ( s )  number of defined (non “  ” ) bits in s d ( s )  distance between leftmost, rightmost defined bits in s

33 Interpretation of Schema theorem Interpret E as –Proportional to average fitness of schema –Inversely proportional to fitness of average individual More fit schema tend to grow in influence under selection, cross-over, and mutation but especially if: –Have small number of defined bits (o(s)) –Have close bundles of defined bits (d(s)) GA keeps good chunks together: –GAs explore the search space by short, low-order schemata which, subsequently, are used for information exchange during crossover –Building block hypothesis: A genetic algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance schemata, called the building blocks

34 Genetic Programming Analogous ideas to basic GA, but elements are programs Fitness: executing the program on a set of training data Maintain the population size Genetic algorithm: evolution of programs Example: programs as (parse) trees  sin  x  ^ x y 2

35 Crossover  sin  x  ^ x y 2  x  y ^ x   x  y  x ^  yx ^ x2

36 Example: Block Problem Goal: spell the word UNIVERSAL Need appropriate representation –Key issue that determines efficiency of learning –Need  Representations of terminal arguments –“ natural ” representation, if possible  Primitive functions I U S ENARVL

37 Example: Block Problem Terminals: – CS (current stack):  Name of top block on stack or F (False) if there is no current stack – TB (top correct block):  name of topmost correct block on stack (it and blocks below are in correct order) – NN (next necessary):  name of the next block needed above TB in the stack or F if no more blocks needed I U S ENARVL

38 Primitive Functions (MS x ) (move to stack): –if block x is on the table, move x to the top of the stack and return the value T. Otherwise, do nothing and return the value F (MT x ) (move to table): – if block x is in stack, move block at top of stack to table and return the value T. Otherwise, return value F (EQ x y ) (equal): –return T if x equals y, and return F otherwise (NOT x ) (not x ): –return T if x  F, else return F (DU x y ) (do until): –execute expression x repeatedly until expression y returns the value T

39 Learned Program Trained to fit 166 test problems Using population of 300 programs, found a solution after 10 generations: (EQ (DU (MT CS ) (NOT CS )) (DU (MS NN ) (NOT NN ))) –(EQ x y ) equal –(DU x y ) do until –(MT x ) move to table – CS current stack –(NOT x ) not x –(MS x ) move to stack – NN next necessary

40 Other examples of genetic programs Design of electronic filter circuits –Discovery of circuits competitive with best human design Image classification

41 Summary of evolutionary programming Conduct randomized, hill-climbing search through hypotheses Analogy to biological evolution Learning  optimization problem (optimize fitness) Can be parallelized easily


Download ppt "Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks."

Similar presentations


Ads by Google