Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digression: Symbolic Regression Suppose you are a criminologist, and you have some data about recidivism. Suppose you are a criminologist, and you have.

Similar presentations


Presentation on theme: "Digression: Symbolic Regression Suppose you are a criminologist, and you have some data about recidivism. Suppose you are a criminologist, and you have."— Presentation transcript:

1 Digression: Symbolic Regression Suppose you are a criminologist, and you have some data about recidivism. Suppose you are a criminologist, and you have some data about recidivism. Years in Prison Holds Ph.D IQ Injects Heroin in Eyeballs Recidivist 10 0 87 1 1 4 1 86 0 0 22 1 186 1 1 6 0 108 0 1 8 0 143 0 0 : : : : :

2 Criminology 101 You want a formula that predicts if someone will go back to jail after being released. You want a formula that predicts if someone will go back to jail after being released. The formula will be based on the data collected, so the “independent variables” are The formula will be based on the data collected, so the “independent variables” are –x 1 = number of years in jail –x 2 = holds Ph.D. –x 3 = IQ –etc. This is usually done with “regression”. Here is a simpler example, with one independent variable. This is usually done with “regression”. Here is a simpler example, with one independent variable.

3 Symbolic Regression A simple data set with one independent variable, called x. What’s the relationship between x and y? A simple data set with one independent variable, called x. What’s the relationship between x and y? x y xy 12457:12457: 2.1 3.3 3.1 1.8 3.2 :

4 Symbolic Regression You might try “linear regression:” You might try “linear regression:” x y y = mx + b

5 Symbolic Regression You might try “quadratic regression:” You might try “quadratic regression:” x yy = ax 2 + bx + c

6 Symbolic Regression You might try “exponential regression:” You might try “exponential regression:” x yy = ax b + c

7 Symbolic Regression How would you choose? How would you choose? Maybe there is some underlying “mechanism” that produced the data. Maybe there is some underlying “mechanism” that produced the data. But you may not know… But you may not know… “Symbolic regression” finds the form of the equation, and the coefficients, simultaneously. “Symbolic regression” finds the form of the equation, and the coefficients, simultaneously.

8 How To Do Symbolic Regression? One way: genetic programming. One way: genetic programming. “The evolution of computer programs through natural selection.” “The evolution of computer programs through natural selection.” The brainchild of John Koza, extending work by John Holland. The brainchild of John Koza, extending work by John Holland. A very bizarre idea that actually works! A very bizarre idea that actually works! We will do this. We will do this.

9 Regression via Genetic Programming We know how to produce “algebraic expression trees.” We know how to produce “algebraic expression trees.” We can even form them randomly. We can even form them randomly. Koza says “Make a generation of random trees, evaluate their fitnesses, then let the more fit have sex to produce children.” Koza says “Make a generation of random trees, evaluate their fitnesses, then let the more fit have sex to produce children.” Maybe the children will be more fit? Maybe the children will be more fit?

10 Expression Trees Again A one-variable tree is a regression equation: A one-variable tree is a regression equation: + * x 2 - x+.5x y = (((x + 0.5) - x) + (2 * x))

11 Evaluating Expression Trees y p = (((x + 0.5) - x) + (2 * x)) xy o y p |y o - y p | 2 1245712457 2.1 2.5 0.16 3.3 4.5 1.44 3.1 8.5 29.16 1.8 10.5 75.69 3.2 14.5 127.69 234.14 = “fitness” Superscripts: “o” for “observed” “p” for “predicted”

12 A Generation of Random Trees … Tree 1Tree 2Tree 3Tree 4 Tree Fitness 1 335 2 1530 3 950 4 1462 : (most of these are really rotten!)

13 Choosing Parents … Tree 1Tree 2Tree 3Tree 4 Tree Fitness 1 335 2 1530 3 950 4 1462 : Choose these two, randomly, “proportional to their fitness" Generation 1

14 “Sexual Reproduction” Choose “crossover points”, at random Then, swap the subtrees to make two new child trees: Generation 1 Generation 2

15 The Steps 1. Create Generation 1 by randomly generating 500 trees. 2. Find the fitness of each tree. 3. Choose pairs of parent trees, proportional to their fitness. 4. Crossover to make two child trees, adding them to Generation 2. 5. Continue until there are 500 child trees in Generation 2. 6. Repeat for 50 generations, keeping the best (most fit) tree over all generations.

16 How Could This Possibly Work? No one seems to be able to say… No one seems to be able to say… John Holland proved something called the “schema theorem,” but it really doesn’t explain much. John Holland proved something called the “schema theorem,” but it really doesn’t explain much. It’s a highly “parallel” process that recombines “good” building blocks. It’s a highly “parallel” process that recombines “good” building blocks. It really does work very well for a huge variety of hard problems! It really does work very well for a huge variety of hard problems!

17 Why This, in a Java Course? Because we’re going to implement it! Because we’re going to implement it! Because writing code to implement this isn’t too hard. Because writing code to implement this isn’t too hard. Because it illustrates a large number of O-O and Java ideas. Because it illustrates a large number of O-O and Java ideas. Because it’s fun! Because it’s fun! Here is what my implementation looks like: Here is what my implementation looks like:

18


Download ppt "Digression: Symbolic Regression Suppose you are a criminologist, and you have some data about recidivism. Suppose you are a criminologist, and you have."

Similar presentations


Ads by Google