Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所.

Similar presentations


Presentation on theme: "1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所."— Presentation transcript:

1 1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所

2 2/27 Discrete Algorithms Discrete Math. lies in the foundation of modern computer science Most algorithms we have learned in computer science are discrete Discrete algorithms emphasize “worst case analysis” Many sequence manipulation algorithms in bioinformatics are discrete

3 3/27 Natural Problems (1) Natural problems: Problems arisen from nature, which are guaranteed to have feasible solutions if data is collected accurately. – But because of noises in sampled data, such solutions are hard to come by. To tackle these problems one should focus on real data rather than worst case analysis.

4 4/27 Natural Problems (2) Techniques taking advantage of the natural constraints of these problems do not necessarily work for general data (especially the worst case), but could perform very well for those well-structured problems. Examples: –many computational problems arisen from biology, speech recognition, and image processing

5 5/27 Constraints with Errors In ordinary constraint optimization problems, one naturally assumes that the constraints are correct. What if these constraints are inconsistent? –There is no feasible solution satisfying them What if every constraint is only partially correct?

6 6/27 Explicit Solution Candidates In ordinary optimization problems, most algorithms do not generate plausible solutions in the interim However, there are advantages to have some solution candidates when there are errors in the constraints.

7 7/27 Plausible Solution Candidates For some optimization problems, machine learning approaches generate plausible solutions in the interim. –Solutions are getting better while the machine learning approach refines solution patterns iteratively. –A better solution emerges from the cooperation of plausible solution candidates.

8 8/27 Fitness Landscape Each solution candidate has its fitness score for the optimization problem. A fitness landscape shows the fitness distribution of the whole search space. Solution candidates are ranked by fitness judgment.

9 9/27 Genetic Algorithm A search technique to find the exact or approximate solutions to optimization problems. It is based on the principle of evolution –Survival of the fittest in Natural Selection Two basic processes from evolution –Inheritance (passing of features from one generation to the next) –Competition (survival of the fittest)

10 10/27 Basic description of GA Algorithm is started with a set of solutions (represented by chromosomes) called population. Solutions from one population are taken and used to form a new population. The new population (offspring) will be better than the old one (parent). Solutions which are selected to form new solutions are selected according to their fitness - the more suitable they are the more chances they have to reproduce.

11 11/27 GA in Pseudo-code Choose initial population Evaluate the fitness of each individual in the population Repeat –Select best-ranking individuals to reproduce –Breed new generation through crossover and mutation (genetic operations) and give birth to offspring –Evaluate the individual fitness of the offspring –Replace worst ranked part of population with offspring Until termination

12 12/27 Building Block Hypothesis Building block: a short and highly fit schema providing benefit for the solution. The global optimal solution is made up of building blocks. Identify, recombine, and resample small building blocks to form a new solution with potentially higher fitness. By working with these particular building blocks, we have reduced the complexity of our problem.

13 13/27 The Fitness Function Plays the role of a judge Give more scores if the individual owns more building blocks Refine the fitness function based on the evolution results

14 14/27 Physical Mapping

15 Cutting and reassembling for DNA sequence ‧ Cut a DNA sequence into small pieces in different ways and reassemble them together ‧ the “small” pieces (called clones) are still too large to find complete sequences ‧ biologically, use “probe”to mark the clones –each probe could mark several clones clone could contain several probes

16 16/27 The Physical Mapping Problem with Noisy Genomic Data Journal of Computational Biology 10(5), 709-735, 2003 ‧ Each row represents a clone; Each column represents a probe ‧ Diagram on the left: input clone-probe matrix; ‧ Diagram on the right: after probe arrangement the clones are put in correct positions

17 17/27 Consecutive Ones with Errors

18 18/27 False Positives and False Negatives false positive false negative

19 19/27 A genetic algorithm for physical mapping A two-stage genetic algorithm –First stage: generate the neighborhood information among probes –Second stage: generate the maximum length of connecting probes

20 20/27 The first stage of GA (GA1) Purpose: find a probe ordering with the highest fitness score for each clone. Pseudo Code –Random generate a population of probe permutations –Evaluate the fitness of each individual in the population –Repeat Select best-ranking individuals to reproduce Breed new generation through crossover and mutation (genetic operations) and give birth to offspring Evaluate the individual fitnesses of the offspring Replace worst ranked part of population with offspring –Until termination

21 21/27 The first stage of GA (GA1) 4123586911121314151718 ●●● ●● ●●● ●●●●●●● ●●●●● ●●● ●●●● ●●●● Two building blocks that make partial consecutive ones

22 22/27 Crossover Operation 23681910121351114151718 P1 P2 Child23681910111213141817515 91011121314818176532115 23681923681910236819 11236819101112

23 23/27 Mutations 23685910121311112151718 23681910121351112151718

24 24/27 Detection of false Negatives 123456789101112131415 ●●●● ●●● ●●●●●●● ●●●●● ●●●●●● ●●●●●● ●●●●●● ●●●●●●●●● ●●●●●●●●●●

25 25/27 The first stage of GA (GA1) Construct the probe neighboring information according to the GA1 results 12356891011121314151718 5: {3, 6} 6: {5, 8} 8: {6, 9} … 18: {17} 5678910111314151617181920 5: {6} 6: {5, 7} 7: {6, 8} … 20: {19} + 5: {3, 6} 6: {5, 7, 8} 7: {6, 8, 9} … 20: {19} Probe neighboring information Probe ordering result for probe segment 1 Probe ordering result for probe segment 2 838586878889909192939596979899 Probe ordering result for probe segment 20 ………………………………. A neighboring probe list

26 26/27 The second stage of GA (GA2) Purpose: find the longest connecting probe sequence according to the probe neighboring information. Pseudo Code –Random generate a population of probe permutations –Evaluate the fitness of each individual in the population –Repeat Select best-ranking individuals to reproduce Breed new generation through crossover and mutation (genetic operations) and give birth to offspring Evaluate the individual fitnesses of the offspring Replace worst ranked part of population with offspring –Until termination

27 27/27 The second stage of GA (GA2) Generate a probe ordering according to the probe neighboring information 1: {2} 2: {1, 3} 3: {2, 4, 5} 4: {3, 5} 5: {3, 4, 6} 6: {5, 7, 8} 7: {6, 8, 9} … 99: {97, 98} 123456793949596979899 235471727355565799989796


Download ppt "1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所."

Similar presentations


Ads by Google