Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik, D. Ecker, R. Sampath, Natural Selection Inc. and Ibis Therapeutics Presenter: Elena Zheleva April 2, 2004
Introduction Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Problem Statement Computational Biology problem: given a RNA secondary structure description, search for similar secondary structures Currently, exhaustive search techniques are used to narrow down search space Authors focus on presentation and set of operators to search via evolution
Outline Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Background RNA (ribonucleic acid) directs middle steps of protein production single-stranded, certain parts are folded RNA Secondary Structure - accounts for diverse functional activities
Background RNA Secondary Structure: Recurs in multiple genes within a single organism Recurs in across the same gene in several organism Why a computational tool for RNA secondary structure search? Discover new structures Improve understanding of functional and regulatory relationships amongst related RNAs
Background – RNAMotif RNAMotif: mines nucleotide sequence databases for repeating structure motifs RNAMotif Input: descriptor contains details about pairing information, length, sequence
Background - RNAMotif RNAMotif Output: list of real structures RNAMotif may return a very high number of motifs when descriptor is more flexible Input to the EA: RNAMotif Output
Outline Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Evolutionary Computation Population Initialization P parent bins B – bin size Bin = a contending solution Each bin contains structures from different organisms Structures chosen at random from RNAMotif Output file Figure 1
Outline Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Evolutionary Computation Variation P parent bins are copied to O offspring bins Variables: operator, number of times to apply it Variation Operator 1: structure replacement within a specified organism Replacement taken from RNAMotif Output File Local – neighboring replacement structure Global – random replacement structure Example: P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}
Evolutionary Computation Variation Variation Operator 2: Structure replacement from different organisms Variable: # of structures to be replaced Example: # = 2 P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus} O organisms = {H. Sapiens, C. Griseus, E. Coli, S. Scrofa}
Evolutionary Computation Variation Variation Operator 3: random single-point bin recombination Generates a second parent from RNAMotif output and applies single-point bin recombination Chooses randomly one of the two offsprings Example: P = {H, S, E, G} P = {D, E, O, B} O = {H, S, E, B} O = {D, E, O, G} Variation Operator 4: random multi-point bin recombination
Outline Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Evolutionary Computation Fitness Fitness Function Scoring Components: Structure nucleotide sequence similarity Structure length similarity Structure thermodynamic stability similarity These measures are applied pairwise by each structural component and summed into a final bin score
Outline Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Evolutionary Computation Selection Selection: For every bin in population, A set of R rival bins is randomly selected Calculate score = # rivals with lower fitness Lower bins are removed Iterations continue until number of generations (G) or CPU time is satisfied, or until expected change of fitness/gen 0
Outline Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Results Experiment 1: 7.6x10 possible bins Exhaustive search: 125 days EA examined ~10 bins before converging < 3 minutes 8 4
Results
To test the utility of this method Run on newly discovered genomes (S. Pyogenes) Compare to database which has an alignment for this RNA secondary structure for previously discovered genomes (S. Mutans) Found similar sequence and structure to close organisms
Outline Problem Statement Background Evolutionary Computation Population initialization Variation Fitness Selection Results Conclusion
Conclusion Evolutionary Algorithm can be applied to find RNA secondary structures over a wide range of organisms Converges quickly and reliably Algorithm comes up with a solution which contains information about structural elements for different organisms/genomes