RiboSearch Ben Daniel Ariel Kirshner Naomi Instructor : Dr. Danny Barash Adaya Cohen
Introduction Biological Introduction Method Layout “The merge strategy” Results and Conclusions
RNA A single-stranded nucleic acid made up of 4 nucleotides : Purines : adenine (A), guanine (G) Pyramidines: cytosine (C), and uracil (U). WC pairs: A-U G-C
Introduction Biological Old scheme Protein carry out all biological functions RNA : only a stage between DNA to protein with no catalytic function DNA RNA Protein
Biological introduction New scheme Since the discovery of self-splicing RNAs in the early 1980’s, a number of new structural and catalytic RNAs have been discovered. Recent studies focusing on non-coding and small RNAs have led to discovery of RNA molecules that posses essential regulatory functions DNA RNA Protein
RNA Secondary Structure Hairpin Internal loop Bulge loop Junction Stem (double strand) pseudoknot The secondary structure of many RNAs is usually more conserved than their sequence
Riboswitch Aptamer Coding section 3’ 5’ Expression platform 5’ UTR 3’ UTR RNA control elements that regulates gene expression, without the participation of proteins Utilize a unique mechanism where by small molecules bind to aptamer/box region causing a conformational switch Were found initially in 5’ UTR of bacteria with successive discoveries in prokaryotes There are evidence suggesting riboswitches could be found in eukaryotes.
Riboswitch mechanism Guanine bind to aptamer region with cause conformational change in the expression platform, which regulates the guanine metabolism.
G-box Regulates genes related to purine metabolism and transport Binds purines Consists of 2 hairpins and 1 internal junction
RiboSearch Goal Finding G-box in eukaryotic genomes Method Combining existing search methods into one overall package
Search Methods Whiffer – CS department, BGU RNAMotif – Macke et al. , 2001 RNAProfile – Pavesi et al. , 2004 STR2 – CS department, BGU
Whiffer Input Pattern that consists of : Output Sequence information Variable gaps Base pairing brackets representing WC pairs Output Candidates locations that meet constraints imposed by the method <<<< [2] TA [5] GTNTCTAC [3] <<<<< [3] CCNNNAA [3] >>>>> [5] >>>>
Whiffer Method Uses simple matching ,based on the constraints ,as opposed to dynamic programming.
RNAMotif Input Database of nucleotide sequences Description file that consists of: Descriptor section Score section (optional) Output Candidates that meet the conditions of the descriptor and the scoring scheme
RNAMotif Sample descriptor file : descr h5 (minlen=6, maxlen=8) ss (minlen=4, maxlen=6) h3 score { gcnt = 0; glen = 0; for( i = 1; i <= NSE; i++ ){ llen=length( se[i] ); glen=glen+llen; for( j = 1; j <= glen; j++ ){ b = se[i,j,1]; if( b == "g" || b == "c" ) gcnt++; { SCORE = 1.0 * gcnt / glen; if( SCORE < .4 ) REJECT; } ss h5 h3
RNAMotif Method Two-stage algorithm Stage I : Compilation stage Analyzing the specific motif, called a descriptor and converting it into a search tree based on the helical nesting of the motif
RNAMotif Method Two-stage algorithm Stage II : DFS Depth first search of the tree that was created by the compilation stage Each time a complete solution to the descriptor is found, the candidate is passed to an optional score section for scoring and ranking In absence of score section the candidate is accepted
RNAProfile Input Number of distinct hairpins a motif has to contain Set of unaligned RNA sequences expected to share a common motif
RNAProfile Output Regions that are most conserved throughout the sequences, according to sequence of the regions Secondary structure that can be formed according to base-pairing and thermodynamic rules
RNAProfile Method Two phases Phase I : Extracting a set of candidate regions from each input sequence, whose predicted optimal secondary structure contains the number of hairpins given as input Phase II : The regions selected are compared with each other to find the group of most similar ones, formed by a region taken from each sequence
Method Summery Whiffer RNAMotif RNAProfile Combines sequence and structure similarity Very high specifity – potential candidates may be ruled out RNAMotif Similarity based mostly on structural elements, according to the descriptor RNAProfile Similarity based on both sequence and structure Recommended as a post-processing step
Structure (bracket notation) The merge strategy Query: Sequence Structure (bracket notation) Input (((..((((…)))).)) Parsing Whiffer RNAMotif Parsing Candidates
Candidates The location contained within a gene The gene is relevant to the requested function (purine metabolism) Filtering RNAProfile Post processing Final candidates
Biological experiments Final candidates Sequence alignment Biological experiments
Results – prokaryote Bacillus Halodurans Merge RNAMotif Whiffer 7 4 Candidates 2 True positives 3 5 False positives False negatives
Results – eukaryote Arabidopsis Thaliana Merge RNAMotif Run #2 Run #1 Whiffer - 70000 30 Candidates 11 17 Final candidates
Results – eukaryote Arabidopsis Thaliana Most promising candidates Arabidopsis Thaliana
c2__11199940_11199996 queryGBox CGTGGATATGGCACGCAAGTTTCTACCGGGCACCGTAAATGTCCGACTAT 50 c2__11199940_11199996_ --TTCAGGTC-CATCTTTGGCTAGACCGAAGTCAGATAATTTGGCGTTAT 47 * * * ** * * **** * * *** * *** queryGBox G-------- 51 c2__11199940_11199996_ AGTCCTGAA 56
c3_20894864_20894920 c3_sequences GGATGAGGAACCAATTGACCCTGGATTTCAAGATT-TACAAAAGAACGTA 49 queryGBox -------------CGTGGATATGGCACGCAAGTTTCTACCGGGCACCGTA 37 ** *** **** ** *** * **** c3_sequences AGCATCC------- 56 queryGBox AATGTCCGACTATG 51 * ***
RiboSearch - Conclusions Filters false positives Sequences are by far less conserved within eukaryotes than prokaryotes The merge strategy is essential in eukaryotic genomes search
Our thanks Dr. Danny Barash Adaya Cohen