Fast and effective prediction of miRNA targets Marc Rehmsmeier CeBiTec, Bielefeld University, Germany Junior Research Group Bioinformatics of Regulation
Small interfering RNAs versus small temporal RNAs Hannon. Nature. 418: , 2002.
miRNA/target duplexes Grosshans and Slack. The Journal of Cell Biology, 156(1):17-21, 2002.
A direct approach Given a miRNA and a potential target: What are the energetically most favourable binding sites? Calculation of multiple mfe secondary structure duplexes
The language of RNA duplexes hybrid =nil ><< tt (region,region)||| unpaired_left_top ||| closed... h unpaired_left_top =ult <<< tt (base,empty) ~~~ unpaired_left_top ||| unpaired_left_bot... h unpaired_left_bot = ulb <<< tt (empty,base) ~~~ unpaired_left_bot||| edangle... h edangle = eds <<< tt (base, base) ~~~ closed ||| edt <<< tt (base,emptybase) ~~~ closed ||| edb <<< tt (emptybase,base) ~~~ closed... h
closed =stacking_region||| bulge_top||| bulge_bottom||| internal_loop||| end_loop... h stacking_region = sr <<< basepair ~~~ closed bulge_top= (bt <<< basepair ~~~ tt (uregion, empty)) `topbound` closed bulge_bottom = (bb <<< basepair ~~~ tt (empty, uregion)) `botbound` closed internal_loop= (il <<< basepair ~~~ tt (uregion,uregion)) `symbound` closed end_loop = el <<< basepair ~~~ tt (region,region) The language of RNA duplexes
Dynamic Programming recurrences Time/memory complexity: linear in target length
let-7/lin-41 binding sites position: 688, mfe: kcal/mol position: 737, mfe: kcal/mol
Requirements For prediction of miRNA targets in large databases we need: A fast program Good statistics
Length normalisation of minimum free energies
p-values of individual binding sites
Poisson statistics of multiple binding sites Probability of k binding sites: with For small p-values: The probability of at least k binding sites:
Comparative analysis of orthologous targets
Multi-species p-values Poisson p-values: multi-species p-value: General case: k species
A dependence problem We should see a p-value as often as it says (blue curve), but we don‘t (red curve).
let-7b/NME4 (human/mouse) binding sites -GGCTCAAGCTGCCCTTACCACCCCATCCCCCACGCAGGACCAACTACCTCCGTCAGCAAGAACCCAAGCCCACATCCAAACCTGCCTGTCCCAAACCAC GGGCTTGCACTGCCTTCTGCACTTCAGGTCT-ACCCATGACCTACTACCTCTGTCAACAAGAAGTCAAGCCCCCATGC---TTCCCATGTCCCCAAAC-- **** ***** * *** ** * ** ** **** ******** **** ****** ******* *** * * ****** ** * TTACTTCCCTGTTCACCTCTGCCCCACCCCAGCCCAGAGGAGTTTGAGCCACCAACTTCAGTGCCTTTCTGTACCCCAAGCCAGCACAAGATTGGACCAA -CACTCCCTACTCCCGCTCTACCCAACTCCAGCCCAGGGGAGTCTAAGCCTCAACTCTATGTGCCTTTTTGTATCCTAAGTCAATACAATATTGGACCAT *** ** * * **** *** ** ********* ***** * **** * * * ******** **** ** *** ** **** ********* TCCTTTTTGCACCAAAGTGCCGGACAACCTTTGTGGTGGGGGGGGGTCTTCACATTATCATAACCTCTCCTCTAAAGGGGAGGCATTAAAATTCACTGTG GTCCTTGTGTACAAAAGTGCCAGACAACCTTTG GGGCATTGTCA-AAGGTGACTTCACCTGCCTCAAAGGAGAGACATTAAAATTT--TATG * ** ** ** ******** *********** *** * *** * * * * ** * ***** *** ********** * ** CCCAGCACATGGGTGGTACACTAATTATGACTTCCCCCAGCTCTGAGGTAGAAATGACGCCTTTATGCAAGTTGTAAGGAGTTGAACAGTAAAGAGGAAG CTTAAAAT * * * Multi-species p-value with k = 2:1.5e-08 Multi-species p-value with k = 1.1:5.0e-05 k = 1.1 is the effective k
Effective number of orthologous targets
Requirements For prediction of miRNA targets in large databases we need: A fast program Good statistics
True and false positives and negatives Classify as Positives Classify as Negatives TP FP TN FN Positives Negatives
TP FP TN FN Sensitivity and specificity p-values control specificity Spec TP FP TN FN Spec
RNAhybrid Target prediction workflow target db miRNA registry individual p-values multi-species p- values Poisson p- values bantam #sites target gene E-value Dm Dp Ag CG CG CG CG CG CG CG CG
Prediction of Drosophila miRNA targets 78 miRNAs 28,645 3‘UTRs (1/3 from D. mel, 1/3 from D. pseu, 1/3 from A. gamb)
Bantam hits targetE-value#sites Dm # sites Dp #site s Ag nervous fingers Distal-less Wrinkled (Hid)
miR-7 hits targetE-value#sites Dm # sites Dp #sites Ag CG Twin of m E(spl) region transcript m E(spl) region transcript m CG CG Him CG Arginine methyltransferase
miR-2 hits targetE-value#sitesE- value #sitesE- value #sites grim reaper sickle miR-2amiR-2bmiR-2c plus a number of others
RNAhybrid functionality length normalisation Poisson statistics web server seed/loop constraints miRNA specific statistics effective k comparative analysis multiple binding sites RNAhybrid
miRNA target selection surprise miRNA target selection rank based p-values E-values user guidance p-values indicate not only biochemical possibility, but also biological function.
Acknowledgements Peter Steffen, Robert Giegerich, Jan Krüger Matthias Höchsmann Alexander Stark, Julius Brennecke, Stephen M. Cohen Sven Rahmann Gregor Obernosterer Robert Heinen Leonie Ringrose
References Rehmsmeier M, Steffen P, Höchsmann M and Giegerich R. Fast and effective prediction of microRNA/target duplexes. RNA, 10: , bibiserv.techfak.uni-bielefeld.de/rnahybrid