RNA Helical Structures Prediction and Comparison Faculty of Life Sciences, Computer Science Department Bar-Ilan University Nussinov-Wolfson Structural Bioinformatics Group, Faculty of Exact Sciences Tel Aviv University RNA Helical Structures Prediction and Comparison Liron Barzilay and Tom Susel Project Advisor: Prof. Ruth Nussinov
Background: ncRNAs DNA mRNA Protein 3% ~30K ? DNA ncRNA 97% >60K
Bacteriophage MS2 Capsid Hepatitis delta virus Ribozyme E. Coli Initiator tRNA receptor 3 ectodomain complex with double-stranded RNA Haloarcula marismortui 50s ribosomal unit
Structure Importance 1: Function
Structure Importance 2: Drugs Anti-Biotic Targeting: RNA Helixes: Shallow groove Deep groove Mixture of RNA Helixes and phosphate groups:
Background: Structuring Main Idea 2D 3D ?
Background: Reduction to Helixes WHY Helixes? >50% of all ncRNA nucleotides Most conserved (even more than loops) Usually Small (<15 BP, Common 2 BP) 1D≈2D
Idea: RCSB PDB ~800 ncRNAs. ~14500 Helixes. ~125000 2-BP Helixes (Window 2). 1D 3D 1D 3D
Modified Residues (Nucleosides): Intra-Chain Direction Reverse: Analysis: Complexity Modified Residues (Nucleosides): mRNAs: 4-Letter Language. ncRNAs: ~110-Letter Language. Intra-Chain Direction Reverse: mRNAs: Standard 3’-5’/5’-3’. ncRNAs: ~3% Reversed 3’-5’/3’-5’.
Analysis: Complexity Broken Linkage: mRNAs: Complete Phosphate Skeleton. ncRNAs: ~80% Have One or More Phosphate Missing. Non-wc BPs mRNAs: Non-WC BPs Are Rare. ncRNAs: ~40% Non-WC BPs.
Analysis 1: Find Helixes Manual selection of ncRNA find_pairs 2D Only Valid Helixes 1D 2D BP Manager Strand A Strand B Positions A Positions B Initial DB 3D
Analysis 2: Clustering Distance Matrix Engine All Against All RMSD Strand A Strand B Positions A Positions B All Against All RMSD Distance Matrix Engine Window 2 Window 3 Window 6 9240X9240 6848X6848 2273X2273 CLUTO
Output: Helix and Cluster Information Analyzing Results Input: _Strand A_ Clusters DB _Strand b_ Finding The Best Fitted Cluster: Levenshtein Distance (Edit Distance) Finding The Representing Helix: Inner Cluster RMSD (for closest members) Output: Helix and Cluster Information
DEMO!
Some Statistics Code: Running Time: Requirements: C++: ~2200 lines. Perl: ~1200 lines. Html/Java script: ~200 lines. Memory and Multi-Processing Optimized. Include Wrap for 3 external tools. Running Time: Started with 1.2gb of ncRNA PDBs BP Manager: 2 hours . 500mb of Helix-Only PDBs DM Engine: 8h x 5 = 40 hours. Perl Scripts: 30m x 5 = 2.5 hours. 305mb of small PDBs + 2.2gb of Matrices Requirements: C++ code is for Ubuntu (Multi- Processing). Perl is OS-Independent. DM Engine and the associated Perl scripts are very demanding. For window 2 we needed Quad Core @ 3.0 GHz with 4gb of Memory to complete the run before memory overload.