Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Group Meeting July 10, 2009 RNA-Informatics University of Georgia ( Slides are arranged according to the order they are received)

Similar presentations


Presentation on theme: "1 Group Meeting July 10, 2009 RNA-Informatics University of Georgia ( Slides are arranged according to the order they are received)"— Presentation transcript:

1 1 Group Meeting July 10, 2009 RNA-Informatics University of Georgia ( Slides are arranged according to the order they are received)

2 Presentation by Anuj Srivastava 07/10/09 Evolution of RNA Secondary Structure Guided by Dr. Russell L. Malmberg

3 3 Data collection from tmRNA db and RNAseP db Perform PCA and correlation on alignment data Stem statistics generation by RNApasta Corresponding rRNA data collection for above data from rdp rRNA tree creation by Mr. Bayes Generation of ancestral sequences at nodes by DNAPARS Collection of variable structures and analysis of the variability Mapping Pasta Arc Diagram Showing Stems Variability and Conservation for tmRNA Entirely Conserved Partially Conserved Highly Variable

4 4 Matrix Creation GUCUUU GAAGAC GUCUUC GAAGAC GUCUGU GCAGAC ACUCGU GCGGGU ------ UUUGUU AACAAA ------

5 5 Data collection from tmRNA db and RNAseP db Perform PCA and correlation on alignment data Stem statistics generation by RNApasta Corresponding rRNA data collection for above data from rdp rRNA tree creation by Mr. Bayes Generation of ancestral sequences at nodes by DNAPARS Collection of variable structures and analysis of the variability Mapping Matrix Creation Matrix of Variability Among Closely Related Species for All 3 RNA’s Under Study : log 2 ((Freq of Di nucleotide change)/(Freq of Individual Di nucleotide involved in change)) For Example: The value for AC->AG change would be: log 2 ((Freq AC->AG)/(Freq AC * Freq AG))

6 6 Data collection from tmRNA db and RNAseP db Perform PCA and correlation on alignment data Stem statistics generation by RNApasta Corresponding rRNA data collection for above data from rdp rRNA tree creation by Mr. Bayes Generation of ancestral sequences at nodes by DNAPARS Collection of variable structures and analysis of the variability Mapping Matrix Creation Matrix of Variability With Respect to Ancestral Sequences for All 3 RNA’s Under Study :

7 7 Improvement of RNATOPS – handling “ missing stems ” Zhibin Huang

8 8 Background – Holmes 2004 - TKF91 Structure Tree Model Point substitutions of nucleotide; Insertions and deletions of bases, base-pairs, whole stems and multi-stem structures;

9 9 Background – Diana L.Kolbe and Sean R. Eddy – Local RNA structure alignment with incomplete sequence trCYK algorithm

10 10 Goal of this project – RNA structure evolution model – RNA structure search across species

11 11 Methods to be used – Tree decomposition-based dynamic programming; – Bayesian method in evolution model;

12 12 Progresses I made

13 13 1. About RNATOPS 1) Test on the real data from Holmes’ paper 2) Working on incorporating non-filter (whole structure) search and sub-structure filter search module into new version of RNATOPS.

14 14 2. Stem detection in some of the long loop regions 1) Based on the tool of “murlet” 2) Test result

15 15 Structural RNA Entropy Computation Yingfeng Wang

16 16 Goal : distinguish ncRNA sequences Method : calculate entropy with SCFG

17 17 Current Performance

18 18 a discover on tRNA Regular distribution

19 19 a discover on tRNA (cont) Abnormal distribution

20 20 Ab initio Prediction of Protein 3D Structure via Tree Decomposition Joseph Robertson

21 0 1 2 3 4 5 6 7 8 PDB ID: 1A1X Cartoon Model of a Polypeptide This type of model has three elements, alpha helices, beta strands and loops. Helices and strands are the so-called cores.

22 Graph tree-decomposition of 1A1X: 5 0 04 3 5 4 0 1 03 3 5 8 3 14 5 6 7 8 2 03 2 3 0 4 23 2 0 1 Where 5 is the number of treebags, and the records for each bag consist of bag number, parent bag, bag size and bag contents. 3 5 4 0 0 2 1 5 6 7 8 3 5 8 0 3 2 Interactions: 0 – 1, 2, 4, 5 1 – 2 2 – 3 3 – 8 4 – 5 – 6, 7 6 – 7, 8 7 – 8 Graph Tree Decomposition Representation

23 Establishing Topological Constraints The plane defined by the intersection of adjacent balls can represent a topological division. A core that interacts with different, unshared cores in the adjoining bags, is embedded in that plane. Crossing interactions are propagated through the bags on the path between the originating bags containing the cores involved. The interactions among cores are used to constrain the structure on the basis of local topology. Initial constraints are established based on the interactions within bags, and then bags are joined, establishing the topology among successively more distant elements. Interactions that are neither nested nor parallel with respect other interactions, i.e. crossing interactions, establish further constraints on the arrangement of local topological elements.

24 Refining Topology and Geometry Cores are then expanded, with a sphere to represent each interaction, which helps establish relative orientation, i.e. N vs C terminals, or sides of a strand etc. This is done for both shared and local cores.

25 Example: 1A1X 0 2 1 8 3 5 4 6 7 3 5 4 0 0 2 1 5 6 7 8 3 5 8 0 3 2 0 3 12 4 0 1 3 2 4

26 Example: 1A1X 0 2 1 8 3 5 4 6 7 0 1 3 2 4 1 2 2 3 0 0 0 8 8 5 5 6 6 77 0 1 2 3 4 5 6 78

27 27 Scoring RNA Structural Multiple Alignment Sal LaMarca

28 28 Scoring multiple alignment Goal 1: To find a general scoring matrix for multiply aligning RNA sequences Goal 2: Find a trend between scoring and gap penalties for multiple alignment Main methods used – Formulate goal 1 into an optimality problem. Find a scoring matrix that will yield the highest scores on many different sets of “SEED” sequences of multiply aligned RNA sequences from the RFAM database – Search for an optimal scoring matrix via an evolutionary algorithm

29 29 Evolutionary algorithm to search for an optimum scoring matrix Gather a large set of SEED alignments from the same RNA family called k. Gather stats on alignments in k. Randomly initialize a set of scoring matrices S, but seed a few with log odds score or other scoring matrices like RIBOSUM. Evaluate the fitness of each scoring matrix q in s by using q to find the average score of the alignments in k. Perform crossover and mutation. Use a system of credits to enforce boundary conditions on entries of a scoring matrix. Return the scoring matrix that has the highest fitness. Termination criteria satisfied Generation

30 30 Progress and Future Work Formulated the problem into an optimality problem Came up with a model to search for an optimal scoring matrix based on alignments that can be gathered from RFAM Pick a family of RNA to start with and write a program to gather SEED alignments from RFAM Choose which stats are relevant to gather Implement the EA Evaluate the scoring matrix found by the EA by using it in other multiple alignment algorithms ProgressFuture Work

31 31 FPT for Non-MSO Expressible problems (Case Study: Subgraph Isomorphism) Pooya Shareghi

32 32 FPT for Non-MSO Expressible (Case Study: Subgraph Isomorphism) The reviewer mentioned that our result is corollary of a well-known fact about graph homomorphism. The following is from Grohe 2007 "The Complexitty of Homomorphism and Constraint Satisfaction Problems Seen from the Other Side" In graph homomorphism, the mapping does not have to be either injective or surjective.  Subgraph isomorphism is a special case of the graph homomorphism problem. Hell&&Nesetril 1990: – HOM(-,D) is in P if all graphs in D are bipartite. – Otherwise, the problem is NPC. Friday, July 10, 2009

33 33 Related Works Chekuri et al. 1997, also Freuder 1990:  t-HOM(C,  ) is in P if C is a class of graphs of bounded treewidth. Dalamau et al. 2002:  t-HOM(C,  ) is in P if the class C has bounded tree width modulo homomorphic equivalence.  Grohe points out that there is an -time alg for solving the existential (w+1)-pebble game that arises in the proof of Dalamau's THM. Friday, July 10, 2009

34 34 Discussion We studied an O(k^{t+1} |H|)-time alg for k,t- SI(C,  ), where C is a class of graphs with bounded treewidth and if mapwidth is k. Dalamau's complexity result does not reflect the effect of mapwidth. 0≤k≤|G|. Therefore if we ignore the parameter k, our algorithm runs in -time, whereas Grohe reports an -time alg. It would be interesting to investigate the source of this discrepancy as well. Friday, July 10, 2009

35 35 miRNA Finding with SCFG and Other Means Tim Shaw

36 Currently calculating the mean and average CYK and Free Energy value for human Doing literature search on microRNA hunting techniques (Preparing for paper) Dr. Cai propose trying out an inside algorithm on CYK Have some results on Viruses Early planning for Prokaryote's ncRNA gene searching Current Status

37 List of Available Software Programs SrnaLoop: use of Blast canonical sequences filter out using energy, repeat masking, and considers only intragenic regions Erpin: Secondary miRNA structure profiling and use Dynamic Programming to align the loop. (Similar to what MiR-abela and MiPred: ab initio SVM and random Forest

38 List of Available Software Programs MiRScan (I and II): comparative approach, identifies the candidate miRNA possess 7 componenets: Extension of base pairing miRNA base pairing 3' conservation 5' conservation Initial pentamer Distance from loop Bulge Symmetry

39 List of Available Software Programs MiRSeeker Identify conserved regions Identify and rank stem loops (Free Energy) Evaluate pattern of divergence RNAZ and EvoFold: Evolution and conservation VirMir: Search for viral miRNAss using RNAfold

40 Tim's Research Update RNApasta Updated some fixes for bugs within the system Currently working on pseudoknot removal

41 41 Ying Zheng 2009-07-09

42 42 Finished Work Coding. I finish the code in C++. I will test the speed of the new code. Literature review. I read two papers ‘Computational Identification of Drosophila MicroRNA Genes’ and ‘A critical Assessment of the Performance of Homology Search methods on Noncoding RNA’ this week and know the ‘miRseeker’ how to identify the Drosophila miRNAs.

43 43 Next Work Read more papers about the miRNA identification software and summarize them. Finish the ‘inside’ algorithm to compute the total probability. Use the Z-score we computed to find the Z- threshold. Tim and I will have discussion about assigning the next work.

44 44 Algorithms for Base Pair Entropy Detection Amir Manzour

45 45 Detecting Various mRNAs and tRNAs Based on Primary Structure Via characteristics of nucleotide arrangements and base-pairing probabilities such as sequence entropy 1.Space-adaptive generalized algorithm needed for scoring different base-pairing probabilities 2.Functions such as entropy can be used to to score a sequence based on calculated base-pairing probabilities

46 46 Flexible Inside-Outside Algorithm Base-pairing probabilities and inside and outside algorithms formulation adaptive to arbitrary given set of SCFG rules

47 47


Download ppt "1 Group Meeting July 10, 2009 RNA-Informatics University of Georgia ( Slides are arranged according to the order they are received)"

Similar presentations


Ads by Google