Download presentation
1
Lecture 7. Topics in RNA Bioinformatics (Identification of RNA Structures)
The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology
2
Lecture outline Sequence-based prediction methods
RNA footprinting and high-throughput methods Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
3
Sequence-based Prediction Methods
Part 1 Sequence-based Prediction Methods
4
RNA structures Some RNAs have strong structural features highly related to their functions tRNA snoRNA rRNA Image sources: Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
5
Secondary vs. tertiary structure
Four levels of molecular structures: Primary: The sequence Secondary: Local interactions Tertiary: Global interactions Quaternary: Inter-molecule interactions Both secondary and tertiary RNA structures are meaningful However, more work has been devoted to identifying/predicting RNA secondary structures Also focus of this lecture Last update: 20-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
6
Methods for predicting RNA structures
Wikipedia contains a comprehensive list: Main classes: Models specific to a particular type of RNA Based on a single sequence Minimum free energy (MFE) Partition function Based on comparison of multiple sequences Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
7
Type-specific models Example: tRNAscan-SE for finding tRNAs and predicting tRNA structures Three main phases: Running tRNAscan and the Pavesi algorithm to find candidate tRNAs Using a covariance model to identify the more confident candidates Trimming the candidates and predicting the detailed secondary structures Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
8
Workflow of tRNAscan-SE
Step 1 Step 2 Image credit: Lowe and Eddy, Nucleic Acids Research 25(5): , (1997) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
9
1a. tRNAscan Features used: Invariant and semi-invariant bases
Potential base-pairing structures consistent with the cloverleaf secondary structure The aminoacyl arm, the D arm, the anticodon arm and the T--C arm Length and position of potential intron sequences Image credit: Fichant and Burks, Journal of Molecular Biology 220(3): , (1991) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
10
1a. tRNAscan: sequential tests
Image credit: Fichant and Burks, Journal of Molecular Biology 220(3): , (1991) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
11
1b. The Pavesi algorithm Frequency tables based on 231 nuclear tRNA genes: Image credit: Pavesi et al., Nucleic Acids Research 22(7): , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
12
1b. The Pavesi algorithm: workflow
Image credit: Pavesi et al., Nucleic Acids Research 22(7): , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
13
2. Chomsky hierarchy of languages
Image source: Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
14
2. Context-free grammar Use of context-free grammars in RNA secondary structure representation: To capture paring relationships. Example: One possible derivation: S0 S1 CS2G CAS3UG CAS4S9UG CAUS5AS9UG CAUCS6GAS9UG CAUCAS7GAS9UG CAUCAGS8GAS9UG CAUCAGGGAS9UG CAUCAGGGAAS10UUG CAUCAGGGAAGS11CUUG CAUCAGGGAAGAS12UCUUG CAUCAGGGAAGAUS13UCUUG CAUCAGGGAAGAUCUCUUG Productions P = { S0S1, S1CS2G, S1AS2U, S2AS3U, S3S4S9, S4US5A, S5CS6G, S6AS7, S7US7, S7GS8, S8G, S8U, S9AS10U, S10CS10G, S10GS11C, S11AS12U, S12US13, S13C} Example credit: Sakakibara et al., CPM , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
15
2. Parse tree and RNA structure
Figure credit: Sakakibara et al., The 5th Annual Symposium on Combinatorial Pattern Matching , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
16
2. SCFG and CM Stochastic context-free grammar (SCFG): context-free grammar with probabilistic derivation Covariance model (CM): model for representing RNA sequence and structure profiles based on SCFG Last update: 20-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
17
2. SCFC and CM Output CM: White state: consensus Gray state: indels Input multiple sequence alignment and consensus structure: Construction of guide tree from consensus structure: Node Description MATP Pair MATL Single strand, left MATR Single strand, right BIF Bifurcation ROOT root BEGL Begin, left BEGR Begin, right END End Image credit: INFERNAL user’s guide Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
18
Based on a single sequence
Minimum free energy (MFE): Finding the RNA structure (pairing of bases) that minimizes the free energy More pairing More stable pairing Strong GC pairing Stable structures such as stacking pairs Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
19
Turner’s free energy model
Considering the total free energy of an RNA structure is the sum of the free energy of the sub-structures Image credit: Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
20
Turner’s energy parameters
For hairpin loops: Table credit: Mathews et al., Journal of Molecular Biology 288(5): , (1999) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
21
Energy minimization Dynamic programming (without pseudoknots)
Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
22
Partition function Issues of MFE:
Solution may not be optimal There can be different structures with similar free energy The partition function sums over the relative likelihood of all possible secondary structures: S: Possible secondary structures G(S): Gibb’s free energy change R: Gas constant T: Absolute temperature Probability of a particular structure s, Mathews, RNA 10(8): , (2004) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
23
Based on multiple sequences
The conservation of a base and the co-conservation of a base pair in multiple sequences can help resolve ambiguous cases In fact, a CM can be trained from a multiple sequence alignment Main types: Joint optimization Consensus/alignment of individual structures Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
24
RNA Footprinting and High-Throughput Methods
Part 2 RNA Footprinting and High-Throughput Methods
25
RNA footprinting A traditional way to study RNA secondary structures
Preferentially cleave or mark nucleotides with a particular structural property Image credit: Novikova et al., International Journal of Molecular Sciences 14(12): , (2013) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
26
Some probes that can be used
Size (Dalton) Structural preference DMS (dimethyl sulfate) 126 Mark unpaired bases IM7 222 RNase V1 15,900 Cleave paired bases RNase ONE 27,000 Cleave unpaired bases Nuclease S1 32,000 Nuclease P1 36,000 Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
27
High-throughput RNA footprinting
Enzyme-based: After enzymatic treatment, sequence the resulting fragments to identify the cleavage sites And thus bases with the structural property Chemical-probe-based: Chemical adduct can terminate reverse-transcription. The termination point can be identified by sequencing Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
28
Parallel Analysis of RNA Structure (PARS)
Image credit: Kertesz et al., Nature 467(7311): , (2010) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
29
DMS-seq Image credit: Rouskin et al., Nature 505(7485):701-705, (2014)
Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
30
Potential confounding factors
General: Expression level of transcripts Need control/comparison Sequence bias Issues in read alignment Blind tail – Fragments that are too short cannot be aligned correctly Experimental efficiency Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
31
Potential confounding factors
Method-specific: DMS modifies mainly only adenines and cytosines Increasing read count towards 3’ end in DMS-seq Natural polymerase drop-off in chemical-probe-based methods Preference due to secondary vs. tertiary structure (e.g., steric hindrance in enzyme-based methods) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
32
Data normalization Normalization strategies:
Transcript level: Comparison using Standard RNA-seq data Control experiment (with some steps not carried out) Data from two different enzymes (PARS) Increasing read count: Smoothing by local window Polymerase drop-off: Modeling it explicitly Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
33
Example: Poisson linear model
Modeling local sequence bias i: measured read count of nucleotide i : actual expression level of transcript bik: the k-th nucleotide within the length-K local sub-sequence around nucleotide i kh: bias coefficient Li et al., Genome Biology 11(5):R50, (2010) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
34
In vivo vs. in vitro data Image credit: Rouskin et al., Nature 505(7485): , (2014) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
35
Using structure-probing data
The high-throughput RNA footprinting (structure-probing) data only tell whether a base is paired or not, but not with which other base The data can be used to help RNA secondary structure prediction Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
36
Using structure-probing data
Several ways to use the data: Free energy penalty Pseudo free energy terms Discrepancy minimization Identifying closest structure centroid Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
37
Example: StructureFold
Overall workflow: Image credit: Tang et al., Bioinformatics 31(16): , (2015) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
38
Example: RNApbfold Minimizing discrepancies between predicted ( ) and measured ( ) probabilities of bases being unpaired: Perturbation of the energy parameter values Variance terms indicating uncertainty Washietl et al., Nucleic Acids Research 40(10): , (2012) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
39
Example: RNApbfold Sample results:
Image credit: Washietl et al., Nucleic Acids Research 40(10): , (2012) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
40
Summary Sequence-based RNA secondary structure prediction
For specific types of RNA Single sequence Minimum free energy (MFE) Partition function Multiple sequences High-throughput RNA structure probing Modification of objective function Selection of appropriate structures Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.