The Chinese University of Hong Kong

Name: The Chinese University of Hong Kong
Uploaded: 2017-07-14T04:32:21+00:00
Duration: PTM23S34
Channel: Joanna West
Description: The Chinese University of Hong Kong

Lecture 7. Topics in RNA Bioinformatics (Identification of RNA Structures)
The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology

Lecture outline Sequence-based prediction methods
RNA footprinting and high-throughput methods Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Sequence-based Prediction Methods
Part 1 Sequence-based Prediction Methods

RNA structures Some RNAs have strong structural features highly related to their functions tRNA snoRNA rRNA Image sources: Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Secondary vs. tertiary structure
Four levels of molecular structures: Primary: The sequence Secondary: Local interactions Tertiary: Global interactions Quaternary: Inter-molecule interactions Both secondary and tertiary RNA structures are meaningful However, more work has been devoted to identifying/predicting RNA secondary structures Also focus of this lecture Last update: 20-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Methods for predicting RNA structures
Wikipedia contains a comprehensive list: Main classes: Models specific to a particular type of RNA Based on a single sequence Minimum free energy (MFE) Partition function Based on comparison of multiple sequences Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Type-specific models Example: tRNAscan-SE for finding tRNAs and predicting tRNA structures Three main phases: Running tRNAscan and the Pavesi algorithm to find candidate tRNAs Using a covariance model to identify the more confident candidates Trimming the candidates and predicting the detailed secondary structures Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Workflow of tRNAscan-SE
Step 1 Step 2 Image credit: Lowe and Eddy, Nucleic Acids Research 25(5): , (1997) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

1a. tRNAscan Features used: Invariant and semi-invariant bases
Potential base-pairing structures consistent with the cloverleaf secondary structure The aminoacyl arm, the D arm, the anticodon arm and the T--C arm Length and position of potential intron sequences Image credit: Fichant and Burks, Journal of Molecular Biology 220(3): , (1991) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

1a. tRNAscan: sequential tests
Image credit: Fichant and Burks, Journal of Molecular Biology 220(3): , (1991) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

1b. The Pavesi algorithm Frequency tables based on 231 nuclear tRNA genes: Image credit: Pavesi et al., Nucleic Acids Research 22(7): , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

1b. The Pavesi algorithm: workflow
Image credit: Pavesi et al., Nucleic Acids Research 22(7): , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

2. Chomsky hierarchy of languages
Image source: Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

2. Context-free grammar Use of context-free grammars in RNA secondary structure representation: To capture paring relationships. Example: One possible derivation: S0  S1  CS2G  CAS3UG  CAS4S9UG  CAUS5AS9UG  CAUCS6GAS9UG  CAUCAS7GAS9UG  CAUCAGS8GAS9UG  CAUCAGGGAS9UG  CAUCAGGGAAS10UUG  CAUCAGGGAAGS11CUUG  CAUCAGGGAAGAS12UCUUG  CAUCAGGGAAGAUS13UCUUG  CAUCAGGGAAGAUCUCUUG Productions P = { S0S1, S1CS2G, S1AS2U, S2AS3U, S3S4S9, S4US5A, S5CS6G, S6AS7, S7US7, S7GS8, S8G, S8U, S9AS10U, S10CS10G, S10GS11C, S11AS12U, S12US13, S13C} Example credit: Sakakibara et al., CPM , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

2. Parse tree and RNA structure
Figure credit: Sakakibara et al., The 5th Annual Symposium on Combinatorial Pattern Matching , (1994) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

2. SCFG and CM Stochastic context-free grammar (SCFG): context-free grammar with probabilistic derivation Covariance model (CM): model for representing RNA sequence and structure profiles based on SCFG Last update: 20-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

2. SCFC and CM Output CM: White state: consensus Gray state: indels Input multiple sequence alignment and consensus structure: Construction of guide tree from consensus structure: Node Description MATP Pair MATL Single strand, left MATR Single strand, right BIF Bifurcation ROOT root BEGL Begin, left BEGR Begin, right END End Image credit: INFERNAL user’s guide Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Based on a single sequence
Minimum free energy (MFE): Finding the RNA structure (pairing of bases) that minimizes the free energy More pairing More stable pairing Strong GC pairing Stable structures such as stacking pairs Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Turner’s free energy model
Considering the total free energy of an RNA structure is the sum of the free energy of the sub-structures Image credit: Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Turner’s energy parameters
For hairpin loops: Table credit: Mathews et al., Journal of Molecular Biology 288(5): , (1999) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Energy minimization Dynamic programming (without pseudoknots)
Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Partition function Issues of MFE:
Solution may not be optimal There can be different structures with similar free energy The partition function sums over the relative likelihood of all possible secondary structures: S: Possible secondary structures G(S): Gibb’s free energy change R: Gas constant T: Absolute temperature Probability of a particular structure s, Mathews, RNA 10(8): , (2004) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Based on multiple sequences
The conservation of a base and the co-conservation of a base pair in multiple sequences can help resolve ambiguous cases In fact, a CM can be trained from a multiple sequence alignment Main types: Joint optimization Consensus/alignment of individual structures Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

RNA Footprinting and High-Throughput Methods
Part 2 RNA Footprinting and High-Throughput Methods

RNA footprinting A traditional way to study RNA secondary structures
Preferentially cleave or mark nucleotides with a particular structural property Image credit: Novikova et al., International Journal of Molecular Sciences 14(12): , (2013) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Some probes that can be used
Size (Dalton) Structural preference DMS (dimethyl sulfate) 126 Mark unpaired bases IM7 222 RNase V1 15,900 Cleave paired bases RNase ONE 27,000 Cleave unpaired bases Nuclease S1 32,000 Nuclease P1 36,000 Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

High-throughput RNA footprinting
Enzyme-based: After enzymatic treatment, sequence the resulting fragments to identify the cleavage sites And thus bases with the structural property Chemical-probe-based: Chemical adduct can terminate reverse-transcription. The termination point can be identified by sequencing Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Parallel Analysis of RNA Structure (PARS)
Image credit: Kertesz et al., Nature 467(7311): , (2010) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

DMS-seq Image credit: Rouskin et al., Nature 505(7485):701-705, (2014)
Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Potential confounding factors
General: Expression level of transcripts Need control/comparison Sequence bias Issues in read alignment Blind tail – Fragments that are too short cannot be aligned correctly Experimental efficiency Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Potential confounding factors
Method-specific: DMS modifies mainly only adenines and cytosines Increasing read count towards 3’ end in DMS-seq Natural polymerase drop-off in chemical-probe-based methods Preference due to secondary vs. tertiary structure (e.g., steric hindrance in enzyme-based methods) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Data normalization Normalization strategies:
Transcript level: Comparison using Standard RNA-seq data Control experiment (with some steps not carried out) Data from two different enzymes (PARS) Increasing read count: Smoothing by local window Polymerase drop-off: Modeling it explicitly Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Example: Poisson linear model
Modeling local sequence bias i: measured read count of nucleotide i : actual expression level of transcript bik: the k-th nucleotide within the length-K local sub-sequence around nucleotide i kh: bias coefficient Li et al., Genome Biology 11(5):R50, (2010) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

In vivo vs. in vitro data Image credit: Rouskin et al., Nature 505(7485): , (2014) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Using structure-probing data
The high-throughput RNA footprinting (structure-probing) data only tell whether a base is paired or not, but not with which other base The data can be used to help RNA secondary structure prediction Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Using structure-probing data
Several ways to use the data: Free energy penalty Pseudo free energy terms Discrepancy minimization Identifying closest structure centroid Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Example: StructureFold
Overall workflow: Image credit: Tang et al., Bioinformatics 31(16): , (2015) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Example: RNApbfold Minimizing discrepancies between predicted ( ) and measured ( ) probabilities of bases being unpaired: Perturbation of the energy parameter values Variance terms indicating uncertainty Washietl et al., Nucleic Acids Research 40(10): , (2012) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Example: RNApbfold Sample results:
Image credit: Washietl et al., Nucleic Acids Research 40(10): , (2012) Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

Summary Sequence-based RNA secondary structure prediction
For specific types of RNA Single sequence Minimum free energy (MFE) Partition function Multiple sequences High-throughput RNA structure probing Modification of objective function Selection of appropriate structures Last update: 17-Oct-2015 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Fall 2015

The Chinese University of Hong Kong

Similar presentations

Presentation on theme: "The Chinese University of Hong Kong"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Chinese University of Hong Kong

Similar presentations

Presentation on theme: "The Chinese University of Hong Kong"— Presentation transcript:

Similar presentations

About project

Feedback