Lab 8.3: RNA Secondary Structure

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA Secondary Structure Prediction
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
MiRNA in computational biology 1 The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig C. Mello for their discovery of "RNA interference.
Homework Assignments due next session 1.Find a entry of interest in OMIM ( )
RNA Structure Prediction
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
RNA Structure Prediction Rfam – RNA structures database RNAfold – RNA secondary structure prediction tRNAscan – tRNA prediction.
Comparative ab initio prediction of gene structures using pair HMMs
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
1 Ref: Ch. 5 Mount: Bioinformatics i.Protein synthesis: ribosomal RNA transfer RNA messenger RNA ii.Catalysis e.g. ribozymes iii.Regulatory molecules 17.1.
Predicting RNA Structure and Function
RiboSearch Ben Daniel Ariel Kirshner Naomi
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.
RNA-Seq and RNA Structure Prediction
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
IN THE NAME OF GOD. PCR Primer Design Lecturer: Dr. Farkhondeh Poursina.
C OMPUTATIONAL BIOLOGY. O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Tools of Bioinformatics
Strand Design for Biomolecular Computation
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
© Wiley Publishing All Rights Reserved. RNA Analysis.
RNA Structure Prediction
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Pre-mRNA secondary structures influence exon recognition Michael Hiller Bioinformatics Group University of Freiburg, Germany.
Motif Search and RNA Structure Prediction Lesson 9.
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
RNA Structure Prediction
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Lecture 8.21 Lecture 8.2: RNA Jennifer Gardy Centre for Microbial Diseases and Immunity Research University of British Columbia
Lab 4.11 Lab 4.1: Multiple Sequence Alignment Jennifer Gardy Molecular Biology & Biochemistry Simon Fraser University.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Genome Annotation (protein coding genes)
Lab 4.1: Multiple Sequence Alignment & Phylogenetic Trees
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
Identification and Characterization of pre-miRNA Candidates in the C
Sequence Based Analysis Tutorial
Basic Local Alignment Search Tool
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Basic Local Alignment Search Tool (BLAST)
Problems from last section
Lab 2: Information Retrieval
Welcome - webinar instructions
Presentation transcript:

Lab 8.3: RNA Secondary Structure February, 2006 Jennifer Gardy Centre for Microbial Diseases and Immunity Research University of British Columbia jennifer@cmdr.ubc.ca Lab 8.3 (c) 2006 CGDN

Outline The chocolate receptor – yum! MFOLD: Secondary structure prediction Building a consensus sequence Forcing base pairs Searching for RNA motifs in the genome Lab 8.3

The chocolate receptor Human chocolate receptor found! You suspect that the chocolate receptor is posttranscriptionally regulated in response to complex behavioral stimuli, but how? Lab 8.3

Unusually long 3’ UTR Lab 8.3

Analysis of the 3’ UTR Lab studies show bases 3430-4025 necessary and sufficient for posttranscriptional regulation Cross-linking data shows a protein factor (the Valentine factor) binds this region in at least two, if not more, places Sequence analysis doesn’t turn up anything RNA binding proteins often don’t recognize SEQUENCE, they recognize STRUCTURE Many RNA secondary structures involved in gene regulation are short hairpins with a bulge in the stem Lab 8.3

Hypothesis The Valentine factor proteins binds the chocolate receptor 3’ UTR via a conserved secondary structure RNA motif to regulate expression of the receptor Computational analysis (2° structure prediction) can identify this motif in the chocolate receptor We can use our motif to find more Valentine-regulated genes in the human genome Lab 8.3

2° Structure Prediction with MFOLD MFOLD developed by Mike Zuker in 1989 Uses energy minimization to predict folding of an RNA sequence into a secondary structure: Uses a set of base pairing rules (e.g. G:C, A:U, G:U only) to create “optimal” structure (lowest free energy) and “suboptimal” structures (within 12kcal/mol of optimum) 41% overall accuracy when tested on known RNA structures (Doshi et al., BMC Bioinformatics 5:105) Day 8 wiki > HSCHOCR_utr.txt file (FASTA file of key region involved in gene regulation) – open onscreen Open a 2nd browser to the MFOLD web server: http://www.bioinfo.rpi.edu/applications/mfold/old/rna/ Lab 8.3

Submitting RNA to MFOLD Paste your sequence Use default parameters Scroll wayyyy down and hit “Fold RNA” Lab 8.3

On your own: Viewing MFOLD Predictions – ~15min. Have each team member open one of the top four structures onscreen: Save file locally, at shell prompt, type gv filename Look for conserved 2° structure motifs Hint: Pay special attention to hairpin loops with a bulge! How many motifs do you find in each structure? Lab 8.3

2 Motifs in Optimal Structure Lab 8.3

3 Motifs in Structures 2 & 3 Lab 8.3

From Motif to Sequence 6bp loop, 3bp – bulge – 5bp stem Grab sequences of the motifs found: 3rd motif in structures 2, 3 is shortest Use as a guide for the lengths of the others UUAUCGGAAGCAGUGCCUUCCAUAA GUAUCGGAGACAGUGAUCUCCAUAU UUAUCGGGAGCAGUGUCUUCCAUAA Lab 8.3

From Sequence to Consensus UUAUCGGAAGCAGUGCCUUCCAUAA GUAUCGGAGACAGUGAUCUCCAUAU UUAUCGGGAGCAGUGUCUUCCAUAA Align sequences Note conserved residues Trim non-conserved ends Organize into structure Helps with covariance Will help downstream Change Us to Ts FASTA file is ACTGs UAUCGGAAGCAGUGCCUUCCAUA UAUCGGAGACAGUGAUCUCCAUA UAUCGGGAGCAGUGUCUUCCAUA UAU|C|GGAAG|CAGUGC|CUUCC|AUA UAU|C|GGAGA|CAGUGA|UCUCC|AUA UAU|C|GGGAG|CAGUGU|CUUCC|AUA >>> * >>>>> <<<<< <<< TAT|C|GGAAG|CAGTGC|CTTCC|ATA TAT|C|GGAGA|CAGTGA|TCTCC|ATA TAT|C|GGGAG|CAGTGT|CTTCC|ATA >>> * >>>>> <<<<< <<< Lab 8.3

On your own: Consensus Sequence – ~5 min. What would the motif’s consensus sequence be? Search 3’ UTR FASTA file for more occurrences: >HSCHOCR Human mRNA for chocolate receptor; positions 3430..4025 tatttatcagtgacagagttcactataaatggtgtttttttaatagaata taattatcggaagcagtgccttccataattatgacagttatactgtcggt tttttttaaataaaagcagcatctgctaataaaacccaacagatactgga agttttgcatttatggtcaacacttaagggttttagaaaacagccgtcag ccaaatgtaattgaataaagttgaagctaagatttagagatgaattaaat ttaattaggggttgctaagaagcgagcactgaccagataagaatgctggt tttcctaaatgcagtgaattgtgaccaagttataaatcaatgtcacttaa aggctgtggtagtactcctgcaaaattttatagctcagtttatccaaggt gtaactctaattcccatttgcaaaatttccagtacctttgtcacaatcct aacacattatcgggagcagtgtcttccataatgtataaagaacaaggtag tttttacctaccacagtgtctgtatcggagacagtgatctccatatgtta cactaagggtgtaagtaattatcgggaacagtgtttcccataattt TAT|C|GGAAG|CAGTGC|CTTCC|ATA TAT|C|GGAGA|CAGTGA|TCTCC|ATA TAT|C|GGGAG|CAGTGT|CTTCC|ATA >>> * >>>>> <<<<< <<< TAT C GG--- CAGTG- --TCC ATA Lab 8.3

Are There Even More Motifs? Search for sub-segments of the consensus Our sequence contains FIVE motifs! >>> * >>>>> <<<<< <<< TAT C GG--- CAGTG- --TCC ATA Lab 8.3

Revisit MFOLD with New Information MFOLD allows you to “constrain” base pairing When you have prior information about some of the secondary structure elements in a sequence (e.g. which bases should pair with each other to form a helix) Lab 8.3

On your own: Forcing Base Pairs – ~15min. Force command = F a b c a: residue number of first base pair b: residue number of last base pair c: how many consecutive bases to pair E.g. F 1 9 3 Given the information below, set up constraints in MFOLD to force the formation of our five motifs: Rerun MFOLD with your constraints, view optimal structure C A T A T G G A C CATGACATG 1 3 5 7 9 5 |TAT|C|AGTGA|CAGAGT|TCACT|ATA| 27 55 |TAT|C|GGAAG|CAGTGC|CTTCC|ATA| 77 458 |TAT|C|GGGAG|CAGTGT|CTTCC|ATA| 480 523 |TAT|C|GGAGA|CAGTGA|TCTCC|ATA| 545 570 |TAT|C|GGGAA|CAGTGT|TTCCC|ATA| 592 >>> * >>>>> <<<<< <<< Lab 8.3

MFOLD Constraints Lab 8.3

Comparison – Final (L) vs. Original (R) Lab 8.3

Demonstration: Finding More Motifs in the Genome Are other human genes regulated by Valentine factor binding to a motif in the UTR? Find dataset of human genes to search against: Can we constrain our search to a subset of the genome? Must have 5’ or 3’ UTR What resource could we use to create this dataset? Ensembl’s BioMart Create a description of our motif and use this to search the database RNAMOT Lab 8.3

Creating the Dataset Ensembl > BioMart > Homo sapiens genes Export > Sequence > 5’ UTR Gene ID Description Same for 3’ UTR Gunzip each file Concatenate: cat file1 file2 > file3 Lab 8.3

RNAMOT Simple motif-searching algorithm (1990) Command-line usage Other motif searching methods available (e.g. RNAMotif), but require more complex input scripts Command-line usage Basic command: rnamot –s –s datasettosearchagainst –d motifdescriptor –o results -s: sequences to be scanned for the motif UTRs from Ensembl -d: motif descriptor file -o: where to write the output to Lab 8.3

RNAMOT Descriptor Describes the 2° structure motif we are searching for in terms of its secondary structure elements (SSEs) H = helix (stem), s = single-stranded (loop, bulge) H1 s1 H2 s2 H2 H1 H1 3:3 0 H2 5:5 0 s1 1:1 0 s2 6:6 0 M 0 W 0 Order of SSEs, 5’-3’ Helix 1 has min. length 3, max. length 3, no variation Helix 2 has min. length 5, max. length 5, no variation SS region 1 is 1 base long (bulge) SS region 2 is 6 bases long (loop) Do not permit mismatches Do not permit wobble base pairs (G:U) rnamot –s –s human_UTRs.fasta –d choc_rec.txt –o results.txt Lab 8.3

RNAMOT Analysis of Human UTRs 4560 genes, 5381 motifs Lab 8.3

Take Home Messages MFOLD and other RNA secondary structure prediction tools rarely give the right answer first (or at all) Too many possible structures in the low energy neighbourhood Can be used as a “first-pass” tool Eyeball key conserved motifs Collect sequences to build a consensus Often need to adjust parameters Use prior knowledge to force base pairing Motif-searching tools can be used to identify conserved secondary structure motifs in a sequence database Retrieves more results than sequence-based searches Lab 8.3

Other (Optional) Activities The Valentine factor binding motif in the chocolate receptor is actually IRE - the iron response element. The chocolate receptor is transferrin and the Valentine factor is IRF/IRE-BP. Visit UTRSite to learn about IRE. What is UTRSite? http://www2.ba.itb.cnr.it/UTRSite/ Signal Manager > U0002 Visit Rfam’s IRE entry. What is Rfam? http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00037 Read about the IRE in its biological context PMID: 8710843 Lab 8.3