Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Functional Non-Coding DNA Part I Non-coding genes and non-coding elements of coding genes BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Chapter 4 Transcription and Translation. The Central Dogma.
Searching genomes for noncoding RNA CS374 Leticia Britos 10/03/06.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Noble Prize.
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
[Bejerano Fall10/11] 1.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.
Predicting RNA Structure and Function
Chapter 17 AP Biology From Gene to Protein.
Chapter 15 Noncoding RNAs. You Must Know The role of noncoding RNAs in control of cellular functions.
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
RNA.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
The many roles of RNA Fran Lewitter Head of Biocomputing Whitehead Institute.
RNA-Seq and RNA Structure Prediction
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov.
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
RNA Processing. RNA, ribonucleic acid single strand polymers of ribonucleotides have secondary and tertiary structure.
The Search for Small Regulatory RNA Central Dogma: DNA to RNA to Protein Replication Processing / Translocation hnRNA rRNAtRNA mRNA.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA folding & ncRNA discovery I519 Introduction to Bioinformatics, Fall, 2012.
1 RNA Bioinformatics Genes and Secondary Structure Anne Haake Rhys Price Jones & Tex Thompson.
Marco Magistri , Journal Club. A non-coding RNA (ncRNA) is any RNA molecule that is not translated into a protein “Structural genes encode proteins.
© Wiley Publishing All Rights Reserved. RNA Analysis.
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
Peptide Bond Formation Walk the Dogma RECALL: The 4 types of organic molecules… CARBOHYDRATES LIPIDS PROTEINS (amino acid chains) NUCLEIC ACIDS (DNA.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 6:
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Basics of RNA structure. RNA functions Storage/transfer of genetic information Genomes many viruses have RNA genomes single-stranded (ssRNA) e.g., retroviruses.
CS5263 Bioinformatics RNA Secondary Structure Prediction.
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Javad Jamshidi Fasa University of Medical Sciences, November 2015 Gene Structure and Transcriptio n.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Lecture 4: Transcription in Prokaryotes Chapter 6.
Motif Search and RNA Structure Prediction Lesson 9.
Tracking down ncRNAs in the genomes. How to find ncRNA gene The stability of ncRNA secondary structure is not sufficiently different from the predicted.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
Lecture 8. Molecular structures The Chinese University of Hong Kong BMEG3102 Bioinformatics.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
Biochemistry Free For All
Halfway Feedback (yours)
RNA sequence-structure alignment
Protein Synthesis Part 3
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Protein Synthesis Part 3
Protein Synthesis Part 3
Dynamic Programming (cont’d)
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
mRNA Degradation and Translation Control
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Presentation transcript:

Non-coding RNA gene finding problems

Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment

Central dogma cont’d

Non-coding RNA (ncRNA) –RNA acting as functional molecule. –Not translated into protein. Non-coding RNA gene –The region of DNA coding ncRNA.

Central dogma cont’d ncRNA

Human genome

How many genes do we have? Only about 30,000 to 40,000 protein-coding genes in the human genome [Lander et al. Nature (2001), Venter et al. Science (2001) ]. Total protein coding gene length is only about 1.5 percent of the human genome. (3*10 9 bases)

What did we miss out? Current gene prediction methods only work well for protein coding genes. Non-coding RNA genes are undetected because they do not encode proteins. Modern RNA world hypothesis: –There are many unknown but functional ncRNAs. [Eddy Nature Reviews (2001)] –Many ncRNAs may play important role in the unexplained phenomenon.[Storz Science (2002)]

Question: If there are many ncRNAs, what are they doing? Question: Biologically, why do we need functional ncRNAs in addition to protein?

Why do we need ncRNAs? ncRNAs involve sequence specific recognition of other nucleic acids (e.g. mRNAs, DNAs). ncRNA is an ideal material for this role. –DNA is big and packaged and can do this job. Base complementary allows ncRNA to be sequence specific! For example: –small interfering RNAs (siRNA) is used to protect our genome. –It recognizes invading foreign RNAs/DNAs based on the sequence specificity. –And helps to degrade the foreign RNAs.

What do they do? RNA-protein machine: –Transfer RNA (tRNA). –Ribosomal RNA (rRNA). –RNAs (snRNAs) in spliceosome. Catalytic RNAs (ribozymes): catalyzing some functions. Micro RNAs (miRNAs): regulatory roles. Small interfering RNAs (siRNAs): RNA silencing –The genome’s immune system. [Plasterk, Science (2002)] –The breakthrough of the year by Science magazine in 2002.

What do they do? Riboswitch RNAs: a genetic control element, to control gene expression. –found in prokaryotes and plants. eukaryotes? Small nucleolar RNAs (snoRNAs): help the modification of rRNAs. tmRNA (tRNA like mRNA): direct abnormal protein degradation.

How can we find such ncRNA genes in the genome?

RNA secondary structure ncRNA is not a random sequence. Most RNAs fold into particular base-paired secondary structure. Canonical basepairs: –Watson-Crick basepairs: G - C A - U –Wobble basepair: G – U

RNA secondary structure cont’d Stacks: continuous nested basepairs. (energetically favorable) Non-basepaired loops: –Hairpin loop. –Bulge. –Internal loop. –Multiloop.

RNA secondary structure cont’d Most basepairs are non-crossing basepairs. –Any two pairs (i, j) and (i’,j’)  i < i’ < j’ < j or i’ < i < j < j’ Pseudoknots are the crossing basepairs.

Pseudoknots Pseudoknots are important for certain ncRNAs Violate the non-crossing assumption. Pseudoknots make most problems harder We assume there are no pseudoknots otherwise noted. [Rivas and Eddy (1999)]

ncRNA evolution is constrained by it secondary structure Drastic sequence changes can be tolerated. Compensatory mutations are very common. –One basepair mutates into another basepair. –Doesn’t change its secondary structure. In this talk: ncRNA – conserved structured RNA. tRNA1: tRNA2: Compensatory mutation

Non-coding RNA gene finding de novo prediction : –Find stable secondary structure from genome. [Shapiro et al. (1990)] The stability of ncRNA secondary structure is not sufficiently different from the predicted stability of a random sequence. [Rivas and Eddy (2000)]. –Look transcript signals. [Wassarman et al. (2001), Argaman et al. (2000)] ncRNA transcript signals are not strong. protein coding gene signals (open reading frame, promoter). [Rivas and Eddy (2000)].

RNA secondary structure prediction It is a basic issue in ncRNA analysis It is important information to the biologists. Searching and alignment algorithms are based on these models. RNA secondary structure -- a set of non- crossing base pairs.

Base pair maximization problem A simple energy model is to maximize the number of basepairs to minimize the free energy. [Waterman (1978), Nussinov et al (1978), Waterman and Smith (1978)] G – C, A – U, and G – U are treated as equal stability. Contributions of stacking are ignored.

A dynamic programming solution Let s[1…n] be an RNA sequence. δ(i,j) = 1 if s[i] and s[j] form a complementary base pair, else δ(i,j) = 0. M(i,j) is the maximum number of base pairs in s[i…j]. [Nussinov (1980)]

A dynamic programming solution M(1,n) is the number of base pairs in the optimal basepaired structure for s[1…n]. All these basepairs can be found by tracing back through the matrix M. Filling M needs O(n 3 ) time.

RNA structure: example i j A C G A U U

Zuker-Sankoff minimum energy model Stacks (contiguous nested base pairs) are the dominant stabilizing force – contribute the negative energy Unpaired bases form loops contribute the positive energy. –Hairpin loops, bulge/internal loops, and multiloops. Zuker-Sankoff minimum energy model. [Zuker and Sankoff (1984), Sankoff (1985)] Mfold and ViennaRNA are all based on this model. (this model is also called mfold model)

Zuker-Sankoff minimum energy model :eH(i,j) :a+3*b+4*c :eL(i,j,i’,j’) i j i jJ’ i’ [Lyngsø (1999)] i i+1 j j-1 :eS(i,j,i+1,j-1)

RNA minimum energy problem This problem can be solved by a dynamic programming algorithm in O(n 4 ) time. Lyngsø et al. (1999) revise the energy function for internal loop, proposed an O(n 3 ) time solution.

Zuker-Sankoff model

Recursive functions W(i) holds the minimum energy of a structure on s[1…i]. V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

Recursive functions (Zuker) W(i) holds the minimum energy of a structure on s[1…i]. V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

A recursive solution W(i) holds the minimum energy of a structure on s[1…i]. V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

A recursive solution W(i) holds the minimum energy of a structure on s[1…i]. V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

A recursive solution W(i) holds the minimum energy of a structure on s[1…i]. V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair. WM(i,j) holds the minimum energy of a structure on s[I…j] that is part of multiloop.

Prediction with pseudoknots Base pair maximization allowing crossing pairs can be solved in polynomial time. Ieong et al. (2003) proved that base pairing maximization problem allowing crossing pairs in a planar secondary structure is NP-hard.

Prediction with pseudoknots Prediction allowing generalized pseudoknots with energy functions depending on adjacent basepairs is NP-hard. –Akutsu (2000) (longest common subsequence for multiple sequences (LCS)). –Lyngsø and Pedersen (2000) (3SAT). –similar to Zuker-Sankoff minimum energy model. Pseudoknots in structure-known RNAs. –Biologists are not interested in the approximation solutions. –Most pseudoknots are planar. –Not too many variations. Rivas and Eddy (1999) presented a O(n 6 ) solution allowing most types of pseudoknots in known ncRNAs.