CS273A Lecture 3: Non Coding Genes MW 12:50-2:05pm in Beckman B100

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

Gene expression From Gene to Protein
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Zhi John Lu, Jason Gloor, and David H. Mathews University of Rochester Medical Center, Rochester, New York Improved RNA Secondary Structure Prediction.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
[Bejerano Fall10/11] 1.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Predicting RNA Structure and Function
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
CS273A Lecture 5: Genes Enrichment, Gene Regulation I
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
RNA-Seq and RNA Structure Prediction
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
[BejeranoFall14/15] 1 MW 12:50-2:05pm in Beckman B100 Profs: Serafim Batzoglou & Gill Bejerano CAs: Jim Notwell & Sandeep Chinchali.
Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA folding & ncRNA discovery I519 Introduction to Bioinformatics, Fall, 2012.
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
Chapter 17: From Gene to Protein Objectives 1. To understand the central dogma 2.To understand the process of transcription 3.To understand the purpose.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 6:
CS5263 Bioinformatics RNA Secondary Structure Prediction.
Progress toward Predicting Viral RNA Structure from Sequence: How Parallel Computing can Help Solve the RNA Folding Problem Susan J. Schroeder University.
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Motif Search and RNA Structure Prediction Lesson 9.
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
RNA MODIFICATION Eukaryotic mRNA molecules are modified before they exit the nucleus.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
Liming Cai’s Research Department of Computer Science The University of Georgia.
Biochemistry Free For All
Fig Prokaryotes and Eukaryotes
CS273A Lecture 2: Protein Coding Genes
Halfway Feedback (yours)
CS273A Lecture 6: Gene Regulation II MW 12:50-2:05pm in Beckman B100
Lecture 21 RNA Secondary Structure Prediction
Lab 8.3: RNA Secondary Structure
From Gene to Protein Chapter 2 and 7 of IB Bio book.
Predicting RNA Structure and Function
LncRNAs exert their effects by diverse mechanisms. LncRNAs exert their effects by diverse mechanisms. (A) lncRNAs can.
Transcription and Gene Expression
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Synthetic Biology: Protein Synthesis
Introduction to Bioinformatics II
Dynamic Programming (cont’d)
Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger
Nat. Rev. Gastroenterol. Hepatol. doi: /nrgastro
mRNA Degradation and Translation Control
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Protein synthesis
Noncoding RNA roles in Gene Expression
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
The Structure of the Genome
Unit 1: 1.5 Structure of the Genome
credit: modification of work by NIH
Chem 291C Draft Sample Preliminary Seminar
Small RNAs and Immunity
Presentation transcript:

CS273A Lecture 3: Non Coding Genes MW 12:50-2:05pm in Beckman B100 Profs: Serafim Batzoglou & Gill Bejerano CAs: Jim Notwell & Sandeep Chinchali http://cs273a.stanford.edu [BejeranoFall14/15]

Announcements http://cs273a.stanford.edu/ Lecture slides, problem sets, etc. Course communications via Piazza Auditors please sign up too Second Tutorial this Wed 10/1 during class (B100). UCSC genome browser. We recommend bringing your laptop. http://cs273a.stanford.edu [BejeranoFall14/15]

Translation: The Genetic Code http://cs273a.stanford.edu [BejeranoFall14/15]

Gene Splicing http://cs273a.stanford.edu [BejeranoFall14/15]

Alternative Splicing http://cs273a.stanford.edu [BejeranoFall14/15]

Genes in the Human Genome When you only show one transcript per gene locus: If you ask the GUI to show you all well established gene variants: http://cs273a.stanford.edu [BejeranoFall14/15]

Protein Domains SKSHSEAGSAFIQTQQLHAAMADTFLEHMCRLDIDSAPITARNTGIICTIGPASRSVETLKEMIKSGMNVARMNFSHGTHEYHAETIKNVRTATESFASDPILYRPVAVALDTKGPEIRTGLIKGSGTAEVELKKGATLKITLDNAYMAACDENILWLDYKNICKVVEVGSKVYVDDGLISLQVKQKGPDFLVTEVENGGFLGSKKGVNLPGAAVDLPAVSEKDIQDLKFGVDEDVDMVFASFIRKAADVHEVRKILGEKGKNIKIISKIENHEGVRRFDEILEASDGIMVARGDLGIEIPAEKVFLAQKMIIGRCNRAGKPVICATQMLESMIKKPRPTRAEGSDVANAVLDGADCIMLSGETAKGDYPLEAVRMQHLIAREAEAAMFHRKLFEELARSSSHSTDLMEAMAMGSVEASYKCLAAALIVLTESGRSAHQVARYRPRAPIIAVTRNHQTARQAHLYRGIFPVVCKDPVQEAWAEDVDLRVNLAMNVGKAAGFFKKGDVVIVLTGWRPGSGFTNTMRVVPVP A protein domain is a subsequence of the protein that folds independently of the other portions of the sequence, and often confers to the protein one or more specific functions. http://cs273a.stanford.edu [BejeranoFall14/15]

Alt. Splicing and Protein Repertoire Alternative splicing often produces protein variants that have a different domain composition, and thus perform different functions. http://cs273a.stanford.edu [BejeranoFall14/15]

Gene Finding Ib: ab initio Computational Challenge: “Find the genes, the whole genes, and nothing but the genes” What if we want to find all splice variants that are ever made? Can we even do it from sequence alone? http://cs273a.stanford.edu [BejeranoFall14/15]

“non coding” RNAs (ncRNA) http://cs273a.stanford.edu [BejeranoFall14/15]

Central Dogma of Biology:

Active forms of “non coding” RNA long non-coding RNA reverse transcription (of eg retrogenes) microRNA rRNA, snRNA, snoRNA

What is ncRNA? Non-coding RNA (ncRNA) is an RNA that functions without being translated to a protein. Known roles for ncRNAs: RNA catalyzes excision/ligation in introns. RNA catalyzes the maturation of tRNA. RNA catalyzes peptide bond formation. RNA is a required subunit in telomerase. RNA plays roles in immunity and development (RNAi). RNA plays a role in dosage compensation. RNA plays a role in carbon storage. RNA is a major subunit in the SRP, which is important in protein trafficking. RNA guides RNA modification. RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule.

“non coding” RNAs (ncRNA) Small structural RNAs (ssRNA) http://cs273a.stanford.edu [BejeranoFall14/15]

ssRNA Folds into Secondary and 3D Structures AAUUGCGGGAAAGGGGUCAA CAGCCGUUCAGUACCAAGUC UCAGGGGAAACUUUGAGAUG GCCUUGCAAAGGGUAUGGUA AUAAGCUGACGGACAUGGUC CUAACCACGCAGCCAAGUCC UAAGUCAACAGAUCUUCUGU UGAUAUGGAUGCAGUUCA Computational Challenge: predict 2D & 3D structure from sequence (1D). Waring & Davies. (1984) Gene 28: 277. Cate, et al. (Cech & Doudna). (1996) Science 273:1678.

For example, tRNA

tRNA Activity

ssRNA structure rules Canonical basepairs: Watson-Crick basepairs: G ≡ C A = U Wobble basepair: G − U Stacks: continuous nested basepairs. (energetically favorable) Non-basepaired loops: Hairpin loop Bulge Internal loop Multiloop Pseudo-knots

Ab initio RNA structure prediction: lots of Dynamic Programming Objective: Maximizing # of base pairings (Nussinov et al, 1978) simple model: (i, j) = 1 iff GC|AU|GU fancier model: GC > AU > GU

Dynamic programming Compute S(i,j) recursively (dynamic programming) Compares a sequence against itself in a dynamic programming matrix Three steps

Initialization Example: GGGAAAUCC G A U C the main diagonal Example: GGGAAAUCC the main diagonal the diagonal below L: the length of input sequence

Recursion j G A U C ? 1 Fill up the table (DP matrix) -- diagonal by diagonal i

Traceback The structure is: G A U C 1 2 3 1 2 3 The structure is: Maximum basepairs= 3 base-pair #1: 2 (G) - 9 (C) base-pair #2: 3 (G) - 8 (C) base-pair #3: 6 (A) - 7 (U) What are the other “optimal” structures?

Pseudoknots drastically increase computational complexity

Objective: Minimize Secondary Structure Free Energy at 37 OC: Instead of (i, j), measure and sum energies: Mathews, Disney, Childs, Schroeder, Zuker, & Turner. 2004. PNAS 101: 7287. http://cs273a.stanford.edu [BejeranoFall14/15]

Zuker’s algorithm MFOLD: computing loop dependent energies

RNA structure Base-pairing defines a secondary structure. The base-pairing is usually non-crossing. In this example the pseudo-knot makes for an exception. Bafna

Stochastic context-free grammar S  aSu S  cSg S  gSc S  uSa S  a S  c S  g S  u S  SS 1. A CFG S  aSu  acSgu  accSggu  accuSaggu  accuSSaggu  accugScSaggu  accuggSccSaggu  accuggaccSaggu  accuggacccSgaggu  accuggacccuSagaggu  accuggacccuuagaggu 2. A derivation of “accuggacccuuagaggu” 3. Corresponding structure

ssRNA transcription ssRNAs like tRNAs are usually encoded by short “non coding” genes, that transcribe independently. Found in both the UCSC “known genes” track, and as a subtrack of the RepeatMasker track http://cs273a.stanford.edu [BejeranoFall14/15]

“non coding” RNAs (ncRNA) microRNAs (miRNA/miR) http://cs273a.stanford.edu [BejeranoFall14/15]

MicroRNA (miR) miR match to target mRNA is quite loose. mRNA ~70nt ~22nt miR match to target mRNA is quite loose. mRNA  a single miR can regulate the expression of hundreds of genes. http://cs273a.stanford.edu [BejeranoFall14/15]

MicroRNA Transcription mRNA http://cs273a.stanford.edu [BejeranoFall14/15]

MicroRNA Transcription mRNA http://cs273a.stanford.edu [BejeranoFall14/15]

MicroRNA (miR) Computational challenges: Predict miRs. Predict miR targets. ~70nt ~22nt miR match to target mRNA is quite loose. mRNA  a single miR can regulate the expression of hundreds of genes. http://cs273a.stanford.edu [BejeranoFall14/15]

MicroRNA Therapeutics Idea: bolster/inhibit miR production to broadly modulate protein production Hope: “right” the good guys and/or “wrong” the bad guys Challenge: and not vice versa. ~70nt ~22nt miR match to target mRNA is quite loose. mRNA  a single miR can regulate the expression of hundreds of genes. http://cs273a.stanford.edu [BejeranoFall14/15]

Other Non Coding Transcripts http://cs273a.stanford.edu [BejeranoFall14/15]

lncRNAs (long non coding RNAs) Don’t seem to fold into clear structures (or only a sub-region does). Diverse roles only now starting to be understood. Example: bind splice site, cause intron retention.  Challenges: 1) detect and 2) predict function computationally http://cs273a.stanford.edu [BejeranoFall14/15]

X chromosome inactivation in mammals Dosage compensation X X X Y

Xist – X inactive-specific transcript Avner and Heard, Nat. Rev. Genetics 2001 2(1):59-67

System output measurements We can measure non/coding gene expression! (just remember – it is context dependent) 1. First generation mRNA (cDNA) and EST sequencing: In UCSC Browser: http://cs273a.stanford.edu [BejeranoFall14/15]

2. Gene Expression Microarrays (“chips”) http://cs273a.stanford.edu [BejeranoFall14/15]

3. RNA-seq “Next” (2nd) generation sequencing. http://cs273a.stanford.edu [BejeranoFall14/15]

Transcripts, transcripts everywhere Human Genome Transcribed* (Tx) Tx from both strands* * True size of set unknown http://cs273a.stanford.edu [BejeranoFall14/15]

Or are they? http://cs273a.stanford.edu [BejeranoFall14/15]

The million dollar question Human Genome Transcribed* (Tx) Leaky tx? Functional? Tx from both strands* * True size of set unknown http://cs273a.stanford.edu [BejeranoFall14/15]

Gene Finding II: technology dependence Challenge: “Find the genes, the whole genes, and nothing but the genes” We started out trying to predict genes directly from the genome. When you measure gene expression, the challenge changes: Now you want to build gene models from your observations. These are both technology dependent challenges. The hybrid: what we measure is a tiny fraction of the space-time state space for cells in our body. We want to generalize from measured states and improve our predictions for the full compendium of states. http://cs273a.stanford.edu [BejeranoFall14/15]