Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS273A Lecture 3: Non Coding Genes MW 12:50-2:05pm in Beckman B100

Similar presentations


Presentation on theme: "CS273A Lecture 3: Non Coding Genes MW 12:50-2:05pm in Beckman B100"— Presentation transcript:

1 CS273A Lecture 3: Non Coding Genes MW 12:50-2:05pm in Beckman B100
Profs: Serafim Batzoglou & Gill Bejerano CAs: Jim Notwell & Sandeep Chinchali [BejeranoFall14/15]

2 Announcements http://cs273a.stanford.edu/
Lecture slides, problem sets, etc. Course communications via Piazza Auditors please sign up too Second Tutorial this Wed 10/1 during class (B100). UCSC genome browser. We recommend bringing your laptop. [BejeranoFall14/15]

3 Translation: The Genetic Code
[BejeranoFall14/15]

4 Gene Splicing [BejeranoFall14/15]

5 Alternative Splicing [BejeranoFall14/15]

6 Genes in the Human Genome
When you only show one transcript per gene locus: If you ask the GUI to show you all well established gene variants: [BejeranoFall14/15]

7 Protein Domains SKSHSEAGSAFIQTQQLHAAMADTFLEHMCRLDIDSAPITARNTGIICTIGPASRSVETLKEMIKSGMNVARMNFSHGTHEYHAETIKNVRTATESFASDPILYRPVAVALDTKGPEIRTGLIKGSGTAEVELKKGATLKITLDNAYMAACDENILWLDYKNICKVVEVGSKVYVDDGLISLQVKQKGPDFLVTEVENGGFLGSKKGVNLPGAAVDLPAVSEKDIQDLKFGVDEDVDMVFASFIRKAADVHEVRKILGEKGKNIKIISKIENHEGVRRFDEILEASDGIMVARGDLGIEIPAEKVFLAQKMIIGRCNRAGKPVICATQMLESMIKKPRPTRAEGSDVANAVLDGADCIMLSGETAKGDYPLEAVRMQHLIAREAEAAMFHRKLFEELARSSSHSTDLMEAMAMGSVEASYKCLAAALIVLTESGRSAHQVARYRPRAPIIAVTRNHQTARQAHLYRGIFPVVCKDPVQEAWAEDVDLRVNLAMNVGKAAGFFKKGDVVIVLTGWRPGSGFTNTMRVVPVP A protein domain is a subsequence of the protein that folds independently of the other portions of the sequence, and often confers to the protein one or more specific functions. [BejeranoFall14/15]

8 Alt. Splicing and Protein Repertoire
Alternative splicing often produces protein variants that have a different domain composition, and thus perform different functions. [BejeranoFall14/15]

9 Gene Finding Ib: ab initio
Computational Challenge: “Find the genes, the whole genes, and nothing but the genes” What if we want to find all splice variants that are ever made? Can we even do it from sequence alone? [BejeranoFall14/15]

10 “non coding” RNAs (ncRNA)
[BejeranoFall14/15]

11 Central Dogma of Biology:

12 Active forms of “non coding” RNA
long non-coding RNA reverse transcription (of eg retrogenes) microRNA rRNA, snRNA, snoRNA

13 What is ncRNA? Non-coding RNA (ncRNA) is an RNA that functions without being translated to a protein. Known roles for ncRNAs: RNA catalyzes excision/ligation in introns. RNA catalyzes the maturation of tRNA. RNA catalyzes peptide bond formation. RNA is a required subunit in telomerase. RNA plays roles in immunity and development (RNAi). RNA plays a role in dosage compensation. RNA plays a role in carbon storage. RNA is a major subunit in the SRP, which is important in protein trafficking. RNA guides RNA modification. RNA can do so many different functions, it is thought in the beginning there was an RNA World, where RNA was both the information carrier and active molecule.

14 “non coding” RNAs (ncRNA) Small structural RNAs (ssRNA)
[BejeranoFall14/15]

15 ssRNA Folds into Secondary and 3D Structures
AAUUGCGGGAAAGGGGUCAA CAGCCGUUCAGUACCAAGUC UCAGGGGAAACUUUGAGAUG GCCUUGCAAAGGGUAUGGUA AUAAGCUGACGGACAUGGUC CUAACCACGCAGCCAAGUCC UAAGUCAACAGAUCUUCUGU UGAUAUGGAUGCAGUUCA Computational Challenge: predict 2D & 3D structure from sequence (1D). Waring & Davies. (1984) Gene 28: 277. Cate, et al. (Cech & Doudna). (1996) Science 273:1678.

16 For example, tRNA

17 tRNA Activity

18 ssRNA structure rules Canonical basepairs: Watson-Crick basepairs:
G ≡ C A = U Wobble basepair: G − U Stacks: continuous nested basepairs. (energetically favorable) Non-basepaired loops: Hairpin loop Bulge Internal loop Multiloop Pseudo-knots

19 Ab initio RNA structure prediction: lots of Dynamic Programming
Objective: Maximizing # of base pairings (Nussinov et al, 1978) simple model: (i, j) = 1 iff GC|AU|GU fancier model: GC > AU > GU

20 Dynamic programming Compute S(i,j) recursively (dynamic programming)
Compares a sequence against itself in a dynamic programming matrix Three steps

21 Initialization Example: GGGAAAUCC G A U C the main diagonal
Example: GGGAAAUCC the main diagonal the diagonal below L: the length of input sequence

22 Recursion j G A U C ? 1 Fill up the table (DP matrix) -- diagonal by diagonal i

23 Traceback The structure is: G A U C 1 2 3
1 2 3 The structure is: Maximum basepairs= 3 base-pair #1: 2 (G) - 9 (C) base-pair #2: 3 (G) - 8 (C) base-pair #3: 6 (A) - 7 (U) What are the other “optimal” structures?

24 Pseudoknots drastically increase computational complexity

25 Objective: Minimize Secondary Structure Free Energy at 37 OC:
Instead of (i, j), measure and sum energies: Mathews, Disney, Childs, Schroeder, Zuker, & Turner PNAS 101: 7287. [BejeranoFall14/15]

26 Zuker’s algorithm MFOLD: computing loop dependent energies

27 RNA structure Base-pairing defines a secondary structure. The base-pairing is usually non-crossing. In this example the pseudo-knot makes for an exception. Bafna

28 Stochastic context-free grammar
S  aSu S  cSg S  gSc S  uSa S  a S  c S  g S  u S  SS 1. A CFG S  aSu  acSgu  accSggu  accuSaggu  accuSSaggu  accugScSaggu  accuggSccSaggu  accuggaccSaggu  accuggacccSgaggu  accuggacccuSagaggu  accuggacccuuagaggu 2. A derivation of “accuggacccuuagaggu” 3. Corresponding structure

29 ssRNA transcription ssRNAs like tRNAs are usually encoded by short “non coding” genes, that transcribe independently. Found in both the UCSC “known genes” track, and as a subtrack of the RepeatMasker track [BejeranoFall14/15]

30 “non coding” RNAs (ncRNA) microRNAs (miRNA/miR)
[BejeranoFall14/15]

31 MicroRNA (miR) miR match to target mRNA is quite loose. mRNA
~70nt ~22nt miR match to target mRNA is quite loose. mRNA  a single miR can regulate the expression of hundreds of genes. [BejeranoFall14/15]

32 MicroRNA Transcription
mRNA [BejeranoFall14/15]

33 MicroRNA Transcription
mRNA [BejeranoFall14/15]

34 MicroRNA (miR) Computational challenges: Predict miRs.
Predict miR targets. ~70nt ~22nt miR match to target mRNA is quite loose. mRNA  a single miR can regulate the expression of hundreds of genes. [BejeranoFall14/15]

35 MicroRNA Therapeutics
Idea: bolster/inhibit miR production to broadly modulate protein production Hope: “right” the good guys and/or “wrong” the bad guys Challenge: and not vice versa. ~70nt ~22nt miR match to target mRNA is quite loose. mRNA  a single miR can regulate the expression of hundreds of genes. [BejeranoFall14/15]

36 Other Non Coding Transcripts
[BejeranoFall14/15]

37 lncRNAs (long non coding RNAs)
Don’t seem to fold into clear structures (or only a sub-region does). Diverse roles only now starting to be understood. Example: bind splice site, cause intron retention.  Challenges: 1) detect and 2) predict function computationally [BejeranoFall14/15]

38 X chromosome inactivation in mammals
Dosage compensation X X X Y

39 Xist – X inactive-specific transcript
Avner and Heard, Nat. Rev. Genetics (1):59-67

40 System output measurements
We can measure non/coding gene expression! (just remember – it is context dependent) 1. First generation mRNA (cDNA) and EST sequencing: In UCSC Browser: [BejeranoFall14/15]

41 2. Gene Expression Microarrays (“chips”)
[BejeranoFall14/15]

42 3. RNA-seq “Next” (2nd) generation sequencing.
[BejeranoFall14/15]

43 Transcripts, transcripts everywhere
Human Genome Transcribed* (Tx) Tx from both strands* * True size of set unknown [BejeranoFall14/15]

44 Or are they? [BejeranoFall14/15]

45 The million dollar question
Human Genome Transcribed* (Tx) Leaky tx? Functional? Tx from both strands* * True size of set unknown [BejeranoFall14/15]

46 Gene Finding II: technology dependence
Challenge: “Find the genes, the whole genes, and nothing but the genes” We started out trying to predict genes directly from the genome. When you measure gene expression, the challenge changes: Now you want to build gene models from your observations. These are both technology dependent challenges. The hybrid: what we measure is a tiny fraction of the space-time state space for cells in our body. We want to generalize from measured states and improve our predictions for the full compendium of states. [BejeranoFall14/15]


Download ppt "CS273A Lecture 3: Non Coding Genes MW 12:50-2:05pm in Beckman B100"

Similar presentations


Ads by Google