CS273A Lecture 5: Genes Enrichment, Gene Regulation I

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Transcription of viral DNAs. Lecture 14 Flint et al. pp. 253 – 277.
Lecture 4: DNA transcription
Gene Regulation in Eukaryotes Same basic idea, but more intricate than in prokaryotes Why? 1.Genes have to respond to both environmental and physiological.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger.
Chapter 12 and 13: Transcription and Translation Lecture 12 October 28, 2003 What’s due? CH6 and CH10 problem set (if you haven’t all ready turned it in)
(CHAPTER 12- Brooker Text)
Step 1 of Protein Synthesis
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Transcription: Synthesizing RNA from DNA
Transcription AHMP 5406.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 7:
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 8:
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
[BejeranoFall14/15] 1 MW 12:50-2:05pm in Beckman B100 Profs: Serafim Batzoglou & Gill Bejerano CAs: Jim Notwell & Sandeep Chinchali.
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Transcription Nicky Mulder Acknowledgements: Anna Kramvis for lecture material (adapted here)
Transcription transcription Gene sequence (DNA) recopied or transcribed to RNA sequence Gene sequence (DNA) recopied or transcribed to RNA sequence.
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
REGULATION of GENE EXPRESSION. GENE EXPRESSION all cells in one organism contain same DNA every cell has same genotype phenotypes differ skin cells have.
Control of Gene Expression Eukaryotes. Eukaryotic Gene Expression Some genes are expressed in all cells all the time. These so-called housekeeping genes.
google. com/search
Eukaryotic Gene Expression The “More Complex” Genome.
NAi_transcription_vo1-lg.mov.
Regulation of Gene Expression Eukaryotes
Controlling the genes Lecture 15 pp Gene Expression Nearly all human cells have a nucleus (not red blood cells) Almost all these nucleated cells.
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
UNIT 3 Transcriptionand Protein Synthesis. Objectives Discuss the flow of information from DNA to RNA to Proteins Discuss the flow of information from.
How Genes Work Ch. 12.
Gene expression DNA  RNA  Protein DNA RNA Protein Replication Transcription Translation Degradation Initiation Elongation Processing Export Initiation.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Control of Gene Expression Year 13 Biology. Exceptions to the usual Protein Synthesis Some viruses contain RNA and no DNA. RNA is therefore replicated.
A Biology Primer Part III: Transcription, Translation, and Regulation Vasileios Hatzivassiloglou University of Texas at Dallas.
Introduction to Gene Expression
Transcription … from DNA to RNA.
Control of Gene Expression Chapter 16. Contolling Gene Expression What does that mean? Regulating which genes are being expressed  transcribed/translated.
Transcription in Prokaryotic (Bacteria) The conversion of DNA into an RNA transcript requires an enzyme known as RNA polymerase RNA polymerase – Catalyzes.
Introduction to Molecular Cell Biology Transcription Regulation Dr. Fridoon Jawad Ahmad HEC Foreign Professor King Edward Medical University Visiting Professor.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
Transcription. Recall: What is the Central Dogma of molecular genetics?
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
The Central Dogma of Molecular Biology replication transcription translation.
Lecture 4: Transcription in Prokaryotes Chapter 6.
CS173 Lecture 9: Transcriptional regulation III
[BejeranoFall15/16] 1 MW 1:30-2:50pm in Clark S361* (behind Peet’s) Profs: Serafim Batzoglou & Gill Bejerano CAs: Karthik Jagadeesh.
Lesson 3 – Gene Expression
Genes in ActionSection 2 Section 2: Regulating Gene Expression Preview Bellringer Key Ideas Complexities of Gene Regulation Gene Regulation in Prokaryotes.
Alessandro Raganelli and Varun Rao.  Prokaryotes and eukaryotes alter gene expression in response to their changing environment  In multicellular eukaryotes,
TRANSCRIPTION (DNA → mRNA). Fig. 17-7a-2 Promoter Transcription unit DNA Start point RNA polymerase Initiation RNA transcript 5 5 Unwound.
Copyright © 2006 Pearson Education, Inc., publishing as Benjamin Cummings Protein Synthesis.
Transcription and RNA processing Fall, Transcription Outline Notes RNA Polymerase Structures Subunits Template versus coding strands Polymerase.
Chapter 15. I. Prokaryotic Gene Control  A. Conserves Energy and Resources by  1. only activating proteins when necessary  a. don’t make tryptophan.
Factors Involved In RNA synthesis and processing Presented by Md. Anower Hossen ID: MS in Biotechnology.
Click to continue How do a few genes build a diversity of body parts? There’s more in the genetic toolkit than just genes! Click your forward cursor to.
CS273A Lecture 6: Gene Regulation II MW 12:50-2:05pm in Beckman B100
CS273A Lecture 7: Genes Enrichment, Gene Regulation I
CS273A Lecture 9: Gene Regulation II
The Human Genome Source Code
Gene Structure.
The Human Genome Source Code
Eukaryotic Gene Regulation
Gene Structure.
Presentation transcript:

CS273A Lecture 5: Genes Enrichment, Gene Regulation I MW  12:50-2:05pm in Beckman B100 Profs: Serafim Batzoglou & Gill Bejerano CAs: Jim Notwell & Sandeep Chinchali http://cs273a.stanford.edu [BejeranoFall14/15]

Announcements http://cs273a.stanford.edu/ Lecture slides, problem sets, etc. Course communications via Piazza Auditors please sign up too PS1 is out. Last Tutorial this Fri. Text processing. Highly recommended! http://cs273a.stanford.edu [BejeranoFall14/15]

TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG Genome Content http://cs273a.stanford.edu [BejeranoFall14/15] 3

Genes long non-coding RNA reverse transcription microRNA rRNA, snRNA, snoRNA

Genes Gene production is conceptually simple Contiguous stretches of DNA transcribe (1 to 1) into RNA Some (coding or non-coding) RNAs are further spliced Some (m)RNAs are then translated into protein (43 to 20+1) Other (nc)RNA stretches just go off to do their thing as RNA The devil is in the details, but by and large – this is it. (non/coding) Gene finding - classical computational challenge: Obtain experimental data Find features in the data (eg, genetic code, splice sites) Generalize from features (eg, predict genes yet unseen) Link to biochemical machinery (eg, spliceosome) http://cs273a.stanford.edu [BejeranoFall14/15]

Coding and non-coding gene production The cell is constantly making new proteins and ncRNAs. These perform their function for a while, And are then degraded. Newly made coding and non coding gene products take their place. To change its behavior a cell can change the repertoire of genes and ncRNAs it makes. The picture within a cell is constantly “refreshing”. http://cs273a.stanford.edu [BejeranoFall14/15]

Cell differentiation To change its behavior a cell can change the repertoire of genes and ncRNAs it makes. That is exactly what happens when cells differentiate during development from stem cells to their different final fates. http://cs273a.stanford.edu [BejeranoFall14/15]

Let’s first ask what is changing? http://cs273a.stanford.edu [BejeranoFall14/15]

Cluster all genes for differential expression Experiment Control (replicates) (replicates) Most significantly up-regulated genes genes Unchanged genes Most significantly down-regulated genes http://cs273a.stanford.edu [BejeranoFall14/15]

Determine cut-offs, examine individual genes Experiment Control (replicates) (replicates) Most significantly up-regulated genes genes Unchanged genes Most significantly down-regulated genes http://cs273a.stanford.edu [BejeranoFall14/15]

Genes usually work in groups Biochemical pathways, signaling pathways, etc. Asking about the expression perturbation of groups of genes is both more appealing biologically, and more powerful statistically (you sum perturbations). http://cs273a.stanford.edu [BejeranoFall14/15]

The Gene Sets to test come from Ontologies TJL-2004

Ask about whole gene sets + Exper. Control Gene set 3 up regulated ES/NES statistic Gene set 2 down regulated - http://cs273a.stanford.edu [BejeranoFall14/15]

Simplest way to ask: Hypergeometric Gene Set 3 Genes measured N = 20,000 Total genes in set 3 K = 11 I’ve picked the top n = 100 diff. expressed genes. Of them k = 8 belong to gene set 3. Under a null of randomly distributed genes, how surprising is it? P-value = Prhyper (k ≥ 8 | N, K, n) + Exper. Control Gene set 3 up regulated A low p-value, as here, suggests gene set 3 is highly enriched among the diff. expressed genes. Now see what (pathway/process) gene set 3 represents, and build a novel testable model around your observations. ES/NES statistic - http://cs273a.stanford.edu [BejeranoFall14/15]

GSEA (Gene Set Enrichment Analysis) Dataset distribution Gene set 1 distribution Gene set 3 distribution Number of genes Gene Expression Level http://cs273a.stanford.edu [BejeranoFall14/15]

Another popular approach: DAVID Input: list of genes of interest (without expression values). http://cs273a.stanford.edu [BejeranoFall14/15]

Multiple Testing Correction run tool Note that statistically you cannot just run individual tests on 1,000 different gene sets. You have to apply further statistical corrections, to account for the fact that even in 1,000 random experiments a handful may come out good by chance. (eg experiment = Throw a coin 10 times. Ask if it is biased. If you repeat it 1,000 times, you will eventually get an all heads series, from a fair coin. Mustn’t deduce that the coin is biased) http://cs273a.stanford.edu [BejeranoFall14/15]

3. RNA-seq “Next” (2nd) generation sequencing. http://cs273a.stanford.edu [BejeranoFall14/15]

What will you test? run tool Also note that this is a very general approach to test gene lists. Instead of a microarray experiment you can do RNA-seq. Advantage: RNA-seq measures all genes(up to your ability to correctly reconstruct them). Microarrays only measure the probes you can fit on them. (Some genes, or indeed entire pathways, may be missing from some microarray designs). http://cs273a.stanford.edu [BejeranoFall14/15]

Single gene in situ hybridization Sall1 http://cs273a.stanford.edu [BejeranoFall14/15]

4. Spatial-temporal maps generation AI: Robotics, Vision http://cs273a.stanford.edu [BejeranoFall14/15]

Cell differentiation To change its behavior a cell can change the repertoire of genes and ncRNAs it makes. But how? http://cs273a.stanford.edu [BejeranoFall14/15]

Closing the loop Some proteins and non coding RNAs go “back” to bind DNA near genes, turning these genes on and off. http://cs273a.stanford.edu [BejeranoFall14/15]

Genes & Gene Regulation Gene = genomic substring that encodes HOW to make a protein (or ncRNA). Genomic switch = genomic substring that encodes WHEN, WHERE & HOW MUCH of a protein to make. [0,1,1,1] B Gene H H Gene N Gene N B Gene [1,0,0,1] [1,1,0,0] http://cs273a.stanford.edu [BejeranoFall14/15]

Transcription Regulation Conceptually simple: The machine that transcribes (“RNA polymerase”) All kinds of proteins and ncRNAs that bind to DNA and to each other to attract or repel the RNA polymerase (“transcription associated factors”). DNA accessibility – making DNA stretches in/accessible to the RNA polymerase and/or transcription associated factors by un/wrapping them around nucleosomes. (Distinguish DNA patterns from proteins they interact with) http://cs273a.stanford.edu [BejeranoFall14/15]

RNA Polymerase Transcription = Copying a segment of DNA into (non/coding) RNA Gene transcription starts at the (aptly named) TSS, or gene transcription start site Transcription is done be RNA polymerase, a complex of 10-12 subunit proteins. There are three types of RNA polymerases in human: RNA pol I synthesizes ribosomal RNAs RNA pol II synthesizes pre-mRNAs and most microRNAs RNA pol III synthesizes tRNAs, rRNA and other ssRNAs TSS RNA Polymerase http://cs273a.stanford.edu [BejeranoFall14/15]

RNA Polymerase is General Purpose RNA Polymerase is the general purpose transcriptional machinery. It generally does not recognize gene transcription start sites by itself, and requires interactions with multiple additional proteins. general purpose context specific http://cs273a.stanford.edu [BejeranoFall14/15]

Terminology Transcription Factors (TF): Proteins that return to the nucleus, bind specific DNA sequences there, and affect transcription. There are 1,200-2,000 TFs in the human genome (out of 20-25,000 genes) Only a subset of TFs may be expressed in a given cell at a given point in time. Transcription Factor Binding Sites: 4-20bp stretches of DNA where TFs bind. There are millions of TF binding sites in the human genome. In a cell at a given point in time, a site can be either occupied or unoccupied. http://cs273a.stanford.edu [BejeranoFall14/15]

Terminology Promoter: The region of DNA 100-1,000bp immediately “upstream” of the TSS, which encodes binding sites for the general purpose RNA polymerase associated TFs, and at times some context specific sites. There are as many promoters as there are TSS’s in the human genome. Many genes have more than one TSS. Enhancer: A region of 100-1,000bp up to 1Mb or more upstream or downstream from the TSS that includes binding sites for multiple TFs. When bound by (the right) TFs an enhancer turns on/accelerates transcription. Note how an enhancer (E) very far away in sequence can in fact get very close to the promoter (P) in space. promoter TSS gene http://cs273a.stanford.edu [BejeranoFall14/15]

TFBS Position Weight Matrix (PWM) Note the strong independence assumption between positions. Holds for most transcription binding profiles in the human genome. http://cs273a.stanford.edu [BejeranoFall14/15]

Promoters http://cs273a.stanford.edu [BejeranoFall14/15]

Enhancers http://cs273a.stanford.edu [BejeranoFall14/15]