[Bejerano Fall10/11] 1.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Gene regulation /function card Anatomical network card Tassy et al., Figure S1: Navigation diagram of ANISEED Anatomical structure card Expression card.
Biol/Chem 473 Schulze lecture 2: Eukaryotic gene structure.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 15:
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Transcription of viral DNAs. Lecture 14 Flint et al. pp. 253 – 277.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Current Topics of Genomics and Epigenomics. Outline  Motivation for analysis of higher order chromatin structure  Methods for studying long range chromatin.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
[Bejerano Fall10/11] 1 Thank you for the midterm feedback! Projects will be assigned shortly.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
[Bejerano Fall10/11] 1 Any Project reflections?
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
The Hardwiring of development: organization and function of genomic regulatory systems Maria I. Arnone and Eric H. Davidson.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
[Bejerano Fall09/10] 1 Thank you for the midterm feedback!
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TAs: Cory McLean, Aaron Wenger.
[Bejerano Fall10/11] 1.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
CS273A Lecture 5: Genes Enrichment, Gene Regulation I
CS173 Lecture 14: Personal Genomics, GSEA/GREAT
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
[BejeranoWinter12/13] 1 MW 11:00-12:15 in Beckman B302 Prof: Gill Bejerano TAs: Jim Notwell & Harendra Guturu CS173 Lecture 8:
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
P300 Marks Active Enhancers Ruijuan LiChao HeRui Fu.
Current Topics in Genomics and Epigenomics – Lecture 2.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
I519 Introduction to Bioinformatics, Fall, 2012
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
[BejeranoFall13/14] 1 MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano TAs: Harendra Guturu & Panos.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Thank you for the midterm feedback!
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Thoughts on ENCODE Annotations Mark Gerstein. Simplified Comprehensive (published annotation, mostly in '12 & '14 rollouts)
Overview of ENCODE Elements
CS173 Lecture 9: Transcriptional regulation III
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Accessing and visualizing genomics data
Outline Molecular Cell Biology Assessment Review from last lecture Role of nucleoporins in transcription Activators and Repressors Epigenetic mechanisms.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
Functional annotation of ChIP-peaks
Dynamic epigenetic enhancer signatures reveal key transcription factors associated with monocytic differentiation states by Thu-Hang Pham, Christopher.
Relationship between Genotype and Phenotype
Ultraconserved Elements in the Human Genome
A Zero-Knowledge Based Introduction to Biology
High-Resolution Profiling of Histone Methylations in the Human Genome
Section 14.3 Gene Expression and Regulation Part 1
High-Resolution Profiling of Histone Methylations in the Human Genome
The Human Genome Source Code
The Human Genome Source Code
Presentation transcript:

[Bejerano Fall10/11] 1

2 Lecture 13 Cis-Regulation cont’d GREAT

[Bejerano Fall10/11] 3 Gene Regulation gene (how to) control region (when & where) DNA DNA binding proteins RNA gene Protein coding

[Bejerano Fall10/11] 4 Pol II Transcription Key components: Proteins DNA sequence DNA epigenetics Protein components: General Transcription factors Activators Co-activators

[Bejerano Fall10/11] 5 Enhancers

[Bejerano Fall10/11] 6 Vertebrate Gene Regulation gene (how to) control region (when & where) DNA proximal: in 10 3 letters distal: in 10 6 letters DNA binding proteins

[Bejerano Fall10/11] 7 Gene Expression Domains: Independent

[Bejerano Fall10/11] 8 Distal Transcription Regulatory Elements

[Bejerano Fall10/11] 9 Repressors / Silencers

[Bejerano Fall10/11] 10 What are Enhancers? What do enhancers encode? Surely a cluster of TF binding sites. [but TFBS prediction is hard, fraught with false positives] What else? DNA Structure related properties? So how do we recognize enhancers? Sequence conservation across multiple species [weak but generic] Verifying repressors is trickier [loss vs. gain of function]. How do you predict an enhancer from a repressor? Duh... repressors Repressors

[Bejerano Fall10/11] 11 Insulators

[Bejerano Fall10/11] 12 Cis-Regulatory Components Low level (“atoms”): Promoter motifs (TATA box, etc) Transcription factor binding sites (TFBS) Mid Level: Promoter Enhancers Repressors/silencers Insulators/boundary elements Cis-regulatory modules (CRM) Locus control regions (LCR) High Level: Epigenetic domains / signatures Gene expression domains Gene regulatory networks (GRN)

[Bejerano Fall10/11] 13 Disease Implications: Genes genome gene protein Limb Malformation Over 300 genes already implicated in limb malformations.

[Bejerano Fall10/11] 14 Disease Implications: Cis-Reg genome gene NO protein made Limb Malformation Growing number of cases (limb, deafness, etc).

[Bejerano Fall10/11] 15 Transcription Regulation & Human Disease [Wang et al, 2000]

[Bejerano Fall10/11] 16 Critical regulatory sequences Lettice et al. HMG : Single base changes Knock out

[Bejerano Fall10/11] 17 Other Positional Effects [de Kok et al, 1996]

[Bejerano Fall10/11] 18 Genomewide Association Studies point to non-coding DNA

[Bejerano Fall10/11] 19 WGA Disease

9p21 Cis effects [Bejerano Fall10/11] 20 Follow up study:

[Bejerano Fall10/11] 21 Cis-Regulatory Evolution: E.g., obile Elements [Yass is a small town in New South Wales, Australia.] Gene What settings make these “co-option” events happen? Gene

[Bejerano Fall10/11] 22 Britten & Davidson Hypothesis: Repeat to Rewire! [Britten & Davidson, 1971] [Davidson & Erwin, 2006]

[Bejerano Fall10/11] 23 Modular: Most Likely to Evolve? ChimpHuman

24 Human Accelerated Regions Human-specific substitutions in conserved sequences 24 [ Pollard, K. et al., Nature, 2006] [Prabhakar, S. et al., Science, 2008] [Beniaminov, A. et al., RNA, 2008] Human Chimp

Generating Functional Hypotheses from Genome-Wide Measurements of Mammalian Cis-Regulation 25 Gill Bejerano Dept. of Developmental Biology & Dept. of Computer Science Stanford University

26 Human Gene Regulation All these cells have the same Genome. Gene 20,000 Genes encode how to make proteins. 1,000,000 Genomic “switches” determine which and how much proteins to make different cells in an adult human. Hundreds of different cell types.

27 Most Non-Coding Elements likely work in cis… 9Mb “IRX1 is a member of the Iroquois homeobox gene family. Members of this family appear to play multiple roles during pattern formation of vertebrate embryos.” gene deserts regulatory jungles Every orange tick mark is roughly 100-1,000bp long, each evolves under purifying selection, and does not code for protein.

28 Many non-coding elements tested are cis-regulatory

29 Combinatorial Regulatory Code Gene 2,000 different proteins can bind specific DNA sequences. A regulatory region encodes 3-10 such protein binding sites. When all are bound by proteins the regulatory region turns “on”, and the nearby gene is activated to produce protein. Proteins DNA Protein binding site

ChIP-Seq: first glimpses of the regulatory genome in action Cis-regulatory peak 30 Peak Calling

Gene transcription start site What is the transcription factor I just assayed doing? Cis-regulatory peak 31 Collect known literature of the form Function A: Gene1, Gene2, Gene3,... Function B: Gene1, Gene2, Gene3,... Function C:... Ask whether the binding sites you discovered are preferentially binding (regulating) any one or more of the functions listed above. Form hypothesis and perform further experiments.

Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile 32 Gene transcription start site SRF binding ChIP-seq peak ChIP-seq identified 2,429 SRF binding peaks in human Jurkat cells 1 SRF is known as a “master regulator of the actin cytoskeleton” In the ChIP-Seq peaks, we expect to find binding sites regulating (genes involved in) actin cytoskeleton formation. [1] Valouev A. et al., Nat. Methods,

Example: inferring functions of Serum Response Factor (SRF) from its ChIP-seq binding profile 33 Existing, gene-based method to analyze enrichment: Ignore distal binding events. Count affected genes. Rank by enrichment hypergeometric p-value. Gene transcription start site SRF binding ChIP-seq peak Ontology term (e.g. ‘actin cytoskeleton’) N = 8 genes in genome K = 3 genes annotated with n = 2 genes selected by proximal peaks k = 1 selected gene annotated with P = Pr(k ≥1 | n=2, K =3, N=8)

We have (reduced ChIP-Seq into) a gene list! What is the gene list enriched for? 34 Microarray tool Microarray data Deep sequencing data Pro: A lot of tools out there for the analysis of gene lists. Cons: These tools are built for microarray analysis. Does it matter ??

SRF Gene-based enrichment results 35 Original authors can only state: “basic cellular processes, particularly those related to gene expression” are enriched 1 [1] Valouev A. et al., Nat. Methods, 2008 SRF SRF acts on genes both in nucleus and cytoplasm, that are involved in transcription and various types of binding 35 Where’s the signal? Top “actin” term is ranked #28 in the list.

Associating only proximal peaks loses a lot of information 36 Relationship of binding peaks to nearest genes for eight human (H) and mouse (M) ChIP-seq datasets Restricting to proximal peaks often leads to complete loss of key enrichments

Bad Solution: Associating distal peaks brings in many false enrichments 37 Why bad? 14% of human genes tagged ‘multicellular organismal development’. But 33% of base pairs have such a gene nearest upstream/downstream. Term Bonferroni corrected p-value nervous system development 5x10 -9 system development 8x10 -9 anatomical structure development 7x10 -8 multicellular organismal development 1x10 -7 developmental process 2x10 -6 SRF ChIP-seq set has 2,000+ binding events. Throw a random set of 2,000 regions at the genome. What do you get from a gene list analysis? Regulatory jungles are often next to key developmental genes

Real Solution: Do not convert to gene list. Analyze the set of genomic regions 38 Gene transcription start site Ontology term ( ‘actin cytoskeleton’) P = Pr binom (k ≥5 | n=6, p =0.33) p = 0.33 of genome annotated with n = 6 genomic regions k = 5 genomic regions hit annotation Gene regulatory domain Genomic region (ChIP-seq peak) Since 33% of base pairs are near a ‘multicellular organismal development’ gene, we now expect 33% of genomic regions to hit this term by chance. => Toss 2,000 random regions at genome, get NO (false) enrichments. GREAT = Genomic Regions Enrichment of Annotations Tool

How does GREAT know how to assign distal binding peaks to genes? 39 Future: High-throughput assays based on chromosome conformation capture (3C) methods will elucidate complex regulation mechanisms Currently: Flexible computational definitions allow assignment of peaks to nearest gene, nearest two genes, etc. Default: each gene has a “basal regulatory domain” of 5 kb up- and 1kb downstream of transcription start site, extends to basal domain of nearest genes within 1 Mb Though some associations may be missed or incorrect, in general signal richness and robustness is greatly improved by associating distal peaks

GREAT infers many specific functions of SRF from its binding profile 40 Ontology Term # Genes Binomial Experimental P-value support * Gene Ontology actin cytoskeleton actin binding 7x x10 -5 Miano et al * Known from literature – as in function is known, SOME of the genes are known, and the binding sites highlighted are NOT Pathway Commons TRAIL signaling Class I PI3K signaling 5x x10 -6 Bertolotto et al Poser et al TreeFam 1x Chai & Tarnawski 2002 TF Targets Targets of SRF Targets of GABP Targets of YY1 Targets of EGR1 5x x x x10 -4 Positive control ChIp-Seq support Natesan & Gilman Top gene-based enrichments of SRF Top GREAT enrichments of SRF (top actin-related term 28 th in list) FOS gene family Similar results for GABP, NRSF, Stat3, p300 ChIP-Seq [McLean et al., Nat Biotechnol., 2010]

GREAT data integrated 41 Michael Hiller Twenty ontologies spanning broad categories of biology 44,832 total ontology terms tested in each GREAT run (2,800 terms) (5,215) (834) (5,781) (427) (456) (150) (1,253) (288) (706) (6,700) (3,079) (911) (615) (19) (222) (9) (6,857) (8,272) (238)

GREAT implementation Can handle datasets of hundreds of thousands of genomic regions Testing a single ontology term takes ~1 ms Enables real-time calculation of enrichment results for all ontologies 42 Cory McLean

43 GREAT web app: input page Dave Bristor Pick a genome assembly Input BED regions of interest

44 Additional ontologies, term statistics, multiple hypothesis corrections, etc. GREAT web app: output summary Ontology-specific enrichments

45 GREAT web app: term details page Frame holding definition of “actin binding” Genes annotated as “actin binding” with associated genomic regions Genomic regions annotated with “actin binding” Drill down to explore how a particular peak regulates Plectin and its role in actin binding

You can also submit any track straight from the UCSC Table Browser 46 A simple, well documented programmatic interface allows any tool to submit directly to GREAT. See our Help. Inquiries welcome!

GREAT web app: export data 47 HTML output displays all user selected rows and columns Tab-separated values also available for additional postprocessing

External Web Stats: Catching On 48 last 500 entries only

Current technologies identify cis-regulatory sequences GREAT accurately assesses functional enrichments of cis- regulatory sequences using a genomic region-based approach [McLean et al., Nat Biotechnol., 2010] Online tool available (version 1.5 coming soon, in QA) GREAT is immediately applicable to all sets with a significant cis-regulatory content: Regulatory Chromatin Markers (e.g., H3K4me1) Genome Wide Association Studies (GWAS) Comparative Genomics sets (e.g., ultraconserved elements) 49 Summary

Acknowledgments GREAT developers Cory McLean Dave Bristor Michael Hiller Shoa Clarke Craig Lowe Aaron Wenger Gill Bejerano 50 Other help Fah Sathira Marina Sirota Bruce Schaar Terry Capellini Christopher Meyer Jennifer Hardee