Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology
Lecture outline Introduction to gene regulation and epigenetics Experimental methods for epigenomics Relevant problems in computational biology and bioinformatics Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Introduction to Gene Regulation and Epigenetics Part 1 Introduction to Gene Regulation and Epigenetics
Gene regulation Here defined as the control of the amount and gene products Amount: Number of transcripts Number of proteins Products: RNAs Total A particular transcript isoform With a particular modification Proteins With a particular form (e.g., activated) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Gene “expression” Gene expression is a general term used to indicate the production of gene products More specific terms: Transcription rate (number of new transcripts per time) Transcript level (total number of transcripts in the cell) Translation rate Protein level All these are correlated but not identical, sometimes with only weak correlations Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Gene regulation Expression of genes needs to be tightly regulated Differentiation into different cell types Response to environmental conditions How are genes regulated? Transcriptional Post-transcriptional Translational Post-translational Analogy: lighting controlling Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
A simple illustration G3 G1 P7 P1 G2 P3 P5 P6 G4 Me Me Ac Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
A simple illustration G3 G1 P7 P1 G2 P3 P5 P6 G4 Me Me Ac miRNA-mRNA interactions Protein-RNA interactions P7 Transcription factor binding DNA methylation P1 Me G2 Protein-protein interactions and DNA long-range interactions Histone modifications P3 Me Ac P5 P6 Chromatin accessibility G4 Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
More details and other mechanisms Transcriptional regulation Transcription factors Binding to promoter vs. distal elements (e.g., enhancers) Activators vs. repressors Post-transcriptional regulation Capping Poly-adenylation Splicing RNA editing mRNA degradation Translation Translational repression Post-translational Protein modifications (e.g., phosphorylation) Image source: http://www.emunix.emich.edu/~rwinning/genetics/eureg.htm Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Epigenetics Wikipedia: “the study of heritable changes in gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence” Heritable: Can pass on to offspring (daughter cells) Same DNA, different outcomes But how can these signals be inherited? Still based on DNA sequence (in some complex way) or not? Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Heritability (D1, E1) (D2, E2) D2 = f(D1) D1 E2 = f(D1, E1) D: DNA E: Epigenetic signals 1, 2: proliferation, differentiation, fertilization D2 = f(D1) D1 E2 = f(D1, E1) E2 = f(E1)? E0 = f(D0)? Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Active and inactive epigenetic signals DNA methylation Chromatin remodeling Histone modifications RNA transcripts ... (And actually not that simple!) Image credit: Zhou et al., Nature Reviews Genetics 12(1):7-18, (2011) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
DNA methylation Methyl group (-CH3) added to cytosine in eukaryotic DNA, usually next to a guanine (in a CpG dinucleotide) Less common, but could also be in CpHpG or CpHpH contexts (H = A, C or T) Could be converted to other forms during TET-mediated active de-methylation Image credit: Song et al., Nature Biotechnology 30(11):1107-1116, (2012) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
DNA methylation Hyper-methylation at CpG islands/promoters is associated with gene repression Regulatory function of DNA methylation at other regions is not as clear Recent studies have suggested links between DNA methylation and Protein binding Transcriptional elongation Splicing Histone modifications Gene imprinting: parent-specific expression Implications in diseases De novo vs. maintenance Image source: http://missinglink.ucsf.edu/lm/genes_and_genomes/methylation.html Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Trajectory of DNA global methylation changes Image credit: Saitou et al., Development 139:15-31, (2012) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Chromatin remodeling Chromatin: compact structure of DNA and proteins DNA wraps around histone proteins to form nucleosomes (~146bp of DNA around each histone octamer) Nuelceosome positioning can be changed dynamically, affecting DNA accessibility (e.g., to binding proteins) Image credit: wikipedia Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Histone modifications Modification of specific residues on histone proteins Acytelation, methylation, phosphorylation, ubiquination, etc. Nomenclature: H3K4me3 (Histone protein H3, lysine 4, tri-methylation) Image credit: Kato et al., IBMS BonKEy 7:314-324, (2010) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Histone modifications and regulation Histone modifications give different types of signals in gene regulation (again, a simplified view): Image credit: Zhou et al., Nature Reviews Genetics 12(1):7-18, (2011) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Non-coding RNA There are different types of functional RNA that do not translate into proteins Type Abbreviation Function Ribosomal RNA rRNA Translation Transfer RNA tRNA Small nuclear RNA snRNA Splicing Small nucleolar RNA snoRNA Nucleotide modifications MicroRNA miRNA Gene regulation Small interfering RNA siRNA … Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
MicroRNA Short (~22 nucleotides) RNAs that regulate gene expression by promoting mRNA degradation or repressing translation Image credit: wikipedia, Sun et al., Annual Review of Biomedical Engineering 12:1-27, (2010) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Gene regulation and epigenetics Some mechanisms are known to regulate gene expression. For example: Transcription factor binding can activate or repress transcription miRNA-mRNA binding can promote mRNA cleavage or repress translation Some signals are correlated with expression, but the causal direction is not certain (or not fixed). For example: Promoter DNA methylation and transcriptional repression Histone modifications and gene activation/repression The different mechanisms are not independent. Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Experimental Methods for Epigenomics Part 2 Experimental Methods for Epigenomics
From epigenetics to epigenomics Not focusing on a single gene, but the whole genome Measuring signals genome-wide Studying general (statistical) phenomena / (biological) mechanisms Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
High-throughput methods (recap) Protein-DNA binding (ChIP-seq, ChIP-exo, ...) DNA long-range interactions (ChIA-PET, Hi-C, TCC, ...) DNA methylation (bisulfite sequencing, RRBS, MeDIP-seq, MBDCap-seq, ...) Open chromatin (DNase-seq, FAIRE-seq, ATAC-seq, ...) Histone modifications (ChIP-seq) Gene expression (RNA-seq, CAGE, ...), isoforms Protein-RNA binding (CLIP-Seq, HITS-CLIP, PAR-CLIP, RIP-seq, ...) ... Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
ChIP-seq Chromatin immunoprecipitation followed by sequencing Use antibody to “pull down” target DNA, such as DNA bound by a certain protein or with a certain chemical modification Image credit: Mardis, Nature Methods 4:613-614, (2007) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
ChIP-exo Higher data resolution by adding a digestion step Image credit: Wikipedia, Rhee et al., Cell 147(6):1408-1419, (2011) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Open chromatin ATAC-seq winning out due to its lower amount of cells required As low as 50,000 cells Up to 1,000 folds lower than MNase-seq or DNase-seq Image credit: Meyer and Liu., Nature Reviews Genetics 15(11):709-721, (2014) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
DNA methylation BS and oxBS DNA-seq BS oxBS C T 5mC 5hmC/5fC Image credit: Booth et al., Nature Protocols 8(10):1841-1851, (2013) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Problems in Computational Biology and Bioinformatics Part 3 Problems in Computational Biology and Bioinformatics
Some related CBB Problems Analysis of chromatin patterns Identification of regulatory elements [lecture] Reconstruction of transcription factor (TF) regulatory networks Identification of non-coding RNAs Prediction of miRNA targets Construction of gene expression models Inferring epigenetic signals Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Analysis of chromatin patterns Computational tasks: Segmentation of the human genome Single bases/fix-sized bins or based on annotation Unsupervised clustering or supervised classification Data aggregation and integration Large-scale correlations Learning of signal shapes Visualization Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Genome segmentation Using chromatin state to segment the genome Hidden Markov model Clustering Annotate identified states using biological knowledge Image credit: Ernst and Kellis, Nature Methods 9(3):215-216, (2012) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Global chromatin patterns Many recent findings that relate chromatin patterns with other features Global example: histone modifications, recombination rates and chromosome 1D and 3D structures in C. elegans Image credit: Gerstein et al., Science 330(6012):1775-1787, (2010) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Local chromatin patterns Histone modifications and protein binding at promoters and enhancers in human Image credit: Heintzman et al., Nature Genetics 39(3):311-318, (2007) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Identifying regulatory elements There are different types of protein-binding regions in the DNA Promoters Enhancers Silencers Insulators ... How to locate them in the genome? Image credit: Raab and Kamakaka, Nature Reviews Genetics 11(6):439-446, (2010) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Identifying regulatory elements Some useful information: Genomic location E.g., promoters are around transcription start sites Evolutionary conservation Functional regions are more conserved Protein binding signals and motifs E.g., EP300 at enhancers, CTCF at insulators Chromatin features E.g., DNase I hypersensitivity, H3K4me1 and H3k27ac at active enhancers Reporter assays ... Difficulty: integrating different types of information Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Reconstruction of TF network Goals: Identifying TF binding sites Determining the target genes of each TF In different cell types In different conditions Deducing how gene expression is regulated by TFs Studying how TFs interact with each other Methods: From expression data Sequence-based (motif analysis) From binding experiments Sign of regulation (activation vs. repression) usually not determined for #2 and #3 Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Expression-based methods Input: gene expression levels of genes Usually from microarrays Often time series data Output: a network (i.e., directed graph) Each node is a gene (and its protein product) An AB edge means A is a TF and it regulates B Types: Differential equations Probabilistic networks Boolean networks Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Expression-based methods Differential equations Models (yj: expression level of gene j, aji: influence of TF i on gene j): Linear Sigmoidal ... Methods: Solve system of equations to get best-fit parameter values Difficulties: Many parameters when there are many TFs Insufficient training data L1 (LASSO) regularization to control the number of non-zero variables Long running time Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Boolean networks Considering each gene to be either on or off Treat the gene regulatory network as a Boolean network (similar to a electric circuit) Expression of a gene at time t+1 depends on the expression of genes that regulate it at time t Goal: Find the logical relationships between genes Image credit: Akutsu and Miyano, Pacific Symposium on Biocomputing 4:17-28, (1999) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
From binding data Input: binding signals of transcription factors in the whole genome Usually from ChIP-chip or ChIP-seq Or from motifs (Best to combine both) Output: TF regulatory network Difficulties: Finding binding sites Peak calling Motif analysis Associating binding sites with target genes Promoters (e.g., 500bp upstream of transcription start site) More difficult for distal binding sites Expression patterns could help Evaluating functional effects of binding (strong vs. weak, transient binding) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Combining both types of data Use expression data to infer initial network Identify potential regulators Search for binding motifs of these regulators Incorporate global occurrence of these motifs at gene promoters to refine the network Image credit: Tamada et al., Bioinformatics 19(Suppl.2):ii227-ii236, (2003) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Identification of non-coding RNAs It has recently been shown that a vast amount of DNA is transcribed into RNA by high-throughput experiments What are they? Experimental artifacts? Unannotated protein-coding genes? Non-functional transcripts? Functional non-coding RNAs? Pseudogenes? Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Prediction of miRNA targets Table credit: Peterson et al., Frontiers in Genetics 5:23, (2014) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Seed match Table source: TargetScan Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Construction of expression models Given the many different mechanisms involved in gene regulation, how are they related to each other? Are they redundant? Do they simply add to each other, or have synergistic effects? Which have more impacts to final expression levels? What are their time scales? When is each mechanism used? Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Construction of expression models Modeling and prediction An indirect way to estimate how well a model is: evaluating the accuracy of predicted expression Prediction of: Expression level Regression: yi f(xi) Classification: (yi > t) f(xi) Differential expression Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Construction of expression models Chromatin features and expression Image credit: Cheng et al., Genome Biology 12(2):R15, (2011) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Construction of expression models Model construction and accuracy Image credit: Cheng et al., Genome Biology 12(2):R15, (2011) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
“Histone code” hypothesis The statistical models are good, but too complex for humans to interpret Is there a simple set of rules (i.e., a “code”) that can easily tell the expression level of a gene? Image credit: Cheng et al., Genome Biology 12(2):R15, (2011) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Inferring epigenetic signals Can we infer epigenetic signals (e.g., open chromatin, DNA methylation or histone modifications) from DNA sequence alone? For a specific cell type Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Deep learning of functional activity Image credit: Kelley et al., Genome Research 26(7):990-999, (2016) Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018
Summary “Gene expression” is a general term with several possible meanings Gene expression is regulated by many mechanisms, including (but not limited to) Transcription factor binding DNA long-range interactions DNA methylation Chromatin structure Histone modifications MicroRNA-mRNA binding A lot of new genome-wide data Many emerging research topics in CBB Last update: 30-Jan-2018 CSCI5050 Bioinformatics and Computational Biology | Kevin Yip-cse-cuhk | Spring 2018