Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University 3D Chromosome Organization Statistical.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Biol/Chem 473 Schulze lecture 2: Eukaryotic gene structure.
Inferring Transcriptional Regulation Using Transctiptomics Carsten O. Daub September 1 st, 2014 StratCan Summer School 2014 Vår Gård, Saltsjöbaden.
The multi-layered organization of information in living systems
Spatial partitioning of the regulatory landscape of the X-inactivation centre Elphège P. Nora, Bryan R. Lajoie, Edda G. Schulz, Luca Giorgetti, Ikuhiro.
Detecting DNA-protein Interactions Xinghua Lu Dept Biomedical Informatics BIOST 2055.
Current Topics of Genomics and Epigenomics. Outline  Motivation for analysis of higher order chromatin structure  Methods for studying long range chromatin.
Epigenetics 12/05/07 Statisticians like data.
Three-dimensional maps of folded genomes Lieberman-Aiden et al., “Comprehensive mapping of long-range interactions reveals folding principles of the human.
3D model of the folded yeast genome Zhijun Duan, Mirela Andronescu, Kevin Schutz, Sean McIlwain, Yoo Jung Kim, Choli Lee, Jay Shendure, Stanley Fields,
This presentation was originally prepared by C. William Birky, Jr. Department of Ecology and Evolutionary Biology The University of Arizona It may be used.
Microarray Type Analyses using Second Generation Sequencing
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Computational Approaches in Epigenomics Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School.
Hybridization Diagnostic tools Nucleic acid Basics PCR Electrophoresis
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
DNA Methylation Assays High Throughput Data Analysis BIOS , VCU Winter 2010 Mark Reimers, PhD.
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Model Selection in Machine Learning + Predicting Gene Expression from ChIP-Seq signals
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
O PTICAL M APPING AS A M ETHOD OF W HOLE G ENOME A NALYSIS M AY 4, 2009 C OURSE : 22M:151 P RESENTED BY : A USTIN J. R AMME.
Next Generation DNA Sequencing
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
I519 Introduction to Bioinformatics, Fall, 2012
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Phylogenomics “The intersection of phylogenetics and genomics”
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Differential Principal Component Analysis (dPCA) for ChIP-seq
Analysis of protein-DNA interactions with tiling microarrays
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Summary for the Conference. Synthesize genomes of several species completely. Synthetic biology in industrial development. Bio-systems and quantitive.
Complex mammalian gene control regions are also constructed from simple regulatory modules.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
Transcription factor binding motifs (part II) 10/22/07.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
YOUR FUTURE STARTS WITH HOPE YOUR FUTURE STARTS WITH HOPE Genome Biology & Applied Bioinformatics Human Genome Mehmet Tevfik DORAK, MD PhD.
ChIP-seq Downstream Analysis Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Epigenetics Continued
Biases and their Effect on Biological Interpretation
Reverse-engineering transcription control networks timothy s
Gene expression from RNA-Seq
Relationship between Genotype and Phenotype
Artefacts and Biases in Gene Set Analysis
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Mattew Mazowita, Lani Haque, and David Sankoff
1. Interpreting rich epigenomic datasets
V6 – Analyzing 3D chromatin conformation
Hi-C Analysis in Arabidopsis Identifies the KNOT, a Structure with Similarities to the flamenco Locus of Drosophila  Stefan Grob, Marc W. Schmid, Ueli.
Lucas J.T. Kaaij, Robin H. van der Weide, René F. Ketting, Elzo de Wit 
Three dimensional (3D) genomics
A CRISPR Connection between Chromatin Topology and Genetic Disorders
Chromatin Domains: The Unit of Chromosome Organization
by Siyuan Wang, Jun-Han Su, Brian J. Beliveau, Bogdan Bintu, Jeffrey R
Volume 10, Issue 11, Pages (March 2015)
Chromosome Architecture
Artefacts and Biases in Gene Set Analysis
The histone H3.3K36M mutation reprograms the epigenome of chondroblastomas by Dong Fang, Haiyun Gan, Jeong-Heon Lee, Jing Han, Zhiquan Wang, Scott M. Riester,
Evolution of Alu Elements toward Enhancers
Gene Density, Transcription, and Insulators Contribute to the Partition of the Drosophila Genome into Physical Domains  Chunhui Hou, Li Li, Zhaohui S.
Whole-chromosome view of gene density, inverse environmental variation, inverse total variation, and open chromatin signature. Whole-chromosome view of.
Genome Architecture: Domain Organization of Interphase Chromosomes
The 3D Genome in Transcriptional Regulation and Pluripotency
Volume 26, Issue 11, Pages e3 (March 2019)
Comparing 3D Genome Organization in Multiple Species Using Phylo-HMRF
Presentation transcript:

Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University 3D Chromosome Organization Statistical challenges and opportunities for analyzing Hi-C data

Transcription regulation

However … Long-range chromosomal interactions Transcriptional factory Chimeric events

Chromosome folding 4 How can a two meter long polymer fit into a nucleus of ten micrometer (10 -5 m) diameter? m

Chromosome folding 5

“… deep things in science are not found because they are useful; they are found because it was possible to find them” -- Robert Oppenheimer 6

Chromosome Conformation Capture (3C) Dekker et al. Science 2002 Naumova and Dekker J of Cell Science Fine scale: (0-kb)

3C-on-chip/Circular 3C (4C) 5C Naumova and Dekker J of Cell Science Fine scale: (0-kb) Intermediate: (0-Mb)

Naumova and Dekker J of Cell Science Fine scale: (0-kb) Whole genome Intermediate: (0-Mb)

10

11

12

chr1 chr2chr3chr4chr5chr6chr7chr8chr9chr1 0 chr1 1 chr1 2 chr1 3 chr1 4 chr1 5 chr1 6 chr1 7 chr1 8 chr1 9 chr2 0 chr2 1 chr2 2 chr X chrY chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr chr X chr Y

What are the main findings?

In Liberman-Aiden et al. Genomes can be decomposed of compartments A and B, Fractal globule, not equilibrium globule.

In Sexton et al. Genome partitioned into physical domains. Domain structure highly connected with epigenetic activities.

In Dixon et al. Topological domains. Stable across cell types. Highly conserved across species. Domain boundaries enriched with insulators.

In Hou et al. Differences between domain boundary and interior, in terms of gene density, TF and epigenetic factor concentration.

Challenges Quality control and pre-processing of the reads, Any bias in the data? and if so, how to normalize? Whether it is possible, and if so, how, to infer the 3-dimesnional chromosomal structure based on the Hi-C data?

20 Hi-C Data Preprocess Restriction enzyme cutting site Restriction enzyme cut fragment Self-ligation reads Dangling reads PCR amplification reads Random breaking reads Random break Valid reads Downstream analysis Imakaev et al. 2012

Systematic biases in the data 21 Yaffe and Tanay, 2011 Restriction enzyme GC content Mappability

Methods for Hi-C Bias Reduction Normalization (equal ‘visibility’, no assumption on biases)  Iterative correction and eigenvector decomposition (ICE) (Imakaev, et al, 2012)  Sequential component normalization (SCN) (Cournac, et al, 2012) Correction (posit a statistical model on biases)  Yaffe & Tanay’s method (Yaffe & Tanay, 2011) Fragment level (4KB, ), 420 parameters  HiCNorm (Hu et al, 2012) Any resolution level 1MB, 10 6, 3 parameters 22

Motivation and the key assumption 23 Number of paired-end reads spanning the two loci is inversely proportional to the 3D spatial distance between them (obtained from fluorescence in situ hybridization(FISH)). Lieberman-Aiden et al, 2009

Bayesian statistical model 24 : number of reads between loci and. : 3D Euclidian distance between loci and. : number of enzyme cut site in locus. : mean GC content in locus. : mean mappability score in locus.

Real Hi-C data from Lieberman-Aiden et al d(L2, L4) = , d(L2, L3) = , significant

mESC: Hind3 vs. Nco1 26

Two compartment model

Whole Chromosome Model 28 Lieberman-Aiden, et al, 2009 Naumova and Dekker, 2010

Other Features (Chromosome 2) 29 CompartmentGene densityGene expressionChromatin accessibility Lamina interaction DNA replication timeH3K36me3H3K27me3 H3K4me3 H3K9me3H3K20me3 RNA polymerase II

References Hu M, Deng K, Selvaraj S, Qin ZS, Ren B, Liu JS. (2012) HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics Hu M, Deng K, Qin ZS, Dixon J, Selvaraj S, Fang J, Ren B, Liu JS. (2012) Bayesian inference of three-dimensional chromosomal organization. PLoS Computational Biology. 9(1):e Hou C, Li L, Qin ZS, Corces, VG. (2012) Gene Density, Transcription and Insulators Contribute to the Partition of the Drosophila Genome into Physical Domains. Mol Cell (with preview article of Xu and Felsenfeld (2012) Order from Chaos in the Nucleus. Mol Cell ).. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS and Ren B. (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature

Acknowledgements 31 Ming Hu Ke Deng Jun S. Liu Jesse Dixon Siddarth Selvaraj Bing Ren Li Chunhui Hou Victor Corces