Manolis Kellis Broad Institute of MIT and Harvard

Slides:



Advertisements
Similar presentations
Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
Advertisements

Interpreting Variation in Human Non-Coding Genomic Regions Using Computational Approaches with Experimental Support Lisa Brooks, Ph.D., Mike Pazin, Ph.D.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Manolis Kellis: Research synopsis Brief overview 1 slide each vignette Why biology in a computer science group? Big biological questions: 1.Interpreting.
Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012.
Genetic Analysis in Human Disease
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Signatures of Selection
The role of variation in finding functional genetic elements Andy Clark – Cornell Dave Begun – UC Davis.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
“An integrated encyclopedia of DNA elements in the human genome” ENCODE Project Consortium. Nature 2012 Sep 6; 489: Michael M. Hoffman University.
Comparative Genomics II: Functional comparisons Caterino and Hayes, 2007.
Manolis Kellis Broad Institute of MIT and Harvard
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
ENCODE The Human Genome project sequenced “the human genome” “the human genome” that we have labeled as such doesn’t actually exist What we call.
Epigenomic and regulatory genomics of complex human disease Manolis Kellis MIT Computer Science & Artificial Intelligence Laboratory Broad Institute of.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
P300 Marks Active Enhancers Ruijuan LiChao HeRui Fu.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
Computational personal genomics: selection, regulation, epigenomics, disease Manolis Kellis MIT Computer Science & Artificial Intelligence Laboratory Broad.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Lecture 4. Topics in Gene Regulation and Epigenomics (Basics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational Biology.
CS177 Lecture 10 SNPs and Human Genetic Variation
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
GWAS Hits and Functional Implications Peter Castaldi February 1, 2013.
Integrative fly analysis: specific aims Aim 1: Comprehensive data collection – Data QC / data standards / – consistent pipelines Aim 2: Integrative annotation.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Jason Ernst Broad Institute of MIT and Harvard
Genomics 2015/16 Silvia del Burgo. + Same genome for all cells that arise from single fertilized egg, Identity?  Epigenomic signatures + Epigenomics:
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Transcriptional Enhancers Looking out for the genes and each other Sridhar Hannenhalli Department of Cell Biology and Molecular Genetics Center for Bioinformatics.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
Common variation, GWAS & PLINK
Functional Elements in the Human Genome
Gil McVean Department of Statistics
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Gene Hunting: Design and statistics
Manolis Kellis Broad Institute of MIT and Harvard
Jason Ernst Joint work with Pouya Kheradpour, Luke Ward
Jason Ernst Joint work with Pouya Kheradpour, Luke Ward
Chromatin-guided interpretation of variation in a disease cohort.
Genome-wide Associations
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
Chromatin state and DNA sequence in TF binding dynamics and disease
1. Interpreting rich epigenomic datasets
Epigenomic views of human disease reveal 1000s of regulatory variants
Revisiting the Thrifty Gene Hypothesis via 65 Loci Associated with Susceptibility to Type 2 Diabetes  Qasim Ayub, Loukas Moutsianas, Yuan Chen, Kalliope.
In collaboration with Mikkelsen Lab
Medical genomics BI420 Department of Biology, Boston College
Integrative analysis of 111 reference human epigenomes
Presentation transcript:

Computational personal genomics: selection, regulation, epigenomics, disease Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory

Understanding human variation and human disease Gene annotation (Coding, 5’/3’UTR, RNAs) Evolutionary signatures Roles in gene/chromatin regulation  Activator/repressor signatures CATGACTG CATGCCTG Disease-associated variant (SNP/CNV/…) Non-coding annotation  Chromatin signatures Other evidence of function  Signatures of selection (sp/pop) Challenge: from loci to mechanism, pathways, drug targets Goal: A systems-level understanding of genomes and gene regulation: The regulators: Transcription factors, microRNAs, sequence specificities The regions: enhancers, promoters, and their tissue-specificity The targets: TFstargets, regulatorsenhancers, enhancersgenes The grammars: Interplay of multiple TFs  prediction of gene expression  The parts list = Building blocks of gene regulatory networks add cartoon image here (remember slide is copied below) 2

Compare 29 mammals: Reveal constrained positions NRSF motif Reveal individual transcription factor binding sites Within motif instances reveal position-specific bias More species: motif consensus directly revealed

Chromatin state dynamics across nine cell types Predicted linking Key points to make: Chromatin states enabled us to study the dynamic nature of chromatin across many cell types. By distinguishing 15 different types of chromatin states, we could summarize all significant combinations of 81 different chromatin tracks and 2.4 billion reads in just nine chromatin annotation tracks, one for each cell type. For example, the same gene (WLS), is ‘poised’ in embryonic stem cells (ES), repressed in three other cell types (K562, blood, and liver), and active in the other five cell types. This allows us to now define ‘vectors’ of activity for each region of the genome, based on the chromatin annotation in the nine cell types. Correlated activity Single annotation track for each cell type Summarize cell-type activity at a glance Can study 9-cell activity pattern across

Revisiting disease- associated variants xx Revisiting disease- associated variants Disease-associated SNPs enriched for enhancers in relevant cell types E.g. lupus SNP in GM enhancer disrupts Ets1 predicted activator

HaploReg: Automate search for any disease study (compbio. mit HaploReg: Automate search for any disease study (compbio.mit.edu/HaploReg) Start with any list of SNPs or select a GWA study Mine publically available ENCODE data for significant hits Hundreds of assays, dozens of cells, conservation, motifs Report significant overlaps and link to info/browser

54000+ measurements (x2 cells, 2x repl) Experimental dissection of regulatory motifs for 10,000s of human enhancers 54000+ measurements (x2 cells, 2x repl)

Example activator: conserved HNF4 motif match WT expression specific to HepG2 Motif match disruptions reduce expression to background Non-disruptive changes maintain expression Random changes depend on effect to motif match

Allele-specific chromatin marks: cis-vs-trans effects Maternal and paternal GM12878 genomes sequenced Map reads to phased genome, handle SNPs indels Correlate activity changes with sequence differences

Brain methylation in 750 Alzheimer patients/controls 500,000 methylation probes 750 individuals Brad Bernstein REMC mapping Phil de Jager, Roadmap disease epigenomics Genome Epigenome meQTL Phenotype Classification MWAS 1 2 10+ years of cognitive evaluations, post-mortem brains 93% of functional epigenomic variation is genotype driven! Global repression in 7,000 enhancers, brain-specific targets

Global hyper-methylation in 1000s of AD-associated loci Top 7000 probes P-value 480,000 probes, ranked by Alzheimer’s association Methylation Alzheimer’s-associated probes are hypermethylated Global effect across 1000s of probes Rank all probes by Alzheimer’s association 7000 probes increase methylation (repressed) Enriched in brain-specific enhancers Near motifs of brain-specific regulators Complex disease: genome-wide effects

Covers computational challenges associated with personal genomics: - genotype phasing and haplotype reconstruction  resolve mom/dad chromosomes - exploiting linkage for variant imputation  co-inheritance patterns in human population - ancestry painting for admixed genomes  result of human migration patterns - predicting likely causal variants using functional genomics  from regions to mechanism - comparative genomics annotation of coding/non-coding elements  gene regulation - relating regulatory variation to gene expression or chromatin  quantitative trait loci - measuring recent evolution and human selection  selective pressure shaped our genome - using systems/network information to decipher weak contributions  combinatorics - challenge of complex multi-genic traits: height, diabetes, Alzheimer's  1000s of genes

Personal genomics today: 23 and We Recombination breakpoints Family Inheritance Me vs. my brother My dad Dad’s mom Mom’s dad Human ancestry Disease risk Genomics: Regions  mechanisms  drugs Systems: genes  combinations  pathways

Personal genomics tomorrow: Already 100,000s of complete genomes Health, disease, quantitative traits: Genomics regions  disease mechanism, drug targets Protein-coding  cracking regulatory code, variation Single genes  systems, gene interactions, pathways Human ancestry: Resolve all of human ancestral relationships Complete history of all migrations, selective events Resolve common inheritance vs. trait association What’s missing is the computation New algorithms, machine learning, dimensionality reduction Individualized treatment from 1000s genes, genome Understand missing heritability Reveal co-evolution between genes/elements Correct for modulating effects in GWAS

Collaborators and Acknowledgements Chromatin state dynamics Brad Bernstein, ENCODE consortium Methylation in Alzheimer’s disease Phil de Jager, Brad Bernstein, Epigenome Roadmap Mammalian comparative genomics Kerstin Lindblad-Toh, Eric Lander, 29 mammals consortium Massively parallel enhancer reporter assays Tarjei Mikkelsen, Broad Institute Funding NHGRI, NIH, NSF Sloan Foundation

MIT Computational Biology group Compbio.mit.edu Mike Lin Ben Holmes Soheil Feizi Angela Yen Luke Ward Bob Altshuler Mukul Bansal Chris Bristow Stefan Washietl Pouya Kheradpour Matt Eaton Manolis Kellis Jason Ernst Irwin Jungreis Rachel Sealfon Jessica Wu Daniel Marbach Louisa DiStefano Dave Hendrix Loyal Goff Sushmita Roy Stata3 Stata4

Human constraint outside conserved regions Active regions Average diversity (heterozygosity) Aggregate over the genome Conserved regions: Non-ENCODE regions show increased diversity  Loss of constraint in human when biochemically-inactive Ward and Kellis, Science 2012 Non-conserved regions: ENCODE-active regions show reduced diversity  Lineage-specific constraint in biochemically-active regions