Big Data Opportunities and Challenges in Human Disease Genetics & Genomics Manolis Kellis Broad Institute of MIT and Harvard MIT Computer Science & Artificial Intelligence Laboratory
Big data Opportunities & Challenges in human disease genetics & genomics The goal: Mechanistic basis of human disease Epigenomics: Enhancers, networks, regulators, motifs Genetics: GWAS, QTLs, molecular epidemiology The challenges / opportunities: Effects are very small, huge number of hypotheses Much larger cohorts are needed, consent limitations Technologies for privacy vs. excuse for data hoarding Overcoming the challenges: Case study: Schizophrenia, Alzheimer’s Collaboration & sharing: personal & technological
Bringing knowledge gap from genetics to disease Chromatin states Promoter Enhancer Insulator Silencer Circuitry Control regions Retina Heart Cortex Lung Blood Skin Nerve Tissue Cell Type Protein miRNA TIMP3 ncRNA Target genes Factors Intermediate effects Lipids Tension Eye drusen Metabolism Drug response Genetic Variant CATGACTG Disease CATGCCTG Environment Requires: systematic understanding of genome function
The most complete map of human gene regulation 2.3M regulatory elements across 127 tissue/cell types High-resolution map of individual regulatory motifs Circuitry: regulatorsregionsmotifstarget genes
Non-coding variants lie in tissue-specific regulatory regions Yield new insights on relevant tissues and pathways Enable linking non-coding elements to relevant target genes Provide a mechanistic basis for developing therapeutics
Control regions harbor 1000s weak-effect disease SNPs GWAS top hits only explain small fraction of trait heritability Functional enrichments well past genome-wide significance
Bayesian integration of weak effects disease modules Poorly ranked SNP nearby Highly ranked SNP nearby Disease gene Genetic association Disease SNP For a type 1 diabetes dataset in dbGap, our model also identifies few relatively SNPs and genes relevant to disease. Here, the model marks the MAZ regulator (which is a regulator of insulin expression) as being relevant, which also is not near any significant SNP in the study but is important for connecting the disease modules. MAZ no direct assoc, but clusters w/ many T1D hits MAZ indeed known regulator of insulin expression
Brain methylation changes in Alzheimer’s patients MAP Memory and Aging Project + ROS Religious Order Study Dorsolateral PFC Genotype (1M SNPs x700 ind.) Reference Chromatin states Methylation (450k probes x 700 ind) Variation in methylation patterns largely genotype driven Global signature of repression in 1000s regulatory regions: hypermethylation, enhancer states, brain regulator targets
Big data Opportunities & Challenges in human disease genetics & genomics The goal: Mechanistic basis of human disease Epigenomics: Enhancers, networks, regulators, motifs Genetics: GWAS, QTLs, molecular epidemiology The challenges / opportunities: Effects are very small, huge number of hypotheses Much larger cohorts are needed, consent limitations Technologies for privacy vs. excuse for data hoarding Overcoming the challenges: Case study: Schizophrenia, Alzheimer’s Collaboration & sharing: personal & technological
Big data Opportunities & Challenges in human disease genetics & genomics The goal: Mechanistic basis of human disease Epigenomics: Enhancers, networks, regulators, motifs Genetics: GWAS, QTLs, molecular epidemiology The challenges / opportunities: Effects are very small, huge number of hypotheses Much larger cohorts are needed, consent limitations Technologies for privacy vs. excuse for data hoarding Overcoming the challenges: Case study: Schizophrenia, Alzheimer’s Collaboration & sharing: personal & technological
Scaling of QTL discovery power w/ sample Number of meQTLs continues to increase linearly Weak-effect meQTLs: median R2<0.1 after 400 indiv.
Inflection point in complex trait GWAS Incl. replication (~100K) Freeze May 2013 (~80K) Freeze Jan. 2013 (~70K) WCPG Hamburg 2012 (~65K) Incl. SWE + CLOZUK (~60K) out
Schizophrenia GWAS: Number of significant loci 3,500 cases 0 loci 10,000 cases 5 loci 35,000 cases 62 loci!
Similar inflection point found in every complex trait! Adult height Crohn’s Schizophrenia (per 5000/5000) (per 1000/1000) (per 3000/3000) 1x 2 1 2x 4 3x 7 5 6 9x 68 51 62 18x 180 - Same story in: Type 1 diabetes Type 2 diabetes Serum cholesterol level Every common chronic disease Significantly associated regions (p < 5e-08) Larger samples lead to new biological insights Proof that Schizophrenia is a heritable, medical disorder Genetic architecture similar to non-brain diseases and traits Many genes recognition of key pathways and processes Voltage-gated calcium channels (CACNA1C, CACNA1D, CACNA1I, CACNB2) Proteins interacting with FMRP, fragile X gene Neuron organization: Postsynaptic density, dendritic spine heads Enhancers: brain (angular gyrus, inferior temporal lobe), immune Eric Lander!!
Big data Opportunities & Challenges in human disease genetics & genomics The goal: Mechanistic basis of human disease Epigenomics: Enhancers, networks, regulators, motifs Genetics: GWAS, QTLs, molecular epidemiology The challenges / opportunities: Effects are very small, huge number of hypotheses Much larger cohorts are needed, consent limitations Technologies for privacy vs. excuse for data hoarding Overcoming the challenges: Collaboration, consortia, sharing of datasets Case study: Schizophrenia, Alzheimer’s