A Connected Digital Biomedical Research Enterprise with Big Data

Slides:



Advertisements
Similar presentations
A quantitative trait locus not associated with cognitive ability in children: a failure to replicate Hill, L. et al.
Advertisements

Corpus Callosum Damage Predicts Disability Progression and Cognitive Dysfunction in Primary-Progressive MS After Five Years.
ACCELERATING SPARSE CANONICAL CORRELATION ANALYSIS FOR LARGE BRAIN IMAGING GENETICS DATA Jingwen Yan, Hui Zhang, Lei Du, Eric Wernert, Andew J. Saykin,
Mapping Genetic Risk of Suicide Virginia Willour, Ph.D.
1 FSTL4 and SEMA5A are associated with alcohol dependence: meta- analysis of two genome-wide association studies Kesheng Wang, PhD Department of Biostatistics.
EleMAP: An Online Tool for Harmonizing Data Elements using Standardized Metadata Registries and Biomedical Vocabularies Jyotishman Pathak, PhD 1 Janey.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Resolving membership in a study in shared aggregate genetics data David W. Craig, Ph.D. Investigator & Associate Director Neurogenomics Division
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Andrew Singleton Molecular Genetics Section Laboratory of Neurogenetics National Institute on Aging Andrew Singleton, Chief of the.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Higher BMI (body mass index) is linked to greater brain atrophy in 700 MCI and AD patients, and in healthy elderly ADNI (N=587,critical P-value: 0.025)
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Paul M. Thompson1 on behalf of the ENIGMA Consortium2
Genetic Variation Influences Glutamate Concentrations in Brains of Patients with Multiple Sclerosis Robby Bonanno.
A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute.
Facilitate Scientific Data Sharing by Sharing Informatics Tools and Standards Belinda Seto and James Luo National Institute of Biomedical Imaging and Bioengineering.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Personalized Medicine Dr. M. Jawad Hassan. Personalized Medicine Human Genome and SNPs What is personalized medicine? Pharmacogenetics Case study – warfarin.
The Stanley Neuropathology Consortium Integrative Database: A novel web-based tool for exploring neuropathological traits, gene expression and associated.
Thompson Lab’s Genetic findings in ADNI Sept Paul Thompson’s Lab* and the ADNI MRI & Genetics Cores *Jason Stein, April Ho, Xue Hua, Suh Lee, Alex.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Date of download: 11/12/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Influence of Child Abuse on Adult DepressionModeration.
To develop the scientific evidence base that will lessen the burden of cancer in the United States and around the world. NCI Mission Key message:
SNPs and complex traits: where is the hidden heritability?
Genomic Analysis: GWAS
Common variation, GWAS & PLINK
Nucleotide variation in the human genome
Epidemiology and Genomics Research Program
Making “Open Data” Work: Challenges for Data Integration in Genomics Research
Genetic Testing for the Clinician
CANDIDATE GENE STUDIES AND GWAS SUGGEST SUBSTANTIAL GENETIC INFLUENCE ON DEFICITS IN OLFACTORY IDENTIFICATION AMONG PERSONS AT RISK OF AD  Marie-Elyse.
Data challenges in the pharmaceutical industry
DIFFUSION ABNORMALITY OF CORPUS CALLOSUM IN ALZHEIMER’S DISEASE
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene Hunting: Design and statistics
Case Study #2 Session 1, Day 3, Liu
High level GWAS analysis
Genomes and Their Evolution
Genome-wide Associations
A Short Tutorial on Causal Network Modeling and Discovery
Shared Genomics Sharing paths of exploration to support collaborative reasoning in genomic data analysis David Hoyle, Mark.
Beyond GWAS Erik Fransen.
Linking Genetic Variation to Important Phenotypes
Eliza Congdon, Russell A. Poldrack, Nelson B. Freimer  Neuron 
The Future of Genetic Research
The Impact of Network Medicine in Gastroenterology and Hepatology
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Exercise: Effect of the IL6R gene on IL-6R concentration
Part II: Potential Genetic Privacy Risks
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Unit 5: Heredity Review Lessons 1, 3, 4 & 5.
Identification of a Novel Risk Locus for Progressive Supranuclear Palsy by a Pooled Genomewide Scan of 500,288 Single-Nucleotide Polymorphisms  Stacey.
Medical genomics BI420 Department of Biology, Boston College
One SNP at a Time: Moving beyond GWAS in Psoriasis
Medical genomics BI420 Department of Biology, Boston College
Volume 16, Issue 4, Pages (April 2015)
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
An Expanded View of Complex Traits: From Polygenic to Omnigenic
Neurobiology of Schizophrenia
Discovery From Data Repositories H Craig Mak  Nature Biotechnology 29, 46–47 (2011) 2013 /06 /10.
Introduction to Genetic Association Studies
SNPs and CNPs By: David Wendel.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Fig. 2. Default mode network (DMN) patterns in each of the 3 groups and longitudinal changes after treatment. (A–C) ... Fig. 2. Default mode network (DMN)
Amanda L. Tapia Department of Biostatistics
Presentation transcript:

A Connected Digital Biomedical Research Enterprise with Big Data Belinda Seto, Ph.D. Deputy Director National Eye Institute

What is it? Digital research assets: data, workflow, publications, software To connect these assets Unique identifiers or tags Annotation Community-developed standards Interfaces

Benefits Increase scientific productivity Enhance collaborations Foster creativity: new tools, algorithms, methods, modeling Enable new discoveries Improve interoperability Facilitate reproducibility

Gene Expression Data Volume Velocity Variety Distribution of the number and types of selected studies released by GEO each year since inception. Users can explore and download historical submission numbers using the ‘history’ page at http://www.ncbi.nlm.nih.gov/geo/summary/?type=history, as well as constructing GEO DataSet database queries for specific data types and date ranges using the ‘DataSet type’ and ‘publication date’ fields as described at http://www.ncbi.nlm.nih.gov/geo/info/qqtutorial.html. Published by Oxford University Press 2012. Barrett T et al. Nucl. Acids Res. 2013;41:D991-D995

Gene Expression Omnibus A public repository (NLM) of microarray, next generation sequencing and functional genomic data Web-based interface and apps for query and data download

Myriad Data Types Genomic Other ‘Omic Imaging Phenotypic Exposure Within biomedical research, many data types Victims of our own success Data production outstrips data handling and analysis Major long-term changes are needed Exposure Clinical

Making Big Data Functional Engender interdisciplinary approach to data collection and analysis by integrating scientific, algorithmic, and computational work Drive functional data collection and analysis that has practical value in determining risk alleles

Integration of Data Opportunities: Understanding biology across scales, from molecules to population Challenges: need access to primary data and processed data, machine-readable metadata, tools to reduce dimensionality

Integration of Disparate Data Types: Brain Images with Genomic

Brain measures versus epidemiological studies to find genetic variants that directly affect the brain difficult May require 10,000-30,000 people e.g., the Psychiatric Genetics Consortium studies Gene variants (SNP’s) may affect brain measures directly, many brain measures relate to disease status. easier?

Finding Genetic Variants Influencing Brain Structure CTAGTCAGCGCT CTAGTCAGCGCT Intracranial Volume … CTAGTAAGCGCT The way you do this is relatively simple, obtain a large group of subjects and for each person you measure a phenotype. You then obtain DNA, find a specific location in the genome and determine the letter or genotype of each person. You then attempt to see if there is a relationship between the genotype and the phenotype. In this toy example, you can see there’s a clear additive effect of the A allele on intracranial volume. The more A alleles you have, the bigger your intracranial volume. CTAGTAAGCGCT CTAGTCAGCGCT C/C A/C A/A SNP Phenotype Genotype Association

Genome-Wide Association Studies (GWAS) Identify loci for phenotypes or diseases using genotyping arrays throughout entire genome Study association of polymorphisms with complex human traits Meta-analysis across multiple studies

Genome-wide Association Study One SNP “Candidate gene” approach e.g., BDNF Screening 500,000 SNPs – 2,000,000 SNPs -log10(P-value) Intracranial Volume Position along genome The 3 billion base pair genome though has millions of variants. Genome-wide association provides a means for studying a large portion of the common variation. Genome-wide association looks at millions of SNPs at a time, testing each individually for their association to a phenotype of interest. It is importantly an unbiased search where you do not put in your hopes and desires about the genome, the genome guides you to the area of association. When you conduct a genome-wide association study An unbiased searchWhere in the genome is a variant associated with a trait. Need P-values < 5x10^-8 to achieve genome wide signifiance. Change picture here NIH-funded database of genotypes and phenotypes enabling searches to find where in the genome a variant is associated with a trait. C/C A/C A/A

Applications of GWAS Identify genetic variants that affect brain measures: volumetric, fiber integrity, connectivity Risk genes Early biomarkers of disease

What is a risk gene? - A common genetic variant related to a brain measure, or a disease, or a trait such as obesity, found by searching the genome 99.9% of DNA is the same for all people - DNA variation causes changes in predisposition to disease, and brain structure. One type of variation is a single nucleotide polymorphism (SNP)- Single letter change in the DNA code 23 pairs of chromosomes In a particular part of the chromosome 5 there are many genes Within a gene there are exons, introns, and SNPs Single Nucleotide Polymorphism (SNP)

GRIN2B Risk Allele Glutamate receptor, signaling pathway Genetic polymorphism of GRIN2B gene Associated with reductions of brain white matter integrity Bipolar disorder Obsessive compulsive disorder

GRIN2b genetic variant is associated with 2.8% temporal lobe volume deficit GRIN2b is over-represented in AD - could be considered an Alzheimer’s disease risk gene - needs replication Jason L. Stein1, Xue Hua PhD1, Jonathan H. Morra PhD1, Suh Lee1, April J. Ho1, Alex D. Leow MD PhD1,2, Arthur W. Toga PhD1, Jae Hoon Sul3, Hyun Min Kang4, Eleazar Eskin PhD3,5, Andrew J. Saykin PsyD6, Li Shen PhD6, Tatiana Foroud PhD7, Nathan Pankratz7, Matthew J. Huentelman PhD8, David W. Craig PhD8, Jill D. Gerber8, April Allen8, Jason J. Corneveaux8, Dietrich A. Stephan8, Jennifer Webster8, Bryan M. DeChairo PhD9, Steven G. Potkin MD10, Clifford R. Jack Jr MD11, Michael W. Weiner MD12,13, Paul M. Thompson PhD1,*, and the ADNI (2010). Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer's Disease, NeuroImage 2010.

GRIN2b genetic variant associates with brain volume in these regions; 2.8% more temporal lobe atrophy Jason L. Stein1, Xue Hua PhD1, Jonathan H. Morra PhD1, Suh Lee1, April J. Ho1, Alex D. Leow MD PhD1,2, Arthur W. Toga PhD1, Jae Hoon Sul3, Hyun Min Kang4, Eleazar Eskin PhD3,5, Andrew J. Saykin PsyD6, Li Shen PhD6, Tatiana Foroud PhD7, Nathan Pankratz7, Matthew J. Huentelman PhD8, David W. Craig PhD8, Jill D. Gerber8, April Allen8, Jason J. Corneveaux8, Dietrich A. Stephan8, Jennifer Webster8, Bryan M. DeChairo PhD9, Steven G. Potkin MD10, Clifford R. Jack Jr MD11, Michael W. Weiner MD12,13, Paul M. Thompson PhD1,*, and the ADNI (2010). Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer's Disease, NeuroImage, 2010.

Alzheimer’s risk gene carriers (CLU-C) have lower fiber integrity even when young (N=398), 50 years before disease typically hits Effects occurred in multiple regions, including several known to degenerate in AD. Such regions included the corpus callosum, fornix, cingulum, SLF and ILF (Liu et al., 2009; Stricker et al., 2009). This suggests that the CLU-C related variability found here might create a local vulnerability important for disease onset. Voxels where CLU allele C (at rs11136000) is associated with lower FA after adjusting for age, sex, and kinship in 398 young adults (68 T/T; 220 C/T; 110 C/C). FDR critical p = 0.023. Left hem. on Right Braskie et al., Journal of Neuroscience, May 4 2011

Effect is even stronger for carriers of a schizophrenia risk gene variant, trkA-T (N=391 people) NTRK1 is a high affinity receptor for the neurotrophin NGF. We found that healthy T-allele carriers at rs6336 in the NTRK1 gene had lower FA broadly throughout the brain. In a recent meta-analysis, this allele was associated with a 1.64 times greater probability of developing schizophrenia in Caucasians versus the C allele . We found significant effects of NTRK1-T regions including the ILF, IFO, cingulum, and genu of the corpus callosum, which most consistently showed lower FA in schizophrenia patients versus controls in a recent meta-analysis (Ellison-Wright I, Bullmore E. Meta-analysis of diffusion tensor imaging studies in schizophrenia. Schizophr Res. 2009;108(1-3):3-10). a. p values indicate where NTRK1 allele T carriers (at rs6336) have lower FA after adjusting for age, sex, and kinship in 391 young adults (31 T+; 360 T-). FDR critical p = 0.038. b. Voxels that replicate in 2 independent halves of the sample (FDR-corrected). Left is on Right. Braskie et al., Journal of Neuroscience, May 2012

Neural Fiber Integrity Fractional Anisotropy Applied to diffusion tensor MRI Eigen = 0 means diffusion is totally unrestricted Eigen = 1 means diffusion is restricted to only one direction FA measures fiber density, axonal diameter, or myelination of white matter

SNP’s can predict variance in brain integrity Neuro-chemical genes COMT NTRK1 ErbB4 BDNF Neuro-developmental genes HFE CLU Neuro-degenerative risk genes A significant fraction of variability in white matter structure of the corpus callosum (measured with DTI) is predictable from SNPs; Kohannim O, et al. Predicting white matter integrity from multiple common genetic variants. Neuropsychopharmacology 2012, in press.

Big Data 26,000 whole brain MR images > 500,000 single nucleotide polymorphism (SNP) Analyze each voxel of the entire brain and search for genetic variants of the whole genome at each brain voxel Select only the most associated SNP at each voxel, by analyzing P-values through an inverse beta transformation

Genetic clustering boosts GWAS power Many top hits now reach genome-wide significance (N=472) and replicate Several SNPs affect multiple ROIs Can form a network of SNPs that affect similar ROIs It has a small-world, scale-free topology (for more, see Chiang et al., J. Neurosci., 2012)

Population level Data Integration: Electronic Medical Records, Genotypes and Phenotypes

eMERGE Goal: research to combine DNA biorepositories with EMR for large-scale association studies of genetics and phenotypes; to incorporate genetic variants into EMG for use in clinical care

Network Members

eMERGE Innovation Algorithms for electronic phenotyping of clinical conditions identified in EMR Discoveries of genetic variants in biorepository samples

Big Data to Knowledge