The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FFGWAS Fast Functional Genome Wide Association AnalysiS of Surface-based Imaging Genetic Data Chao Huang.

Slides:



Advertisements
Similar presentations
Gene-by-Environment and Meta-Analysis Eleazar Eskin University of California, Los Angeles.
Advertisements

Association Tests for Rare Variants Using Sequence Data
ACCELERATING SPARSE CANONICAL CORRELATION ANALYSIS FOR LARGE BRAIN IMAGING GENETICS DATA Jingwen Yan, Hui Zhang, Lei Du, Eric Wernert, Andew J. Saykin,
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
The search process: A way to rank endophenotypes (e) for relevance to a disease (i) phenotype.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Basics of Linkage Analysis
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
The Inheritance of Complex Traits
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
1 FSTL4 and SEMA5A are associated with alcohol dependence: meta- analysis of two genome-wide association studies Kesheng Wang, PhD Department of Biostatistics.
Disclosures/Conflicts Consulting: GE Healthcare Bayer Abbott Elan/Janssen Synarc Genentech Merck.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FADTTS: Functional Analysis of Diffusion Tensor Tract Statistics Hongtu Zhu, Ph.D. Department of Biostatistics.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Type 1 Error and Power Calculation for Association Analysis Pak Sham & Shaun Purcell Advanced Workshop Boulder, CO, 2005.
Quantitative Genetics
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
The False Discovery Rate A New Approach to the Multiple Comparisons Problem Thomas Nichols Department of Biostatistics University of Michigan.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Complex Adaptive Systems and Human Health: Statistical Approaches in Pharmacogenomics Kim E. Zerba, Ph.D. Bristol-Myers Squibb FDA/Industry Statistics.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data Yimei Li Department of Biostatistics St. Jude Children’s Research Hospital Joint.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Jingwen Zhang1, Hongtu Zhu1,2, Joseph Ibrahim1
Apolipoprotein E and Gray Matter Loss in Mild Cognitive Impairment and Alzheimer’s Disease Spampinato MV, Goldsberry G, Mintzer J, Rumboldt Z Medical University.
A simple method to localise pleiotropic QTL using univariate linkage analyses of correlated traits Manuel Ferreira Peter Visscher Nick Martin David Duffy.
Forest Approach to Genetic Studies Heping Zhang Presented at IMS Genomic Workshop, NUS Singapore, June 8, 2009 And Xiang Chen, Ching-Ti Liu, Minghui Wang,
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-scale Imaging Genetic Data Tutorial: pipeline,
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
EBF FLJ31951UBLCP1 IL12B B36 Position Genes LD Regions Genotyped Markers Chr5 (q33.3) rs rs Figure 1. Physical map of 360kb around IL12B.
Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of Short- term conversion to AD: Results from ADNI Xuejiao.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
A Fine Mapping Theorem to Refine Results from Association Genetics Studies S.J. Schrodi, V.E. Garcia, C.M. Rowland Celera, Alameda, CA ABSTRACT Justification.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
A Multi-stage Approach to Detect Gene-gene Interactions Associated with Multiple Correlated Phenotypes Zhou Xiangdong,Keith Chan, Danhong Zhu Department.
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Gene Hunting: Design and statistics
Correlation for a pair of relatives
Detecting variance-controlling QTL
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Diego Calderon, Anand Bhaskar, David A
Volume 22, Issue 1, Pages (January 2012)
Evaluation of power for linkage disequilibrium mapping
Presentation transcript:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FFGWAS Fast Functional Genome Wide Association AnalysiS of Surface-based Imaging Genetic Data Chao Huang Department of Biostatistics Biomedical Research Imaging Center The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA Joint work with Hongtu Zhu

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Motivation for Imaging GeneticsMotivation for Imaging Genetics Statistical Methods for Imaging GeneticsStatistical Methods for Imaging Genetics Fast Functional Genome-wise Association AnalysisFast Functional Genome-wise Association Analysis ConclusionConclusion Outline

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Genetics Motivation for Imaging Genetics Imaging Genetics Motivation for Imaging Genetics

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Genetics G B E D Selection E: environmental factors G: genetic markers D: disease

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Data PETEEG/MEG Calcium CT

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Neuroimaging Phenotype Multivariate, smoothed functions, and piecewisely smoothed functions Dimension varies from 100~500,000.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Multi - Omic Data Ritchie et al. (2015). Nature Review Genetics

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging genetics allows for the identification of how common/rare genetic polymorphisms influencing molecular processes (e.g., serotonin signaling), bias neural pathways (e.g., amygdala reactivity), mediating individual differences in complex behavioral processes (e.g., trait anxiety) related to disease risk in response to environmental adversity. (Hariri AR, Holmes A. Genetics of emotional regulation: the role of the serotonin transporter in neural function. Trends Cogn Sci. [10:182–191]) Motivation

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Genetics Statistical Methods for Imaging Genetics

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Statistical Methods Hibar, et al. HBM 2012

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL High Dimensional Regression Model Data PhenotypeGenotypeError

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hibar, et al. HBM 2012 Challenges

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL (I) Spatially Heteroscedastic Linear Model (II) Global Sure Independence Screening Procedure (III) Detection Procedure Fast Voxel-wise Genome-wise Analysis Huang, et al. Neuroimage 2015 Issues to be addressed: -- Spatially correlated functional data -- Multivariate imaging phenotypes

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Fast Functional Genome-wise AnalysiS

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Data Structure

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Data Structure ……. Person No.1 ……. Person No.100 Genetic Variation Person No. 1 Person No SNP1SNP2 …….SNP2000 Hippocampal Surface

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Multivariate Varying Coefficient Model Global Sure Independence Screening Procedure Significant Voxel-locus Cluster-locus Detection FFGWAS …… SNP 1 SNP N ……

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Multivariate Varying Coefficient Model

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Multivariate Varying Coefficient Model We need to test: We first consider a local Wald-type statistic as: where

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Big-data Challenges Several big-data challenges arise from the calculation of as follows. Calculating across all loci and vertices can be computationally. Bandwidth selection in across all loci can be also computationally. Holding all in the computer hard drive requires substantial computer resources. Speeding up the calculation of.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FFGWAS To solve these computational bottlenecks, we propose three solutions as follows. Calculate under the null hypothesis for all loci. Divide all loci into K groups based on their minor allele frequency (MAF), and select a common optimal bandwidth for each group. Develop a GSIS procedure to eliminate many ‘noisy’ loci based on a global Wald-type statistic.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL A Global Sure Independence Screening (1) The global Wald-type statistic at locus g is defined as (2) Calculate the p-values of for all loci The candidate significant locus set (3) Sort the -log 10(p)-values of and select the top N0 loci

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Detection Procedure (1) The first one is to detect significant voxel-locus pairs Wild Bootstrap method (2) The second one is to detect significant cluster-locus pairs.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simulation Studies and Real Data Analysis

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simulation Studies: Data Generation Covariate Data (non-genetic data) Generated from either U(0,1) or the Bernoulli distribution with success probability 0.5. Genetic Data Linkage Disequilibrium (LD) blocks ( Haploview & PLINK ) 1.Generate 2,000 blocks; 2.Randomly select 10 SNPs in each block; 3.Chose the first 100 SNPs as the causal SNPs

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simulation Studies: Data Generation Step 1: Fitting the model without genetic predictors Estimates of Step 2: Specifying effected Regions Of Interest associated with causal SNPs Imaging Data True values Step 3: Generating imaging data with prespecified parameters and ROIs

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Simulation Studies Simulation results for comparisons between FFGWAS and FVGWAS in identifying significant voxel-SNP pairs. Simulation settings: the green and red regions in the figure, respectively, represent Hippocampal surface, and the effected ROI associated with the causal SNPs among first SNPs.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Genetics for ADNI A longitudinal prospective study with 1700 aged between 55 to 90 years Clinical Data including Clinical and Cognitive Assessments Genetic Data including Ilumina SNP genotyping and WGS MRI (fMRI, DTI, T1, T2) PET (PIB, Florbetapir PET and FDG-PET) Chemical Biomarker PI: Dr. Michael W. Weiner detecting AD at the earliest stage and marking its progress through biomarkers; developing new diagnostic methods for AD intervention, prevention, and treatment.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ADNI Data Analysis: Dataset Description 708 MRI scans of AD (186), MCI (388), and healthy controls (224) from ADNI-1. These scans on 462 males and 336 females are performed on a 1.5 T MRI scanners. The typical protocol includes the following parameters: (i) repetition time (TR) = 2400 ms; (ii) inversion time (TI) = 1000 ms; (iii) flip angle = 8 o ; (iv) field of view (FOV) = 24 cm with a 256 x 256 x 170 acquisition matrix in the x−, y−, and z−dimensions, (v) voxel size: 1.25 x 1.26 x 1.2 mm 3. Covariates: gender, age, APOE ε4, and the top 5 PC scores in SNPs

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Data Preprocessing

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ADNI Data Analysis Top 10 SNPs (Right Hippocampus)Top 10 SNPs (Left Hippocampus) SNPCHRBP-LOG 10(p) rs e rs e rs e rs e rs e rs e rs e rs e rs e rs e SNPCHRBP-LOG 10(p) rs e rs e rs e rs e rs e rs e rs e rs e rs e rs e

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ADNI Data Analysis (Right Hippocampus) (Left Hippocampus)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ADNI Data Analysis: Left Hippocampus Significant Loci Zoom (Left Hippocampus)(Right Hippocampus)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ADNI Data Analysis: Left Hippocampus (Left Hippocampus) (Right Hippocampus) Top 1 SNP: rs Closed Gene: HRH4 HRH4 (Histamine Receptor H4) is a Protein Coding gene. Diseases associated with HRH4: cerebellar degeneration An important paralog of this gene: CHRM4 Top 1 SNP: rs Closed Gene: C3orf58 C3orf58 (Chromosome 3 Open Reading Frame 58) is a Protein Coding gene. Diseases associated with C3orf58: hypoxia Mirshafiey & Naddafi, Am J Alzheimers Dis Other Demen. 2013

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ADNI Data Analysis 0 12 – log 10 (p) values on Hippocampus (L & R) corresponding to Top 2 SNPs Left Right rs657132rs rs rs

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ADNI Data Analysis –log 10 (p) values of significant clusters on Hippocampus (L & R) corresponding to Top 2 SNPs Left rs657132rs rs rs Right

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL We have developed a FFGWAS pipeline for efficiently carrying out whole-genome analyses of multimodal imaging data. Our FFGWAS consists of a multivariate varying coefficient model, a global sure independence screening (GSIS) procedure, and a detection procedure based on wild bootstrap methods. Two key advantages of using FFGWAS include (i) Much smaller computational complexity; (ii) GSIS for screening many noisy SNPs. We have successfully applied FFGWAS to hippocampal surface data & genetic data of ADNI study. Conclusion

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL A Software for FFGWAS