1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants Journal club (Nov/13) SH Lee.
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Association Tests for Rare Variants Using Sequence Data
A Method for Detecting Pleiotropy
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
GBS & GWAS using the iPlant Discovery Environment
PROC GLIMMIX: AN OVERVIEW
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Association Modeling With iPlant
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
:NEUROPSYCHIATRIC GENETICS [BIOSTATISTICS|BIOINFORMATICS] CORE BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS DATA: DATA MANAGEMENT AND ANALYSIS RICHARD.
Quantitative Genetics
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
EPI 809/Spring Multiple Logistic Regression.
Introduction to Linear Mixed Effects Kiran Pedada PhD Student (Marketing) March 26, 2015.
Linkage Analysis in Merlin
Kaitlyn Cook Carleton College Northfield Undergraduate Mathematics Symposium October 7, 2014 A METHOD FOR COMBINING FAMILY-BASED RARE VARIANT TESTS OF.
Robust and powerful sibpair test for rare variant association
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Copy the folder… Faculty/Sarah/Tues_merlin to the C Drive C:/Tues_merlin.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Creating a Kinship Matrix using Microsatellite Analyzer (MSA) Zhifen Zhang The Ohio State University.
Generalized Linear Mixed Model (GLMM) & Weighted Sum Test (WST) Detecting Association between Rare Variants and Complex Traits Qunyuan Zhang, Ingrid Borecki,
Population Stratification
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Next-Generation Sequencing
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Mapping Quantitative Traits using IBD sharing References: Introduction to Quantitative Genetics, by D.S. Falconer and T. F.C. Mackay (1996) Longman.
Input: A set of people with/without a disease (e.g., cancer) Measure a large set of genetic markers for each person (e.g., measurement of DNA at various.
Next-Generation Sequencing Eric Jorgenson Epidemiology 217 2/28/12.
Permutation Analysis Benjamin Neale, Michael Neale, Manuel Ferreira.
Quantitative Genetics
BUSI 6480 Lecture 8 Repeated Measures.
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
A Statistical Method for Adjusting Covariates in Linkage Analysis With Sib Pairs Colin O. Wu, Gang Zheng, JingPing Lin, Eric Leifer and Dean Follmann Office.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Linear Models Alan Lee Sample presentation for STATS 760.
Council on Dairy Cattle Breeding April 27, 2010 Interpretation of genomic breeding values from a unified, one-step national evaluation Research project.
Genome wide association studies (A Brief Start)
GenABEL: an R package for Genome Wide Association Analysis
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Analysis of Next Generation Sequence Data BIOST /06/2015.
Sequence Kernel Association Tests (SKAT) for the Combined Effect of Rare and Common Variants 統計論文 奈良原.
QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Regression Models for Linkage: Merlin Regress
upstream vs. ORF binding and gene expression?
Generalized Linear Models
Genome Wide Association Studies using SNP
Introduction to Data Formats and tools
Regression-based linkage analysis
Linkage in Selected Samples
Mapping Quantitative Trait Loci
Genome-wide Association Studies
Linking Genetic Variation to Important Phenotypes
What are BLUP? and why they are useful?
Pathways and Interactions
Lecture 9: QTL Mapping II: Outbred Populations
The Basic Genetic Model
Presentation transcript:

1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine

2 Data & Question Relationship between X and Y ? Genotypes: SNP Insertion Deletion Duplication Inversion Translocation … Phenotypes (quantitative, categorical)

3 Linkage & Association Association: (Y,X) Linkage: (Y,Q) Q is unobservable Genotypes Phenotype Putative QTL r 1 Q r 2

4 A Fixed-effect Mixture Model For Linkage Commonly used in plant genetics r 1 Q r 2 P 1 X P 2 F1F1 F2F2 SNP A SNP B

5 A Variance-component Model For Linkage Commonly used in human genetics r 1 Q r 2 Background IBD matrix QTL IBD matrix Diagonal unit matrix SNP A SNP B

6 Variance-component Model = Random-effect Linear Model Random effects

7 From Linkage to Association marker effect(s) Family-based association model Linkage model QTL effect(s) fixed effect(s)

8 A Simple Association Model For Unrelated Subjects

9 Covariate(s): Adjusting For Confounder(s) Observed confounders: age, sex etc. Hidden confounders: population structure Population structure can be estimated by: -PCA -Clustering -Admixture/ancestry

10 Modeling Hidden Genetic Correlation Between Subjects marker fixed effect(s) Family data, pedigree => IBD matrix Population data, hidden, marker data => IBS matrix covariate fixed effect(s) Genetic background random effects

11 Modeling Rare Variants Common variants, tested individually, H0: β 1 =0. One p-value per variant Rare variants, tested as an entire group (burden test), usually by gene H0: β 1 = β 2 =…=β k =0. One p-value per group of variants  Incorporated with variable selection, with loose criteria  β can be treated as random effects, variance components test, can be weighted by prior information

12 Collapsing Model Collapsing multiple variables into one

13 Weighted Sum Model Weighted sum score

14 Weighting Variants  Base on allele frequency, continuous or binary(0,1) weight, variable threshold;  Based on function annotation/prediction;  Based on sequencing quality (coverage, mapping quality, genotyping quality, validated or not etc.);  Data-driven, using both genotype and phenotype data, learning weights (including effect directions) from data, requiring permutation test;  Any combination … Grouping Variants By geneBy transcriptBy exon By gene set / pathwayBy protein domain ……

15 Modeling More Data Types Generalized Linear (Mixed) Model Link function For binary Y, logistic model

16 Longitudinal Data (quantitative)  Fixed effect, time as covariate  Repeated measures, random effect, correlation within subjects Time

17 Longitudinal Data (binary)  Linear model, time as covariate  Survival analysis, CoxPH model etc. Time

18 Tools SAS Procedures REG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST R Functions/Packages lm (), glm() gee, nlme, kinship2/coxme, lme4, survival Other Programs SOLAR, MMAP, EMMA, EMMAX, SKAT

19 Pipeline job1 job2 ….. Job N Input (data + options) Options.jobi => self-programmed modules (SAS, R,…) Options.jobi => external program modules (MMAP, SKAT,..) Result 1 Result 2 ….. Result N Job generating/submitting module Job number controlling module Job status monitoring module (all done ?) Yes Result summarizing module no Wait … LSF bsub

20 gwas.sh options.gwa #!/bin/sh OPFILE=$1... … [DATA] database=SAS genotype_dir=/dsg1/gwas/fhsgeno genotype_file= phenotype_file=fhs100 markerinfo_file=mapall marker_selection=MAF>0.01 pedigree_file=pediall subjectID=subject pedgreeID=famid markername=snp … [ANALYSIS] phenolist_file= pheno_list=bmi/qt covariates= program=SASGLM analysis=mixed [OUTPUT] output_dir=/dsguser/qunyuan/fhs/bmi output_file= output_replace=no [RUN] clusterjobname=bmimixed memsize=1000M maxjobn=300 … Phenotypecovarprogramanalysis run Bmiqtage,sexSASGLMmixed YES Obes qlNASASGLMgee YES HD qlageSASGLMgee NO Age… Sex… … Programlanguagelocation Maintainer SASGLMSAS/dsg1/code/sas/glm.sasQ.Zhang GSTATR/dsg1/code/R/gstat.RQ.Zhang MMAPC /dsg1/code/sas/mmap.sh J. Czajkowski …

21 Thanks !