Download presentation
Presentation is loading. Please wait.
Published byCornelius Bates Modified over 9 years ago
1
1 Associating Genomic Variations with Phenotypes Model comparison, rare variants, and analysis pipeline Qunyuan Zhang Division of Statistical Genomics & Genome Institute Washington University School of Medicine
2
2 Data & Question Relationship between X and Y ? Genotypes: SNP Insertion Deletion Duplication Inversion Translocation … Phenotypes (quantitative, categorical)
3
3 Linkage & Association Association: (Y,X) Linkage: (Y,Q) Q is unobservable Genotypes Phenotype Putative QTL r 1 Q r 2
4
4 A Fixed-effect Mixture Model For Linkage Commonly used in plant genetics r 1 Q r 2 P 1 X P 2 F1F1 F2F2 SNP A SNP B
5
5 A Variance-component Model For Linkage Commonly used in human genetics r 1 Q r 2 Background IBD matrix QTL IBD matrix Diagonal unit matrix SNP A SNP B
6
6 Variance-component Model = Random-effect Linear Model Random effects
7
7 From Linkage to Association marker effect(s) Family-based association model Linkage model QTL effect(s) fixed effect(s)
8
8 A Simple Association Model For Unrelated Subjects
9
9 Covariate(s): Adjusting For Confounder(s) Observed confounders: age, sex etc. Hidden confounders: population structure Population structure can be estimated by: -PCA -Clustering -Admixture/ancestry
10
10 Modeling Hidden Genetic Correlation Between Subjects marker fixed effect(s) Family data, pedigree => IBD matrix Population data, hidden, marker data => IBS matrix covariate fixed effect(s) Genetic background random effects
11
11 Modeling Rare Variants Common variants, tested individually, H0: β 1 =0. One p-value per variant Rare variants, tested as an entire group (burden test), usually by gene H0: β 1 = β 2 =…=β k =0. One p-value per group of variants Incorporated with variable selection, with loose criteria β can be treated as random effects, variance components test, can be weighted by prior information
12
12 Collapsing Model Collapsing multiple variables into one
13
13 Weighted Sum Model Weighted sum score
14
14 Weighting Variants Base on allele frequency, continuous or binary(0,1) weight, variable threshold; Based on function annotation/prediction; Based on sequencing quality (coverage, mapping quality, genotyping quality, validated or not etc.); Data-driven, using both genotype and phenotype data, learning weights (including effect directions) from data, requiring permutation test; Any combination … Grouping Variants By geneBy transcriptBy exon By gene set / pathwayBy protein domain ……
15
15 Modeling More Data Types Generalized Linear (Mixed) Model Link function For binary Y, logistic model
16
16 Longitudinal Data (quantitative) Fixed effect, time as covariate Repeated measures, random effect, correlation within subjects Time
17
17 Longitudinal Data (binary) Linear model, time as covariate Survival analysis, CoxPH model etc. Time
18
18 Tools SAS Procedures REG, LOGISTIC, GENMOD, MIXED, HPMIXED, GLIMMIX, PHREG/LIFETEST R Functions/Packages lm (), glm() gee, nlme, kinship2/coxme, lme4, survival Other Programs SOLAR, MMAP, EMMA, EMMAX, SKAT
19
19 Pipeline job1 job2 ….. Job N Input (data + options) Options.jobi => self-programmed modules (SAS, R,…) Options.jobi => external program modules (MMAP, SKAT,..) Result 1 Result 2 ….. Result N Job generating/submitting module Job number controlling module Job status monitoring module (all done ?) Yes Result summarizing module no Wait … LSF bsub
20
20 gwas.sh options.gwa #!/bin/sh OPFILE=$1... … [DATA] database=SAS genotype_dir=/dsg1/gwas/fhsgeno genotype_file= phenotype_file=fhs100 markerinfo_file=mapall marker_selection=MAF>0.01 pedigree_file=pediall subjectID=subject pedgreeID=famid markername=snp … [ANALYSIS] phenolist_file= pheno_list=bmi/qt covariates= program=SASGLM analysis=mixed [OUTPUT] output_dir=/dsguser/qunyuan/fhs/bmi output_file= output_replace=no [RUN] clusterjobname=bmimixed memsize=1000M maxjobn=300 … Phenotypecovarprogramanalysis run Bmiqtage,sexSASGLMmixed YES Obes qlNASASGLMgee YES HD qlageSASGLMgee NO Age… Sex… … Programlanguagelocation Maintainer SASGLMSAS/dsg1/code/sas/glm.sasQ.Zhang GSTATR/dsg1/code/R/gstat.RQ.Zhang MMAPC /dsg1/code/sas/mmap.sh J. Czajkowski …
21
21 Thanks !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.