Washington State University

Slides:

Advertisements

Similar presentations

GBS & GWAS using the iPlant Discovery Environment

Advertisements

Genome-wide association mapping Introduction to theory and methodology

PAG 2011 TASSEL Terry Casstevens 1, Peter Bradbury 2,3, Zhiwu Zhang 1, Yang Zhang 1, Edward Buckler 1,2,4 1 Institute.

Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.

Population Stratification

Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.

The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop GWAS/QTL Apps Overview.

GenABEL: an R package for Genome Wide Association Analysis

Statistical Genomics Zhiwu Zhang Washington State University Lecture 26: Kernel method.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 19: SUPER.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 25: Ridge Regression.

Washington State University

Statistical Genomics Zhiwu Zhang Washington State University Lecture 29: Bayesian implementation.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 16: CMLM.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 7: Impute.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 9: Linkage Disequilibrium.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 20: MLMM.

Statistical Genomics Zhiwu Zhang Washington State University Lecture 11: Power, type I error and FDR.

Genome Wide Association Studies Zhiwu Zhang Washington State University.

Warm Up What did you learn from your research about sickle cell anemia? We will have a share out about what you learned.

Washington State University

Lecture 10: GWAS by correlation

Lecture 28: Bayesian Tools

Washington State University

Washington State University

Hardware Accelerator Test Bench for Error-Correcting Algorithms

Lecture 22: Marker Assisted Selection

Lecture 10: GWAS by correlation

Washington State University

Washington State University

Genome Wide Association Studies using SNP

Washington State University

Lecture 12: Population structure

Washington State University

Washington State University

Washington State University

Monte Carlo Methods in Scientific Computing

Washington State University

Washington State University

Washington State University

Lecture 10: GWAS by correlation

Washington State University

Washington State University

Lecture 23: Cross validation

Lecture 23: Cross validation

Washington State University

Washington State University

Lecture 10: GWAS by correlation

Lecture 16: Likelihood and estimates of variances

Washington State University

GWAS/QTL Apps Overview

Course calendar (page 1 of 2)

Lecture 26: Bayesian theory

Washington State University

Lecture 11: Power, type I error and FDR

Washington State University

Lecture 11: Power, type I error and FDR

Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS Xin He, Chris K. Fuller, Yi Song, Qingying Meng, Bin Zhang,

Washington State University

Lecture 27: Bayesian theorem

Washington State University

Lecture 18: Heritability and P3D

Washington State University

Lecture 17: Likelihood and estimates of variances

Washington State University

Lecture 23: Cross validation

Lecture 29: Bayesian implementation

Lecture 22: Marker Assisted Selection

Washington State University

Results from a GWAS of prostate cancer in the KP population (8,399 cases and 38,745 controls), highlighting key chromosomal regions. Results from a GWAS.

Presentation transcript:

Washington State University Statistical Genomics Lecture 21: FarmCPU Zhiwu Zhang Washington State University

Administration Homework 5, due April 12, Wednesday, 3:10PM Final exam: May 4 (Thursday), 120 minutes (3:10-5:10PM), 50

Outline History of method and software development FarmCPU

Problems in GWAS Computing difficulties: millions of markers, individuals, and traits False positives, ex: “Amgen scientists tried to replicate 53 high-profile cancer research findings, but could only replicate 6”, Nature, 2012, 483: 531 False negatives

GWAS Stream Q PC PC+K EMMA EMMAx Q+K MLMM CMLM SELECT P3D GCTA ECMLM FST-LMM GEMMA FarmCPU GenAbel BLINK

t test Computing speed Power | type I error GLM GenABEL FaST-LMM CMLM Speed improvement Power improvement GLM GenABEL Computing speed FaST-LMM CMLM ECMLM GEMMA Select P3D/EMMAX SUPER EMMA MLMM MLM Power | type I error

Usage of Software Packages Leading Authors Corresponding authors Language Released Citation PUMA Gabriel E. Hoffman Jason G. Mezey C++ 2013 27 TATES Sophie van der Sluis Fortran 76 GAPIT Lipka AE Zhang Z R 2012 284 MLMM Vincent S Nordborg M R/python 226 GEMMA Zhou X Stephens M 445 FastLMM Christoph L, Listgarten J, Heckerman D 2011 348 Qxpak M. Pérez-Enciso 2004 141 EMMAX Kang HM Sabatti C & Eskin E 2010 813 GCTA Jian Y 1338 GenABEL Aulchenko YS 2007 990 TASSEL Bradbury, Zhang, and Kroon Bradbury PJ Java 2006 1596 PLINK Purcell S 12111 65%

Why human geneticists not go beyond PLINK?

MLM was more enriched on Flowering time genes

Model Development Si: Testing marker Adjustment on marker Q: Population structure K: Kinship Adjustment on covariates S: Pseudo QTNs

SUPER algorithm y = PC + SNP + e Bins y = PC + Kinship + e -2LL QTNs y = PC + Kinship + SNP + e

FarmCPU algorithm y = PC + SNP + e Bins y = PC + Kinship + e -2LL QTNs y = PC + QTNs + SNP + e

t test Computing speed Power | type I error GLM GenABEL BLINK FarmCPU Speed improvement Power improvement GLM GenABEL BLINK FarmCPU Computing speed FaST-LMM CMLM ECMLM GEMMA Select P3D/EMMAX SUPER EMMA MLMM MLM Power | type I error

FARM-CPU (Fixed And Random Model Circuitous Probability Unification) Fixed model y = M1 + … + Mt + mi + e SNP p1 … NA pl Mt Pt1 Ptj Ptk Ptl Pt M2 P21 P2j P2k P2l P2 M1 P11 P1j P1k P1l P1 m1 mj mk ml Substitution FARM-CPU (Fixed And Random Model Circulative Probability Unification) Keywords: substitution, test, screen, storage, memory, mutation, markers, formula, optimization, processer, unit, background, The shaded area is the storage of p values for markers (dark shaded) and mutations (shadow shaded). The Manhattan plot (with red dots) area is the processer to optimize bin size and the bin selected as pseudo mutations (M) connected by the green wires. The equation is the processer to test marker (m) one at a time with mutation (M) as covariates. The p values of M are processed xx unit (non-shaded area) to get average P values which are connect by the blue wires to substitute the Nas for the corresponding markers which do not have P values in the test as they are confounded to M. Random model y = u + e with Var(u)∝SVD(M) Optimization

Re-analysis of Arabidopsis data Xiaolei Liu

Flowering time genes enriched

Associations on flowering time

It is time for human geneticists to move forward

Substitution makes difference

Converge fast

FarmCPU is computing efficient Testing 60K SNPs

Half million individuals, half million SNPs three days But, PINK new version is faster

ZZLab.Net

Summary History of method and software development FarmCPU