Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Similar presentations


Presentation on theme: "Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,"— Presentation transcript:

1 Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director, Center for Human Genomics Wake Forest University School of Medicine GWA ─ promising but challenging

2 Outline The need for genome-wide association studies The reality of genome-wide association studies Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

3 The need for GWA Current understanding of disease etiology is limited Therefore, candidate genes or pathways are insufficient Current understanding of functional variants is limited Therefore, the focusing on nonsynonymous changes is not sufficient Results from linkage studies are often inconsistent and broad Therefore, the utility of identified linkage regions is limited GWA studies offer an effective and objective approach Better chance to identify disease associated variants Improve understanding of disease etiology Improve ability to test gene-gene interaction and predict disease risk

4 GWA is promising Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome Over 6 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal variants The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect When the above conditions are met… …associated SNPs will have different frequencies between cases and controls

5 GWA is challenging Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants They confer a stronger risk when they interact True associated SNPs are not necessary highly significant Too many SNPs are evaluated False positives due to multiple tests Single studies tend to be underpowered False negatives Considerable heterogeneity among studies Phenotypic and genetic heterogeneity False positives due to population stratification

6 Reality of GWA AMD, IBD, T1D, etc. Parkinson’s, nicotine dependence, T2D, etc. Prostate cancer, breast cancer, and other ongoing studies Heart diseases, lung diseases, psychiatric diseases, inflammatory diseases, cancers, and many other studies that are in planning stages

7 Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

8 Genome coverage Two major platforms for GWA Illumina: HumanHap300, HumanHap550, and HumanHap1M Affymetrix: GeneChip 100K, 500K, and 1M Genome-wide coverage The percentage of known SNPs in the genome that are in LD with the genotyped SNPs Calculated based on HapMap Calculated based on ENCODE

9 Genome coverage Genome-wide coverage Genome coverage of common SNPs (MAF ≥ 0.05) Genome coverage of rare SNPs Genome coverage using multi-markers Pe’er, 2006

10 Genome coverage Genome coverage for common SNPs (MAF ≥ 0.05) Pe’er, 2006

11 Genome coverage Genome coverage for common SNPs (MAF ≥ 0.05) Genome coverage for common and rare SNPs Pe’er, 2006

12 Genome coverage Genome coverage of common SNPs (MAF ≥ 0.05) Genome coverage of common and rare SNPs Genome coverage using multi-markers Pe’er, 2006

13 Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

14 Strategies for pre-association analysis Quality control Filter SNPs by genotype call rates Filter SNPs by minor allele frequencies Filter SNPs by testing for Hardy-Weinberg Equilibrium

15 Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Evaluate whether there is an upward bias in association tests

16 Q-Q plot Clayton, 2006 Adjust for stratificationFilter by call rateAll SNPs

17 Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Correct for stratification by adjusting association statistics at each SNP by a uniform overall inflation factor Is susceptible to over or under adjustment

18 Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Structure (STRUCTURE) Used to assign the samples to discrete subpopulation clusters and then aggregate evidence of association within each cluster Estimate individual proportion of ancestry and treat it as a covariate Computationally intensive when there are a large number of AIMs

19 Strategies for pre-association analysis Quality control Quantile-quantile plot (Q-Q plot) Population stratification Genomic control Structure (STRUCTURE) Principal component analysis (EIGENSTRAT) Identify several eigenvectors (ancestries or geographic regions) Adjust genotypes and phenotypes along each eigenvector Compute association statistics using adjusted genotypes and phenotypes No need for AIMs

20 Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

21 Strategies for association analysis Single SNP analysis using pre-specified genetic models 2 x 3 table (2-df) Additive model (1-df), and test for additivity All possible genetic models

22 Strategies for association analysis Single SNP analysis using pre-specified genetic models Haplotype analysis Two-marker and three-marker slide Multi-marker Within haplotype block Between two recombination hot spots

23 Strategies for association analysis Single SNP analysis using pre-specified genetic models Haplotype analysis Gene-gene and gene-environment interactions Interaction with main effect Logistic regression Interaction without main effect: data mining Classification and recursive tree (CART) Multifactor Dimensionality Reduction (MDR)

24 Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

25 Sample size and false positives Estimate sample size Sample size OR MAF Type I error Power Quanto Effective sample size

26 Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests Bonferroni correction  Nominal significance level = study-wide significance / number of tests  Nominal significance level = 0.05/500,000 = 10 -7 Effective number of tests  Take LD into account Permutation procedure  Permute case-control status  Mimic the actual analyses  Obtain empirical distribution of maximum test statistic under null hypothesis

27 Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests False discovery rate (FDR) Expected proportion of false discoveries among all discoveries Offers more power than Bonferroni Holds under weak dependence of the tests

28 Sample size and false positives Estimate sample size False positives: too many dependent tests Adjust for number of tests False discovery rate (FDR) Bayesian approach Taking a priori into account, False-Positive Report Probability (FPRP)

29 Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

30 Confirmation in independent study populations The above approaches may limit the number of false positives Confirmation is needed to dissect true from false positives Replication, examine the results from the 2 nd stage only Joint analysis, combining data from 1 st stage with 2 nd stage Multiple stages

31 Replication vs. joint analysis Skol, 2006

32 Multiple stages 1 st stage # of true sig. SNPs (80% power) # of total sig. SNPs (  = 0.01) # of Risk SNPs # of SNPs tested % of true sig. SNPs 20 500,000 16 5,016 0.38% 2 nd stage 16 5,016 13 63 21% 3 rd stage 13 63 10 100%

33 Important issues in genome-wide association studies Genome coverage Strategies for pre-association analysis Strategies for association analysis Sample size and false positives (Type I and II errors) Confirmation in independent study populations Increase the magnitude of effects of a specific gene

34 Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotype, e.g. aggressive or early onset

35 Study aggressive cases

36 Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history

37 Study cases with family history Antoniou and Easton, 2003

38 Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history Controls that are disease free

39 Disease free controls

40 Increase the magnitude of effects of a specific gene Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset Cases with family history Controls that are disease free Increase their effects by studying a homogeneous population Lower levels of genetic heterogeneity

41 Summary GWA studies are promising but difficult There are many important issues in GWA The impact of these issues can be minimized by a well- designed study


Download ppt "Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,"

Similar presentations


Ads by Google