Genome-Wide Association Studies (GWAS) ulty/zhang/Webpages/zhang/courses/epi243_07/lectures/Genome-

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Lecture 2 Strachan and Read Chapter 13
What is an association study? Define linkage disequilibrium
Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Association Tests for Rare Variants Using Sequence Data
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Genetic Analysis in Human Disease
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Gene-gene and gene-environment interactions Manuel Ferreira Massachusetts General Hospital Harvard Medical School Center for Human Genetic Research.
Genome-Wide Association Studies (GWAS) Epidemiology 243 Molecular Epidemiology of Cancer Spring 2008.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display Human Genetics Concepts and Applications Seventh Edition.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Study Design Discussion The Ghost of Candidate Gene Past and the Ghost of Genome-wide Association Yet to Come Stephen S. Rich, Ph.D. Wake Forest University.
Introduction to Molecular Epidemiology Jan Dorman, PhD University of Pittsburgh School of Nursing
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
P REDICTION M ODELS U SING G ENOMIC P ROFILING H. Zhang E. Warner D. Zhao.
Understanding Genetics of Schizophrenia
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Georgia Wiesner, MD CREC June 20, GATACAATGCATCATATG TATCAGATGCAATATATC ATTGTATCATGTATCATG TATCATGTATCATGTATC ATGTATCATGTCTCCAGA TGCTATGGATCTTATGTA.
Multiple Choice Questions for discussion
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
Flow Cytometry at Boston University Medical Campus Introduction to some methods that we offer Yan Deng (X4-5225), Gerald Denis (X4-1371),
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Genome-Wide Association Study (GWAS)
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Selection of Candidate Genes for Population Studies Zuo-Feng Zhang, MD, PhD Epidemiology 243: Molecular Epidemiology.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Paolo Vineis University of Torino and ISI Foundation, Torino, Italy address: GENE-ENVIRONMENT INTERACTIONS IN CANCER.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Single Nucleotide Polymorphisms (SNPs
SNPs and complex traits: where is the hidden heritability?
Genomic Analysis: GWAS
Genome Wide Association Studies using SNP
High level GWAS analysis
Molecular Epidemiology Research and Training Program
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Chapter 7 Multifactorial Traits
Evaluating the Effects of Imputation on the Power, Coverage, and Cost Efficiency of Genome-wide SNP Platforms  Carl A. Anderson, Fredrik H. Pettersson,
Presentation transcript:

Genome-Wide Association Studies (GWAS) ulty/zhang/Webpages/zhang/courses/epi243_07/lectures/Genome- Wide_Association_Studies_(GWAS).ppt+&cd=2&hl=en&ct=clnk&gl=us&client=safari Slides 1-35 modified from:

Association Studies of Genetic Factors 1st generation Very small studies (<100 cases) Usually not epidemiologic study design; 1-2 SNPs 2nd generation Small studies ( cases) More epi focus; a few SNPs 3rd generation Large molecular epi studies (>500 cases) Proper epi design; pathways 4th generation Consortium-based pooled analyses (>2000 cases) GxE analyses 5th generation Post-GWS studies Boffeta, 2007

International Lung Cancer Consortium (ILCCO) Goodman Thun Benhamou Chen Berwick Schwarts Le Marchand Kiyohara McLaughlin Zhang Wiencke Yang Stucker Boffetta Spitz Tajima Risch Brennan Wichmann Wild Landi 3 cohort studies 17 population based case-control studies 13 hospital based case-control studies 2 studies with mixed controls 1 cross-sectional study Vineis Harris Christiani Lan Hong Lazarus

Issues in genetic association studies Many genes ~25,000 genes, many can be candidates Many SNPs ~12,000,000 SNPs, ability to predict functional SNPs is limited Methods to select SNPs: Only functional SNPs in a candidate gene Systematic screen of SNPs in a candidate gene Systematic screen of SNPs in an entire pathway Genomewide screen Systematic screen for all coding changes

Introduction A genome-wide association study is an approach that involves rapidly scanning markers across the complete sets of DNA, or genomes, of many people to find genetic variations associated with a particular disease. Once new genetic associations are identified, researchers can use the information to develop better strategies to detect, treat and prevent the disease. Such studies are particularly useful in finding genetic variations that contribute to common, complex diseases, such as asthma, cancer, diabetes, heart disease and mental illnesses.

Definition of GWAS A genome-wide association study is defined as any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease (such as cancer) or condition.

Potential of GWAS Whole genome information, when combined with epidemiological, clinical and other phenotype data, offers the potential for increased understanding of basic biological processes affecting human health, improvement in the prediction of disease and patient care, and ultimately the realization of the promise of personalized medicine. In addition, rapid advances in understanding the patterns of human genetic variation and maturing high-throughput, cost-effective methods for genotyping are providing powerful research tools for identifying genetic variants that contribute to health and disease.

Potential of GWAS

Selection of SNPs (Genome-wide association studies) Molecular Higher requirements: Affymetrix and Illumina Analytical Highest requirements: Data management, automation Advantages No biological assumptions and can identify novel genes/pathways Excellent chance to identify risk alleles Utility in individual risk assessment Disadvantages High costs Concern of multiple tests

SNP Selection

Affymetrix® Genome-Wide Human SNP Array The new Affymetrix® Genome-Wide Human SNP Array 6.0 features 1.8 million genetic markers, including more than 906,600 single nucleotide polymorphisms (SNPs) and more than 946,000 probes for the detection of copy number variation. The SNP Array 6.0 represents more genetic variation on a single array than any other product, providing maximum panel power and the highest physical coverage of the genome.

The need for GWA Current understanding of disease etiology is limited Therefore, candidate genes or pathways are insufficient Current understanding of functional variants is limited Therefore, the focusing on nonsynonymous changes is not sufficient Results from linkage studies are often inconsistent and broad Therefore, the utility of identified linkage regions is limited GWA studies offer an effective and objective approach Better chance to identify disease associated variants Improve understanding of disease etiology Improve ability to test gene-gene interaction and predict disease risk Xu JF, 2007

GWA is promising Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome Over 12 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal variants The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect When the above conditions are met… …associated SNPs will have different frequencies between cases

GWA is challenging Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants They confer a stronger risk when they interact True associated SNPs are not necessary highly significant Too many SNPs are evaluated False positives due to multiple tests Single studies tend to be underpowered False negatives Considerable heterogeneity among studies Phenotypic and genetic heterogeneity False positives due to population stratification Xu, 2007

Genome coverage Two major platforms for GWA Illumina: HumanHap300, HumanHap550, and HumanHap1M Affymetrix: GeneChip 100K, 500K, 1M, and 2.3M Genome-wide coverage The percentage of known SNPs in the genome that are in LD with the genotyped SNPs Calculated based on HapMap (Haplotype map) Calculated based on ENCODE Encyclopedia of DNA Elements identify all functional elements in the human genome. Xu, 2007

Strategies for pre-association analysis Quality control Filter SNPs by genotype call rates Filter SNPs by minor allele frequencies Filter SNPs by testing for Hardy-Weinberg Equilibrium (p + q) 2 = p 2 + 2pq + q 2 = 1

Data Analysis Single SNP analysis using pre- specified genetic models 2 x 3 table (2-df) Additive model (1-df), and test for additivity All possible genetic models (recessive, dominant)

Data Analysis Haplotype analysis Gene-gene and gene-environment interactions Interaction with main effect Logistic regression Interaction without main effect: data mining Classification and recursive tree (CART) Multifactor Dimensionality Reduction (MDR)

Sample size needs as a function of genotype prevalence and OR for main effects Boffeta, 2007

False Positives False positives: too many dependent tests Adjust for number of tests Bonferroni correction Nominal significance level = study-wide significance / number of tests Nominal significance level = 0.05/500,000 = Effective number of tests Take LD into account Permutation procedure Permute case-control status Mimic the actual analyses Obtain empirical distribution of maximum test statistic under null hypothesis

False Positives False discovery rate (FDR) Expected proportion of false discoveries among all discoveries Offers more power than Bonferroni Holds under weak dependence of the tests

False Positives Bayesian approach Taking a priori into account, False-Positive Report Probability (FPRP)

Confirmation in independent study populations The approach may limit the number of false positives Confirmation is needed to dissect true from false positives Replication, examine the results from the 2 nd stage only Joint analysis, combining data from 1 st stage with 2 nd stage Multiple stages

Issues of GWAS Population stratification Multiple Testing: False Positives Gene-Environmental Interaction High Costs

Kingsmore, 2008

Hypothesis The overall hypothesis is that multiple sequence variants in the genome are associated with the risk of lung cancer among non-smokers. Specifically, we hypothesize that a number of common nonsmoking lung cancer risk-modifying SNPs are in strong LD with the SNPs arrayed on the 500K GeneChip®.

If DNA damage not repaired DNA damage repaired If loose cell cycle control Defected DNA repair gene G1 S G2 M P53 Cyclin D1 P16 Environmental Carcinogens / Procarcinogens Exposures PAHs, Xenobiotics, Arene, Alkine, etc Active carcinogens Detoxified carcinogens DNA Damage Normal cell Carcinogenesis Programmed cell death Tobacco consumption Occupational Exposures Environmental Exposure CYP1A1 GSTP1 mEH NQO1 XRCC1 GSTM1 Theoretical model of gene-gene/environmental interaction pathway for lung cancer Ile 105 Val  Ala 114 Val  Tyr 113 His  His 139 Arg  Tyr 113 His  His 139 Arg  Pro 187 Ser  MspI Ile 462 Val  Arg 194 Trp, Arg 399 Gln, Arg 280 His  Null  Ala 146 Thr Arg 72 Pro  G 870 A  G0

Figure 1. The effects of SNPs on the Risk of Lung Cancer among Smokers and Non-smokers OR

Flow cytometry analysis Facsalibur sorting

Fortessa cytometerThe analyzer can be configured with up to 5 lasers to detect up to 20 parameters simultaneously to support ever increasing demands in multicolor flow cytometry. A wide range of up to 34 laser choices is available as excitation sources, including blue, red, violet, yellow-green, and UV Excitation Optics The excitation optics consist of multiple fixed wavelength lasers, beam shaping optics, and individual pinholes which result in spatially separated beam spots. A final lens focuses the laser light into the gel-coupled cuvette flow cell. Since the optical pathway and the sample core stream are fixed, alignment is constant from day to day and from experiment to experiment. Collection Optics Emitted light from the gel-coupled cuvette is delivered by fiber optics to the detector arrays. The collection optics are set up in patented octagon- and trigon-shaped optical pathways that maximize signal detection resulting from each laser illuminated beam spot. Bandpass filters in front of each PMT allow spectral selection of the collected wavelengths. Importantly, this arrangement allows filter and mirror changes within the optical array to be made easily and requires no additional alignment for maximum signal strength.

FACSAria Three lasers provide excitation at 407, 488, and 633 nm for analysis of up to 10 fluorescence channels plus forward and side scatter Digital electronics Sort up to four populations simultaneously

Spectral overlap

Figure 2: Fluorescein emission profile with two filters overlaid. The standard filter for fluorescein is a 530/30 filter. This filter allows light between nm to pass through the filter. The second filter, 585/42, is a common filter for the fluorescent molecule phycoerythrin (PE) and allows light between nm to pass. The overlap of the fluorescein molecule into the PE detector indicates that approximately 12% of the fluorescein molecule is being measured in the PE detector. Figure generated using the Invitrogen spectral viewer. Compensation is the process of correcting the spillover from our primary signal in each secondary channel it is measured in.