Genome-Wide Association Study (GWAS)

Slides:



Advertisements
Similar presentations
What is an association study? Define linkage disequilibrium
Advertisements

Association Tests for Rare Variants Using Sequence Data
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Genetic Analysis in Human Disease
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
Chapter 5 Human Heredity by Michael Cummings ©2006 Brooks/Cole-Thomson Learning Chapter 5 Complex Patterns of Inheritance.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
Human Genomics Chapter 5. Human Genomics Human genomics is the study of the human genome. It involves determining the sequence of the nucleotide base.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Gene Hunting: Linkage and Association
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
POLYMORPHISM AND VARIANT ANALYSIS Saurabh Sinha, University of Illinois.
Biotechnology and Genomics Chapter 16. Biotechnology and Genomics 2Outline DNA Cloning  Recombinant DNA Technology ­Restriction Enzyme ­DNA Ligase 
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Genome wide association studies (A Brief Start)
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Unit 1 – Living Cells.  The study of the human genome  - involves sequencing DNA nucleotides  - and relating this to gene functions  In 2003, the.
Notes: Human Genome (Right side page)
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Genome-Wides Association Studies (GWAS) Veryan Codd.
Genetic Analysis in Human Disease Kim R. Simpfendorfer, PhD Robert S.Boas Center for Genomics & Human Genetics The Feinstein Institute for Medical Research.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Single Nucleotide Polymorphisms (SNPs
Common variation, GWAS & PLINK
Biology, 9th ed,Sylvia Mader
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Chapter 7 Multifactorial Traits
Medical genomics BI420 Department of Biology, Boston College
Biology, 9th ed,Sylvia Mader
Medical genomics BI420 Department of Biology, Boston College
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
SNPs and CNPs By: David Wendel.
Presentation transcript:

Genome-Wide Association Study (GWAS) Presented by Karen Xu

What you need to know Basic genetic concepts behind GWAS Genotyping technologies and common study designs Statistical concepts for GWAS analysis Replication, interpretation and follow-up of association results

Central Goal of Human Genetics To identify genetic risk factors for common, complex diseases

Goal of GWAS To use genetic risk factors to predict who is at risk Identify the biological underpinnings of disease susceptibility for developing new prevention and treatment strategies

Application in pharmacology Identifying DNA sequence variations associated w/ drug metabolism and efficacy as well as adverse effects Example, warfarin---determining the appropriate dose Personalized medicine

Concepts underlying the study design SNP---single nucleotide polymorphism Single base pair changes in the DNA sequence that occur with high frequency in the human genome SNP (common) vs. Mutation (rare) Cystic fibrosis---mutations in the CFTR gene Linage analysis---genotyping families affected by cystic fibrosis using a collection of genetic markers across the genome and examining how these genetic markers segregate w/ the disease across multiple familes

Common Disease Common Variant Hypothesis Common disorders are likely influenced by genetic variation that is also common in the population 1. If common genetic variants influence disease, the effect size (or penetrance) for any one variant must be small relative to that found for rare disorders. 2. If common alleles have small genetic effects (low penetrance), but common disorders show heritability (inheritance in families), then multiple common alleles must influence disease susceptibility.

Figure 1. Spectrum of Disease Allele Effects. Bush WS, Moore JH (2012) Chapter 11: Genome-Wide Association Studies. PLoS Comput Biol 8(12): e1002822. doi:10.1371/journal.pcbi.1002822 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002822

Capturing Common Variation 1. location and density of commonly occurring SNPs is needed to identify the genomic regions and individual sites that must be examined by genetic studies 2. population-specific differences in genetic variation must be cataloged so that studies of phenotypes in different populations can be conducted with the proper design 3. correlations among common genetic variants must be determined so that genetic studies do not collect redundant information

International HapMap Project Used a variety of sequencing techniques to discover and catalog SNPs in European descent populations, the Yoruba populations of African origin, Han Chinese individuals from Beijing, and Japanese individuals from Tokyo Has since been expanded to include 11 human populations

Linkage Disequilibrium A property of SNPs on a contiguous stretch of genomic sequence that describes the degree to which an allele of a SNP is inherited or correlated with an allele of another SNP within a population Linkage between markers on a population scale

Figure 2. Linkage and Linkage Disequilibrium. Bush WS, Moore JH (2012) Chapter 11: Genome-Wide Association Studies. PLoS Comput Biol 8(12): e1002822. doi:10.1371/journal.pcbi.1002822 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002822

Direct vs. Indirect Association LD creates two possible positive outcomes from a genetic association study 1. direct association----the SNP influencing a biological system that leads to the phenotype is directly genotyped in the study 2. Indirect association----the influential SNP is not directly typed, but instead a tag SNP in high LD with the influential SNP is typed Therefore, a significant SNP association from a GWAS should not be assumed as the causal variant

Genotyping Technologies Chip-based microarray technology Illumina, NA molecules and primers are first attached on a slide and amplified with polymerase so that local clonal DNA colonies, later coined "DNA clusters", are formed. To determine the sequence, four types of reversible terminator bases (RT-bases) are added and non-incorporated nucleotides are washed away. A camera takes images of the fluorescently labeled nucleotides, then the dye, along with the terminal 3' blocker, is chemically removed from the DNA, allowing for the next cycle to begin.

Study Design Case control vs. quantitative design Two primary classes of phenotypes: categorical or quantitative From the statistical perspective, quantitative traits are preferred, but not required for a successful study

Association Test 1. single-locus analysis When a well-defined phenotype has been selected for a study population, and genotypes are collected using sound techniques, the statistical analysis can begin Quantitative traits----ANOVA (analysis of variance)---null hypothesis is that there is no difference between the trait means of any genotype group Dichotomous case/ control traits are analyzed using logistic regression---null hypothesis---there is no association between the phenotype and genotype http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html

Statistical replication Replication studies should be conducted in an independent dataset drawn from the same population as GWAS Once an effect is confirmed in the target population, other populations may be sampled to determine if the SNP has an ethnic-specific effect Identical phenotype criteria should be used in both GWAS and replication studies A similar effect should be seen in the replication set from the same SNP, or a SNP in high LD with the GWAS-identified SNP

Meta-analysis of multiple analysis results Meta-analysis developed to examine and refine significance and effect size estimates from multiple studies examining the same hypothesis in the published literature However, it is rare to find multiple studies that match perfectly on all criteria Study heterogeneity is often statistically quantified in a meta-analysis to determine the degree to which studies differ.

Data Imputation To conduct a meta-analysis properly, the effect of the same allele across multiple distinct studies must be assessed. This can prove difficult if different studies use different genotyping platforms (which use different SNP marker sets). As this is often the case, GWAS datasets can be imputed to generate results for a common set of SNPs across all studies. Genotype imputation exploits known LD patterns and haplotype frequencies from the HapMap or 1000 Genomes project to estimate genotypes for SNPs not directly genotyped in the study [50].

Logistic regression Predicting the likelihood that Y is equal to 1 (rather than 0) given certain values of X Example: we try to predict whether or not small business will succeed based on the number of years of experience the owner has in the field prior to starting the business. We presume that those people who have more experience will be more likely to succeed As X (the number of years of experience) increases, the probability that Y will be equal to 1 (success in the business) will tend to increase

Logistic Regression

Logistic Regression

Logistic Regression