Genome wide association studies (A Brief Start)

Slides:



Advertisements
Similar presentations
Analysis of imputed rare variants
Advertisements

What is an association study? Define linkage disequilibrium
Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ferdinand van ’t Hooft Cardiovascular Genetics and Genomics Group Karolinska Institutet, Stockholm, Sweden Genome-Wide Association Study GWAS
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
1 FSTL4 and SEMA5A are associated with alcohol dependence: meta- analysis of two genome-wide association studies Kesheng Wang, PhD Department of Biostatistics.
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
BNFO 602 Lecture 2 Usman Roshan. Bioinformatics problems Sequence alignment: oldest and still actively studied Genome-wide association studies: new problem,
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Genome Variations & GWAS
Rare and common variants: twenty arguments G.Gibson Homework 3 Mylène Champs Marine Flechet Mathieu Stifkens 1 Bioinformatics - GBIO K.Van Steen.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
A basic review of genetics Dr. Danny Chan Associate Professor Assistant Dean (Faculty of Medicine) Department of Biochemistry Department of Biochemistry.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Genome-Wide Association Study (GWAS)
BGRS 2006 SEARCH FOR MULTI-SNP DISEASE ASSOCIATION D. Brinza, A. Perelygin, M. Brinton and A. Zelikovsky Georgia State University, Atlanta, GA, USA 123.
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Methods in genome wide association studies. Norú Moreno
POLYMORPHISM AND VARIANT ANALYSIS Saurabh Sinha, University of Illinois.
Future Directions Pak Sham, HKU Boulder Genetics of Complex Traits Quantitative GeneticsGene Mapping Functional Genomics.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Genome-Wides Association Studies (GWAS) Veryan Codd.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Power Calculations for GWAS
SNPs and complex traits: where is the hidden heritability?
Genomic Analysis: GWAS
Common variation, GWAS & PLINK
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
High level GWAS analysis
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Beyond GWAS Erik Fransen.
Chapter 7 Multifactorial Traits
Medical genomics BI420 Department of Biology, Boston College
Perspectives from Human Studies and Low Density Chip
Medical genomics BI420 Department of Biology, Boston College
Hunting for Celiac Disease Genes
Presentation transcript:

Genome wide association studies (A Brief Start) Source: PLoS Comput Biol. 2012 Dec; 8(12): e1002822. Published online 2012 Dec 27. doi:  10.1371/journal.pcbi.1002822 PMCID: PMC3531285 Chapter 11: Genome-Wide Association Studies William S. Bush1,* and Jason H. Moore2 Fran Lewitter, Editor and Maricel Kann, Editor And Zhiwu Zhang Lecture and Labs

GWAS Idea is a epidemiological study of common diseases using the Genome. Essentially GWAS searches the genome for small variations, called single nucleotide polymorphisms or SNPs, that occur more frequently in people with a particular disease than in people without the disease or vice versa. Then it does significance testing to see if there are any association between the disease and the location of that the genetic variation. First we need to understand what is a SNP.

SNP Most humans have a genome that is very similar but there are locations on the genome where commonly there are differences between people. SNPs are single base-pair changes in the DNA sequence that occur with high frequency in the human genome. SNPs are typically used as markers of a genomic region, with the large majority of them having a minimal impact on biological systems. SNPs can have functional consequences, causing amino acid changes, changes to mRNA transcript stability, and changes to transcription factor binding affinity. SNPs are by far the most abundant form of genetic variation in the human genome. SNPs typically have two alleles, meaning within a population there are two commonly occurring base-pair possibilities for a SNP location.

SNP versus Mutation The frequency of a SNP is given in terms of the minor allele frequency or the frequency of the less common allele. A SNP with a minor allele G frequency of 0.35 implies that 35% of a population has the allele versus the more common allele (the major allele), which is found in 65% of the population. Mutations: These conditions are largely caused by extremely rare genetic variants that ultimately induce a detrimental change to protein function, which leads to the disease state. Variants with such low frequency in the population are sometimes referred to as mutations, though they can be structurally equivalent to SNPs - single base-pair changes in the DNA sequence. In the genetics literature, the term SNP is generally applied to common single base-pair changes, and the term mutation is applied to rare genetic variants.

SNP and GWAS GWAS examine SNPs across the genome, they represent a promising way to study complex, common diseases in which many genetic variations contribute to a person’s risk. This approach has already identified SNPs related to several complex conditions including diabetes, heart abnormalities, Parkinson disease, and Crohn disease. There is hope that as we do more studies we will understand more common diseases.

CV/CD hypothesis This hypothesis states that common disorders are likely influenced by common genetic variation If common genetic variants influence disease, the effect size for any one variant must be small relative to that found for rare disorders. If common disorders show heritability (inheritance in families), then multiple common alleles must influence disease susceptibility. As such, the total genetic risk due to common genetic variation must be spread across multiple genetic factors. These two points suggest that traditional family-based genetic studies are not likely to be successful for complex diseases, prompting a shift toward population-based studies.

The HapMap Project We need to KNOW where the SNPS occur with what density We also need to figure out which SNPS are related to racial phenotypes. Hence, the International Hap/Map project was launched to understand the SNPs related to race. Indentified 500,000 SNPs for people of European descent.

LD: Linkage Disequilibrium LD: property of one allele in an SNPs being correlated with an allele in another SNPs along a contiguous stretch of the genome. When all alleles are independent we have Linkage equilibrium, so when they are dependent – we call it LD. Common measures are Distance, or R-square defined for proportions. Idea is: causality is almost impossible to prove in these studies and so, because of the small effect sizes and indirect associations. Hence, large scale studies are required.

Genotyping Technology Two primary platforms have been used for most GWAS. These include products from Illumina (San Diego, CA) and Affymetrix (Santa Clara, CA). Affymetrix platform prints short DNA sequences as a spot on the chip that recognizes a specific SNP allele. Alleles (i.e. nucleotides) are detected by differential hybridization of the sample DNA. Illumina on the other hand uses a bead-based technology with slightly longer DNA sequences to detect alleles. The Illumina chips are more expensive to make but provide better specificity. A chip that has more SNPs with better overall genomic coverage for a study of Africans than Europeans. This is because African genomes have had more time to recombine and therefore have less LD between alleles at different SNPs. More SNPs are needed to capture the variation across the African genome. These next-generation sequencing methods will provide all the DNA sequence variation in the genome. It is time now to retool for this new onslaught of data.

Design Most common are: Case control (binary response) Quantitative (continuous response) Quantitative easier: uses ANOVA like methods for each SNP presence or absence (response like HDL, LDL anything that is measured) For yes/no phenotypes we can use 2 by 2 tables and chi-square or logistic regression. This study type asks if the allele of a genetic variant is found more often than expected in individuals with the phenotype of interest (e.g. with the disease being studied). Early calculations on statistical power indicated that this approach could be better than linkage studies at detecting weak genetic effects

Common Data The most common approach of GWA studies is the case-control setup, which compares two large groups of individuals, one healthy control group and one case group affected by a disease. For each of these SNPs it is then investigated if the allele frequency is significantly altered between the case and the control group. In such setups, the fundamental unit for reporting effect sizes is the odds ratio. If the allele frequency in the case group is much higher than in the control group, the odds ratio is higher than 1, and vice versa for lower allele frequency. Additionally, a P-value for the significance of the odds ratio is typically calculated using a simple chi-squared test. Finding odds ratios that are significantly different from 1 is the objective of the GWA study because this shows that a SNP is associated with disease.

Data The most common type of data appears to be in the form of 2 by 2 tables. Lets say we have two groups disease and not disease and we are focusing on the presence and absence of Essentially calculate the chi-square test for all the SNPs. Disease Not disease G 2000 8000 Not G

Other types of Data Instead of being Disease or Not Disease the phenotype could be a measure of a trait, like height, biomass etc. In that case we model the data as a linear model: Y = SNP effect + error And perform ANOVA type analysis However, there are other contributing factors to the model and Dr. Zhiwu Zhang talked to us about these kinds of models

Fixed and Random Effect Models GLM for GWAS Y = SNP + Q (or PCs) + e (fixed effect) MLM for GWAS Y = SNP + Q (or PCs) + Kinship + e Fixed effect Random effect

GLM to GLiM The Mixed model is obviously a better approach as we can model the systematic variations in the model batter. However, it has been looked at in depth only for continuous response and not so much for binary response or categorical response. Hence, the direction is going from General Linear Mixed models to Generalized Linear Mixed models, using logistic regression. P(Y=1| X’s) = SNP + Q + K (where we incorporate a fixed and a random effect in the model).