Molecular & Genetic Epi 217 Association Studies

Slides:



Advertisements
Similar presentations
Association Tests for Rare Variants Using Sequence Data
Advertisements

Julia Krushkal 4/11/2017 The International HapMap Project: A Rich Resource of Genetic Information Julia Krushkal Lecture in Bioinformatics 04/15/2010.
Are you ready for the genomic age? An introduction to human genomics Jacques Fellay EPFL School of Life Sciences Swiss Institute of Bioinformatics Lausanne,
Genetic Epidemiology Michèle Sale, Ph.D. Center for Public Health Genomics Tel:
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
Linkage Disequilibrium
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Single Nucleotide Polymorphism And Association Studies
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Bernard Keavney Institute of Human Genetics University of Newcastle, UK. Recent developments in genetic epidemiology relevant to PURE.
HapMap: application in the design and interpretation of association studies Mark J. Daly, PhD on behalf of The International HapMap Consortium.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Medical variations Gabor T. Marth Boston College Biology Department BI543 Fall 2013 February 5, 2013.
Population Genetics: Chapter 3 Epidemiology 217 January 16, 2011.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Genome-Wide Association Study (GWAS)
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
Population Pathway ? Genes SNPs Analysis Phenotypes Haplotypes/coding SNPs SNP discovery Sequencing/genotyping technology Polymorphism function Replication.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Introduction: Human Population Genomics ACGTTTGACTGAGGAGTTTACGGGAGCAAAGCGGCGTCATTGCTATTCGTATCTGTTTAG.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
Association mapping: finding genetic variants for common traits & diseases Manuel Ferreira Queensland Institute of Medical Research Brisbane Genetic Epidemiology.
Molecular & Genetic Epi 217 Association Studies: Indirect John Witte.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte, Xin Liu & Mark Pletcher.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Statistical Issues in Genetic Association Studies
Clustering and optimization in genetic data: the problem of Tag-SNPs selection Paola Bertolazzi, Serena D‘ Aguanno, Giovanni Felici *, Paola Festa** *
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The HapMap Project and Haploview
FaceBase Kick-Off Meeting Nov 15-16, 2009 Bethesda Oral Clefts: Moving from Genome Wide Studies Toward Functional Genomics TH Beaty for Alan F Scott, Ingo.
The International Consortium. The International HapMap Project.
Motivations to study human genetic variation
Copyright OpenHelix. No use or reproduction without express written consent1.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Population stratification
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
Population genetics Dr Gavin Band
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C
A Common 16p11.2 Inversion Underlies the Joint Susceptibility to Asthma and Obesity  Juan R. González, Alejandro Cáceres, Tonu Esko, Ivon Cuscó, Marta.
Catarina D. Campbell, Nick Sampas, Anya Tsalenko, Peter H
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium  Christopher S. Carlson,
Volume 152, Issue 8, Pages (June 2017)
KDM4A SNP-A482 (rs586339) correlates with worse outcome in patients with NSCLC. A, schematic of the human KDM4A protein is shown with both the protein.
Presentation transcript:

Molecular & Genetic Epi 217 Association Studies John Witte

Association Studies

Association Studies Use of association studies is rapidly expanding, reflecting a number of laudable properties, including their: Ease, since one need not collect large pedigrees; and Potential for being more powerful than conventional linkage-based approaches.

Linkage vs. Association Risch & Merikangas, Science 1996

Association Study Approaches Direct vs Indirect Candidate genes: Functional All common variants All common variants in genome (GWAS) All variants in genome (sequencing) Expensive Rare variants

Genomics Revolution Human Genome Project: 13 years, $3B for 1 sequence Now: 1 week, $10K > 500 times faster < 1/100,000th the cost! Soon: 1 hour, $1K (#1 Innovation, 2010) Improving our ability to study genomics of health and disease Computing has, famously, increased in potency according to Moore’s law. This says that computers double in power roughly every two years—an increase of more than 30 times over the course of a decade, with concomitant reductions in cost. calculates that the cost of DNA sequencing at the institute has fallen to a hundred-thousandth of what it was a decade ago (see chart 1). The genome sequenced by the International Human Genome Sequencing Consortium (actually a composite from several individuals) took 13 years and cost $3 billion. Now, using the latest sequencers from Illumina, of San Diego, California, a human genome can be read in eight days at a cost of about $10,000. Nor is that the end of the story. Another Californian firm, Pacific Biosciences, of Menlo Park, has a technology that can read genomes from single DNA molecules. It thinks that in three years’ time this will be able to map a human genome in 15 minutes for less than $1,000. The Economist, 2010

Control Selection A critical aspect of association studies is that controls should be selected from the cases’ source population. That is, controls should be those individuals who, if they were diseased, would become cases.

Population Stratification Confounding bias that may occur if one’s sample is comprised of sub-populations with different: allele frequencies (); and disease rates (RpR) Cases are more likely than controls to arise from the sub-population with the higher baseline disease rate. Cases and controls will have different allele frequencies regardless of whether the locus is causal. Gene Sub-population Disease  RpR

Example of Population Stratification Higher levels of Native American heritage defined sub-populations with higher risks of NIDDM, and lower frequencies of immunoglobulin haplotype Gm3;5,13,14 Gm3;5,13,14 NIDDM Pima Indians 0.01 0.39 Mixed 0.47 0.27 Caucasians 0.67 0.13 Cases more likely to have higher Native American heritage, and less likely to carry the haplotype. Ignoring stratification gave a false inverse association: OR = 0.3. Adjusting for heritage gave OR = 0.8 (95% CI = 0.6-1.2). (Knowler et al. Am J Hum Genet 1988) Cardon & Palmer, 2003

Family-Based Association Studies Siblings Parents G G G G G G Cousins G G

Continuum of Assoc Study Designs Population-based “Ethnicity” Matched Structured Assoc Family-based Population Stratification Overmatching (Bias…………………versus………………...efficiency) Gene Subpopulation Disease  Sharing of genes & envt. Efficiency Also, recruitment issues

Association Analysis Genotype Cases Controls OR GG A D 1 GT B E BD/AE TT C F CD/AF Simple chi-square test comparing genotype frequencies (2 d.f.) Called a co-dominant analysis

Genetic Model ORs depend on genetic model R = r = 1 not risk allele R > r = 1 recessive R = r > 1 dominant R = r2 > 1 log additive (Assuming positive association) Genotype OR GG 1 GT r TT R

Tests of association If genetic model known: Collapse genotypes into 2x2 table, 1 d.f. test Trend test for log additive Use logistic regression: coding; covariates Rarely know genetic model Use all three models (dom, rec, log additive) Compare fit with the co-dominant (2d.f.) model (LR test) Cannot use LR test to compare models with each other as not nested Model with best fit and smallest P is best? Use permutation test here (MAX test)

Candidate Gene Studies Selection of candidates Linkage regions? Biological support? “I am interested in a candidate gene and have samples ready to study. What SNPs do I genotype?”

Candidate Gene: Where do I Start? Location: What chromosome? What position on the chr? Exons/UTR: How many exons? UTR regions? Size: How large is the gene? Use UCSC genome browser.

SNP Picking: Things to Consider Validation: What is the quality of the SNPs? Informativity: Are these SNPs informative in my population? How common are they? Location? Potentially Functional: Do these SNPs have a potential biological impact? Missense variants? Previously Associated: Have previous studies found SNPs in the candidate gene associated with the outcome?

SNP Picking: Validation

SNP Picking: Validation

SNP Picking: Validation

SNP Picking: Informative

SNP Picking: Potentially Functional

SNP Picking: Previously Associated

MTHFR Summary Chromosome 1: 11,780,053-11,800,381 Size: 20,329 bp Exons: 12 Potentially Functional: 5 missense of which 3 MAF >5% Previously Associated: 3 (C677T, A1298C, A2756G)

MTHFR SNPs 102 SNPs across MTHFR Too Many SNPs to Genotype! http://genome.ucsc.edu/cgi-bin/hgGateway 102 SNPs across MTHFR Too Many SNPs to Genotype!

Too many MTHFR SNPs Solution: Tag SNP Selection SNPs are correlated (aka Linkage Disequilibrium) A T G C A/T 1 G/A 2 G/C 3 T/C 4 5 A/C 6 Pairwise Tagging: SNP 1 SNP 3 SNP 6 3 tags in total Test for association: high r2 high r2 high r2 Carlson et al. (2004) AJHG 74:106

Coverage: Measurement Error in TagSNPs Complete set Subset of these make up the genotyping set For a given SNP Get r2 between that SNP and SNPs in the genotyping set Take the highest r2 value Called maximum r2

Common Measures of Coverage Threshold Measures e.g., 73% of SNPs in the complete set are in LD with at least one SNP in the genotyping set at r2 > 0.8 Average Measures e.g., Average maximum r2 = 0.84

Coverage and Sample Size Sample size required for Direct Association, n Sample size for Indirect Association n* = n/ r2 For r2 = 0.8, increase is 25% For r2 = 0.5, increase is 100%

Tag SNPs Database Resources http://www.hapmap.org http://gvs.gs.washington.edu/GVS/index.jsp

HapMap Re-sequencing to discover millions of additional SNPs; deposited to dbSNP. SNPs from dbSNP were genotyped Looked for 1 SNP every 5kb SNP Validation Polymorphic Frequency Haplotype and Linkage Disequilibrium Estimation LD tagging SNPs

HapMap Phase III Populations ASW African ancestry in Southwest USA CEU Utah residents with Northern and Western European ancestry from the CEPH collection CHB Han Chinese in Beijing, China CHD Chinese in Metropolitan Denver, Colorado GIH Gujarati Indians in Houston, Texas JPT Japanese in Tokyo, Japan LWK Luhya in Webuye, Kenya MEX Mexican ancestry in Los Angeles, California MKK Maasai in Kinyawa, Kenya TSI Toscani in Italia YRI Yoruba in Ibadan, Nigeria

Tag SNPs: HapMap

Tag SNPs: HapMap

Tag SNPs: HapMap & Haploview http://www.broad.mit.edu/mpg/haploview/

Tag SNPs: HapMap & Haploview

Tag SNPs: HapMap & Haploview

Tag SNPs: HapMap & Haploview

Tag SNPs: HapMap & Haploview

Tag SNPs: HapMap Summary Identified 33 common MTHR SNPs (MAF > 5%) among Caucasians Forced in 3 potentially functional/previously associated SNPs Identified tag based on pairwise tagging 15 tags SNPs could capture all 33 MTHR SNPs (mean r2 = 97%) Note: number of SNPs required varies from gene to gene and from population to population

1K Genomes Project

Taster Project: 3 SNPs in the TAS2R38 Gene P A V P A I P V V P V I A A V Haplotype definition Each individual has two haplotypesdiplotype Haploytpeallele diplotypegenotype A A I A V V A V I

TASR: 3 SNPs form Haplotypes Taster P A V Non-taster A V I 3rd haplotype is the result of recombination. A of non-taster AV of taster Allows us to compare the effect of the 1st SNP vs. the 2nd and 3rd. Rare-not in all combinations

TAS2R38 Haplotype Function