Download presentation
Presentation is loading. Please wait.
1
Genomic Analysis: GWAS
2
Genetic Markers Genetic marker – a locus used to identify a chromosome or locate other genes on a genetic map Many different types including SNPs, VNTRs (microsatellites), RFLPs, etc.
3
Genetic Markers: SNPs Single Nucleotide Polymorphisms
Polymorphic single bases Four possible states at any single base in a genome Usually only two are observed, ancestral and variant Advantages – Low mutation rate (stable) High abundance (every bases) Easy to type Disadvantages – Rate heterogeneity Ascertainment bias Low information content
4
Genetic Markers: VNTRs
Short tandem repeats, microsatellites, STRs Mononucleotide, dinucleotide, trinucleotide, etc. Allele lengths are variable ((TA)3, (TTAA)12, (AGT)33, etc.) Many possible variants in a population Most often occur in non-coding regions Advantages – Low ascertainment bias Easy to identify Highly informative Disadvantages – High mutation frequency Complex mutation behavior Difficult to automate genotyping
5
SNPs: Single nucleotide polymorphisms
Responsible for 90% of all human genetic variation ~12,000,000 documented SNPs in the NCBI database Categorized as coding (in an exon) or noncoding (the majority) Coding SNPs can be synonymous or nonsynonymous Most SNPs are completely neutral Often used as markers for pinpointing disease causing polymorphisms
6
Finding ‘phenotypic’ SNPs
Many genes ~25,000 genes, many can be candidates, many may contribute to particular phenotypes Many SNPs ~12,000,000 SNPs, ability to predict functional SNPs is limited Methods to select candidate SNPs (narrow broad): Only functional SNPs in a candidate gene Systematic screen of SNPs in a candidate gene Systematic screen of SNPs in an entire metabolic pathway Systematic screen for all coding changes (exome screening)
7
Genomic Medicine Exome sequencing
Exome – the coding sequences of all annotated protein coding genes; ~1% of the genome Accomplished via target-capture methods What’s the major potential drawback?
8
Genomic Medicine First application of exome sequencing to syndrome with unknown cause Miller syndrome – thought to be recessive Suggests that effected individuals require two variants (one on each chromosome) Exomes of four individuals sequenced including a pair of siblings Narrowed to a single gene, DHODH, dihydroorotate dehydrogenase, biosynthesis of pyrimidines All individuals harbored compound heterozygous mutations for missense mutations All parents were carriers Ng et al. 2010, Nature Genetics 42, 30-35
9
Finding ‘responsible’ SNPs in an ocean of variation
Many genes ~25,000 genes, many can be candidates, many may contribute to particular phenotypes Many SNPs ~12,000,000 SNPs, ability to predict functional SNPs is limited Methods to select candidate SNPs (narrow broad): Only functional SNPs in a candidate gene Systematic screen of SNPs in a candidate gene Systematic screen of SNPs in an entire pathway Systematic screen for all coding changes Genome-wide screen (GWAS)
10
Introduction to genomic analysis
A genome-wide association study (GWAS) is an approach that involves rapidly scanning markers across the complete sets of DNA, or genomes, of many people to find genetic variations associated with a particular phenotype. Once associations are identified, develop better strategies to detect, treat and prevent the disease. find genetic variations that contribute to common, complex diseases, such as asthma, cancer, diabetes, heart disease and mental illnesses.
11
Potential of GWAS Whole genome information, when combined with epidemiological, clinical and other phenotype data, offers the potential for increased understanding of basic biological processes affecting human health, improvement in the prediction of disease and patient care, the promise of personalized medicine.
12
Potential of GWAS
13
How to do GWAS What do you need?
The human genome reference A map of human genetic variation A set of technologies that can quickly and accurately analyze whole or partial (exome) samples for genetic variants This is typically accomplished using low coverage genome (or exome) sequencing (4-20X) A typically GWAS is based on a case-control design in which SNPs are genotyped across a population….
14
How to do GWAS A typically GWAS is based on a case-control design in which SNPs are genotyped across a population…. And the strength of association between each SNP and the disease in question is calculated
15
How to do GWAS A typically GWAS is based on a case-control design in which SNPs are genotyped across a population…. And the strength of association between each SNP and the disease in question is calculated Usually visualized via a Manhattan plot in which SNPS from each chromosome are plotted along with their association value
16
The basic idea The A allele is associated (4/14, 29%)
with individuals exhibiting the disease phenotype The basic idea G G G A A G A A A A G G A A A G G A G G G G G G G G
17
Age-related macular degeneration
Study cohort – 2172 unrelated individuals of European descent, at least 60 years old 1238 with AMD, 934 controls Each individual harbors two alleles 2476 AMD alleles 1868 non-AMD alleles Null hypothesis – Alleles will be randomly distributed in the population, i.e. no association of any alleles with AMD Alternative hypothesis – Some allele will be positively associated with AMD
18
Age-related macular degeneration
Single SNP identified by GWAS, rs 4344 alleles recovered, two variants C/T X2 test suggests association, p=1.2 x 10-62 Allele Cases with AMD Controls Total Alleles C 1522 670 2192 T 954 1198 2152 Total alleles 2476 1868 4344
19
Age-related macular degeneration
bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState= &virtModeType=default&virtMode=0&nonVirtPosition=&position=chr1%3A &hgsid= _LVSGMKr7pmucs4DrZ7CkYbaFVbni ;v=rs ;vdb=variation;vf=762175
20
Complex traits vs. Mendelian traits
Traits for which a molecular cause is known (2002) Complex trait – any phenotype that does not exhibit classic Mendelian inheritance attributable to a single locus Although these traits may exhibit familial tendencies Why would a trait be non-Mendelian? Codominance, incomplete dominance Multiple alleles Polygenic characteristics Environmental effects
21
GWAS in practice (2007)
22
GWAS in practice
23
GWAS in practice
25
GWAS in practice
26
GWAS in practice
27
GWAS in practice (2014)
28
Post-GWAS: Finding the causal locus
GWAS is really just a starting point – it typically narrows it the causal region down to a few million or a few hundred thousand bp SNPs occur every bp If the locus is narrowed down to 500,000 bp, that’s ~2500 SNPs One way to proceed – identify genes in the region and determine plausibility Location of SNPs and function of the locus Relatively easy in some cases Not so much in others (regulatory function?) Functional annotation databases exist to classify genes according to roles in the cell
29
Association signals in the IL23R gene region on chromosome 1p31
Association signals in the IL23R gene region on chromosome 1p31. (A) Genomic locations of genes on chromosome 1p31 between 67,260,000 and 67,580,000 base pairs (Build 35). (B) The negative log10 association P-values (Cochran-Mantel-Haenszel chi-square test) from the combined Jewish and non-Jewish case-control cohorts are plotted for genotyped markers in the region.
30
GWAS is promising Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome Over 12 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal variants The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect
31
GWAS is challenging Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants They confer a stronger risk when they interact True associated SNPs are not necessary highly significant Too many SNPs are evaluated False positives due to multiple tests Single studies tend to be underpowered False negatives Considerable heterogeneity among studies Phenotypic and genetic heterogeneity False positives due to population stratification Xu, 2007
33
Components of a GWAS (simple)
34
Components of a GWAS (not so simple)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.