Disease Genomics. What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Linkage and Genetic Mapping
What is an association study? Define linkage disequilibrium
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Genetic Analysis in Human Disease
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
More Powerful Genome-wide Association Methods for Case-control Data Robert C. Elston, PhD Case Western Reserve University Cleveland Ohio.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
SNPs DNA differs between humans by 0.1%, (1 in 1300 bases) This means that you can map DNA variation to around 10,000,000 sites in the genome Almost all.
RFLP DNA molecular testing and DNA Typing
BioVision Alexandria 2010 Linking Genes to Disease: Leveraging the Human Genome Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy)
Genome Variations & GWAS
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Understanding Genetics of Schizophrenia
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
Rare and common variants: twenty arguments G.Gibson Homework 3 Mylène Champs Marine Flechet Mathieu Stifkens 1 Bioinformatics - GBIO K.Van Steen.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Non-Mendelian Genetics
CS177 Lecture 10 SNPs and Human Genetic Variation
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
Gene Hunting: Linkage and Association
A basic review of genetics Dr. Danny Chan Associate Professor Assistant Dean (Faculty of Medicine) Department of Biochemistry Department of Biochemistry.
Genome-Wide Association Study (GWAS)
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Genome wide association studies (A Brief Start)
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Single nucleotide polymorphisms and Large scale variation
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
GENETICS Dr. Samar Saleh Assiss. Lecturer Mosul Medical College Pathology3 rd year.
Genome-Wides Association Studies (GWAS) Veryan Codd.
1 Seminar 4: Applied Epidemiology Kaplan University School of Health Sciences.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Power Calculations for GWAS
Single Nucleotide Polymorphisms (SNPs
Genomic Analysis: GWAS
Common variation, GWAS & PLINK
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Genome-wide Associations
Association Mapping Lon Cardon
Chapter 7 Multifactorial Traits
Medical genomics BI420 Department of Biology, Boston College
Linkage Analysis Problems
Medical genomics BI420 Department of Biology, Boston College
Presentation transcript:

Disease Genomics

What is genomics? Looking at the properties of the genome as a whole – “seeing the wood for the trees”; identifying patterns by considering many data points at once. – Examining large-scale properties requires a model of what is expected just by chance, the null hypothesis.

What is disease genomics? OED: A condition of the body, or of some part or organ of the body, in which its functions are disturbed or deranged; So disease genomics is about taking a whole-genome view to genetic disorders so we can discover: – The identification of the underlying genetic determinants – insights into the pathoetiology of the disease – How to select the appropriate treatment – How to prevent disease

Preventive Medicine Empower people to make the appropriate life-style choices –23andMe, Coriell Study Treat the cause of the disease rather than the symptoms –E.g. peptic ulcers “All medicine may become pediatrics” Paul Wise, Professor of Pediatrics, Stanford Medical School, 2008 Effects of environment, accidents, aging, penetrance … –Somatic change, understanding how the genome changes over a life- time –cancer Health care costs can be greatly reduced if –Invest in preventive medicine –Target the cause of disease rather than symptoms

23andMe © 23andMe 2009

23andMe Spittoon

23andMe Research Reports

Human genetic variation Substitutions ACTGACTGACTGACTGACTG ACTGACTGGCTGACTGACTG – Single Nucleotide Polymorphisms (SNPs) Base pair substitutions found in >1% of the population Insertions/deletions (INDELS) ACTGACTGACTGACTGACTG ACTGACTGACTGACTGACTGACTG – Copy Number Variants (CNVs) Indels > 1Kb in size

Variation can have an effect on function – Non-synonymous substitutions can change the amino acid encoded by a codon or give rise to premature stop codons – Indels can cause frame-shifts – Mutations may affect splice sites or regulatory sequence outside of genes or within introns Human genetic variation

How much genetic variation does an individual possess? 1000 Genomes project: A map of human genome variation from population- scale sequencing, Nature 467:1061–1073 Compared to the Human genome reference sequence, which is itself constructed from 13 individuals

Penetrance of genetic variants Highly penetrant Mendelian single gene diseases –Huntington’s Disease caused by excess CAG repeats in huntingtin’s protein gene –Autosomal dominant, 100% penetrant, invariably lethal Reduced penetrance, some genes lead to a predisposition to a disease –BRCA1 & BRCA2 genes can lead to a familial breast or ovarian cancer –Disease alleles lead to 80% overall lifetime chance of a cancer, but 20% of patients with the rare defective genes show no cancers Complex diseases requiring alleles in multiple genes –Many cancers (solid tumors) require somatic mutations that induce cell proliferation, mutations that inhibit apoptosis, mutations that induce angiogenesis, and mutations that cause metastasis –Cancers are also influenced by environment (smoking, carcinogens, exposure to UV) –Atherosclerosis (obesity, genetic and nutritional cholesterol) Some complex diseases have multiple causes –Genetic vs. spontaneous vs. environment vs. behavior Some complex diseases can be caused by multiple pathways –Type 2 Diabetes can be caused by reduced beta-cells in pancreas, reduced production of insulin, reduced sensitivity to insulin (insulin resistance) as well as environmental conditions (obesity, sedentary lifestyle, smoking etc.).

Adapted from Nature 461, (2009) The search for disease-causing variants

Inheritance models

Healthy Disease

Identifying the genetic causes of highly penetrant disorders de novo mutations Mendelian disorders

de novo mutations Humans have an exceptionally high per- generation mutation rate of between 7.6 × 10 −9 and 2.2 × 10 −8 per bp per generation An average newborn is calculated to have acquired 50 to 100 new mutations in their genome – -> 0.86 novel non-synonymous mutations The high-frequency of de novo mutations may explain the high frequency of disorders that cause reduced fecundity.

Prevalence (%) Age onsetMortalityFertilityHeritability Paternal age effect Autism Anorexia nervosa — Schizophrenia Bipolar affective disorder Unipolar depression Anxiety disorders — Look at the epidemiology of the disease for clues The role of genetic variation in the causation of mental illness: an evolution-informed framework Uher, R. Molecular Psychiatry (2009) Dec;14(12): , “

How do we identify the de novo mutation responsible? 1000 Genomes project: A map of human genome variation from population- scale sequencing, Nature 467:1061–1073 Compared to the Human genome reference sequence, which is itself constructed from 13 individuals

Identifying a causative de novo mutation Patient with idiopathic disorder Veltman and colleagues - Nat Genet Dec;42(12): (1) Sequence genome (2) Select only coding mutations (3) Exclude known variants seen in healthy people (4) Sequence parents and exclude their private variants For 6/9 patients, they were able to identify a single likely-causative mutation (5) Look at affected gene function and mutational impact ~22,000 variants (exome re-sequencing) MSGTCASTTR MSGTNASTTR ~5,640 coding variants ~143 novel coding variants ~5 de novo novel coding variants

Mendelian disease Definition: Diseases in which the phenotypes are largely determined by the action, lack of action, of mutations at individual loci. Rare 1% of all live born individuals 4 types of inheritance : Autosomal dominant : Autosomal recessive : X linked dominant : X linked recessive

Mendelian disease

Definitions SNP: “Single Nucleotide Polymorphism” a mutation found in >1% of the population, that produces a single base pair change in the DNA sequence haplotypes genotypes alleles A A A CG C A A T T Genetic Association: Correlation between (alleles/genotype/haplotype) and a phenotype of interest. both alleles at a locus form a genotype Locus: Location on the genome alternate forms of a SNP A A A CG C A AT T A A A CG C A AT T the pattern of alleles on a chromosome

Single Nucleotide Polymorphisms (SNPs)

Recombination AX ax Gametophytes (gamete- producing cells) Gametes a X A x Recombination B B b b X/x: unobserved causative mutation A/a: distant marker B/b: linked marker

Linkage Disequilibrium & Allelic Association Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles. This is linkage disequilibrium It is important for allelic association because it means we don’t need to assess the exact aetiological variant, but we see trait-SNP association with a neighbouring variant Marker 123n LD D

SNPs can be used to track the segregation of regions of DNA ACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCG ACGTGCTCGATT GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG ACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCG ACGTGCTCGATT GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG ACGTGCTCGATTGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCG ACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG Time Individual 1 Individual 2 Individual 3 Individual 4 Individual 5 Individual 6 ACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCG Individual 7 ACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCG ACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG ACGTGCTCGATTGATCCGC TAACTCGAATCCTCAGGATCTAGCCATATCG ACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG Individual ACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCG Individual ACGTGCTCGATCGATCCGC TAACTCGAATCCTCAGAATCTAGCCATATCG ACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGGATCTAGCCATATCG ACGTGCTCGATTGATCCGC TAACTCGAATCCTCAGGATCTAGCCATATCG ACGTGCTCGATC GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCG Individual ACGTGCTAGATT GATCCGCTAACTCGAATCCTCAGAATCTAGCCATATCG Individual Locus 1 Locus 2 More time (+ recombination) + recombination

SNPs can be used to associate regions of DNA with a trait (disease) CaseControl C allele05 T allele32 Locus 1 CaseControl A allele23 G allele14 Locus 2

Genetic Case Control Study C/G T/G T/A C/A T/A Allele T is ‘associated’ with disease T/G C/A T/G C/G C/A ControlsCases

Measures of Association: The Odds Ratio Odds are related to probability: odds = p/(1-p) – If probability of horse winning race is 50%, odds are 1/1 – If probability of horse winning race is 25%, odds are 1/3 for win or 3 to 1 against win If probability of exposed person getting disease is 25%, odds = p/(1-p) = 25/75 = 1/3 We can calculate an odds ratio = cross-product ratio (“ad/bc”)

Odds ratio example: Association of a SNP with the occurrence of Myocardial Infarction Presence of Disease Variant AllelePresentAbsent Present 8133,061 Absent 7943,667 Total1,5076,728 OR = Odds in Exposed = 813 / 3,061 = 813 x 3,667 =1.23 Odds in Unexposed794 / 3, x 3,061

Family-based Linkage Analysis a/A A/A a/a Healthy Disease Where is ??? = non-viable so not observed

AaAA Related individuals are from the same family We assume we’re tracking the same causative mutation within the family Testing for Transmission Disequilibrium Family Based Tests of Association

Example

Log of the Odds (LOD) score used to define disease locus

Problems AaAA Difficult to gather large enough families to get power for testing Recombination events near disease locus may be rare Resolution often 1-10Mb Difficult to get parents for late onset / psychiatric conditions

Genome-wide Association Studies (GWAS) Looking for the segregation of disease (case/control) with particular genotypes across a whole population A lot of recombination within the population so you can very finely map loci Based on the common-disease, common-variant hypothesis – Only makes sense for moderate effect sizes (odds ratio < 1.5)

Technology makes it feasible -- Affymetrix: 500K; 1M chip arrived (Randomly distributed SNPs) -- Illumina: 550K chip costs (gene-based) GWAS  Good for moderate effect sizes ( odds ratio < 1.5).  Particularly useful in finding genetic variations that contribute to common, complex diseases.

Whole Genome Association *** ** Scan Entire Genome - 500,000s SNPs Identify local regions of interest, examine genes, SNP density regulatory regions, etc Replicate the finding

Common disease common variant (CDCV) hypothesis

QQ-plots Log QQ plot

Tests of association Treat genotype as factor with 3 levels, perform 2x3 goodness- of-fit test (Cochran-Armitage). Loses power if additive assumption not true. Count alleles rather than individuals, perform 2x2 goodness-of- fit test. Out of favour because sensitive to deviation from HWE risk estimates not interpretable Logistic regression Easily incorporates inheritance model (additive, dominant, etc) Can be used to model multiple loci Major allele homozygote (0) Heterozygote (1) Minor allele homozygote (2) Case Control

Genome-Wide Scan for Type 2 Diabetes in a Scandinavian Cohort

HapMap Rationale: there are ~10 million common SNPs in human genome – We can’t afford to genotype them all in each association study – But maybe we can genotype them once to catalogue the redundancies and use a smaller set of ‘tag’ SNPs in each association study Samples – Four populations, 270 indivs total Genotyping – 5 kb initial density across genome (600K SNPs) – Second phase to ~ 1 kb across genome (4 million) – All data in public domain

Haplotypes Nature Genetics 37, (2005)

Published Genome-Wide Associations through 12/2009, 658 published GWA at p<5x10 -8 NHGRI GWA Catalog

Imagine a sample of individuals drawn from a population consisting of two distinct subgroups which differ in allele frequency. If the prevalence of disease is greater in one sub-population, then this group will be over-represented amongst the cases. Any marker which is also of higher frequency in that subgroup will appear to be associated with the disease Population Stratification can be a problem

Traditional Issues Persist Allelic heterogeneity – When multiple disease variants exist at the same gene, a single marker may not capture them well enough. – Haplotype-based association analysis is good theoretically, but it hasn’t shown its advantage in practice. Locus heterogeneity – Multiple genes may influence the disease risk independently. As a result, for any single gene, a fraction of the cases may be no different from the controls. Effect modification (a.k.a. interaction) between two genes may exist with weak/no marginal effects. – It is unknown how often this happens in reality. But when this happens, analyses that only look at marginal effects won’t be useful. – It often requires larger sample size to have reasonable power to detect interaction effects than the sample size needed to detect marginal effects.

Localization Linkage analysis yields broad chromosome regions harbouring many genes – Resolution comes from recombination events (meioses) in families assessed – ‘Good’ in terms of needing few markers, ‘poor’ in terms of finding specific variants involved Association analysis yields fine-scale resolution of genetic variants – Resolution comes from ancestral recombination events – ‘Good’ in terms of finding specific variants, ‘poor’ in terms of needing many markers

Linkage vs Association Linkage 1.Family-based 2.Matching/ethnicity generally unimportant 3.Few markers for genome coverage ( microsatellites) 4.Can be weak design 5.Good for initial detection; poor for fine-mapping 6.Powerful for rare variants Association 1.Families or unrelateds 2.Matching/ethnicity crucial 3.Many markers req for genome coverage (10 5 – 10 6 SNPs) 4.Powerful design 5.Ok for initial detection; good for fine-mapping 6.Powerful for common variants; rare variants generally impossible