Rare Mendelian diseases versus common multi-factorial diseases e.g., cystic fibrosis is one of the most common life-shortening childhood-onset inherited.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Lecture 2 Strachan and Read Chapter 13
Positional cloning of human disease genes: a reversal of scientific priorities D Botstein, et al Construction of a genetic linkage map in man using.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic Analysis in Human Disease
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
POSITIONAL CLONING TWO EXAMPLES. Inheritance pattern - dominant autosomal Entirely penetrant and fatal Frequency - about 1/10,000 live births Late onset.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
MALD Mapping by Admixture Linkage Disequilibrium.
Signatures of Selection
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Human Genetics Chapter 14. DNA fingerprinting Every cell that has a nucleus contains the DNA fingerprint for that individual. Only two to four percent.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.
A Genome-wide association study of Copy number variation in schizophrenia Andrés Ingason CNS Division, deCODE Genetics. Research Institute of Biological.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Molecular medicine - 2.
Genome-Wide Association Study (GWAS)
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
An quick overview of human genetic linkage analysis
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
© 2007 McGraw-Hill Higher Education. All rights reserved. Chapter 2 Genetics: You and Your Family Health History.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Date of download: 7/2/2016 Copyright © 2016 American Medical Association. All rights reserved. From: How to Interpret a Genome-wide Association Study JAMA.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
Single Nucleotide Polymorphisms (SNPs
Quantitative traits Lecture 13 By Ms. Shumaila Azam
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Patterns of Linkage Disequilibrium in the Human Genome
Genome-wide Associations
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
Presentation transcript:

rare Mendelian diseases versus common multi-factorial diseases e.g., cystic fibrosis is one of the most common life-shortening childhood-onset inherited diseases in the United States, affecting 1 in 3900 births; one of every 31 individuals is a carrier of the recessive disease allele e.g., in the United States, the lifetime risk for developing cancer is slightly less than 1 in 2 for men and slightly more than 1 in 3 for women; although there are subclasses of cancer like early onset breast cancer that obey Mendelian rules, they make up a negligible fraction of the overall disease

1980: DNA markers are the key to identifying Mendelian disease genes Botstein D, White RL, Skolnick M, Davis RW Am J Hum Genet 32: the fact we cannot experiment on people does not preclude us from doing genetics; all we need are more DNA markers to differentiate individuals; the markers themselves need not cause the disease; they need only be sufficiently close to the gene that does; markers can take many forms including RFLPs, microsatellites, SNPs, etc. (a) disease cosegregates with marker A; (b) disease cosegregates with marker B prior to a recombination event, and marker C after power to localize genes is reliant on having sufficient number of recombinations (a)(b)

1989: successful cloning of CFTR gene responsible for cystic fibrosis Rommens JM, …, Tsui LC, Collins FS Science 245: the  F508 mutation is one of over 1500 eventually found in this gene but it is still the most important; this was the first disease gene identified by its chromosomal position instead of by its hypothesized function; Huntington’s was mapped before CF but the unexpected nature of that mutation (triplet repeat) took longer to solve

Online Mendelian Inheritance in Man currently lists 2284 phenotypes whose molecular basis is known the success of what came to be known as positional cloning was a tribute to an admission of ignorance; we did not know enough human biology to guess the likely gene for a disease so we focused instead on determining where the gene was on the chromosome; for the overwhelming majority of cases, the answer turned out to be a completely unknown gene that no scientist had hypothesized

1990-6: birth and death of sib pair analysis for linkage based studies of common multi-factorial diseases Risch N, Merikangas K The future of genetic studies of complex human diseases. Science 273: linkage analysis had been successfully used to find genes for Mendelian diseases; in 1990, Risch popularized a method (sib pairs) to find genes for complex multi-factorial diseases; that method failed and they wanted to propose a different method that would be more powerful association studies were to be performed on functional polymorphisms for as many candidate genes as technically feasible, the entire genome if need be, regardless of how impractical that was; at least the number of patients would no longer be a limiting factor

past, present, and (near) future genetic studies of human diseases the biggest change from before is we are now doing these studies on the general population instead of rare families we must study common diseases in the general population because rare mutations that cause Mendelian subcategories of disease are not typically responsible for those diseases in the general population the problem however is that without families we lose statistical power and must compensate by gathering more data

population bottleneck, subsequent recombination, linkage disequilibrium although the parameters and details of the human population bottleneck are still not settled, the order of magnitude estimates are that our species collapsed to 15,000 individuals 70,000 years ago; assuming few new mutations the only thing that would have happened since that time is recombination, and we can model any particular individual’s genome as a mosaic of segments from these 15,000 ancestral genomes mutation is the red ; chromosomal stretches derived from common ancestor are yellow; the new stretches due to recombination are blue

we need not test all the functional polymorphisms, just enough markers to be within linkage disequilibrium linkage disequilibrium (LD) is the non-random association of two alleles on adjacent loci; there are many reasons why this might happen, but for the HapMap, the assumption is that human variation is intrinsically limited because of the recent population bottleneck

common-disease-common-variant versus common-disease-rare-variant CDCV hypothesis: a few common allelic variants account for most of the genetic variance in disease susceptibility Reich DE, Lander ES On the allelic spectrum of human disease. Trends Genet 17: CDRV hypothesis: a large number of rare allelic variants account for the genetic variance in disease susceptibility Terwilliger JD, Weiss KM Linkage disequilibrium mapping of complex disease: fantasy or reality? Curr Opin Biotechnol 9: for complex reasons having to do with human population history, linkage disequilibrium would only work in diseases where the CDCV hypothesis is valid; the best justification for the HapMap was that one common variant has more public health impact than many rare variants, so it makes sense to find these first

multiple rare alleles contribute to low plasma HDL cholesterol levels for 128 individuals with low plasma HDL-C, 21 (16%) had variants not present in the high HDL-C group; conversely, only 3 (2%) of individuals with high plasma HDL-C had variants not present in the low HDL-C group (P < ); Cohen JC, …, Hobbs HH Science 305: non-synonymous variants in 3 out of 128 outlier samples 15 non-synonymous variants in 21 out of 128 outlier samples HDL-C levels candidate genes ABCA1, APOA1, LCAT

International HapMap Consortium International HapMap Consortium Nature 437: phase I genotyped common SNPs of frequency greater than 0.05 in every 5-kb interval for 269 individuals from 3 populations in Africa (YRI), Europe (CEU), and Asia (CHB+JPT) solid line represents ENCODE region data, dashed line represents neutral model with constant population size and random mating and no ascertainment biases

7 tag SNPs capture all the common variation in a locus on chromosome 2 left plot shows the 7 haplotypes and their respective counts, with colored circles indicating SNP positions where a haplotype has the less common allele; groups of SNPs captured by a single tag SNP (r2  0.8) using a pairwise tagging algorithm have the same color; right plot shows the SNPs mapped to a genealogical tree relating the seven haplotypes for this region

Wellcome Trust Case Control genome wide association studies Wellcome Trust Case Control Consortium Nature 447: genotype 500,000 SNPs from HapMap in a British population of 2,000 affected individuals for each of 7 major diseases, with another 3,000 shared individuals for control of the 14 variants for which there was a strong prior evidence of association to the studied diseases all but two (APOE and INS) were reproduced by this analysis

genome wide scan in seven diseases; y-axis represents statistical significance using -log 10 of a p-value the chromosomes are shown in alternating colors; significant SNPs with p-value <1  are in green

a doubling in relative risk for a disease is not as bad as it sounds Couzin J, Kaiser J Science 316: with the notable exception of macular degeneration there is only a doubling in relative risk; the 120% increase in relative risk for inflammatory bowel disease only bumps the absolute risk from 0.5% to 1.1%

but gene therapy remains elusive 19 years after the cystic fibrosis gene Jesse Gelsinger (June 18, 1981 to September 17, 1999) was the first person identified as having died in a clinical trial for gene therapy. He was only 18 years old. Gelsinger suffered from ornithine transcarbamylase OTC deficiency, a disease of the liver whose victims are unable to metabolize ammonia, a byproduct of protein breakdown. Gelsinger was injected with adenoviruses containing the corrected gene in the hope that it would manufacture the much needed enzyme. He died four days later, having suffered a massive immune response, triggered by the viral vector used to transport the gene into his cells. This led to multiple organ failure and brain death. Food and Drug Administration investigators concluded that scientists involved in the trial, including lead researcher Dr. James M. Wilson (University of Pennsylvania), broke several rules of conduct: (a) Inclusion of Gelsinger as a substitute for another volunteer who had dropped out, despite his having high ammonia levels that should have led to his exclusion from the trial, (2) Failure by the university to report that 2 other patients had experienced serious side effects from the therapy, (3) Failure to mention the deaths of monkeys given a similar treatment, as should be been done for the informed consent. The university paid the parents an undisclosed amount.

Alleles with small effect sizes: To separate true signals from noise, researchers have to set an exceptionally high threshold that a marker needs to exceed before it is acceptable as a likely disease-causing candidate. By increasing the numbers of samples in their disease and control groups, researchers will steadily dial down the statistical noise until even disease genes with small effects stand out above the crowd. However, the logistical challenge of collecting a large number of carefully-ascertained patients will always be a serious obstacle. Rare variants: Our catalogue of human genetic variation, i.e. the HapMap, is largely restricted to common variants, since rare variants are much harder to identify. The instrumentation has restrictions on how many different SNPs one can analyze with a single chip. Everyone agrees that some non-trivial fraction of the genetic risk of common diseases will be the result of rare variants, especially as the latest results in a variety of diseases failed to provide unambiguous support for the CDCV hypothesis. The problem is not so much the costs of sequencing itself, as that is plummeting due to massive investment in rapid sequencing technologies, but rather the interpretation of the resultant data. Population differences: Markers that are associated with disease in one population can never be assumed to show the same associations in other human groups. This will be especially true for rare variants. The more difficult challenge will be in collecting large numbers of ancestry- homogeneous samples of validated disease patients and healthy controls.

Epistatic interactions: Most genetic approaches assume that genetic risk is additive, in other words, that the presence of two risk factors in an individual will increase risk by the sum of the two factors. There is however no reason to expect that this will always be the case. Epistatic interactions, in which combined risk is greater (or less) than the sum of the risk from individual genes, are difficult to identify by genome scans and even harder to untangle. Copy number variation: CNVs are now known to account for a substantial fraction of human genetic variation, and have been shown to play an important role in gene expression variation and in human evolution. It seems highly likely that CNVs will be responsible for a non-trivial proportion of disease risk, but only time will tell. Epigenetic inheritance: Although epigenetic inheritance does occur, the degree to which it is influencing human physical variation and disease risk is essentially unknown. It needs to be established that epigenetically inherited variations actually contribute to a non-trivial fraction of human disease risk before we can include them in our systematic scans. Disease heterogeneity: Lumping patients with fundamentally different conditions into a single patient cohort is a recipe for failure, even if there are strong genetic risk factors for each of the separate conditions, since each will be drowned out by noise from the other. The geneticists cannot fix this problem. It will take a combined effort of clinicians and biomedical researchers to stratify the disease into useful subcategories.