MALD Mapping by Admixture Linkage Disequilibrium.

Slides:



Advertisements
Similar presentations
What is an association study? Define linkage disequilibrium
Advertisements

Planning breeding programs for impact
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Basics of Linkage Analysis
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Office hours Wednesday 3-4pm 304A Stanley Hall. Fig Association mapping (qualitative)
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Quantitative Genetics
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Modes of selection on quantitative traits. Directional selection The population responds to selection when the mean value changes in one direction Here,
Hidenki Innan and Yuseob Kim Pattern of Polymorphism After Strong Artificial Selection in a Domestication Event Hidenki Innan and Yuseob Kim A Summary.
Multifactorial Traits
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
Non-Mendelian Genetics
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.
Gene Hunting: Linkage and Association
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
INTRODUCTION TO ASSOCIATION MAPPING
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Admixture Mapping Controlled Crosses Are Often Used to Determine the Genetic Basis of Differences Between Populations. When controlled crosses are not.
Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
SNPs and complex traits: where is the hidden heritability?
MULTIPLE GENES AND QUANTITATIVE TRAITS
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
upstream vs. ORF binding and gene expression?
Recombination (Crossing Over)
Patterns of Linkage Disequilibrium in the Human Genome
Power to detect QTL Association
MULTIPLE GENES AND QUANTITATIVE TRAITS
The ‘V’ in the Tajima D equation is:
Basic concepts on population genetics
Genome-wide Association Studies
Genetic Linkage.
Presentation transcript:

MALD Mapping by Admixture Linkage Disequilibrium

Introduction Admixture – a genetic mix of two or more different populations. (such as African-Americans) Linkage Disequilibrium – an event where two alleles at different loci are genetically linked, and show non-random association.

Why do we need MALD? Problem We want to find the cause of a complex genetic disorder or disease. The disease is a result of several different genes, each one having a small effect (or none), but all together cause the disease phenotype.

Why do we need MALD? Solution 1 Failed Using linkage mapping methods inorder to find the cause. Problem Linkage mapping will only find the allele with the biggest influence on the disease, but not necessarily it’s cause. Individuals with the allele maybe healthy and individuals without maybe sick. Failed

Why do we need MALD? Solution 2 Not Feasible Using genome-wide association studies to find the causative alleles in the affected individuals. Problem genome-wide association studies require haplotypes of the entire genome for each of the thousands of individuals in the study, costs would be several millions of dollars for a single study!! Not Feasible

??? We need a model that enables us to infer information about the entire genome of an individual without having to genotype the entire genome and without having to use thousands of people…

MALD – some basic concepts Linkage Disequilibrium linkage disequilibrium refers to an event where 2 alleles in different locations on the genome are linked (non-random). Linkage is not due to recombination but rather other effects such as epistasis.

Linkage Disequilibrium Linkage disequilibrium is usually measured by the covariance of the allele frequency: Here p1,p2 denote the marginal allele frequencies at the two loci and h12 denotes the haplotype frequency in the joint distribution of both alleles.

Linkage Disequilibrium – cont. Normal frequencies

Linkage Disequilibrium – cont. Linkage equilibrium

Linkage Disequilibrium – cont.

Linkage Disequilibrium – cont. Using the following notation:

Linkage Disequilibrium – cont. DAB is hard to interpret: Sing is arbitrary. Range depends on allele frequencies

Linkage Disequilibrium – cont. D’AB – a scaled version of DAB Better estimates exists which will not be discussed here…

Why does linkage equilibrium hold for most loci? Generation t, initial configuration:

Why does linkage equilibrium hold for most loci? Generation t+1, without recombination:

Why does linkage equilibrium hold for most loci? Generation t+1, with recombination:

Why does linkage equilibrium hold for most loci? Generation t+1, Overall:

Why does linkage equilibrium hold for most loci? r = the probability of recombination

Admixture Linkage Disequilibrium In an admixed group the genotype of the people in the population is a mix of both parent population, alleles in the genetic mix can be linked back to their original parental populations. These segments of LD in an admixed individual are said to be ALD.

MALD – basic concepts cont. Relies on the differences in allele frequency between the parent populations. Using these differences allows us to focus on changes in regions in the genome rather than specific genes.

Admixture Admixture is a result of a mixture of 2 or more populations: African-Americans: 80% African 20% European Latinos: 50% Native American 50% European Caribbean: 50% European 30% Native American 20% Western African

MALD MALD uses the allele frequencies of areas in the genome that exhibit LD to the parental population. When an individual from an admixed population is affected by a genetic disease, with higher frequency in either parental population. The variation in allele frequency can be detected, and the disease locus found.

MALD MALD can be separated to 5 steps: Choosing a cohort of people affected by the disease from an admixed population.

Choosing the group The group needs to be at least 2nd generation admixed in order to rule out data resulting from recombination* A set of markers that identifies the origin of the alleles must be available. The individuals need to be at least 10% admixed.

Conservation of ALD

Which disease is suitable for MALD The disease must be one that has a large difference in frequency between the parental populations (~60%). Disease should be complex (otherwise we can use linkage mapping).

MALD The group of people selected are genotyped with a set of polymorphic markers.

Markers used for MALD Markers must be evenly spaced and sufficiently dense (at most 1.5cM apart) Markers must be able to differentiate between alleles from parental populations. Markers should not show LD within the parental populations. Markers must have high Shannon Information Content (SIC).

Markers used for MALD The amount and spacing of the markers required depends on the amount of admixture. More admixture (such as Latinos) means more fragmented LD segments. Which means more markers are needed in order to find the origin of each loci.

Shannon Information Content A measurement used in Information Theory. SIC of a marker is the amount of information that I gain from using this marker. SIC is a much better measurement of the quality of a marker from simple LD (D) since it takes into consideration the amount of information gained by not finding the marker.

Shannon Information Content A SIC value of 0.035 for a marker is considered sufficient of MALD.

MALD The patchwork of ancestral chromosome data is assessed for every individual Chromosome regions that have elevated frequency of the ancestry with the higher disease incidence are Identified. The cause at each loci is identified.

The Power of MALD Theoretically MALD enables us to only choose cases (affected individuals), without the need for controls. MALD can find the causative allele in a resolution of 10cM*, ~100 genes. Which can later be analyzed by association studies, and research. * Depending on the amount of admixture and density of markers.

The Power of MALD Sample sizes are considerably smaller for MALD analysis. The amount of SNPs to be mapped is considerably smaller. An individual can be genotyped for a MALD study for a few hundred dollars. Feasible!!

Limitations and Guidelines Assessing LD is tricky, LD may result from natural selection in the parental population. This must be ignored so as not to give false-positives. Errors in assessing the frequencies in parental populations can have the same effect. Ethical consideration must be taken into account, information might be misused.

Criteria for declaring significance in a MALD study The Bayesian statistic for detecting genome-wide significant association should be >2. The deviation of European ancestry compared with the genome average should be seen in cases only, and not in controls. The signal should remain when the marker that contributes most strongly to disease is removed. Markers that are in linkage disequilibrium with each other in ancestral European and western African populations should be excluded from the mapping by admixture linkage disequilibrium (MALD) marker set. The region of association should be statistically significant based on two different Markov chain Monte Carlo analysis-software packages. The P-values for case–control association studies should be obtained by carrying out permutation testing. The statistic at the disease locus must be more extreme, and therefore more significant, than for any other locus throughout the genome in 100 random permutations of the case and control labels. The statistic for association should increase in significance when marker density at the locus is increased, or when more affected samples are added to the study.

MALD – the future Using databases such as the International HapMap project, SNPs of each parental population can be assessed, and we will be able to reach ~90% information about the origin at each loci. Maps for other populations will be built allowing ALD studies in Latinos, Hawaiians and Aborigines. SIC values will become less important as cost of denser genotyping drops. Disease that exhibit 10-20% frequency difference between parental populations will be assessed.

Questions?