Linkage analysis: basic principles Manuel Ferreira & Pak Sham Boulder Advanced Course 2005.

Slides:



Advertisements
Similar presentations
Introduction to QTL mapping
Advertisements

The next generation Chapters 9, 10, 17 in the course textbook, especially pages , ,
Planning breeding programs for impact
Genetics notes For makeup. A gene is a piece of DNA that directs a cell to make a certain protein. –Homozygous describes two alleles that are the same.
Linkage and Gene Mapping. Mendel’s Laws: Chromosomes Locus = physical location of a gene on a chromosome Homologous pairs of chromosomes often contain.
HEREDITY CHAPTER 4. You have Characteristics or traits. Acquired Traits —Reading Skills Inherited Traits —eye color.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
GGAW - Oct, 2001M-W LIN Study Design for Linkage, Association and TDT Studies 林明薇 Ming-Wei Lin, PhD 陽明大學醫學系家庭醫學科 台北榮民總醫院教學研究部.
Genetics.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Introduction to Linkage
Unit Animal Science. Problem Area Animal Genetics and Biotechnology.
Biometrical genetics Pak C Sham The University of Hong Kong Manuel AR Ferreira Queensland Institute for Medical Research 23 rd International Workshop on.
Intro Genetics and Meiosis
Statistical Power Calculations Boulder, 2007 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
REPORTING CATEGORY 2. #21-PROTEIN SYNTHESIS CHANGES TO DNA CODE.
 Genetic material found in the nucleus of eukaryotic cells; in the cytoplasm of prokaryotes (no nucleus)  A library of genetic information (genes) located.
Stern - Introductory Plant Biology: 9th Ed. - All Rights Reserved - McGraw Hill Companies Genetics Chapter 13 Copyright © McGraw-Hill Companies Permission.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Unit 4 Vocabulary Review. Nucleic Acids Organic molecules that serve as the blueprint for proteins and, through the action of proteins, for all cellular.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Quantitative Trait Loci, QTL An introduction to quantitative genetics and common methods for mapping of loci underlying continuous traits:
Introduction to QTL analysis Peter Visscher University of Edinburgh
Broad-Sense Heritability Index
Scientific Basis of Genetics Janice S. Dorman, PhD University of Pittsburgh School of Nursing.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Introduction to Genetics
Non-Mendelian Genetics
Allele. Alternate form of a gene gene variant autosome.
Genetic Theory Manuel AR Ferreira Boulder, 2007 Massachusetts General Hospital Harvard Medical School Boston.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Gene Hunting: Linkage and Association
Life Science “The Molecular Basis of Heredity”. Amino Acid Any of the organic acids that are the chief component of proteins, either manufactured by cells.
Linkage in selected samples Manuel Ferreira QIMR Boulder Advanced Course 2005.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Recombination and Linkage
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Principles of Mendelian Genetics B-4.6. Principles of Mendelian Genetics Genetics is the study of patterns of inheritance and variations in organisms.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
BSAA CD UNIT C Animal Science. Problem Area 1 Animal Genetics and Biotechnology.
Genetic principles for linkage and association analyses Manuel Ferreira & Pak Sham Boulder, 2009.
Biometrical genetics Manuel AR Ferreira Boulder, 2008 Massachusetts General Hospital Harvard Medical School Boston.
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Genetic Theory Manuel AR Ferreira Boulder, 2007 Massachusetts General Hospital Harvard Medical School Boston.
GENETICS.
Regression Models for Linkage: Merlin Regress
Biomedical Technology I
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
GENETICS.
Regression-based linkage analysis
A Brief History What is molecular biology?
Biometrical model and introduction to genetic analysis
Structure of DNA & Replication
Sexual reproduction creates unique combinations of genes.
Balanced Translocation detected by FISH
Lecture 9: QTL Mapping II: Outbred Populations
Scientific Basis of Genetics
Mendelian Genetics Vocabulary.
Presentation transcript:

Linkage analysis: basic principles Manuel Ferreira & Pak Sham Boulder Advanced Course 2005

Outline 1. Aim 2. The Human Genome 3. Principles of Linkage Analysis 4. Parametric Linkage Analysis 5. Nonparametric Linkage Analysis

1. Aim

For a heritable trait... localizes region of the genome where a locus (loci) that regulates the trait is likely to be harboured identifies a locus that regulates the trait Linkage: Association: Family-specific phenomenon: Affected individuals in a family share the same ancestral predisposing DNA segment at a given trait locus Population-specific phenomenon: Affected individuals in a population share the same ancestral predisposing DNA segment at a given trait locus

2. Human Genome

A DNA molecule is a linear backbone of alternating sugar residues and phosphate groups Attached to carbon atom 1’ of each sugar is a nitrogenous base: A, C, G or T Two DNA molecules are held together in anti-parallel fashion by hydrogen bonds between bases [Watson-Crick rules] Antiparallel double helix Only one strand is read during gene transcription Nucleotide: 1 phosphate group + 1 sugar + 1 base DNA structure A gene is a segment of DNA which is transcribed to give a protein or RNA product

DNA polymorphisms RFLPs A B Minisatellites Microsatellites >100,000 Many alleles, (CA) n, very informative, even, easily automated SNPs 10,054,521 (25 Jan ‘05) Most with 2 alleles (up to 4), not very informative, even, easily automated

Haploid gametes ♁ ♂ ♂ ♁ G 1 phase chr1 S phase Diploid zygote 1 cell M phase Diploid zygote >1 cell ♁ ♂ ♁ A - B - A - B - A - B - A - B - A - B - A - B - ♂ ♁ A - B - A - B - ♂ ♁ - A - B - A - B - A - B - A - B DNA organization Mitosis (22 + 1)

Diploid gamete precursor cell (♂)(♁) (♂) (♁) Haploid gamete precursors Hap. gametes NR R R ♁ A - B - - A - B A - B - - A - B A - B - - A - B A - B - - A - B ♂ ♁ A - B - A - B - - A - B - A - B DNA recombination Meiosis 2 (22 + 1) chr1

Diploid gamete precursor (♂)(♁) (♂) (♁) Haploid gamete precursors Hap. gametes NR ♁ A - B - - A - B A - B - - A - B A - B - - A - B A - B - - A - B ♂ ♁ A - B - A - B - - A - B - A - B DNA recombination between linked loci Meiosis 2 (22 + 1)

Human Genome - summary Recombination fraction between loci A and B (θ) Proportion of gametes produced that are recombinant for A and B If A and B are very far apart: 50%R:50%NR - θ = 0.5 If A and B are very close together: <50%R - 0 ≤ θ < 0.5 Recombination fraction (θ) can be converted to genetic distance (cM) Haldane: eg. θ =0.17, cM=20.8 Kosambi: eg. θ=0.17, cM=17.7 DNA is a linear sequence of nucleotides partitioned into 23 chromosomes Two copies of each chromosome (2x22 autosomes + XY), from paternal and maternal origins. During meiosis in gamete precursors, recombination can occur between maternal and paternal homologs

3. Principles of Linkage Analysis

Linkage Analysis requires genetic markers M1 M2 MnMn M1 M2 MnMn M1 M2 MnMn θ Q θ

Linkage Analysis: Parametric vs. Nonparametric QM Phe A D C E Genetic factors Environmental factors Mode of inheritance Recombination Correlation Chromosome Gene Adapted from Weiss & Terwilliger 2000

4. Parametric Linkage Analysis

Linkage with informative phase known meiosis M2M5Q2Q2M2M5Q2Q2 M1M6Q1Q?M1M6Q1Q? M 1 Q 1 /M 2 Q 2 M3M4Q2Q2M3M4Q2Q2 M 1 Q 1 /M 3 Q 2 M 2 Q 2 /M 3 Q 2 M 1 Q 1 /M 4 Q 2 M 2 Q 2 /M 4 Q 2 M 2 Q 1 /M 3 Q 2 Chromosome M 1..6 Q 1,2 Autosomal dominant, Q 1 predisposing allele Gene ♁ ♂ NR: M 1 Q 1 NR: M 2 Q 2 R: M 1 Q 2 R: M 2 Q 1 θ MQ = 1/6 = 0.17 Informative Phase known (~20.8 cM) M1M2Q1Q2M1M2Q1Q2 M1M1 Q1Q1 M2M2 Q2Q2

M1M2Q1Q2M1M2Q1Q2 M3M4Q2Q2M3M4Q2Q2 NR: M 1 Q 1 NR: M 2 Q 2 R: M 1 Q 2 R: M 2 Q 1 Q2Q2Q2Q2 Q1Q?Q1Q? P 1-θ θ θ M 1 Q 1 /M 2 Q 2 R: M 1 Q 1 R: M 2 Q 2 NR: M 1 Q 2 NR: M 2 Q 1 P θ θ 1-θ M 1 Q 2 /M 2 Q 1 N N Informative Phase unknown Linkage with informative phase unknown meiosis M 1 Q 1 /M 3 Q 2 M 2 Q 2 /M 3 Q 2 M 1 Q 1 /M 4 Q 2 M 2 Q 2 /M 4 Q 2 M 2 Q 1 /M 3 Q 2

Parametric LOD score calculation Overall LOD score for a given θ is the sum of all family LOD scores at θ eg. LOD=3 for θ=0.28

M1 M2 MnMn θ Q For each marker, estimate the θ that yields highest LOD score across all families Markers with a significant parametric LOD score (>3) are said to be linked to the trait locus with recombination fraction θ This θ (and the LOD) will depend upon the mode of inheritance assumed MOI determines the genotype at the trait locus Q and thus determines the number of meiosis which are recombinant or nonrecombinant. Limited to Mendelian diseases. Parametric Linkage Analysis - summary

M1M2Q1Q1M1M2Q1Q1 M3M4Q1Q2M3M4Q1Q2 M2M3Q1Q1M2M3Q1Q1 M1M4Q1Q2M1M4Q1Q2 M1M4Q1Q1M1M4Q1Q1 M2M4Q1Q2M2M4Q1Q2 NR: M 3 Q 1 NR: M 4 Q 2 R: M 3 Q 2 R: M 4 Q 1 Q1Q1Q1Q1 Q2Q?Q2Q? P 1-θ θ θ M 3 Q 1 /M 4 Q 2 R: M 3 Q 1 R: M 4 Q 2 NR: M 3 Q 2 NR: M 4 Q 1 P θ θ 1-θ M 3 Q 2 /M 4 Q 1 N N Practical 1. Identify informative individual(s) 2. Reconstruct possible phase(s) 3. Classify gametes as R or NR 4. Count R and NR gametes 5. Express 6. Express LOD score

Practical II Talk example Practical example Graph each…

Outline 1. Aim 2. The Human Genome 3. Principles of Linkage Analysis 4. Parametric Linkage Analysis 5. Nonparametric Linkage Analysis

Approach Parametric: genotype marker locus & genotype trait locus (latter inferred from phenotype according to a specific disease model) Parameter of interest: θ between marker and trait loci Nonparametric: genotype marker locus & phenotype If a trait locus truly regulates the expression of a phenotype, then two relatives with similar phenotypes should have similar genotypes at a marker in the vicinity of the trait locus, and vice-versa. Interest: correlation between phenotypic similarity and marker genotypic similarity No need to specify mode of inheritance, allele frequencies, etc...

Phenotypic similarity between relatives Squared trait differences Squared trait sums Trait cross-product Trait variance-covariance matrix Affection concordance T2T2 T1T1

Genotypic similarity between relatives IBS Alleles shared Identical By State “look the same”, may have the same DNA sequence but they are not necessarily derived from a known common ancestor IBD Alleles shared Identical By Descent are a copy of the same ancestor allele M1M1 Q1Q1 M2M2 Q2Q2 M3M3 Q3Q3 M3M3 Q4Q4 M1M1 Q1Q1 M3M3 Q3Q3 M1M1 Q1Q1 M3M3 Q4Q4 M1M1 Q1Q1 M2M2 Q2Q2 M3M3 Q3Q3 M3M3 Q4Q4 IBS IBD 2 1 Inheritance vector (M)

Genotypic similarity between relatives - M1M1 Q1Q1 M3M3 Q3Q3 M2M2 Q2Q2 M3M3 Q4Q4 Number of alleles IBD 0 M1M1 Q1Q1 M3M3 Q3Q3 M1M1 Q1Q1 M3M3 Q4Q4 1 M1M1 Q1Q1 M3M3 Q3Q3 M1M1 Q1Q1 M3M3 Q3Q3 2 Proportion of alleles IBD Inheritance vector (M)

Genotypic similarity between relatives - 2 2n

Statistics that incorporate both phenotypic and genotypic similarities Genotypic similarity ( ) Phenotypic similarity

Haseman-Elston regression – Quantitative traits Phenotypic dissimilarity Genotypic similarity b ×=+ c

VC ML – Quantitative & Categorical traits method H1:H1: H0:H0: e.g. LOD=3

Individual LOD scores can be expressed as P values (Pointwise) LODChi-sq (n-df)P value Genome-wide linkage analysis (e.g. VC) (x4.6)

Statistics for selected samples T2T2 T1T1 H 0 (No linkage): Mean H 1 (Linkage): Mean H 0 (No linkage): Mean H 1 (Linkage): Mean Mean IBD sharing statistics (Risch & Zhang 1995, 1996)

Other Linkage statistics Dependent variable: Phenotypes Independent variable: Dependent variable: Independent variable: Phenotypes Extensions to Haseman Elston VC ML with mixture distribution Pedwide-regression Analysis (“reverse HE”) Reverse VC ML (Wright 1997, Drigalenko 1998, Elston et al. 2000, Forrest 2001, Visscher & Hopper 2001, Xu et al. 2000, Sham & Purcell 2001) (Eaves et al. 1996) (Sham et al. 2002) (Sham et al. 2000) Statistics for affection traits Based on IBD scoring functions eg. S all (Whittemore & Halpern 1994, Kong & Cox 1997) Forrest & Feingold 2000 Mixed statistic

No need to specify mode of inheritance Nonparametric Linkage Analysis - summary Models phenotypic and genotypic similarity of relatives Expression of phenotypic similarity, calculation of IBD HE and VC are the most popular statistics used for linkage of quantitative traits Other statistics available, specially for affection traits Type I error? Power?

Type I error True positive LOD k Theoretical (Lander & Kruglyak 1995) Empirical

Theoretical genome-wide thresholds Genome-wide threshold for suggestive linkage LOD score that occurs by chance alone on average once per scan LOD = 2.2, Chi-sq = 10.1, Pointwise P = Genome-wide threshold for significant linkage LOD score that occurs by chance alone on average once per 20 scans LOD = 3.6, Chi-sq = 16.7, Pointwise P =

Empirical genome-wide thresholds Genome-wide threshold for suggestive linkage LOD score that occurs by chance alone on average once per scan Genome-wide threshold for significant linkage LOD score that occurs by chance alone on average once per 20 scans