Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional.

Slides:



Advertisements
Similar presentations
Linkage and Genetic Mapping
Advertisements

Lecture 2 Strachan and Read Chapter 13
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Tutorial #1 by Ma’ayan Fishelson
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Gene Frequency and LINKAGE Gregory Kovriga & Alex Ratt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Tutorial #2 by Ma’ayan Fishelson. Crossing Over Sometimes in meiosis, homologous chromosomes exchange parts in a process called crossing-over. New combinations.
1 BBS- 6. INTRODUCTION METHODS OF HOMOZYGOSITY MAPPING HOMOZYGOSITY MAPPER GENETIC LINKAGE LOD SCORE METHOD 2.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Mendelian Inheritance and Exception and Extensions of Mendelian Inheritance.
1.Generate mutants by mutagenesis of seeds Use a genetic background with lots of known polymorphisms compared to other genotypes. Availability of polymorphic.
MMLS-C By : Laurence Bisht References : The Power to Detect Linkage in Complex Diseases Means of Simple LOD-score Analyses. By David A.,Paula Abreu and.
Parametric and Non-Parametric analysis of complex diseases Lecture #8
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Lectures 30 and 31 “Identifying human disease genes” If you are interested in studying a human disease, how do you find out which gene, when mutated, causes.
Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.
General Explanation There are 2 input files –The locus file describes the loci being analyzed and parameters for the different analyzing programs. –The.
Linkage Analysis in Merlin
Supplementary slides. Mock-ups Exome overview Genomic coverage: lower quartile 1, median 23, upper quartile 35 Protocols: Aligner used: BWA v2.3 Reference.
Chapter 6 Biology of STRs: Stutter Products, Non-template Addition, Microvariants, Null Alleles, and Mutation Rates ©2002 Academic Press.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Non-Mendelian Genetics
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
Class 3 1. Construction of genetic maps 2. Single marker QTL analysis 3. QTL cartographer.
CS177 Lecture 10 SNPs and Human Genetic Variation
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Announcements: Proposal resubmission deadline 4/23 (Thursday).
Genomics Collaboration Senior Scientist
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Sir Archibald E Garrod – alcaptonuria – black urine - (Madness of King George)
Mapping and cloning Human Genes. Finding a gene based on phenotype ’s of DNA markers mapped onto each chromosome – high density linkage map. 2.
An quick overview of human genetic linkage analysis
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Genotyping and Genetic Maps Bas Heijmans Leiden University Medical Centre The Netherlands.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
An quick overview of human genetic linkage analysis Terry Speed Genetics & Bioinformatics, WEHI Statistics, UCB NWO/IOP Genomics Winterschool Mathematics.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
Short description on how to use dChip SNP Please watch in slide show mode Updated
Finding a gene based on phenotype Model organisms ’s of DNA markers mapped onto each chromosome – high density linkage map. 2. identify markers linked.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Chapter Seven: Extending Mendelian Genetics
Linkage analysis & Homozygosity mapping
Map-based cloning of interesting genes
Recombination (Crossing Over)
Error Checking for Linkage Analyses
Use of Homozygosity Mapping to Identify a Region on Chromosome 1 Bearing a Defective Gene That Causes Autosomal Recessive Homozygous Hypercholesterolemia.
Homework #4 is due 12/4/07 (only if needed)
Sequential Steps in Genome Mapping
Balanced Translocation detected by FISH
IBD Estimation in Pedigrees
Linkage Analysis Problems
Presentation transcript:

Linkage analysis Jan Hellemans 6

Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional Sanger sequencing only possible after selection  Massively parallel sequencing possible prior to or after selection  RNA sequencing  exome sequencing  genome sequencing

Finding causal mutations  Selection  positional (prior to sequencing)  linkage analysis  GWAS  structural variations (e.g. microdeletions)  functional (prior to & after sequencing)  candidate genes selected based on known function or involvement in related disorders  filtering of variants based on functional predictions  overlap (after sequencing)  looking for genes / variants that occur in multiple independent patients  mostly a combination is used

exome sequencing

Aims  Interprete microsatellite results  Add genotypes to pedigrees  Create pedigree and genotype files  Calculate and interprete LOD-scores  Delineate linkage intervals  Basic principles of linkage analysis  Analyze other types of markers  Association studies  Learn how to work with specific pedigree programs

Starting linkage analysis

Preparations  Clearly define the phenotype  If not specific enough than you may analyze different disorders that can map to different genomic loci  LOD scores are additive  Find suitable families  larger is better  more patients is better  Collect genomic DNA from as much family members as possible  Determine the type of inheritance  Calculate the power to prove linkage with the available material (SLink – not part of this course)

Linkage analysis types  Directed linkage analysis  Evaluate linkage at a specific locus such as a candidate gene  Common approach: evaluate an intragenic, 5’ and 3’ marker often microsattelites  Genome wide linkage analysis  Screen for linkage for markers spread across the entire genome  Microsatellites: ~400 markers spaced at about 10cM  SNP’s: 500k SNP array  Homozygosity mapping  Screen only affected individuals in inbred families  Select homozygous markers (typically SNP markers)  Very efficient technology  Fine mapping  Some linked markers are known, but the borders of the linkage interval still need to be defined

Exercise – Part 1  2 inbred families with a recessive disorder  With a homozygosity mapping based on 500k SNP arrays 2 candidate regions could be identified  Chromosome 4  Patient 1 homozygous for  6.052Mb Mb  Mb – Mb  Patient 2 homozygous for  Mb – Mb  Task: find microsatellite markers to confirm linkage

Find additional flanking markers  Find physical position of marker in NCBI > UniSTS  NCBI map viewer:  Go to Homo sapiens and to the wright chromosome  Maps & options: show  DeCode, Généthon & Marshfield (genetic maps)  Genes  Set region: e.g. 2Mb up- and downstream of your marker  Click ‘Data as table view’  Click on STS behind a marker to see its details  Select markers that  locate to only 1 genomic location  have a PCR product with an extended size range one size  not polymorphic

Exercise – Part 1 > possible solution  Markers in 1st candidate region  D4S3017 (21.078Mb)  D4S3044 (25.189Mb)  D4S1618 (33.857Mb)  D4S3350 (33.857Mb)  D4S2988 (36.889Mb)  Markers in 2nd candidate region  D4S1582 (10.311Mb)  D4S2906 (12.321Mb)  D4S2944 (13.141Mb)  D4S1602 (14.059Mb)  D4S2960 (15.437Mb)  Order primers & analyze them on all family members

Analyzing microsatellite data

Microsatellites > basics  Repeats of short sequences (e.g. 2bp) NNNNAC(AC) n ACNNNN  Number of repeats is variable (instable sequence)  Number of repeats determines the allele  Number of repeats corresponds to specific length of PCR product:  allel 1: NNNNACACACACACNNNN(5*AC  18bp)  allel 2: NNNNACACACACACACNNNN(6*AC  20bp)  allel 3: NNNNACACACACACACACNNNN(7*AC  22bp) ...  Determine length to know the allele (sequencer)

Microsatellites > basics

Microsatellites > determine size 230bp220bp 225bp  Use internal size standard (other color)

Microsatellites > heterozygotes 230bp220bp 225bp223bp

Microsatellites > stutter peaks  Repeats are difficult to copy  polymerase slips  Some amplicons have 1 repeat less a few even loose multiple repeats  Small repeats are more prone to slippage and show more pronounced stutter peaks  Largest product is the correct one  Distance between peaks = length of a repeat

Microsatellites > stutter peaks allelic peak 1st stutter peak 2nd stutter peak

Microsatellites > stutter peaks  Allelic peaks are the heighest  Stutter peaks are lower A1A2

Microsatellites > stutter peaks A1A2

Microsatellites > +A peaks  Taq polymerase tends to add an extra A at the 3’ end  Variable degree of products with or without this extra A  Do not confuse with stutter peaks (only 1bp difference) allelic peak 1st stutter peak 2nd stutter peak allelic peak + A 1st stutter peak + A 2nd stutter peak + A

Microsatellites > complex plots (stutter & +A) A1A2

Microsatellites > mutliplex  Combine multiple markers in a single analysis ($$$)  Different size range  Multicolor  Commercial kits: e.g. 16 markers / lane

Microsatellite plots examples

Genotyping pedigrees

 Screen one or multiple markers for some or all family members  For every marker:  Make a list of all occuring allele sizes  Due to technical variation on sizing the same allele can have a slightly different size in different measurements (-0.4bp _ +0.4bp). Give all alleles within this range the same allele number  Add the allele numbers to the pedigree at the corresponding individual/marker combination  Find the wright phase  Advanced software like GeneMapper can generate tables with allele numbers for every sample / marker  Advanced pedigree programs like Progeny can store genotype information for family members  Verify inheritance

Exercise – Part 2  Genotype 3 markers in all available individuals of 2 families  Pedigrees & microsatellite plots in ExercisePart2-GenotypingData.pdf  Add allele numbers for the 3 markers to the pedigree  Interprete the genotyped pedigrees: linked?

Family 1

Family 2

Exercise – Part 2 > Conclusions  D4S1582  Mendelian error  can not be interpreted  D4S2944  Linked  D4S3017  Not-linked: unaffected individuals with the same genotype as a patient

Calculate LOD scores

EasyLinkage  EasyLinkage = UI for linkage analysis   Bioinformatics Feb 1;21(3):405-7PMID:  Bioinformatics Sep 1;21(17):3565-7PMID:  Interface for many linkage analysis programs  Input  Pedigree file (linkage format)  Genotype file(s)  Marker information (already provided for popular markers)  Settings

Pedigree file  Naming requirements for EasyLinkage: p_xxx.pro  e.g. p_SMMD.pro  Format:  Tab delimited text file  1 individual per row  Columns:  1  family ID  2  person ID  3  father ID  4  mother ID  5  sex (1=male, 2=female, 0=unknown)  6  affection status (1=unaffected, 2=affected, 0=unknown)  7  DNA availability (optional, relevant for power calculations)  8  liability class (to be provided if multiple liability classes are used)

Genotype files  Person ID’s have to match exactly with those provided in the pedigree file  Naming requirements for EasyLinkage: MarkerName_xxx.abi  e.g. D1S1609_SMMD.abi  Format:  Tab delimited text file  1 individual per row  Columns (for microsatellite based analysis):  1  marker (same as in file name and matching a marker in an available marker set)  2  custom information (content doesn’t matter, but column must be present)  3  individual ID (match person ID in pedigree file)  4 & 5  genotypes for 2 alleles (unknown=0)

Marker information  Contains information on the chromosome and position of every marker  Already available for a number of commercial SNP- arrays and for the microsatellite markers from  Genethon  Marshfield  DeCode  Custom marker sets can be created (see manual)

EasyLinkage settings  Choose a program:  FastLink  Parametric, single-point  SuperLink  Parametric, single-/multipoint  SPLink  Nonparametric, single-point  Genehunter  Nonpara-/parametric, single-/multipoint  Genehunter Plus  Nonpara-/parametric, single-/multipoint  Genehunter MOD  Nonpara-/parametric, single-/multipoint  Genehunter Imprinting  Nonpara-/parametric, single-/multipoint  GeneHunter TwoLocus  Parametric, two-locus, single-/multipoint  Merlin  Nonpara-/parametric, single-/multipoint  SimWalk  Nonparametric, single-/multipoint  Allegro  Nonpara-/parametric, single-/multipoint & simulation, single- /multi-point  PedCheck  Mendelian error check  FastSLink  Simulation, single-/multi-point

EasyLinkage settings  Parametric non-parametric  Single point multipoint  Frequency of the disease allele  Penetrance vectors (wt/wt, wt/mt, mt/mt)  Standard dominant:  Standard recessive:  Reduced penetrance: replace 1 by penetrance (e.g. 0.9)  Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1)  Example: % chance to show a similar phenotype despite a normal genotype 90% chance to show the phenotype when 1 mutant allele (dominant with incomplete penetrance) 99% likelihood to present with the phenotype if both alleles are mutant

Evaluate calculated LOD-scores  Maximum LOD-scores can be seen in EasyLinkage  Details about LOD-scores at different recombination fractions can be found in text files generated by EasyLinkage  process in Excel (generate graphs,...)  Standard rules for LOD-scores  >3  significant linkage  2<LOD<3  suggestive linkage  -2<LOD<2  uninformative  <-2  significant absence of linkage

Interpreting LOD plots

Exercise – Part 3  Generate one pedigree file containing all family members of both families (use Global ID’s)  Generate a genotype file for each of the tested markers  Run SuperLink analysis with the right settings  Evaluate results

Exercise – Part 3 > Results

Strengthen the evidence  Analyze more family members  Analyze more families  Analyze flanking markers  Look for more informative markers that result in higher LOD-scores  A series of flanking markers allows for multipoint linkage analysis  A series of linked markers gives more confidence (subjective)  Flanking markers can also be used to fine-map the linkage interval

Determine the linkage interval L L NL ? ? LLLL L ? ?... candidate region

Exercise 2: find the linkage interval

Post linkage  Create a list of all the genes within the linkage interval  NCBI map viewer  UCSC (also for non-coding RNA’s)  Evaluate known gene functions for relevance to the investigated phenotype  Sequence genes  Start with those that seem the most relevant to the disorder  Start with the coding regions  Screen the entire region with capture sequencing  Finding a mutation and proving its causality is the ultimate proof