Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional.

Linkage analysis Jan Hellemans 6

Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional Sanger sequencing only possible after selection  Massively parallel sequencing possible prior to or after selection  RNA sequencing  exome sequencing  genome sequencing

Finding causal mutations  Selection  positional (prior to sequencing)  linkage analysis  GWAS  structural variations (e.g. microdeletions)  functional (prior to & after sequencing)  candidate genes selected based on known function or involvement in related disorders  filtering of variants based on functional predictions  overlap (after sequencing)  looking for genes / variants that occur in multiple independent patients  mostly a combination is used

exome sequencing

Aims  Interprete microsatellite results  Add genotypes to pedigrees  Create pedigree and genotype files  Calculate and interprete LOD-scores  Delineate linkage intervals  Basic principles of linkage analysis  Analyze other types of markers  Association studies  Learn how to work with specific pedigree programs

Starting linkage analysis

Preparations  Clearly define the phenotype  If not specific enough than you may analyze different disorders that can map to different genomic loci  LOD scores are additive  Find suitable families  larger is better  more patients is better  Collect genomic DNA from as much family members as possible  Determine the type of inheritance  Calculate the power to prove linkage with the available material (SLink – not part of this course)

Linkage analysis types  Directed linkage analysis  Evaluate linkage at a specific locus such as a candidate gene  Common approach: evaluate an intragenic, 5’ and 3’ marker often microsattelites  Genome wide linkage analysis  Screen for linkage for markers spread across the entire genome  Microsatellites: ~400 markers spaced at about 10cM  SNP’s: 500k SNP array  Homozygosity mapping  Screen only affected individuals in inbred families  Select homozygous markers (typically SNP markers)  Very efficient technology  Fine mapping  Some linked markers are known, but the borders of the linkage interval still need to be defined

Exercise – Part 1  2 inbred families with a recessive disorder  With a homozygosity mapping based on 500k SNP arrays 2 candidate regions could be identified  Chromosome 4  Patient 1 homozygous for  6.052Mb - 14.488Mb  21.008Mb – 37.477Mb  Patient 2 homozygous for  11.186Mb – 37.219Mb  Task: find microsatellite markers to confirm linkage

Find additional flanking markers  Find physical position of marker in NCBI > UniSTS  NCBI map viewer: http://www.ncbi.nlm.nih.gov/mapview/  Go to Homo sapiens and to the wright chromosome  Maps & options: show  DeCode, Généthon & Marshfield (genetic maps)  Genes  Set region: e.g. 2Mb up- and downstream of your marker  Click ‘Data as table view’  Click on STS behind a marker to see its details  Select markers that  locate to only 1 genomic location  have a PCR product with an extended size range one size  not polymorphic

http://www.ncbi.nlm.nih.gov/projects/mapview

Exercise – Part 1 > possible solution  Markers in 1st candidate region  D4S3017 (21.078Mb)  D4S3044 (25.189Mb)  D4S1618 (33.857Mb)  D4S3350 (33.857Mb)  D4S2988 (36.889Mb)  Markers in 2nd candidate region  D4S1582 (10.311Mb)  D4S2906 (12.321Mb)  D4S2944 (13.141Mb)  D4S1602 (14.059Mb)  D4S2960 (15.437Mb)  Order primers & analyze them on all family members

Analyzing microsatellite data

Microsatellites > basics  Repeats of short sequences (e.g. 2bp) NNNNAC(AC) n ACNNNN  Number of repeats is variable (instable sequence)  Number of repeats determines the allele  Number of repeats corresponds to specific length of PCR product:  allel 1: NNNNACACACACACNNNN(5*AC  18bp)  allel 2: NNNNACACACACACACNNNN(6*AC  20bp)  allel 3: NNNNACACACACACACACNNNN(7*AC  22bp) ...  Determine length to know the allele (sequencer)

Microsatellites > basics

Microsatellites > determine size 230bp220bp 225bp  Use internal size standard (other color)

Microsatellites > heterozygotes 230bp220bp 225bp223bp

Microsatellites > stutter peaks  Repeats are difficult to copy  polymerase slips  Some amplicons have 1 repeat less a few even loose multiple repeats  Small repeats are more prone to slippage and show more pronounced stutter peaks  Largest product is the correct one  Distance between peaks = length of a repeat

Microsatellites > stutter peaks allelic peak 1st stutter peak 2nd stutter peak

Microsatellites > stutter peaks  Allelic peaks are the heighest  Stutter peaks are lower A1A2

Microsatellites > stutter peaks A1A2

Microsatellites > +A peaks  Taq polymerase tends to add an extra A at the 3’ end  Variable degree of products with or without this extra A  Do not confuse with stutter peaks (only 1bp difference) allelic peak 1st stutter peak 2nd stutter peak allelic peak + A 1st stutter peak + A 2nd stutter peak + A

Microsatellites > complex plots (stutter & +A) A1A2

Microsatellites > mutliplex  Combine multiple markers in a single analysis ($$$)  Different size range  Multicolor  Commercial kits: e.g. 16 markers / lane

Microsatellite plots examples

Genotyping pedigrees

 Screen one or multiple markers for some or all family members  For every marker:  Make a list of all occuring allele sizes  Due to technical variation on sizing the same allele can have a slightly different size in different measurements (-0.4bp _ +0.4bp). Give all alleles within this range the same allele number  Add the allele numbers to the pedigree at the corresponding individual/marker combination  Find the wright phase  Advanced software like GeneMapper can generate tables with allele numbers for every sample / marker  Advanced pedigree programs like Progeny can store genotype information for family members  Verify inheritance

Exercise – Part 2  Genotype 3 markers in all available individuals of 2 families  Pedigrees & microsatellite plots in ExercisePart2-GenotypingData.pdf  Add allele numbers for the 3 markers to the pedigree  Interprete the genotyped pedigrees: linked?

Family 1

Family 2

Exercise – Part 2 > Conclusions  D4S1582  Mendelian error  can not be interpreted  D4S2944  Linked  D4S3017  Not-linked: unaffected individuals with the same genotype as a patient

Calculate LOD scores

EasyLinkage  EasyLinkage = UI for linkage analysis  http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start  Bioinformatics. 2005 Feb 1;21(3):405-7PMID: 15347576  Bioinformatics. 2005 Sep 1;21(17):3565-7PMID: 16014370  Interface for many linkage analysis programs  Input  Pedigree file (linkage format)  Genotype file(s)  Marker information (already provided for popular markers)  Settings

Pedigree file  Naming requirements for EasyLinkage: p_xxx.pro  e.g. p_SMMD.pro  Format:  Tab delimited text file  1 individual per row  Columns:  1  family ID  2  person ID  3  father ID  4  mother ID  5  sex (1=male, 2=female, 0=unknown)  6  affection status (1=unaffected, 2=affected, 0=unknown)  7  DNA availability (optional, relevant for power calculations)  8  liability class (to be provided if multiple liability classes are used)

Genotype files  Person ID’s have to match exactly with those provided in the pedigree file  Naming requirements for EasyLinkage: MarkerName_xxx.abi  e.g. D1S1609_SMMD.abi  Format:  Tab delimited text file  1 individual per row  Columns (for microsatellite based analysis):  1  marker (same as in file name and matching a marker in an available marker set)  2  custom information (content doesn’t matter, but column must be present)  3  individual ID (match person ID in pedigree file)  4 & 5  genotypes for 2 alleles (unknown=0)

Marker information  Contains information on the chromosome and position of every marker  Already available for a number of commercial SNP- arrays and for the microsatellite markers from  Genethon  Marshfield  DeCode  Custom marker sets can be created (see manual)

EasyLinkage settings  Choose a program:  FastLink  Parametric, single-point  SuperLink  Parametric, single-/multipoint  SPLink  Nonparametric, single-point  Genehunter  Nonpara-/parametric, single-/multipoint  Genehunter Plus  Nonpara-/parametric, single-/multipoint  Genehunter MOD  Nonpara-/parametric, single-/multipoint  Genehunter Imprinting  Nonpara-/parametric, single-/multipoint  GeneHunter TwoLocus  Parametric, two-locus, single-/multipoint  Merlin  Nonpara-/parametric, single-/multipoint  SimWalk  Nonparametric, single-/multipoint  Allegro  Nonpara-/parametric, single-/multipoint & simulation, single- /multi-point  PedCheck  Mendelian error check  FastSLink  Simulation, single-/multi-point

EasyLinkage settings  Parametric non-parametric  Single point multipoint  Frequency of the disease allele  Penetrance vectors (wt/wt, wt/mt, mt/mt)  Standard dominant: 0 1 1  Standard recessive: 0 0 1  Reduced penetrance: replace 1 by penetrance (e.g. 0.9)  Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1)  Example: 0.01 0.9 0.99 1% chance to show a similar phenotype despite a normal genotype 90% chance to show the phenotype when 1 mutant allele (dominant with incomplete penetrance) 99% likelihood to present with the phenotype if both alleles are mutant

Evaluate calculated LOD-scores  Maximum LOD-scores can be seen in EasyLinkage  Details about LOD-scores at different recombination fractions can be found in text files generated by EasyLinkage  process in Excel (generate graphs,...)  Standard rules for LOD-scores  >3  significant linkage  2<LOD<3  suggestive linkage  -2<LOD<2  uninformative  <-2  significant absence of linkage

Interpreting LOD plots

Exercise – Part 3  Generate one pedigree file containing all family members of both families (use Global ID’s)  Generate a genotype file for each of the tested markers  Run SuperLink analysis with the right settings  Evaluate results

Exercise – Part 3 > Results

Strengthen the evidence  Analyze more family members  Analyze more families  Analyze flanking markers  Look for more informative markers that result in higher LOD-scores  A series of flanking markers allows for multipoint linkage analysis  A series of linked markers gives more confidence (subjective)  Flanking markers can also be used to fine-map the linkage interval

Determine the linkage interval L L NL ? ? LLLL L ? ?... candidate region

Exercise 2: find the linkage interval

Post linkage  Create a list of all the genes within the linkage interval  NCBI map viewer  UCSC (also for non-coding RNA’s)  Evaluate known gene functions for relevance to the investigated phenotype  Sequence genes  Start with those that seem the most relevant to the disorder  Start with the coding regions  Screen the entire region with capture sequencing  Finding a mutation and proving its causality is the ultimate proof

Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional.

Similar presentations

Presentation on theme: "Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional.

Similar presentations

Presentation on theme: "Linkage analysis Jan Hellemans 6. Finding causal mutations  2 opposing strategies  sequence then select  select then sequence  Sequencing  traditional."— Presentation transcript:

Similar presentations

About project

Feedback