Tutorial #5 by Ma’ayan Fishelson

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Tutorial #8 by Ma’ayan Fishelson. Computational Difficulties Algorithms that perform multipoint likelihood computations sum over all the possible ordered.
. Exact Inference in Bayesian Networks Lecture 9.
Gene linkage seminar No 405 Heredity. Key words: complete and incomplete gene linkage, linkage group, Morgan´s laws, crossing-over, recombination, cis.
Tutorial #1 by Ma’ayan Fishelson
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
6- GENE LINKAGE AND GENETIC MAPPING Compiled by Siti Sarah Jumali Level 3 Room 14 Ext 2123.
Genetic linkage analysis Dotan Schreiber According to a series of presentations by M. Fishelson.
Lectures 8 – Oct 24, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Basics of Linkage Analysis
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
.. Likelihood Computation  Given a Bayesian network and evidence e, compute P( e ) Sum over all possible values of unobserved variables Bet1Die Win1.
Pedigree Analysis.
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Tutorial #6 by Ma’ayan Fishelson Based on notes by Terry Speed.
An quick survey of human 2-point linkage analysis Terry Speed Division of Genetics & Bioinformatics, WEHI Department of Statistics, UCB NWO/IOP Genomics.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Basic Model For Genetic Linkage Analysis Lecture #3 Prepared by Dan Geiger.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Linear-Time Reconstruction of Zero-Recombinant Mendelian Inheritance on Pedigrees without Mating Loops Authors: Lan Liu, Tao Jiang Univ. California, Riverside.
Tutorial #11 by Anna Tzemach. Background – Lander & Green’s HMM Recombinations across successive intervals are independent  sequential computation across.
CASE STUDY: Genetic Linkage Analysis via Bayesian Networks
Reconstructing Genealogies: a Bayesian approach Dario Gasbarra Matti Pirinen Mikko Sillanpää Elja Arjas Department of Mathematics and Statistics
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Gene linkage seminar No 405 Heredity. Key words: complete and incomplete gene linkage, linkage group, Morgan´s laws, crossing-over, recombination, cis.
Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.
Genetics.
General Explanation There are 2 input files –The locus file describes the loci being analyzed and parameters for the different analyzing programs. –The.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
1 Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.
A gene is composed of strings of bases (A,G, C, T) held together by a sugar phosphate backbone. Reminder - nucleotides are the building blocks.
Non-Mendelian Genetics
Chapter 3 – Basic Principles of Heredity. Johann Gregor Mendel (1822 – 1884) Pisum sativum Rapid growth; lots of offspring Self fertilize with a single.
Introduction to Linkage Analysis Pak Sham Twin Workshop 2003.
Mendelian Heredity (Fundamentals of Genetics) CH9 pg 173.
Recombination and Linkage
Markov Chain Monte Carlo Hadas Barkay Anat Hashavit.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
Genetic Theory Pak Sham SGDP, IoP, London, UK. Theory Model Data Inference Experiment Formulation Interpretation.
1 HMM in crosses and small pedigrees Lecture 8, Statistics 246, February 17, 2004.
Pedigree and linkage analysis
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
. Basic Model For Genetic Linkage Analysis Prepared by Dan Geiger.
Guy Grebla1 Allegro, A new computer program for linkage analysis Guy Grebla.
Chapter 23: Evaluation of the Strength of Forensic DNA Profiling Results.
Fast Elimination of Redundant Linear Equations and Reconstruction of Recombination-free Mendelian Inheritance on a Pedigree Authors: Lan Liu & Tao Jiang,
Simple Inheritance: with distinct heterozygotes Dominant and recessive alleles are represented with capital and small letters. Ex. legends T_, tall tt,
General Genetics Chapter 14 Mendel and the Gene Idea.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Mendelian genetics in Humans: Autosomal and Sex- linked patterns of inheritance Obviously examining inheritance patterns of specific traits in humans.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
An quick survey of human 2-point linkage analysis
Error Checking for Linkage Analyses
Basic Model For Genetic Linkage Analysis Lecture #3
Pedigree Analysis.
Linkage Analysis Problems
Genetic linkage analysis
Tutorial #6 by Ma’ayan Fishelson
Presentation transcript:

Tutorial #5 by Ma’ayan Fishelson The Elston-Stewart Algorithm References: Kenneth Lange “Mathematical and Statistical Methods for Genetic Analysis” Jurg Ott “Analysis of Human Genetic Linkage” http://www.accessexcellence.com/AB/GG/ http://www.nhgri.nih.gov/DIR/VIP/Glossary/index.html http://www.tokyo-med.ac.jp/genet/index-e.htm Tutorial #5 by Ma’ayan Fishelson

The Given Problem Input: A pedigree + phenotype information about some of the people. These people are called typed. Output: the probability of the observed data, given some probability model for the transmission of alleles. founder leaf 1/2 typed

Q: What is the probability of the observed data composed of ? A: There are three types of probability functions: founder probabilities, penetrance probabilities, and transmission probabilities.

Founder Probabilities – One Locus Founders – individuals whose parents are not in the pedigree. We need to assign probabilities to their genotypes. This is done by assuming Hardy-Weinberg equilibrium. Suppose the gene frequency of d is 0.05, then: P(d/h) = 2 * 0.05 * 0.95 1 d/h d-mutant allele h-normal allele Pr(d/h, h/h) = Pr(d/h) * Pr(h/h) = (2 * 0.05 * 0.95)*(0.95)2 1 d/h 2 h/h Genotypes of different founders are treated as independent:

Founder Probabilities – Multiple Loci According to linkage equilibrium, the probability of the multi-locus genotype of founder k is: Pr(xk) = Pr(xk1) *…* Pr(xkn) 1 d/h 1/2 Pr(d/h, 1/2) = Pr(d/h) * Pr(1/2) = Pr(d) * Pr(h) * Pr(1) * Pr(2) Linkage equilibrium Hardy-Weinberg Example:

Penetrance Probabilities Penetrance: the probability of the phenotype, given the genotype. E.g.,dominant disease, complete penetrance: E.g., recessive disease, incomplete penetrance: d/d Pr(affected |d/d) = 1.0 d/h Pr(affected | d/h) = 1.0 Pr(affected | h/h) = 0 Can be, for example, sex-dependent, age-dependent, environment-dependent. d/d Pr(affected | d/d) = 0.7

Transmission Probabilities Transmission probability: the probability of a child having a certain genotype given the parents’ genotypes. Pr(xc| xm, xf). If we split the ordered genotype xc into the maternal allele xcm and the paternal allele xcf, we get: Pr(xc| xm, xf) = Pr(xcm|xm)Pr(xcf|xf) The inheritance from each parent is independent.

Transmission Probabilities – One locus The transmission is according to the 1st law of Mendel. Pr(Xc=d/h | Xm=h/h, Xf=d/h) = Pr(Xcm=h | Xm=h/h)*Pr(Xcf=d | Xf=d/h) = 1 * ½ = ½ 1 d/h 2 h/h 3 We also need to add the inheritance probability of the other phase, but we can see that it’s zero !

Transmission Probabilities – One locus Different children are independent given the genotypes of their parents. Pr(X3=d/h, X4=h/h, x5=d/h | X1=d/h, X2=h/h) = = (1 * ½) * (1 * ½) * (1 * ½) 1 d/h 2 h/h 3 4 5

Transmission Probabilities – Multiple Loci Let’s look at paternal inheritance for example. We generate all possible recombination sequences (s1,s2,…,sn), where sl = 1 or sl = -1. (2n sequences for n loci). Each sequence determines a selection of paternal alleles p1,p2,…,pn where: and therefore its probability of inheritance is: We need to sum the probabilities of all 2n recombination sequences.

Calculating the Likelihood of Family Data - Summary The likelihood of the data is the probability of the observed data (the known phenotypes), given certain values for the unknown recombination fractions. Each person i has an ordered multi-locus genotype xi = (xi1, xi2, …,xin), and a multi-locus phenotype gi. For a pedigree with m people: where x=(x1,…,xm) and g=(g1,…,gm).

Computational Problem Performing a multiple sum over all possible genotype combinations for all members of the pedigree. Solution: the Elston-Stewart algorithm provides a means for evaluating the multiple sum in a streamlined fashion, for simple pedigrees. This results in a more efficient computation.

Simple Pedigree No consanguineous marriages, marriages of blood-related individuals ( no loops in the pedigree). There is one pair of founders from which the whole pedigree is generated.

the genotypes of i’s parents “Peeling” Order Assume that the individuals in the pedigree are ordered such that parents precede their children, then the pedigree likelihood can be represented as: where is: P(gi), if i is a founder, or , otherwise. In this way, we first sum over all possible genotypes of the children and only then on the possible genotypes for the parents. the genotypes of i’s parents

An Example for “Peeling” Order 1 2 3 4 5 7 6 h(gi) = P(xi|gi) P(gi) h(gm,gf,gc) = P(xc|gc) P(gc|gm,gf) According to the Elston-Stewart algorithm:

Elston-Stewart “Peeling” Order As can be seen, this “peeling” order, “clips off” branches (sibships) of the pedigree, one after the other, in a bottom-up order. 1 2 1 1 2 3 4 1 2 3 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 7 6

Elston-Stewart – Computational Complexity The computational complexity of the algorithm is linear in the number of people but exponential in the number of loci.

locus 1 variables locus 2 variables Sc[1,m] Sc[2,m] Ga[1,m] Ga[1,p] Sc[1,p] Gc[1,p] Pa[1] Gb[1,m] Gb[1,p] Gc[1,m] Pb[1] Pc[1] Ga[2,m] Ga[2,p] Sc[2,p] Gc[2,p] Pa[2] Gb[2,m] Gb[2,p] Gc[2,m] Pb[2] Pc[2] locus 1 variables locus 2 variables Let’s look at an example for the model which describes our likelihood function. We have 3 types of variables in this model: genetic loci variables (the orange nodes) which represent genotypes. For each individual i and locus a we have 2 genetic loci variables that represent the alleles inherited from i’s father and mother at locus a. We have phenotype variables (yellow nodes). For each individual i and phenotype a we have a node that represents the value of the phenotype for individual a. We also have selector variables (green nodes), which denote the inheritance pattern in the pedigree. For each individual i and locus a we have 2 selector variables that indicate the haplotype of the parent from which i got his alleles. For example, the paternal selector of person c at locus 1 (Sc[1,p]) indicates whether c got his paternal allele at locus 1 from his father’s paternal haplotype or from his father’s maternal haplotype. Here we can see a fragment of a network that describes parents-child interaction in a 2-loci analysis. On the top we can see all the nodes that describe the 1st locus, and on the bottom all the nodes that describe the 2nd locus. Each node is associated with a probability function of the node given its parents. All the probability functions are logical functions, except for those of selector variables, which represent the recombination model. In the same way, we have the variable Sc[1,p] . If theta1 is the recombination fraction between these 2 loci, then the probability that these 2 selectors have different values is the probability that a recombination occurred, because it means that the 2 alleles came from different haplotypes.

Variation on the Elston-Stewart Algorithm in Fastlink The pedigree traversal order in Fastlink is some modification of the Elston-Stewart algorithm. Assume no multiple marriages… Nuclear family graph: Vertices: each nuclear family is a vertex. Edges: if some individual is a child in nuclear family x and a parent in nuclear family y, then x and y are connected by and edge x-y which is called a “down” edge w.r.t. x and an “up” edge w.r.t. y.

Traversal Order One individual A is chosen to be a “proband”. For each genotype g, the probability is computed that A has genotype g conditioned on the known phenotypes for the rest of the pedigree and the assumed recombination fractions. The first family that is visited is a family containing the proband, preferably, a family in which he is a child. Visit(w) { While w has an unvisited neighbor x reachable via an up edge: Visit(x); While w has an unvisited neighbor y reachable via a down edge: Visit(y); Update w; }

Traversal Order - Updates If nuclear family w is reached via a down edge from z, the parent in w that nuclear families w and z share, is updated. If nuclear family w is reached via an up edge from z, then the child that w and z share is updated.

Example 1 An example pedigree: The corresponding nuclear family graph: 100 101 204 205 202 203 200 201 304 302 303 300 301 400 An example pedigree: The corresponding nuclear family graph:

Example 2 An example pedigree: The corresponding nuclear family graph: 100 101 304 305 202 203 405 404 302 303 403 201 102 103 300 301 400 401 402