Download presentation
Presentation is loading. Please wait.
2
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1 Department of Mathematics, University of Southern California; Departments of 2 Operations Research and 3 Biology University of North Carolina at Chapel Hill
3
comparative maps Spaghetti Diagram Crop Circle Livingstone et al 1999 Genetics 152:1183 Gale & Devos 1998 PNAS 95:1972
4
some terms Feature: a gene or some other marker Segment: a string or substring of features Homology: descent from a common ancestor Block: a pair of segments that are putatively homologous. These are what we seek!
5
local genome alignment Consider each chromosome to be a string of features Assign common letters to homologous features Identify segments sharing multiple pairs of common letters Differences from DNA/protein alignment –high frequency of gaps relative to matches –inversions may occur within the alignment
6
homology matrix gene1 2 3 4 5 6 7 8 1- 0 0 0 1 0 0 0 20 - 0 0 0 1 0 0 30 0 - 0 0 0 1 0 40 0 0 - 0 0 0 1 51 0 0 0 - 0 0 0 60 1 0 0 0 - 0 0 70 0 1 0 0 0 - 0 80 0 0 1 0 0 0 - 1234 5678
7
duplication and multiplication there is not necessarily a one-to-one alignment
8
genome rearrangements inversionreciprocal translocation homologous segments may be small
9
Bancroft (2001) TIG 17, 89 after Ku et al (2000) PNAS 97, 9121 Non-homologous features may be abundant within homologous segments
10
We must allow some non-colinearity in marker order between segmental homologs
11
homology matrix for Arabidopsis
12
going beyond eyeballing LineUp – Hampson et al (2003) –Designed for genetic maps with error ADHoRe – Van der Poele (2002) –Designed for unambiguous marker order data Both provide automatic detection of blocks For statistics, both employ permutation tests –Computationally intensive –p-values are approximate
13
FISH: Fast Identification of Segmental Homology Block identification –Dynamic programming provides speed and optimality guarantee –Can be generalized to multiple alignments Statistical assessment –Null model of duplication and transposition –Closed-form equation for calculating p- values (i.e. no permutation testing)
14
from homology matrix to graph nodes ( ) –represent dots in the homology matrix
15
from homology matrix to graph nodes ( ) –represent dots in the homology matrix edges ( ) –connect nodes with nearest neighbors –are unidirectional –have an associated distance –must be shorter than some threshold
16
from homology matrix to graph nodes ( ) –represent dots in the homology matrix edges ( ) –connect nodes with nearest neighbors –are unidirectional –have an associated distance –must be shorter than some threshold paths ( ) –traverse shortest available edges –can be efficiently computed –can be considered candidate blocks
17
null model Within a genome: homologies are due to the duplication of individual features followed by insertion into a (uniformly) random position Between genomes: homologies are due to the above process plus the transposition of randomly chosen features into randomly chosen positions.
18
computing neighborhood size h = # nodes / # cells in matrix n = # cells in neighborhood Prob(neighborhood has 1 node) p = 1 – (1-h) n Threshold distance for p=T under Manhattan distance ( x+ y) d T = 0.5 + sqrt[(log(1-p)/log(1-h)+0.25] T is analogous to a gap parameter –small T: few false positive edges, short blocks –large T: more false positive edges, longer blocks
19
neighborhood geometry
20
blocks of nearest neighbors
21
block statistics Chen-Stein Theorem: number of blocks with k nodes is approximately Poisson Expected number of blocks = cp u Conservative matrix-wide p-value Prob(X k) < 1 – e –cp u where c is the # of cells in the matrix and p u = h(nh) k-1
22
identifying blocks Let edge from i to j have weight w ij =1 Initialize: score of block terminating at i S i = 0 Recursion for block scores S j = max(S i + w ij ) i such that j T i Dynamic programming can be used to find all maximally extended blocks
23
simulation experiment kobsstderrupboundlowbound 245.80.0647.640.1 32.280.022.391.78 40.1130.0030.1200.079 50.0060.0010.0060.004 60.00030.00020.00030.0002 How often are blocks of size k observed under the null model compared with expectation?
24
FISH v.1.0 http://www.bio.unc.edu/faculty/vision/lab/FISH –source code –compiled executables –documentation –sample data Adjustable parameters (e.g. T) Reports statistics on blocks Is fast and memory-efficient
25
applications Automated pairwise alignment of genome maps as part of Phytome project Prediction of gene content in regions of unsequenced genomes Studies of genome evolution, especially duplication and gene order rearrangement
26
future work Biologically motivated neighborhood geometries (ADHoRe) Non-discrete marker positions (LineUp) –Genetic versus physical maps –Map uncertainty Robustness to deviation from null model (permutation tests) Extension to homologies among 3 or more segments
27
Thanks! Sugata Chakravarty Peter Calabrese U.S. National Science Foundation
29
http://www.bio.unc.edu/vision/faculty/lab/FISH
30
Ghosts Simillion, Vandepoele, Van Montagu, Zabeau, Van de Peer (2002) PNAS 99, 13627
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.