FISH Fast Identification of Segmental Homology University of North Carolina at Chapel Hill Shian-Gro Wu
Outline IntroductionIntroduction Input dataInput data How it worksHow it works –From markers to features –Form features to grid –Form grid to bolcks
Introduction FISH is software for the fast identification and statistical evaluation of segmental homologs. genome contig gene(marker)
Introduction contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks
Input data Each map file lists the names and transcriptional orientation (if known) of all the markers on one contig. Example gene namestranscriptional orientation At1g At1g At1g At1g At1g marker
Input data Each match file lists all the homologies between markers in a pair of contigs. Example gene names gene names match score At1g01010At1g At1g01010At1g At1g01010At1g At1g01010At1g At1g01010At1g
From markers to features contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks
From markers to features step1step1 –positions and transcriptional orientations (when known) of the markers are read from a set of map files, one map file per contig. Markers within each map file must be ordered according to their physical positions on the contig. –Individual homologies between markers are read from a set of match files. There is at least one, and no more than two, such files for each pair of contigs. A,B,C A&A,A&B,A&C,B&A,B&B………
From markers to features step2step2 –FISH performs detandemization, in which multiple markers may be collapsed into single features. – MIN Score and MAX Dist. markers features a b c d e f g h A B (B) C D (C) E F
From markers to features 1.ScoreAB > MIN Score markAmarkB ScoreAB 2.ScoreAC > MIN Score and ScoreBC > MIN Score markAmarkB ScoreAB markAmarkB ScoreAC markC ScoreBC markAmarkB ScoreAB MAX Dist Range
Form features to grid contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks
Form features to grid In order to identify segmental homologies, FISH computes a grid for each pair of contigs. Points in the grid represent matches between pairs of features. contigA contigB f A1 f A2 f A3 f A4 f B1 f B2 f B3 f B4 Point A1B2 Point B2A4
Form features to grid Each position in the grid, whether or not a point is present, is called as a cell. cell (contigA,contigB) = feature (contigA) * feature (contigB) cell (contigC,contigC) = feature (contigC) * [feature (contigC) -1] / 2 A B C C
Form features to grid contig markers features contig1 contig2 points cells ….
Form features to grid contigA markers contigB contigA features contigB contigA contigB points contigA contigB blocks
Form grid to bolcks Defining the neighborhood size –FISH measures distance between two points (X i,Y i ) and (X j,Y j ) using the Manhattan distance –In order to be considered neighbors, two points must be closer than m:number of points n:number of cells
Form grid to bolcks m:number of points n:number of cells If T=0.05 dTdT m/n
Result
Form grid to bolcks Choosing among multiple neighborsChoosing among multiple neighbors –It can happen that a point may be in the neighborhood of more than one other point. –FISH ranks the cells within each neighborhood and chooses that neighbor having the highest rank Where n is the number of cells in the point’s neighborhood, d c is the distance of the cell from the point under consideration and w is the weight.
Reference User’s Manual for Fast Identification of Segmental Homologyhttp:// Fast identification and statistical evaluation of segmental homologies in comparative maps abstract/19/suppl_1/i74
Thank You