Whole genome comparison Kelley Crouse And Greg Matuszek
Objective Implement a parallel program for genome and chromosome comparisons
Background MUMmer: serial implementation using a suffix tree Parallel implementation using a variant of the Smith-Waterman local alignment algorithm.
Disadvantages Neither handles larger genomes and chromosomes quickly Parallel version hindered by data structure
How we plan to implement A suffix tree will be created using one sequence The second sequence will be fragmented and sent out to the workers. Each worker will compare its fragment against the suffix tree and report back to the farmer with the location(s) of similarity
What is a Suffix Tree? The tree represents all suffixes within a given string Used to search for a sub-string within a string By comparing a test string, T, against the suffix tree of string, S, it is possible to locate any and all possible correlations between the two strings
Suffix Tree - Bananas Each suffix of “Bananas” is represented within the suffix tree Sub-string S, can be compared to bananas by following the paths of each leaf.
Fragmenting the Second Sequence Random fragmenting - Difficult to assemble alignment - allows for small and large fragments Specific length fragments - Restricted to one fragment size - Alignment is easier to assemble
What we hope to gain Ability to identify conserved regions between genomes (and chromosomes) Conduct comparison between large genomes and chromosomes quickly and accurately