Presentation is loading. Please wait.

Presentation is loading. Please wait.

Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms Yifeng Zheng, Stephen Fisher, Shirley cohen, Sheng.

Similar presentations


Presentation on theme: "Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms Yifeng Zheng, Stephen Fisher, Shirley cohen, Sheng."— Presentation transcript:

1 Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms Yifeng Zheng, Stephen Fisher, Shirley cohen, Sheng Guo, Junhyong Kim and Susan B. Davidson Phylogenetics – the science of identifying and understanding evolutionary relationship between different species Cyberinfrastructure for Phylogenetic Research project (CIPRes) –Design efficient data storage and query capabilities for managing phylogenetic trees –Evaluate existing phylogenetic tree reconstruction algorithms Building “gold standards” by simulating very large phylogenetic tree as well as sequences for each species in the tree according to models that are carefully curated by experts. –... Crimson system focuses on providing data management support for CIPRes simulation. Background Our Solution Data storage and index strategy: extension of the Dewey labeling scheme Query evaluation algorithm which achieve high performance An user friendly data management system: Crimson system Technical Challenges PHylogenetic trees may cntain millions of species associated with sequences with thousands of characters. Efficiently manage and query this data is important. Data management strategies developed for XML are not suitable for phylogenetic tree management. –Different from XML documents used in web and commercial application which are relatively shallow, phylogenetic trees can be very deep. According to a survey of 200,000 XML documents by Mignet, Barbosa and Veltri in WWW 2003, the average depth of XML was reported to be 4 and the deepest was 135. Simulation phylogenetic tree have an average depth of greater than 1000, and the deepest can be more than 1 million. –Queries used with phylogenetic trees are also very different from the path-oriented or restructuring quries supported by XPath and XQuery. System Architecture Tree ProjectorSampling Benchmark Manager Projection Tree Repository Species Repository Query Repository Manager GUI Manager Input Query Tree Viewer Query History Data Loader Simulation Tree Sampling Species with Sequences Sampling Strategy Phylogenetic Trees Phylogenetic Queries The phylogenetic reconstruction problem is NP-hard, so current algorithms can only handle a relative small input set. To benchmark these reconstruction algorithms, we must therefore be able to efficiently sample a subset of species according to various criteria, and project the tree pattern induced by the smaple in the simulation tree. –Sampling a set of species according to a given time Guarantee that the sampling results are derived from an evolutionary time period. Given a tree T with weight on the edge representing time, sampling a set of species according to a given time t will return a subset of T’s leaves set such that for all species, whose evaluation time (the weighted distance from the root to this specie) is t, have the same number of descendant species sampled out. –Tree projection determining the relationship among a set of species by appealing to an authoritative tree Given a tree T and a subset S of its leaves, the tree projection of T over S is a “subtree” T’ in which each edge is a subpath of a path from the root of T to a node in S and each node has at least two children. Extended Dewey Labeling 6 5 2 1 2 34 Performance Results References: Cyberinfrastructure for Phylogenetic Research (CIPRES) project (www.phylo.org)www.phylo.org Susan B. Davidson, Junhyong Kim, Yifeng Zheng: Efficiently Supporting Structure Queries on Phylogenetic Trees. SSDBM 2005: 93-102


Download ppt "Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms Yifeng Zheng, Stephen Fisher, Shirley cohen, Sheng."

Similar presentations


Ads by Google