Zach Miller Computer Sciences Department University of Wisconsin-Madison Bioinformatics Applications and Workloads
Collaboration with the BMRB The BioMagResBank is a repository for data from NMR spectroscopy on proteins. Two main efforts: - Weekly BLAST run - Protein Structure Determination
BLAST Framework in PERL completely automates the process: - Requires no previous setup - Downloads and installs BLAST - Retrieves and formats all DBs - Retrieves input queries from URL
BLAST - Input can be in.tar,.zip,.gz,.Z files - Automatically splits input - Creates condor jobs and a.dag file - Is very fault tolerant by using DAGMan to oversee the run - When all results are complete, it packages the results and log files
BLAST - Resulting tarballs can be configured to be no larger than a certain size for more reliable transfer - After tarballs are created, they are automatically sent to an ftp server
BLAST - We’ve been doing the run every week for about a year with almost no human intervention - Very easy to add new databases or sets of input sequences!
Protein Structure - Collaboration with Jurgen Dorelijers of the BMRB and Aart Nederveen from Utrecht University in the Netherlands - Recalculated the structure of over 500 proteins using state-of-the-art techniques - Applications used were both CNS and CYANA
Protein Structure - DAGMan used to manage workflow and to provide fault-tolerance. - Using periodic_remove in the submit file to keep the job from “misbehaving” combines nicely with DAGMan’s RETRY feature.
Protein Structure - The effort used about hours of compute time - We accomplished the run in about 60 hours of real time - Framework that I created allows you to very simply compute the structure of as many proteins as you like, making it easy, automatic, and repeatable.
Protein Structure - Groups often use different parameters and protocols in structure determination and only calculate a few structures - Comparing structures from different groups is then difficult
Protein Structure - Our work was significant because it computed not just a few but over 500 structures - All were computed with the same paramaters, making the results very internally consistant (besides being more accurate on their own due to the state-of- the-art techniques)
Web Portal - Currently supports only BLAST - Being used by a handful of users from the biochem department at the UW - Interest is growing, so we’ll soon be adding more applications
Questions? Thank You!