Zach Miller Computer Sciences Department University of Wisconsin-Madison Bioinformatics Applications.

Slides:



Advertisements
Similar presentations
Easily retrieve data from the Baan database
Advertisements

Web: OMII-UK LiveCD Demonstrations – Providing Access to Computational Resources for Researchers AHM 2009 Steve Crouch,
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Pocket PC For small projects Shazia Naz Subhani Registries Core Facility, BESC King Faisal Specialist Hospital & Research Centre.
1 Using Stork Barcelona, 2006 Condor Project Computer Sciences Department University of Wisconsin-Madison
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Automatic Information Retrieval from Bioinformatics Websites Kang Peng.
Exchange server Mail system Four components Mail user agent (MUA) to read and compose mail Mail transport agent (MTA) route messages Delivery agent.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Enabling Grids for E-sciencE Medical image processing web portal : Requirements analysis. An almost end user point of view … H. Benoit-Cattin,
Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Submitted by: Madeeha Khalid Sana Nisar Ambreen Tabassum.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.
Using the SAS® Information Delivery Portal
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
A monitoring data summary tool Alan Williams Alan Williams 2008 GIS / Data Management Conference March 31 – April 3, 2008 Fort Collins, CO.
1 HawkEye A Monitoring and Management Tool for Distributed Systems Todd Tannenbaum Department of Computer Sciences University of.
SEMESTER PROJECT PRESENTATION CS 6030 – Bioinformatics Instructor Dr.Elise de Doncker Chandana Guduru Jason Eric Johnson.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Configuration Management (CM)
Peter Keller Computer Sciences Department University of Wisconsin-Madison Quill Tutorial Condor Week.
INFSO-RI Module 01 ETICS Overview Alberto Di Meglio.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
INFSO-RI Module 01 ETICS Overview Etics Online Tutorial Marian ŻUREK Baltic Grid II Summer School Vilnius, 2-3 July 2009.
1 The Roadmap to New Releases Todd Tannenbaum Department of Computer Sciences University of Wisconsin-Madison
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor-G Operations.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Framework of Job Managing for MDC Reconstruction and Data Production Li Teng Zhang Yao Huang Xingtao SDU
The Roadmap to New Releases Derek Wright Computer Sciences Department University of Wisconsin-Madison
The BioBox Initiative: Bio-ClusterGrid Maddie Wong Technical Marketing Engineer Sun APSTC – Asia Pacific Science & Technology Center.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
1 A Steering Portal for Condor/DAGMAN Naoya Maruyama on behalf of Akiko Iino Hidemoto Nakada, Satoshi Matsuoka Tokyo Institute of Technology.
Alain Roy Computer Sciences Department University of Wisconsin-Madison Packaging & Testing: NMI & VDT.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
Structural Biology on the GRID Dr. Tsjerk A. Wassenaar Biomolecular NMR - Utrecht University (NL)
CERN IT Department t LHCb Software Distribution Roberto Santinelli CERN IT/GS.
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Configuring Quill Condor Week.
JavaScript 101 Introduction to Programming. Topics What is programming? The common elements found in most programming languages Introduction to JavaScript.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
Accurate Information … Informed Decisions Data from automated and manual sources in a central, secure repository providing easy regulatory and ad hoc reporting,
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
EGEE is a project funded by the European Union under contract IST Enabling bioinformatics applications to.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Condor and Virtual Machines.
George Kola Computer Sciences Department University of Wisconsin-Madison Data Pipelines: Real Life Fully.
Deconstructing Science Problems into Parallel Jobs W ednesday afternoon, 1:15p Zach Miller Systems Programmer Condor Project.
Improving the Research Bootstrap of Condor High Throughput Computing for Non-Cluster Experts Based on Knoppix Instant Computing Technology RIKEN Genomic.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
FIRST REVIEW MEETING, Brusells, T2.6 - Renewable energy production data capturing module.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
1 Chapter 1 INTRODUCTION TO WEB. 2 Objectives In this chapter, you will: Become familiar with the architecture of the World Wide Web Learn about communication.
HPC In The Cloud Case Study: Proteomics Workflow
CyVerse Discovery Environment
Easily retrieve data from the Baan database
Work report Xianghu Zhao Nov 11, 2014.
Building and Testing using Condor
Module 01 ETICS Overview ETICS Online Tutorials
Outline What users want ? Data pipeline overview
STORK: A Scheduler for Data Placement Activities in Grid
Mats Rynge USC Information Sciences Institute
Condor: BLAST Tuesday, Dec 7th, 10:45am
Presentation transcript:

Zach Miller Computer Sciences Department University of Wisconsin-Madison Bioinformatics Applications and Workloads

Collaboration with the BMRB The BioMagResBank is a repository for data from NMR spectroscopy on proteins. Two main efforts: - Weekly BLAST run - Protein Structure Determination

BLAST Framework in PERL completely automates the process: - Requires no previous setup - Downloads and installs BLAST - Retrieves and formats all DBs - Retrieves input queries from URL

BLAST - Input can be in.tar,.zip,.gz,.Z files - Automatically splits input - Creates condor jobs and a.dag file - Is very fault tolerant by using DAGMan to oversee the run - When all results are complete, it packages the results and log files

BLAST - Resulting tarballs can be configured to be no larger than a certain size for more reliable transfer - After tarballs are created, they are automatically sent to an ftp server

BLAST - We’ve been doing the run every week for about a year with almost no human intervention - Very easy to add new databases or sets of input sequences!

Protein Structure - Collaboration with Jurgen Dorelijers of the BMRB and Aart Nederveen from Utrecht University in the Netherlands - Recalculated the structure of over 500 proteins using state-of-the-art techniques - Applications used were both CNS and CYANA

Protein Structure - DAGMan used to manage workflow and to provide fault-tolerance. - Using periodic_remove in the submit file to keep the job from “misbehaving” combines nicely with DAGMan’s RETRY feature.

Protein Structure - The effort used about hours of compute time - We accomplished the run in about 60 hours of real time - Framework that I created allows you to very simply compute the structure of as many proteins as you like, making it easy, automatic, and repeatable.

Protein Structure - Groups often use different parameters and protocols in structure determination and only calculate a few structures - Comparing structures from different groups is then difficult

Protein Structure - Our work was significant because it computed not just a few but over 500 structures - All were computed with the same paramaters, making the results very internally consistant (besides being more accurate on their own due to the state-of- the-art techniques)

Web Portal - Currently supports only BLAST - Being used by a handful of users from the biochem department at the UW - Interest is growing, so we’ll soon be adding more applications

Questions? Thank You!