W EB - BASED B IOINFORMATICS P IPELINES FOR B IOLOGISTS Integrative Services for Genomic Analysis (ISGA) Chris Hemmerich Center for Genomics and Bioformatics.

Slides:



Advertisements
Similar presentations
Welcome! Were Glad Youre Here!. Whats New In Version 5.1b-100 Welcome to The Annual Information & Records Associates, Inc. User Conference May 20, 2009.
Advertisements

Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Supplier Contract Management © 2006 Oracle Corporation – Proprietary and Confidential 2. Author & Negotiate 5. Amend4. Monitor and Track 6. Renew & Closeout.
10 de abril de 2014 Cloud Services for Projects in Bioinformatics: Technical Considerations and Business Fernando Barraza Omicsco Universidad de San Buenaventura.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
IWay Service Manager 6.1 Product Update Scott Hathaway iWay Software Copyright 2010, Information Builders. Slide 1.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Passage Three Introduction to Microsoft SQL Server 2000.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
ArcGIS Workflow Manager An Introduction
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
The BioBox Initiative: Bio-ClusterGrid Gilbert Thomas Associate Engineer Sun APSTC – Asia Pacific Science & Technology Center.
Building service testbeds on FIRE D5.2.5 Virtual Cluster on Federated Cloud Demonstration Kit August 2012 Version 1.0 Copyright © 2012 CESGA. All rights.
WorkPlace Pro Utilities.
Justice Information Exchange Model (JIEM) Larry Webster SEARCH January 23, 2004.
MSS Technologies and the AIIM Grand Canyon Chapter present: Electronic Document Management System Needs Analysis.
Rsv-control Marco Mambelli – Site Coordination meeting October 1, 2009.
10-1 aslkjdhfalskhjfgalsdkfhalskdhjfglaskdhjflaskdhjfglaksjdhflakshflaksdhjfglaksjhflaksjhf.
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
Page Up or Down to navigate through the program.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Module 7: Fundamentals of Administering Windows Server 2008.
Configuration Management (CM)
GMOD Projects at the Center for Genomics and Bioinformatics Chris Hemmerich - Indiana University, Bloomington.
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
1 What’s the difference between DocuShare 3.1 and 4.0?
Javascript Cog Kit By Zhenhua Guo. Grid Applications Currently, most grid related applications are written as separate software. –server side: Globus,
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
Workflow Project Status Update Luciano Piccoli - Fermilab, IIT Nov
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Microsoft WorkSpace Step by Step Guide January 2015.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
Ashley Montebello – CprE Katie Githens – SE Wayne Rowcliffe – SE Advisor/Client: Akhilesh Tyagi.
The BioBox Initiative: Bio-ClusterGrid Maddie Wong Technical Marketing Engineer Sun APSTC – Asia Pacific Science & Technology Center.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
0 eCPIC Admin Training: OMB Submission Packages and Annual Submissions These training materials are owned by the Federal Government. They can be used or.
Packaging for Voracity Solutions Control Panel David Turner.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Applications.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Module 6: Configuring User Environments Using Group Policies.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
INFORMATION DEPLOYED. SOLUTIONS ADVANCED. MISSIONS ACCOMPLISHED. PDS Punch-Out v1.0 SPS Spotlight Series October 2014.
V7 Foundation Series Vignette Education Services.
Fab25 User Training Cerium Labs LabCollector - LIMS Lynette Ballast.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
SQL Database Management
SPS Spotlight Series October 2014
Architecture Review 10/11/2004
Doron Orbach UCMDB Product Manager
SRA Submission Pipeline
Presentation transcript:

W EB - BASED B IOINFORMATICS P IPELINES FOR B IOLOGISTS Integrative Services for Genomic Analysis (ISGA) Chris Hemmerich Center for Genomics and Bioformatics CONTACT:

JUSTIFICATION AND HISTORY

ISGA B ACKGROUND Provide a high-throughput microbial annotation service to local biologists Reliable and pipelined execution Efficient maintenance Provide privacy and security for data High-quality (automated) annotation Biologists able to customize parameters Able to incorporate new programs and pipelines

ERGATIS (ERGATIS.SOURCEFORGE.NET) Web-based analysis pipeline tool Wraps tools and utilities in “components” Ability to add new components Build new and customize existing pipelines In-depth monitoring of pipelines Underlying Workflow package supports SGE XML/BSML common data exchange format Includes prokaryotic annotation pipeline

ERGATIS WORKFLOW

A SLIGHT CORRECTION

W HY N OT E XPOSE E RGATIS ? Insufficient accounts and permissions Shared interface for building and customizing pipelines Users must submit and retrieve results through filesystem Pipeline monitoring interface is slow and complex. Information of use to biologists is lost in “noise” High umber of components in a pipeline Complexity of configuration interface

O UR S OLUTION Develop an alternative interface for biologists that uses the Ergatis backend Administrators also use Ergatis New interface features Accounts and permission system File management Simplify pipelines and component management by reducing functionality Provide form validation, documentation and other features to improve usability

THE GOAL

ISGA: WHIRLWIND TOUR

P IPELINE C USTOMIZATION Ability to toggle some clusters on/off. Some clusters contain parallel programs that can be independently toggled. Ability to edit component parameters Ability to save customizations to use with later data sets

P IPELINE B UILDER

R UN S TATUS

ISGA P IPELINE E XECUTION ISGA writes configuration and pipeline definition files to the Ergatis installation ISGA then triggers execution through Ergatis and receives the pipeline id in return Status is updated directly from Ergatis XML files Selected output is copied to ISGA, and the rest is available for download if needed

ISGA T OOLBOX Includes a GBrowse instance for visualizing annotation results BLAST support for pipeline results as query or database Text search against annotation results Tools can be executed over SGE and monitored

A DMINISTRATIVE T OOLS Lightly monitor status in ISGA w/ link to Ergatis page Notification when pipeline fails, ISGA will pick up a resumed pipeline Ability to redirect ISGA to a cloned Ergatis pipeline or cancel (w/ user notification) Disable new job submissions

UNDER THE HOOD pipeline builder genome browser monitor pipelines download results blast search ISGA Web Interface bioinformatics tools input and results Shared Storage PostgreSQL Database pipeline specification user account annotation results XML configuration workflow engine Ergatis Sun Grid Engine computation nodes job scheduler ISGA Backend

UNDER THE HOOD (CONTINUED) Perl & jQuery Persistence = PostgreSQL & YAML & XML Mason MasonX::WebApp Hacked up HTML::FormEngine

ADDING AN ERGATIS PIPELINE TO ISGA

64 Ergatis Components

FIRST: U NDERSTAND THE P IPELINE ISGA takes a description of an Ergatis pipeline YAML Database Schema Ergatis component.config files Document input and output of all components Which components are optional? The user can upload previously generated data in their stead? Alternative data from the pipeline can be used? The pipeline is still useful without this functionality

S IMPLIFICATION Our microbial annotation pipeline is composed of 64 Ergatis components Impossible to diagram for you on a slide or for a biologist on our web page Many of these components are file format conversions, program iterations, database preparation, etc… They are not relevant to a high level view of the pipeline and offer no useful parameters for a biologist to customize

C LUSTERS OF E RGATIS C OMPONENTS Break the pipeline into biologically meaningful clusters of one or more components This is as much art as science, may depend on your audience Example: ‘Alternative Start Site Analysis’ overlap_analysis.default start_site_curation.default translate_sequence.translate_new_model parse_evidence.hypothetical hmmpfam.post_overlap_analysis parse_evidence.hmmpfam_post wu-blastp.post_overlap_analysis bsml2fasta.post_overlap_analysis bsml2featurerelationships.post_overlap xdformat.post_overlap_analysis ber.post_overlap_analysis parse_evidence.ber_post translate_sequence.final_polypeptides bsml2fasta.final_cds

C OMPONENT C USTOMIZATION Scripts and XML files are unchanged ISGA stores the configuration template for each component Components with editable parameters have a YAML definition that is used to build the web form These values are incorporated into the configuration template

C OMPONENT T EMPLATE --- !perl/ISGA::ComponentBuilder Name: RNAmmer Description: ‘RNAmmerpredicts 5s/8s, 16s/18s, and …’ Params: - { templ: 'select', NAME: 'molecules', TITLE: 'rRNA Molecules', REQUIRED: 1, OPTION: ['ssu (5/8s rRNA)', 'lsu (16 /18s rRNA)', 'tsu (23/28s rRNA)', 'ssu and lsu', …], OPT_VAL: ['ssu', 'lsu', 'tsu', 'ssu,lsu’, …], VALUE: 'ssu,lsu,tsu', DESCRIPTION: 'Declare what rRNA molecule types to search for.', CONFIGLINE: '___molecule___’ } RunBuilderParams: - { templ: 'hidden', NAME: 'project_id_root', TITLE: 'Project Id Root', REQUIRED: 1, DESCRIPTION: 'The Id root used in bsml id generation', CONFIGLINE: '___project_id_root___' }

F UTURE ISGA W ORK Incorporate additional pipelines Small prokaryotic assembly pipeline Comparative genomics Functional genomics Add additional features Make pipelines modular components of ISGA Implement pipeline versioning Pipeline and data sharing Ergatis Cloud Support?

ISGA Qunfeng DongKashi Revanna Chris Hemmerich Aaron Buechlein Ram Podicheti