GMOD Projects at the Center for Genomics and Bioinformatics Chris Hemmerich - Indiana University, Bloomington.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Module 1: Introduction to SQL Server Reporting Services.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Integrating Genome and Transcriptome Resources into TreeGenes Jill Wegrzyn David Neale Doreen Main Keithanne Mockaitis.
Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS Ravi K Madduri University of Chicago and ANL.
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
Virtual techdays INDIA │ august 2010 UNDERSTANDING OFFICE WEB APPS Vedant Kulshreshtha │ TSP – Collaboration Platform Microsoft India.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Building Data-intensive Pipelines Ravi K Madduri Argonne National Lab University of Chicago.
GlucoSync Development Glucose Meter Data Upload Rebecca Busch, Adam Langdon, Casey Leichty, Zac Orr, Kristin Stechschulte Dr. John Estell, Dr. David Kisor.
GenSAS: Genome Sequence Annotation Server, a Tool for Online Annotation and Curation Dorrie Main, Taein Lee, Ping Zheng, Sook Jung, Stephen P. Ficklin,
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
GMOD in the Cloud Genome Informatics November 3, 2011 Scott Cain GMOD Project Coordinator Ontario Institute for Cancer Research
Oracle Application Express (Oracle APEX)
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
© 2004 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Programming the Web Using ASP.Net Chapter 1: ASP.Net Dave Mercer.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
OCLC Online Computer Library Center CONTENTdm Migration Training Craig Yamashita Vice President, Technology and Product Development DiMeMa, Inc. July 2005.
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
W EB - BASED B IOINFORMATICS P IPELINES FOR B IOLOGISTS Integrative Services for Genomic Analysis (ISGA) Chris Hemmerich Center for Genomics and Bioformatics.
The BioBox Initiative: Bio-ClusterGrid Gilbert Thomas Associate Engineer Sun APSTC – Asia Pacific Science & Technology Center.
Trimble Connected Community
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Customized cloud platform for computing on your terms !
WorkPlace Pro Utilities.
Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian
Infrastructure clouds, microbial genomics, and the Cloud Virtual Resource project (CloVR) Sam Angiuoli
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
June 6 th – 8 th 2005 Deployment Tool Set Synergy 2005.
WebApollo: A Web-Based Sequence Annotation Editor for Community Annotation Ed Lee, Gregg Helt, Nomi Harris, Mitch Skinner, Christopher Childers, Justin.
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
RNA-Seq 2013, Boston MA, 6/20/2013 Optimizing the National Cyberinfrastructure for Lower Bioinformatic Costs: Making the Most of Resources for Publicly.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Got genom e? Community Meetings GMOD.org The GMOD community meets semi- annually to discuss GMOD components, best practices,
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
The BioBox Initiative: Bio-ClusterGrid Maddie Wong Technical Marketing Engineer Sun APSTC – Asia Pacific Science & Technology Center.
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Distributed System Concepts and Architectures 2.3 Services Fall 2011 Student: Fan Bai
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
1 The EDIT System, Overview European Commission – Eurostat.
Bio-IT World Conference and Expo ‘12, April 25, 2012 A Nation-Wide Area Networked File System for Very Large Scientific Data William K. Barnett, Ph.D.
Galaxy Community Conference July 27, 2012 The National Center for Genome Analysis Support and Galaxy William K. Barnett, Ph.D. (Director) Richard LeDuc,
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
University of Chicago and ANL
CyVerse Discovery Environment
MIK 2.1 DBNS - introduction to WS-PGRADE, 2013
SRA Submission Pipeline
HC Hyper-V Module GUI Portal VPS Templates Web Console
got genome? Community Meetings Databases Training GMOD.org
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
DBOS DecisionBrain Optimization Server
Presentation transcript:

GMOD Projects at the Center for Genomics and Bioinformatics Chris Hemmerich - Indiana University, Bloomington

A Simple Web Interface for Configuring GBrowse: WebGBrowse Ram Podicheti

WebGBrowse  A web interface for configuring GBrowse installations  Upload GFF file  Upload optional config file to use as starting point  Add, edit, and remove new tracks using web forms  Extensive help embedded in forms and includes tutorial  Preview your changes at any point in GBrowse  Makes GBrowse more feasible for small projects  We host the GBrowse server, so no installation is required  Configuration is done online through form  Use one configuration for multiple GFF files

WebGBrowse   Available for download and local installation   Support, make feature requests, contribute  We want to help you help us add support for more features  Pending GMOD component  Migration of development environment Podicheti, R., Gollapudi, R. & Dong, Q*. WebGbrowse – a web server for GBrowse Bioinformatics, 2009

Web-based Bioinformatics Pipelines for Biologists: ISGA Chris Hemmerich, Aaron Buechlein Ram Podicheti, Jeong-Hyeon Choi, Boshu Liu

ISGA: Driving Forces  Workflow Management system that can meet the needs of a small sequencing center.  Flexible pipeline definition  Design new pipelines  Incorporate new programs as components  Support distributed computing environments  Potential need to grow beyond local computing resources  Minimize CGB staff involvement in pipeline running  Free resources for building new pipelines

Workflow Management  Ergatis (  Institute for Genome Sciences, U. Maryland  Build pipelines from existing programs  Supports distributed computing environments  Robust monitoring of pipeline execution Orvis J, Crabtree J, Galens K, Gussman A, Inman JM, Lee E, Nampally S, Riley D, Sundaram JP, Felix V, Whitty B, Mahurkar A, Wortman J, White O, Angiuoli SV. Ergatis: A web interface and scalable software system for bioinformatics workflows. Bioinformatics Jun 15;26(12).

Ergatis Workflow  10+ readily available pipelines, more in the community  220 components in svn, more in the community  XML component and pipeline definition  XML/BSML common data exchange format  Optional, but recommended for reusable components  Conversion tools for FASTA, GFF, Chado, etc…  Isolates format changes from other programs

Ergatis: Pipeline List

Ergatis: Pipeline Monitor

Ergatis: Configure Component

Ergatis Architecture

Biologist Interface Requirements  Support single-lab biologists  Self-sufficient but have limited bioinformatics resources  Embrace tools that don’t require extensive training  Ability to run pre-configured pipelines quickly  Option to customizing specific tools in a pipeline  Interface that encourages exploration  Remove complexity and information they don’t need  Inline help  Immediately detect errors and allow them to correct them  Return output in useful formats  Simple tools for visualizing and searching large result sets

 Simplify pipelines  Hide housekeeping components  Group components into clusters representing processes  Support customization  Disable components where possible  Replace components with pre-computed data where possible  Edit scientifically-active program parameters  Help and validation for all forms  Users and data privacy  Provide download and upload  Incorporate visualization & analysis tools ISGA Design

Why develop ISGA as a separate package?  ISGA only re-implements the web interface of Ergatis  Ergatis libraries, component definitions, and method of running and monitoring pipelines is used by ISGA as-is  ISGA adds and removes Ergatis features  Accessing component information  Building pipelines from components  A hybrid ISGA/Ergatis interface wouldn’t serve anyone  ISGA biologist users need to be given limited functionality for simplicity and security  Ergatis bioinformatician users need full functionality and a complex interface to work efficiently

Workflow

Pipeline Builder

Run Status

ISGA Architecture

Under the Hood pipeline builder genome browser monitor pipelines download results blast search ISGA Web Interface bioinformatics tools input and results Shared Storage PostgreSQL Database pipeline specification user account annotation results XML configuration workflow engine Ergatis Sun Grid Engine computation nodes job scheduler ISGA Backend

Usage  > 100 pipelines run  > 60 users  Two external sites evaluating local ISGA installations that we know of

What’s new?  Celera assembly pipeline  Ability to accept parameters with pipeline inputs  Ability to iterate components over a list of pipeline inputs  Conversion scripts for Hawkeye visualization  Installation instructions :shame   Administration improvements  Online configuration  User classes and pipeline quotas

Parameterized Inputs

Input Iterator

What’s in the works?  Pipelines  SHORE SNP Calling (ISGA)  Gene clustering over Microbial phylogenies (Ergatis)  Transcriptome annotation pipeline (Ergatis)  Methyl-seq (Ergatis)  Features  Pipeline reproducibility and provenance  User groups and sharing  Modular pipeline and toolbox installation  ISGA pipelines as standalone Ergatis templates  ISGA pipeline over Amazon EC2 via CLoVR

Cloud Resources through CloVR  Execute Ergatis Pipelines over an SGE instance hosted on Amazon EC2 machine images  CloVR manages creation and shutdown of cloud images as part of pipeline  Upload input as part of pipeline or access data hosted at Amazon  Results are retrieved to local machine  Ergatis assumes a shared filesystem, so some modification is required to manage file transfers

CloVR Architecture

Using CloVR with ISGA  ISGA/Ergatis pipelines can be ported to ISGA/CloVR  ISGA installation communicates with local Ergatis and CloVR  EC2 presents challenges for billing customers

ISGA with CloVR Architecture

Acknowledgements Funding Indiana Metabolomics and Cytomics Initiative(METACyt) – Lilly Endowment, Inc. National Institutes of Health under grant 5 RC2 HG CGB Genomics John Colbourne Keithanne Mockaitis Bioinformatics Haixu Tang Jeong-Hyeon Choi Aaron Buechlein Ram Podicheti Computing Phillip Steinbachs Jon Burgoyne ISGA Aumni Qunfeng Dong Kashi Revanna External Projects Joshua Orvis & Ergatis team Sam Angiuoli & CLoVR team Anup Mahurkar & Workflow team