Evolution of the CIPRES Science Gateway, a Public Resource for Phylogenetics. Mark A. Miller San Diego Supercomputer Center.

Slides:



Advertisements
Similar presentations
PRAGMA BioSciences Portal Raj Chhabra Susumu Date Junya Seo Yohei Sawai.
Advertisements

Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Configuration management
Configuration management
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
High Performance Computing Course Notes Grid Computing.
Introduction to Web Database Processing
© 2004, The Trustees of Indiana University 1 OneStart Workflow Basics Brian McGough, Manager, Systems Integration, UITS Ryan Kirkendall, Lead Developer.
Chapter 11 ASP.NET JavaScript, Third Edition. 2 Objectives Learn about client/server architecture Study server-side scripting Create ASP.NET applications.
Configuration Management
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Apache Airavata GSOC Knowledge and Expertise Computational Resources Scientific Instruments Algorithms and Models Archived Data and Metadata Advanced.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
IPlant Collaborative Powering a New Plant Biology iPlant Collaborative Powering a New Plant Biology.
Creating the CIPRES Science Gateway for Inference of Large Phylogenetic Trees Mark A. Miller San Diego Supercomputer Center.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Basics of Web Databases With the advent of Web database technology, Web pages are no longer static, but dynamic with connection to a back-end database.
Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian
Cloud Usage Overview The IBM SmartCloud Enterprise infrastructure provides an API and a GUI to the users. This is being used by the CloudBroker Platform.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
HTML+JavaScript M2M Applications Viewbiquity Public hybrid cloud platform for automating and visualizing everything.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Marcel Casado NCAR/RAP WEATHER WARNING TOOL NCAR.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Configuration Management (CM)
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
AUTOMATION OF WEB-FORM CREATION - KINNERA ANGADI – MS FINAL DEFENSE GUIDANCE BY – DR. DANIEL ANDRESEN.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
Guide to Linux Installation and Administration, 2e1 Chapter 10 Managing System Resources.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
== Enovatio Delivers a Scalable Project Management Solution Minus Large Upfront Infrastructure Costs, Thanks to the Powerful Microsoft Azure Platform MICROSOFT.
© 2013, published by Flat World Knowledge Chapter 10 Understanding Software: A Primer for Managers 10-1.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Remote Api Tutorial How to call WS-PGRADE workflows from remote clients through the http protocol?
START Application Spencer Johnson Jonathan Barella Cohner Marker.
REST API to develop application for mobile devices Mario Torrisi Dipartimento di Fisica e Astronomia – Università degli Studi.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Accessing the VI-SEEM infrastructure
Tools and Services Workshop
Joslynn Lee – Data Science Educator
CyVerse Discovery Environment
The CIPRES Science Gateway: Enabling High-Impact Science for Phylogenetics Researchers with Limited Resources Mark Miller, Wayne Pfeiffer, and Terri.
X in [Integration, Delivery, Deployment]
Overview of Workflows: Why Use Them?
Grid Systems: What do we need from web service standards?
New Technologies for Storage and Display of Meteorological Data
Presentation transcript:

Evolution of the CIPRES Science Gateway, a Public Resource for Phylogenetics. Mark A. Miller San Diego Supercomputer Center

How to fail at creating a Gateway Allow the development staff to focus on their personal goal: creating the coolest, most generic software package ever. Ignore new researchers in your community and focus on an existing user base. Focus on updating an existing Gateway’s capabilities. Focus on low end computational use cases/ classroom use. Fail to anticipate the emerging needs of Biologists for genomics tools. Fail to grasp the importance of access to parallel codes for compute-intensive jobs. Case study: The Next Generation Biology Workbench

Allow the development staff to focus on their personal goal: creating the coolest, most generic software package ever. Ignore new researchers in your community and focus on an existing user base. Focus on updating an existing Gateway’s capabilities. Focus on low end computational use cases/ classroom use. Fail to anticipate the emerging needs of Biologists for genomics tools. Fail to grasp the importance of access to parallel codes for compute-intensive jobs. Case study: The Next Generation Biology Workbench The NGBW closed in It targeted low end use cases, and in the end supported primarily advanced high school students. How to fail at creating a Gateway

Engage only in user- and use case- driven development. Listen to user requests for new features. Expand capacity to meet growing user demands. Be driven by the high end users, help with one-off solutions when necessary Refactor infrastructure as use cases drive need for changes. Build features only in response to user requests, or when usage patterns break the existing infrastructure. Case study: The CIPRES Science Gateway A Better path for Gateway development

Engage only in user- and use case- driven development. Listen to user requests for new features. Expand capacity to meet growing user demands. Be driven by the high end users, help with one-off solutions when necessary Refactor infrastructure as use cases drive need for changes. Build features only in response to user requests, or when usage patterns break the existing infrastructure. Case study: The CIPRES Science Gateway The current CIPRES Science Gateway is built on the same software as the NGBW. But this project always held to user-driven development.

CIPRES has been successful: Over 15,000 users submitted 550,000+ TeraGrid/XSEDE jobs since Dec, An average of ~350 new XSEDE Users registered in each of the last 12 months. 100 million core hours of TeraGrid/XSEDE time distributed to scientists. Supported at least 1800 publications. Used for curriculum delivery by at least 76 instructors.

Tactics for Gateway Success: Step 1: identify a user population in need

Phylogenetics is the study of the diversification of life on the planet Earth, both past and present, and the relationships among living things through time ?

Evolutionary relationships can be inferred from DNA sequence comparisons: 1. Align sequences to determine evolutionary equivalence: 2. Infer evolutionary relationships based on some set of assumptions:

Biology in the new world of abundant DNA sequence data requires a new kind of cyberinfrastructure! Sequence alignment and Tree inference are NP hard. Even with heuristics, community codes scale exponentially with number of species and columns. Phylogenetics codes that were historically run in desktop environments must be moved to high performance computing resources. The need for access to HPC resources will increase for the foreseeable future. Scientists who do not have HPC access will have to tailor their questions to available resources, and risk being left out of the discovery process.

Tactics for Gateway Success: Step 1: identify a user population in need Community pressure causes CIPRES project to provide public access to their compute engine via a Portal. Construction begins….

Workflow for the CIPRES Gateway: Assemble Sequences Upload to Portal Run Alignment Run Tree Inference Download Post-Tree Analysis Store CIPRES Gateway

Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs

Usage Epochs in CIPRES History

Original architecture. Restricted command line set

Usage Epochs in CIPRES History Make all command line options available

Usage Epochs in CIPRES History The Generic software package from the failed NGBW project allowed us to expose all command line options to users in about 3 months.

Usage Epochs in CIPRES History Make parallel codes available

Usage Epochs in CIPRES History Make parallel codes available The Generic software package from the failed NGBW project allowed us to submit jobs “easily” to TeraGrid/XSEDE resources, and to local HPC resources.

Linear growth in usage has continued every month since….. It has just been a matter of trying to help the software keep up with the changing use cases. Usage Epochs in CIPRES History

Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage-created needs drive improvements

Motivation: Too Many Users. Create a tool set that gives: ability to halt submissions from a given user account ability to monitor usage by each account automatically ability for users to track their SU consumption ability to forecast SU cost of a job for users ability to charge to a user’s personal XSEDE allocation

Help users track their resource consumption: Notify users of their usage level

Motivation: users running 2 week jobs Issue: During service interruptions, the app lost track of the job, results must be fetched manually Response: Create a system of daemons that return results robustly even with system outages

CIPRES DB Execution Hosts Running tasks Tasks curl, task is done checkJobsD 1. Find all “submitted” tasks 2. Ask execution host if job is done 3. If yes, set status to “done” loadResultsD 1. Find all “done” tasks 2. Transfer results to CIPRES DB 3. Remove job from “WorkQ” submitJobsD 1. Find all “new” tasks 2. Submit to correct execution host 3. Set status to “submitted” Change status in Running task table to “done” Job Submissions/Results Retrieval is managed by daemons

Motivation: Users input file size grew from KB to MB, output from MB to GB, stressing the system. Software improvement was required to: Keep large files from being read into memory multiple times. Point to files instead of storing them in the DB. Store identical files in the DB only once. Sunset accounts that have been inactive for more than 1 year. Move GB+ files outside the web application/database system

Motivation: Users input file size grew from KB to MB, output from MB to GB, stressing the system. Software improvement was required to: Keep large files from being read into memory multiple times. Point to files instead of storing them in the DB. Store identical files in the DB only once. Sunset accounts that have been inactive for more than 1 year. Move GB+ files outside the web application/database system Limit users to 150 GB of data storage

Help users track their resource consumption: Notify users of their usage level

CIPRES DB Execution Hosts Running tasks Tasks curl, task is done checkJobsD 1. Find all “submitted” tasks 2. Ask execution host if job is done 3. If yes, set status to “done” loadResultsD 1. Find all “done” tasks 2. Transfer results to CIPRES DB 3. Remove job from “WorkQ” submitJobsD 1. Find all “new” tasks 2. Submit to correct execution host 3. Set status to “submitted” Change status in Running task table to “done” What happens when job output is GB in size?

CIPRES DB Execution Hosts Running tasks Tasks curl, task is done loadResultsD 1. Find all “done” tasks 2. Transfer results to CIPRES DB 3. Remove job from “WorkQ” What happens when jobs output is GB in size? After 5 minutes, the transfer is still in progress, the job is still in the WorkQ, and marked “done” loadResultsD finds it, and starts the transfer again…. Soon multiple transfers are in progress, and the system chokes

CIPRES DB Execution Hosts Running tasks Tasks loadResultsD 1. Find all “done” tasks 2. Ask how big the results are. 3. Move large results out of the system, transfer all others 4. Remove job from “WorkQ” Solution: Compress and move large files to cloud storage for direct return to user via hyperlink

CIPRES DB Execution Hosts Running tasks Tasks loadResultsD 1. Find all “done” tasks 2. Ask how big the results are. 3. Move large results out of the system, transfer all others 4. Remove job from “WorkQ” Solution: Compress and move large files to cloud storage for direct return to user via hyperlink 500+ Users have required file downloads by this transfer mechanism….

Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users

Other issues also arose Gridftp proved unreliable at high load. Move to local Lustre file systems. Under load, a MySQL bug prevented the DB connections from releasing, choking the web app; refactor how the DB manages files.

Other issues also arose The Lustre file system is not good for many Biology codes, so we moved to NFS…

Other issues also arose The Lustre file system is not good for many Biology codes, so we moved to NFS… Lustre failures on long jobs cause surge in resource use

The issue with issues: Dealing with these issues occurred in fire drill mode; users were stymied and frustrated. On average, 30-45% of developer time is spent dealing with these issues. Some days/weeks all forward progress is halted. But on the other hand, making your existing users happy is the first priority…..

Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users Step 5: stay in touch with your community

Provide many points of contact

When a project belongs to the community…

Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users Step 5: stay in touch with your community Step 6: embrace customer service

Set aside time for user issues

The goals are: No more than 24 h response time Foster a supportive and helpful culture Make it clear that trouble reports are a gift to CIPRES, not an annoyance

Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users Step 5: stay in touch with your community Step 6: embrace customer service Step 7: innovate as funds permit

There are highly-evolved legacy desktop/browser applications that help with matrix assembly, but have no tree inference tools or are under powered: raxmlGUI

There are projects that offer powerful and distinct user experiences, and are interested in incorporating powerful tree inference tools into an existing application:

CSG XSEDE Parallel codes We received funding to create a public CIPRES RESTful API (CRA) to help with these use cases…. raxmlGUI

Morpho- Bank MB-DB Character Recording Character Matrix Assembly Team Data Sharing Character Quantification Character Visualization Character Matrix Publication Use Cases: MorphoBank and REST Services MorphoBank provides powerful visual tools for creating and sharing data matrices among large teams……

Morpho- Bank MB-DB Character Recording Character Matrix Assembly Team Data Sharing Character Quantification Character Visualization Character Matrix Publication Use Cases: MorphoBank and REST Services But its has no concept of trees or tree inference……

Morpho- Bank MB-DB Character Recording Character Matrix Assembly Team Data Sharing Character Quantification Character Visualization Character Matrix Publication Use Cases: MorphoBank and REST Services CRA XSEDE Parallel codes CIPRES RESTful API allows users to proceed with their workflow within the MorphoBank environment……

Mesquite Tree Display Tree Editing Tree Reconciliation Sequence Editing Sequence Assembly Tree Analysis Use Cases: Mesquite and REST Services Desktop Mesquite provides powerful visual tools for pre- and post- tree tasks on the desktop……

Mesquite Tree Display Tree Editing Tree Reconciliation Sequence Editing Sequence Assembly Tree Analysis Use Cases: Mesquite and REST Services Desktop But its tree inference is limited by the desktop hardware……

CRA XSEDE Parallel codes Mesquite Tree Display Tree Editing Tree Reconciliation Sequence Editing Sequence Assembly Tree Analysis Use Cases: Mesquite and REST Services Desktop RESTful CIPRES API provides the needed compute power without leaving the app……

Many advanced developers find the workflow supported by the CIPRES browser too restrictive. !!!

Use Cases: Individual developers and REST Services Advanced phylogenetic researchers want: to run many jobs simultaneously create ad hoc workflows Advanced phylogenetic researchers don’t want: to assemble and click each job one at a time to manually port the output of one job to the subsequent job in their workflow

CRA XSEDE Parallel codes Scripting Tools Use Cases: Individual developers and REST Services Assuming modest scripting skills, an advanced researcher can accomplish this goal using the CIPRES RESTful API to avoid the clumsy browser interface

The REST API was released in October 2014, and announced formally January It is available through: MorphoBank Influenza Research Database Virus Pathogen Resource (ViPR) Tree-Based Alignment Selector (TBAS) raxmlGUI Coming soon: Mesquite siMBa BioKepler

Advantages of offering REST services: Preserves the investment in creating and learning to use complex software environments. Makes interaction with the application more flexible for individuals with scripting skills.

But where are the individual scripters we expected? !!!

Perhaps the REST API has too high a barrier to entry.

Web Form Parameter map Front end Validation (Javascript; struts) Backend validation Tool XML Parameter map Backend validation Rest Client Command Line Command Line

Perhaps the REST API has too high a barrier to entry. What next?

Perhaps the REST API has too high a barrier to entry. Web Form Parameter map Front end Validation (Javascript; struts) Backend validation Tool XML Parameter map Backend validation Rest Client Command Line Command Line JavaScript GUI

Use Cases: Individual developers and REST Services Advanced phylogenetic researchers want: to run many jobs simultaneously create ad hoc workflows Advanced phylogenetic researchers don’t want: to assemble and click each job one at a time to manually port the output of one job to the subsequent job in their workflow

Descriptive text Code cells Cell Controls

The Jupyter notebook as the following properties: Interleaving text and live code makes it easy to modify and share workflows. The information is stored as an easily sharable file that can be used in any Jupyter implementation with the proper software installed. Many scripting languages are supported. Supports interactive creating/modifying figures, and GUI interactions.

Create a CIPRES Notebook environment where: Notebooks in R and python are supported (at least). A standard collection of Phylogenetics scripting packages are available in each language. A forum is provided for notebook storage, exchange, and publishing. Ability to submit to virtual HPC clusters on XSEDE resources.

Challenges: How to allow users to submit command lines without major security issues. How to make sure jobs are configured correctly/efficiently

Workflow for the CIPRES Notebook Environment: Assemble Sequences Upload to Portal Run Alignment Run Tree Inference Download Post-Tree Analysis Store CIPRES Gateway

The expanded workflow becomes more tractable in the Notebook Environment because users have the ability to recruit tools, and design their own workflows. Will the barrier to entry be too high?

How will SciGap help us? 7/13/2014

How will SciGap help us? For all apps: As we delve into providing access via the CIPRES Notebook, CIPRES job submissions and middleware can be taken over by SciGaP. This would allow all Gateway developers (Terri and Kenneth, for example) to focus primarily on creating the new interface, while the heavy lifting required of the production application is taken over by SciGaP. Recall that in our team, 30-45% of developer time is spent on putting out fires in the middleware. We would love to give those issues to SciGaP…. 7/13/2014

Acknowledgements Terri Schwartz – Lead developer Wayne Pfeiffer – HPC Expertise Paul Hoover – Database /Backend Mona Wong – Interface