Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolution of the CIPRES Science Gateway, a Public Resource for Phylogenetics. Mark A. Miller San Diego Supercomputer Center.

Similar presentations


Presentation on theme: "Evolution of the CIPRES Science Gateway, a Public Resource for Phylogenetics. Mark A. Miller San Diego Supercomputer Center."— Presentation transcript:

1

2 Evolution of the CIPRES Science Gateway, a Public Resource for Phylogenetics. Mark A. Miller San Diego Supercomputer Center

3 How to fail at creating a Gateway Allow the development staff to focus on their personal goal: creating the coolest, most generic software package ever. Ignore new researchers in your community and focus on an existing user base. Focus on updating an existing Gateway’s capabilities. Focus on low end computational use cases/ classroom use. Fail to anticipate the emerging needs of Biologists for genomics tools. Fail to grasp the importance of access to parallel codes for compute-intensive jobs. Case study: The Next Generation Biology Workbench

4 Allow the development staff to focus on their personal goal: creating the coolest, most generic software package ever. Ignore new researchers in your community and focus on an existing user base. Focus on updating an existing Gateway’s capabilities. Focus on low end computational use cases/ classroom use. Fail to anticipate the emerging needs of Biologists for genomics tools. Fail to grasp the importance of access to parallel codes for compute-intensive jobs. Case study: The Next Generation Biology Workbench The NGBW closed in 2013. It targeted low end use cases, and in the end supported primarily advanced high school students. How to fail at creating a Gateway

5 Engage only in user- and use case- driven development. Listen to user requests for new features. Expand capacity to meet growing user demands. Be driven by the high end users, help with one-off solutions when necessary Refactor infrastructure as use cases drive need for changes. Build features only in response to user requests, or when usage patterns break the existing infrastructure. Case study: The CIPRES Science Gateway A Better path for Gateway development

6 Engage only in user- and use case- driven development. Listen to user requests for new features. Expand capacity to meet growing user demands. Be driven by the high end users, help with one-off solutions when necessary Refactor infrastructure as use cases drive need for changes. Build features only in response to user requests, or when usage patterns break the existing infrastructure. Case study: The CIPRES Science Gateway The current CIPRES Science Gateway is built on the same software as the NGBW. But this project always held to user-driven development.

7 CIPRES has been successful: Over 15,000 users submitted 550,000+ TeraGrid/XSEDE jobs since Dec, 2009. An average of ~350 new XSEDE Users registered in each of the last 12 months. 100 million core hours of TeraGrid/XSEDE time distributed to scientists. Supported at least 1800 publications. Used for curriculum delivery by at least 76 instructors.

8 Tactics for Gateway Success: Step 1: identify a user population in need

9 Phylogenetics is the study of the diversification of life on the planet Earth, both past and present, and the relationships among living things through time ?

10 Evolutionary relationships can be inferred from DNA sequence comparisons: 1. Align sequences to determine evolutionary equivalence: 2. Infer evolutionary relationships based on some set of assumptions:

11 Biology in the new world of abundant DNA sequence data requires a new kind of cyberinfrastructure! Sequence alignment and Tree inference are NP hard. Even with heuristics, community codes scale exponentially with number of species and columns. Phylogenetics codes that were historically run in desktop environments must be moved to high performance computing resources. The need for access to HPC resources will increase for the foreseeable future. Scientists who do not have HPC access will have to tailor their questions to available resources, and risk being left out of the discovery process.

12 Tactics for Gateway Success: Step 1: identify a user population in need Community pressure causes CIPRES project to provide public access to their compute engine via a Portal. Construction begins….

13 Workflow for the CIPRES Gateway: Assemble Sequences Upload to Portal Run Alignment Run Tree Inference Download Post-Tree Analysis Store CIPRES Gateway

14 Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs

15 Usage Epochs in CIPRES History

16 Original architecture. Restricted command line set

17 Usage Epochs in CIPRES History Make all command line options available

18 Usage Epochs in CIPRES History The Generic software package from the failed NGBW project allowed us to expose all command line options to users in about 3 months.

19 Usage Epochs in CIPRES History Make parallel codes available

20 Usage Epochs in CIPRES History Make parallel codes available The Generic software package from the failed NGBW project allowed us to submit jobs “easily” to TeraGrid/XSEDE resources, and to local HPC resources.

21 Linear growth in usage has continued every month since….. It has just been a matter of trying to help the software keep up with the changing use cases. Usage Epochs in CIPRES History

22 Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage-created needs drive improvements

23 Motivation: Too Many Users. Create a tool set that gives: ability to halt submissions from a given user account ability to monitor usage by each account automatically ability for users to track their SU consumption ability to forecast SU cost of a job for users ability to charge to a user’s personal XSEDE allocation

24 Help users track their resource consumption: Notify users of their usage level

25 Motivation: users running 2 week jobs Issue: During service interruptions, the app lost track of the job, results must be fetched manually Response: Create a system of daemons that return results robustly even with system outages

26 CIPRES DB Execution Hosts Running tasks Tasks curl, task is done checkJobsD 1. Find all “submitted” tasks 2. Ask execution host if job is done 3. If yes, set status to “done” loadResultsD 1. Find all “done” tasks 2. Transfer results to CIPRES DB 3. Remove job from “WorkQ” submitJobsD 1. Find all “new” tasks 2. Submit to correct execution host 3. Set status to “submitted” Change status in Running task table to “done” Job Submissions/Results Retrieval is managed by daemons

27 Motivation: Users input file size grew from KB to MB, output from MB to GB, stressing the system. Software improvement was required to: Keep large files from being read into memory multiple times. Point to files instead of storing them in the DB. Store identical files in the DB only once. Sunset accounts that have been inactive for more than 1 year. Move GB+ files outside the web application/database system

28 Motivation: Users input file size grew from KB to MB, output from MB to GB, stressing the system. Software improvement was required to: Keep large files from being read into memory multiple times. Point to files instead of storing them in the DB. Store identical files in the DB only once. Sunset accounts that have been inactive for more than 1 year. Move GB+ files outside the web application/database system Limit users to 150 GB of data storage

29 Help users track their resource consumption: Notify users of their usage level

30 CIPRES DB Execution Hosts Running tasks Tasks curl, task is done checkJobsD 1. Find all “submitted” tasks 2. Ask execution host if job is done 3. If yes, set status to “done” loadResultsD 1. Find all “done” tasks 2. Transfer results to CIPRES DB 3. Remove job from “WorkQ” submitJobsD 1. Find all “new” tasks 2. Submit to correct execution host 3. Set status to “submitted” Change status in Running task table to “done” What happens when job output is GB in size?

31 CIPRES DB Execution Hosts Running tasks Tasks curl, task is done loadResultsD 1. Find all “done” tasks 2. Transfer results to CIPRES DB 3. Remove job from “WorkQ” What happens when jobs output is GB in size? After 5 minutes, the transfer is still in progress, the job is still in the WorkQ, and marked “done” loadResultsD finds it, and starts the transfer again…. Soon multiple transfers are in progress, and the system chokes

32 CIPRES DB Execution Hosts Running tasks Tasks loadResultsD 1. Find all “done” tasks 2. Ask how big the results are. 3. Move large results out of the system, transfer all others 4. Remove job from “WorkQ” Solution: Compress and move large files to cloud storage for direct return to user via hyperlink

33 CIPRES DB Execution Hosts Running tasks Tasks loadResultsD 1. Find all “done” tasks 2. Ask how big the results are. 3. Move large results out of the system, transfer all others 4. Remove job from “WorkQ” Solution: Compress and move large files to cloud storage for direct return to user via hyperlink 500+ Users have required file downloads by this transfer mechanism….

34 Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users

35 Other issues also arose Gridftp proved unreliable at high load. Move to local Lustre file systems. Under load, a MySQL bug prevented the DB connections from releasing, choking the web app; refactor how the DB manages files.

36 Other issues also arose The Lustre file system is not good for many Biology codes, so we moved to NFS…

37 Other issues also arose The Lustre file system is not good for many Biology codes, so we moved to NFS… Lustre failures on long jobs cause surge in resource use

38 The issue with issues: Dealing with these issues occurred in fire drill mode; users were stymied and frustrated. On average, 30-45% of developer time is spent dealing with these issues. Some days/weeks all forward progress is halted. But on the other hand, making your existing users happy is the first priority…..

39 Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users Step 5: stay in touch with your community

40 Provide many points of contact

41 When a project belongs to the community…

42 Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users Step 5: stay in touch with your community Step 6: embrace customer service

43 Set aside time for user issues

44 The goals are: No more than 24 h response time Foster a supportive and helpful culture Make it clear that trouble reports are a gift to CIPRES, not an annoyance

45 Tactics for Gateway Success: Step 1: identify a user population in need Step 2: commit to responding to user’s needs Step 3: let user behavior/usage created needs drive improvements Step 4: manage challenges that threaten productivity of high end users Step 5: stay in touch with your community Step 6: embrace customer service Step 7: innovate as funds permit

46 There are highly-evolved legacy desktop/browser applications that help with matrix assembly, but have no tree inference tools or are under powered: raxmlGUI

47 There are projects that offer powerful and distinct user experiences, and are interested in incorporating powerful tree inference tools into an existing application:

48 CSG XSEDE Parallel codes We received funding to create a public CIPRES RESTful API (CRA) to help with these use cases…. raxmlGUI

49

50 Morpho- Bank MB-DB Character Recording Character Matrix Assembly Team Data Sharing Character Quantification Character Visualization Character Matrix Publication Use Cases: MorphoBank and REST Services MorphoBank provides powerful visual tools for creating and sharing data matrices among large teams……

51 Morpho- Bank MB-DB Character Recording Character Matrix Assembly Team Data Sharing Character Quantification Character Visualization Character Matrix Publication Use Cases: MorphoBank and REST Services But its has no concept of trees or tree inference……

52 Morpho- Bank MB-DB Character Recording Character Matrix Assembly Team Data Sharing Character Quantification Character Visualization Character Matrix Publication Use Cases: MorphoBank and REST Services CRA XSEDE Parallel codes CIPRES RESTful API allows users to proceed with their workflow within the MorphoBank environment……

53

54 Mesquite Tree Display Tree Editing Tree Reconciliation Sequence Editing Sequence Assembly Tree Analysis Use Cases: Mesquite and REST Services Desktop Mesquite provides powerful visual tools for pre- and post- tree tasks on the desktop……

55

56 Mesquite Tree Display Tree Editing Tree Reconciliation Sequence Editing Sequence Assembly Tree Analysis Use Cases: Mesquite and REST Services Desktop But its tree inference is limited by the desktop hardware……

57 CRA XSEDE Parallel codes Mesquite Tree Display Tree Editing Tree Reconciliation Sequence Editing Sequence Assembly Tree Analysis Use Cases: Mesquite and REST Services Desktop RESTful CIPRES API provides the needed compute power without leaving the app……

58 Many advanced developers find the workflow supported by the CIPRES browser too restrictive. !!!

59 Use Cases: Individual developers and REST Services Advanced phylogenetic researchers want: to run many jobs simultaneously create ad hoc workflows Advanced phylogenetic researchers don’t want: to assemble and click each job one at a time to manually port the output of one job to the subsequent job in their workflow

60 CRA XSEDE Parallel codes Scripting Tools Use Cases: Individual developers and REST Services Assuming modest scripting skills, an advanced researcher can accomplish this goal using the CIPRES RESTful API to avoid the clumsy browser interface

61 The REST API was released in October 2014, and announced formally January 2015. It is available through: MorphoBank Influenza Research Database Virus Pathogen Resource (ViPR) Tree-Based Alignment Selector (TBAS) raxmlGUI Coming soon: Mesquite siMBa BioKepler

62 Advantages of offering REST services: Preserves the investment in creating and learning to use complex software environments. Makes interaction with the application more flexible for individuals with scripting skills.

63 But where are the individual scripters we expected? !!!

64 Perhaps the REST API has too high a barrier to entry.

65 Web Form Parameter map Front end Validation (Javascript; struts) Backend validation Tool XML Parameter map Backend validation Rest Client Command Line Command Line

66 Perhaps the REST API has too high a barrier to entry. What next?

67 Perhaps the REST API has too high a barrier to entry. Web Form Parameter map Front end Validation (Javascript; struts) Backend validation Tool XML Parameter map Backend validation Rest Client Command Line Command Line JavaScript GUI

68 Use Cases: Individual developers and REST Services Advanced phylogenetic researchers want: to run many jobs simultaneously create ad hoc workflows Advanced phylogenetic researchers don’t want: to assemble and click each job one at a time to manually port the output of one job to the subsequent job in their workflow

69 Descriptive text Code cells Cell Controls

70 The Jupyter notebook as the following properties: Interleaving text and live code makes it easy to modify and share workflows. The information is stored as an easily sharable file that can be used in any Jupyter implementation with the proper software installed. Many scripting languages are supported. Supports interactive creating/modifying figures, and GUI interactions.

71 Create a CIPRES Notebook environment where: Notebooks in R and python are supported (at least). A standard collection of Phylogenetics scripting packages are available in each language. A forum is provided for notebook storage, exchange, and publishing. Ability to submit to virtual HPC clusters on XSEDE resources.

72 Challenges: How to allow users to submit command lines without major security issues. How to make sure jobs are configured correctly/efficiently

73 Workflow for the CIPRES Notebook Environment: Assemble Sequences Upload to Portal Run Alignment Run Tree Inference Download Post-Tree Analysis Store CIPRES Gateway

74 The expanded workflow becomes more tractable in the Notebook Environment because users have the ability to recruit tools, and design their own workflows. Will the barrier to entry be too high?

75 How will SciGap help us? 7/13/2014

76 How will SciGap help us? For all apps: As we delve into providing access via the CIPRES Notebook, CIPRES job submissions and middleware can be taken over by SciGaP. This would allow all Gateway developers (Terri and Kenneth, for example) to focus primarily on creating the new interface, while the heavy lifting required of the production application is taken over by SciGaP. Recall that in our team, 30-45% of developer time is spent on putting out fires in the middleware. We would love to give those issues to SciGaP…. 7/13/2014

77 Acknowledgements Terri Schwartz – Lead developer Wayne Pfeiffer – HPC Expertise Paul Hoover – Database /Backend Mona Wong – Interface


Download ppt "Evolution of the CIPRES Science Gateway, a Public Resource for Phylogenetics. Mark A. Miller San Diego Supercomputer Center."

Similar presentations


Ads by Google