Download presentation
Presentation is loading. Please wait.
Published byFelicia Wilkinson Modified over 9 years ago
1
BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort NIH Resource for Macromolecular Modeling and Bioinformatics University of Illinois
2
BioCoRE and GEMS 3 October 2004 Overview I Chemical applications such as virtual screening, protein kinetics and structure, and analysis and validation of molecular simulations require enormous resources that can be provided by CyberInfrastructure Successful solution of these problems require collaborative approaches, also facilitated by CyberInfrastructure
3
BioCoRE and GEMS 3 October 2004 Overview II To make CyberInfrastucture effective, the following issues must be addressed: Users of CyberInfrastructure need a data- centric way of managing their computations and data Distributed databases on the grid need to address the problem of reliability and fault- tolerance of data
4
BioCoRE and GEMS 3 October 2004 Overview III We will study examples of collaborative software that address these issues, primarily: –BioCoRE: A Collaboratory for Structural Biology –GEMS: Grid Enabled Molecular Simulations Toolset and Database
5
BioCoRE and GEMS 3 October 2004 Sample CyberScience Projects Collaborative BiophysicsBioCoRE K. Schulten, Illinois Virtual ScreeningThe Screensaver Project W.G. Richards, Oxford Protein KineticsFolding@Home V. Pande, Stanford Distributed Database of Molecular Simulations BioSimGrid M. Sansom, Oxford
6
BioCoRE and GEMS 3 October 2004 What is BioCoRE? BioCoRE: a collaborative work environment for biomedical research, research management and training. BioCoRE assists the entire research process, from talking with collaborators to performing simulations and collecting data, to preparing papers and reports.
7
BioCoRE and GEMS 3 October 2004 Sharing Documents With the BioFS and WebDAV, scientists can exchange and edit files from anywhere with a web connection.
8
BioCoRE and GEMS 3 October 2004 Setting Up and Running Simulations NAMDCFG: A “Simulation Setup Wizard” Online help and error checking for NAMD input files Job submission to supercomputers simplified Job status monitored for easy retrieval Job data archived for future reference
9
BioCoRE and GEMS 3 October 2004 Sharing Molecular Views Using VMD and BioCoRE, collaborators may exchange and manipulate 3-D models of molecules Emphasis on collaborative sessions. Streamlined process of sharing views.
10
BioCoRE and GEMS 3 October 2004 Communicating Control Panel provides instant messaging and notifications BioCoRE also provides message boards, Web site library, lab book
11
BioCoRE and GEMS 3 October 2004 Programming Interface Provide way for users to programmatically interact with BioCoRE. Communication (Control Panel), shared states (VMD) WebDAV
12
BioCoRE and GEMS 3 October 2004 Availability Free Can be accessed from Illinois site, or server software can be installed locally Server software can be modified if necessary http://www.ks.uiuc.edu/Research/biocore/
13
BioCoRE and GEMS 3 October 2004 Virtual Screening Combinatorial Complexity Lead Exploration Screen docking affinities based on a scoring function (interaction energies, RMSD, etc…) Modeled as an all pairs problem Logically independent computational requirements are well suited for wide area grid distribution Leads (ligands) L0001 L0002 L0003 L0004 L0005
14
BioCoRE and GEMS 3 October 2004 CyberInfrastructure Needs for Virtual Screening I Incorporate protein (receptor) flexibility –Use multiple protein structures (hierarchical representations and algorithms) Iterative refinement of results –Add new protein conformations to improve docking –Use higher resolution models for promising hits (integration of data and work flow) –Monitor status of results (not just jobs running)
15
BioCoRE and GEMS 3 October 2004 CyberInfrastructure Needs for Virtual Screening II Manage computation and storage in the grid –Declarative rather than imperative specification Automate usage of algorithms / tools –Select software and optimal parameters for algorithms (recommender system) –Example: MDSimAid (http://mdsimaid.cse.nd.edu) selects optimal MD simulation protocol (limited options)http://mdsimaid.cse.nd.edu
16
BioCoRE and GEMS 3 October 2004 BioSimGrid Mark S. P. Sansom, Oxford Trajectory data stored in relational database tables per Data Schema Semi-Automated Deposition of trajectory files for certain formats (CHARMM, NAMD, etc…) Trajectory analysis modules Future goal to distribute database Database for biomolecular simulations Specifically: molecular dynamics trajectories Facilitate validation and analysis of simulations Provides “independence” from the specific simulation semantics (configuration parameters, architecture, simulation tools, etc…)
17
BioCoRE and GEMS 3 October 2004 CyberInfrastructure Needs for Distributed Databases I Metadata for trajectories –Simulation protocol, software, etc. Distribution on the grid –Storage fault tolerance / reliability –Scalable solution: reduce storage requirements and centralization
18
BioCoRE and GEMS 3 October 2004 CyberInfrastructure Needs for Distributed Databases II Data-driven model for the user –Data organized around key themes (trajectories, molecules) Generic tools for developers –Applicable to different applications
19
BioCoRE and GEMS 3 October 2004 Solving Integration Problem We need to capture the data flow and the work flow –Ecce project –XML metadata –Component architectures (e.g., JavaBeans, Common Component Architecture)
20
BioCoRE and GEMS 3 October 2004 Solving Integration Problem BioCoRE (K. Schulten, Illinois) –Use of programming interface –Provides multiple services to applications (web file system, job management, shared visualization)
21
BioCoRE and GEMS 3 October 2004 Solving Grid Management Current grid tools are task oriented: run this particular simulation code with these input files, etc. –Web portals are an incremental improvement over command line or stand alone applications Problem: Controlling multiple resources –For example, create 10,000 tasks & keep track of the data, as might be needed for virtual screening or @home applications
22
BioCoRE and GEMS 3 October 2004 Solving Grid Management with GIPSE GIPSE: Grid Interface for Parameter-driven Simulation Environments –Shift focus from management to research –Result-driven interface –Scripting capabilities
23
BioCoRE and GEMS 3 October 2004 Solving grid management with GIPSE Simplify process –XML Data format –Missing “glue” Powerful searches –Optimizations –Control loops GEMS ToolsetHIV-1 Protease
24
BioCoRE and GEMS 3 October 2004 Solving grid management with GIPSE Manage data –Storage –Database retrieval Monitor progress –Status –Application –specific GEMS ToolsetHIV-1 Protease
25
BioCoRE and GEMS 3 October 2004 GEMS Database Toolset Grid Enabled Molecular Simulation –Data Centric –Wide area distributed storage –Researchers have data and resource autonomy –Simulation configuration, input data files, and output data files identified via XML –Centralized SQL locator –Availability via replication
26
BioCoRE and GEMS 3 October 2004 Reliability and Leveraged Availability via Runtime Imaging Reliability of data storage is increased User can tradeoff availability versus storage volume Workspace data has 2-way redundancy by default Archival data has a 2-way redundancy of fewer snapshots, but saves the computational images For each computational run through the GEMS portal a comprehensive runtime image is created from which the simulation can automatically be regenerated. Runtime images include executable version and location, library requirements, hardware requirements, input files, and configuration parameters
27
BioCoRE and GEMS 3 October 2004 Integration of Distributed Data Into New Simulations A grid distributed “make” based on a computational requirement over a set parameter sweep –Example: optimize MD simulation protocol Before starting the sweep a query determines data points that are up to date and those that require computation (including regeneration) –Example: keep current list of results of virtual screening as more computations are performed or targets and ligands added
28
BioCoRE and GEMS 3 October 2004 Example: Validating Simulations Locate specific published simulation configurations for benchmarking Select pertinent input data files (pdb, psf, force fields, etc…) for direct utilization in a new simulation for purpose of comparison/contrast. Researcher B wants to vary certain parameters of Researcher A’s published simulation to test her new MD integrator
29
BioCoRE and GEMS 3 October 2004 Acknowledgments Collaborators in GIPSE and GEMS: –Aaron Striegel –Doug Thain –Jeff Peng Students –Paul Brenner –Santanu Chatterjee Funding from NSF Career and Biocomplexity Klaus Schulten BioCoRE Team: –Robert Brunner –Michael Bach –David Brandon BioCoRE funding from NIH
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.