Presentation is loading. Please wait.

Presentation is loading. Please wait.

GridChem Refactoring: Workflows in GridChem Sudhakar Pamidighantam

Similar presentations


Presentation on theme: "GridChem Refactoring: Workflows in GridChem Sudhakar Pamidighantam"— Presentation transcript:

1 GridChem Refactoring: Workflows in GridChem Sudhakar Pamidighantam spamidig@ncsa.edu

2 Acknowledgements Rion Dooley, TACC Suresh Marru, IU Yang Liu, NCSA Raman Sandhu, IU Nikhil Singh, NCSA Gaurang Mehta, ISI NSF and TeraGrid

3 GridChem Refactoring Outline GridChem functioning New needs and requirements Refactoring ParamChem Future usage

4 GridChem Functioning CCG Virtual Organization - Allocations, Consulting, User services Three Tier Client, Middleware and Resources Architecture Job Based interactions with services Data archived on Storage resources Previous talk has some details http://teragridforum.org/mediawiki/images/e/ed/Sudhak ar.NCSA12.4.08.ppt

5 GridChem Usage

6 Scince from GridChem Gateway

7 GridChem New Needs New projects and Collaborations Multisclae modeling for Material Science Prof. Duane Johnson, UIUC QM-QMC Coupled applications ParamChem, Profs. MacKerrel(UMB), Roitberg(UF), Connolly(Uky), Pamidighantam(UIUC/NCSA) QM-MM-MD Coupled applications Advanced Users requiring Potential Energy Hypercurface Computations, Ab initio MD, and Parameter Sweeps. Workflows

8 GridChem Refactoring Code refactoring is the process of changing a computer program's source code without modifying its external functional behavior in order to improve some of the nonfunctional attributes of the software. Advantages include improved code readability and reduced complexity to improve the maintainability of the source code, as well as a more expressive internal architecture or object model to improve extensibility.computer programsource codefunctionalnonfunctionalreadabilitycomplexity maintainability architectureobject modelextensibility “ By continuously improving the design of code, we make it easier and easier to work with. This is in sharp contrast to what typically happens: little refactoring and a great deal of attention paid to expediently adding new features. If you get into the hygienic habit of refactoring continuously, you'll find that it is easier to extend and maintain code. ” —- Joshua Kerievsky, Refactoring to Patterns [1] [1] http://en.wikipedia.org/wiki/Code_refactoring

9 Client New Features New client has no server dependencies. The client library containing the new service stubs which need axis2 dependencies. Axis2 handles fast communication with the service and streaming data. DTO (Data Transfer Object) classes now implemented as the service beans that hold information sent from the service. – They’re a little less redundant. – For example in the in the new software bean, the client no longer has to parse through abstract data formats to figure out the multitude of application/hpc resource combinations.

10

11 GridChem Client Nanocad 3D

12 GridChem Refactoring Preprocessing Tools GDIS Tubegen JMol MolGen

13 GridChem-Xbaya Composer

14 Server New Features Axis2 implementation makes it also available via HTTP, SOAP, and RPC. The database underwent a cleanup and schema changes. This leads to performance in terms of quicker services.

15 Server New Features Single service distribution into Tomcat 5.5. Uses JNDI and Tomcat’s connection pooling for much better stability. JNDI provides standardised naming for Directory Services Removed GAT, now uses Java cogkit based GridFTP client directly for performance increase on remote directory listings, file transfers, and i/o operations.

16 Server New Features Multiple file upload support with service-side user cache. -Several application and workflows require multiple files for execution and Users can upload input files, browse previously uploaded files, and retrieve previously uploaded files. Job queue prediction via qbets is available on the client side. QBets provides a way to select resources automatically. Resource monitoring plugs into the TeraGrid gpir and iis instances for accurate, effort-free resource monitoring, discovery, and access GPIR and IIS provide system and service information for various resources ( HPC systems).

17 Server New Features Job updates are done via a RESTful trigger service. The batch scripts callout to the service with a secret service key and the service updates the job status immediately. This provides State and Session preservation Database access times on all queries has been improved to sub-second performance. The overall service memory footprint has been cut down 2 orders of magnitude from the current production version. Support for ingesting TeraGrid users, their profile, project, and resource information, and allowing them to use the CCG infrastructure with their current allocations has been added. This will provide these services across TeraGrid user communities.

18 Server New Features CGI scripts are now bundled with the service and deployed into tomcat rather than an apache server. Software access control has been implemented. The BlackList table holds a list of users who are denied access to a Software record.

19 Workflow Selection/Execution XBaya is a graphical client program for workflow composition, monitoring, and more. Different Web services can be invoked at different steps. Data from intermediate steps stored in databases for future reuse. Each step on the workflow can be monitored.

20 Integration of Xbaya and GridChem Data

21 Paramchem middleware Requirements Broad Goals Parameterization User (community) management Paramteterization process management Workflows Data management Archival and Retrieval requirements

22 Cyberenvironments for Parameterization Computational Reference Data Generation

23 Molecular Force Field Cyberenvironments Parameter Initialization and optimization Workflow Paramater definitions Workflow For Empirical Parameter Optimization Model/Reference Data Definition Merit Function Specification Consistency Checker Optimization Methods Choice Optmization Job Launcher Update Parameter Database with new set Job Manager Optimization Incomplete? Paramater testing Model Successful Testing Optimization Monitor Optimization Job Completed? Paramater Sensitivity Analysis Notification of End of Workflow Expert Interface

24 Parameterization Menus and Data structures

25 Charmm-Gaussian Workflow in Xbaya- workflow management system

26 ParamChem Web Services Client  Objects  Database Interaction WS Resources DTO ObjectsHibernate Databasehb.xml Client DTO (Data Transfer Object) Serialize transfer through XML DAO (Data Access Object) How to get the DB objects hb.xml (Hibernate Data Map) describes obj/column data mapping Business Model DAO

27 ParamChem Data Models Users ParamProjects Resources UserProjectResource SoftwareResources ComputeResources NetworkResources StorageResources Resources resoruceID Type hostName IPAddress siteID userID paramprojectID resourceID loginName SUsLocalUserUsed Workflows WFID WFName userID projID RegWFID cost UsersResources JobID JobName userID projID softID cost WF Node/Job DataResources

28 ParamChem Resource following CCG Class Dependencies

29 ParamChem Middleware Services (PMS) Use Cases Authentication Workflow Selection/Creation Workflow Configuration Workflow Submission Workflow Resource Monitoring Workflow Monitoring Data Resource Monitoring/Organization Workflow Results Retrieval/Organization …

30 PMS Authentication (follows GridChem Middleware Services) WSDL (Web Service Definition Language) is a language for describing how to interface with XML- based services. It describes network services as a pair of endpoints operating on messages with either document-oriented or procedure-oriented information. The service interface is called the port type WSDL FILE: <definitions name="MathService" targetNamespace="http://www.globus.org/namespaces/examples/core/MathService_instance" xmlns="http://schemas.xmlsoap.org/wsdl/" … http://www.gridchem.org:8668/space/GMS/usecase Retrieve UserProjects (GetResourceProperty Port Type [PT]) Contact PMS Creates Session, Session RP and EPR, Ind.User.Comm Sends EPR ( Like a Cookie, but more than that) Login Request (username:passwd) Validates, Loads UserProjects,Data,WFRegistries Sends acknowledgement ParamChem ClientPMS

31 PMS Authentication follows GMS_WS http://www.gridchem.org:8668/space/GMS/usecase Selects project LoadVO port type (w. MAC address) Verifies user/project/MACaddr Load UserResources RP Retrieve UserResources [as userVO/ Profile] (GetResourceProperty port Type PT) ParamChem ClientPMS Validates, Loads UserProjects,Data.WFs Sends acknowledgement

32 PMS Workflow Submission Create WF object PredictWFStartTime PT + WF DTO ----> Node/Job DTOs Node/JobStart Prediction RP PT = portType RP = Resource Properties DTO = Data Transfer Object Completion: Email from batch system to PMS server cron@PMS  DB Submission Xbaya GFAC CoGKit GAT “gsi-ssh” If decision OK, Submit Workflow PT + WFDTO and JobDTOs Create WF object API—Submit Store WF Object Send Acknowledgement Need to check to make sure allocation-time is available The workflow is sane and executable. ParamChem ClientPMS

33 ParamChem Middleware Services Monitoring Parse XML, Display PT = portType RP = Resource Properties DTO = Data Transfer Object DB = Data Base cron@PMS server Xbaya Monitoring Server cron@HPC Servers Job Launcher Notifications VO Admin email parses email  DB (status + cost) Request for Job, Resource Status Alloc. Balance UserResource RP Updated from DB ParamChem ClientPMSResources/Kits/DB Send info Discover Applications (Software Resources) Discover Data (Data Resources) Monitor Workflow Schedulers (Primary) Monitor System (Secondary) Monitor Queues (Tertiary) Workflow Status Updates (automated) Node/Job Status Display

34 PMS Workflow Status Workflow Status WFDTO.status WFXBaya Launcher Status Update Estimate Start time Scheduler emails/ notifications Notifications: Client, Management, email, IM ParamChem ClientPMSResources/Kits/DB

35 PMS DATA Organization Retrieval (MSS) GetResourceProperty PT FileDTO(?) LoadFile PT (project folder+job) Validates project folder owned by user. Send new listing PT = portType RP = Resource Properties DTO = Data Transfer Object MSS = Mass Storage System Job Completion, Workflow Completion Send Output to MSS LoadFile PT MSS query UserFiles RP + FileDTO object Retrieve Root Dir. Listing on MSS with CoGKit or GAT or “gsi-ssh” API file request Store locally Create FileDTO Load into UserData RP RetrieveFiles PT (+file rel.path) Retrieve file: CoGKit or GAT or “gsi-ssh” GetResourceProperty PT ParamChem ClientPMSResources/Kits/DB

36 ParamChem File Retrieval PT = portType RP = Resource Properties DTO = Data Transfer Object MSS = Mass Storage System Create FileDTO (?) Load into UserData RP RetrieveJobOutput PT (+JobDTO) Job Record from DB. Running: from Resource Complete: from MSS Retrieve file: CoGKit or GAT or “gsiftp” GetResourceProperty PT ParamChem Client PMSResources/Kits/DB

37 ParamChem Web Services WSRF (Web Services Resource Framework) Compliant WSRF Specifications: WS-ResourceProperties (WSRF-RP) WS-ResourceLifetime (WSRF-RL) WS-ServiceGroup (WSRF-SG) WS-BaseFaults (WSRF-BF) %ps -aux | grep ws /usr/java/jdk1.5.0_05/bin/java \ -Dlog4j.configuration=container-log4j.properties \ -DGLOBUS_LOCATION=/usr/local/globus \ -Djava.endorsed.dirs=/usr/local/globus/endorsed \ -DGLOBUS_HOSTNAME=derrick.tacc.utexas.edu \ -DGLOBUS_TCP_PORT_RANGE=62500,64500 \ -Djava.security.egd=/dev/urandom \ -classpath /usr/local/globus/lib/bootstrap.jar: /usr/local/globus/lib/cog-url.jar: /usr/local/globus/lib/axis-url.jar org.globus.bootstrap.Bootstrap org.globus.wsrf.container.ServiceContainer -nosec Logging Configuration Where to find Globus Where to get random seed for encryption key generation Classpath (required jars)

38 model dto credential job notification filefile.task job.task user exceptions resource persistence synch query test util dao gpir crypt enumerators gat proxy GMS_WS client audit pms Classes for WSRF service implementation (PT) Cmd line tests to mimic client requests Data Access Obj – queries DB via persistent classes (hibernate) Data Transfer Obj – (job,File,Hardware,Software,User) XML How to handle errors (exceptions) CCG Service business mode (how to interact) Contains user’s credentials for job sub. file browsing,… “ Oversees correct” handling of user data (get/putfile). Define Job & util & enumerations (SubmitTask, KillTask,…) CCGResource&Util, Synched by GPIR, abstract classes NetworkRes., ComputeRes., SoftwareRes., StorageRes., VisualizationRes. User (has attributes – Preference/Address) DB operations (CRUD), OR Maps, pool mgmt,DB session, Classes that communicate with other web services Periodically update DB with GPIR info (GPIR calls) JUnit service test (gms.properties): authen. VO retrieval, Res.Query,Synch, Job Mgmt, File Mgmt, Notification Contains utility and singleton classes for the service. Encryption of login password Mapping from GMS_WS enumeration classes  DB GAT util classes: GATContext & GAT Preferences generation Classes deal with CoGKit configuration. Autonomous notification via email, IM, textmesg.


Download ppt "GridChem Refactoring: Workflows in GridChem Sudhakar Pamidighantam"

Similar presentations


Ads by Google