Download presentation
Presentation is loading. Please wait.
1
NetSchedule Push-Pull Model
Push Job Queue 1 Queue 2 Job 1 Job 2 ….. Job 3 Pull Job NetSchedule server maintains several FIFO queues
2
Architecture Notification (UDP/IP) Worker Node (Active)
Client-submitter NetSchedule Queue 1 Queue 2 …. GetJob() PutResult() Client-submitter Job 1 Job 2 Job 3 Client Waiting for job GetStatus() GetJob() PutResult() Worker Node (Waiting for a new job) Notification (UDP/IP) Push – Pull design (NetSchedule server is passive) Worker nodes can subscribe for queue events notifications Using connection-less network protocol
3
Job Timeout Protection
Node Failure! NetSchedule Worker node 1 Job N Timer: Job N Worker node 2 Job N expired and rescheduled Job N
4
Running an arbitrary cmd-line app remotely, as a GRID Worker Node
Infrastructure GRID Worker Node CLIENT WEB Server BEFORE my_app.pl –x –i foo.asn Network Network Network --- --- --- Cmd-line args, STDIN, Input files AFTER some_app.exe via CRADispatcherClient API remote_app_ dispatcher.cgi Job Info NetSchedule remote_app some_app.exe via CRemoteApp* API NetCache my_app.pl Job Data Exit code, STDOUT, STDERR E.g. a ready-to-use utility: ns_submit_remote_job.exe
5
NetCache/NetSchedule In Grid Framework
1.Put Input Data Client(s): Submit job - Poll job status/progress msg Retrieve the result NetCache Server Input & output BLOBs 1.1 Data key (inp_blob_key) 8. Get Output Data (out_blob_key) 8.1 Output data 2. Job Submit (inp_blob_key) 7. Get Result (job_key) 5. Put Result 5.1. Data key (out_blob_key) Poll Job Status Or Progress Msg (job_key) 2.1 (job_key) 7.1 (out_blob_key) NetSchedule Server(s): - Control job queue 4. Get Input data (inp_blob_key) 3. Get Job Put Progress Message(s) 6. Put Job Result (out_blob_key) Worker Node: -Get new job Get input data (BLOB) Do the job Submit output BLOB Submit job result
6
Creating and using a (multithreaded) GRID Worker Node
CLIENT GRID Infrastructure GRID Worker Node Class CGridClient uses these two classes to submit the job and to get the results: class CMyNode::IWorkerNodeJob: virtual Do(CWorkerNodeJobContext&) Job Info & Small data NetSchedule CGridSubmitter SetJobInput(string) ostream GetOStream() string GetJobInput() istream GetIStream() CWorkerNodeJobContext SetJobOutput(string) PutProgressMessage(string) stream GetOStream() CGridJobStatus string GetJobOutput() int GetReturnCode() istream GetIstream() EJobStatus GetStatus() string GetProgressMessage() NetCache Job Data (large) NCBI_WORKER_NODE_MAIN (CMyNode, 1.0.1); Network Network
7
Performance & Latency Stress test results:
NetSchedule Queue performance guarantees low overhead comparing to conventional CGI model Submit 5000 jobs Done. Elapsed: sec. Avg time: sec. GetStatus 5000 jobs... .....Elapsed : sec. Avg time : sec. Take-Return jobs... Returned 2500 jobs. Jobs processed: 2500 Elapsed: sec. Avg time: sec. Test environment: Linux-to-Linux 2 CPU machine, One active submitter
8
Worker node sample <include/connect/services/grid_client*.hpp>
#include <connect/services/grid_worker_app.hpp> USING_NCBI_SCOPE; class CSampleJob : public IWorkerNodeJob { public: CSampleJob() {} virtual ~CSampleJob() {} int Do(CWorkerNodeJobContext& context) // 1. Get an input data from the client CNcbiIstream& is = context.GetIStream(); int count; is >> count; ...... // 2. Doing some time consuming job here // SleepMicroSec(3000); // 3. Return the result to the client CNcbiOstream& os = context.GetOStream(); os << dvec.size() << ' '; .... context.CommitJob(); return 0; } }; Sample is based on <include/connect/services/grid_woker_*.hpp> See also <include/connect/services/grid_client*.hpp> class CSampleJobFactory : public IWorkerNodeJobFactory { public: virtual ~CSampleJobFactory() {} virtual IWorkerNodeJob* CreateInstance(void) return new CSampleJob; } virtual string GetJobVersion() const return "Sample Job worker node version 1.0.1"; }; int main(int argc, const char* argv[]) CGridWorkerApp app(new CSampleJobFactory); return app.AppMain(argc, argv);
9
Documentation and Examples
GRID Overview: Running an existing application remotely: Configuration file for a remote application server (launcher): Configuration file for a remote application client (jobs submitter/consumer): NetSchedule server configuration:
11
High availability All central components (queue and data storage) are duplicated All components are controlled by NCBI load balancer Protection against back-end (remote CGI) failures - by timeout or via explicit rescheduling
12
Features Back end servers can run requests for more than 30 seconds (WEB timeout) Easy application migration. (Minor code tweak and recompilation) Backend machines can send progress messages (feedback to the user)
13
Worker node API High level design (use of C++ streams, compatibility with ASN.1 serialization) Support of SMP (node can run parallel jobs) Remote administrative access to worker nodes (shutdown, availability check, statistics)
14
Converting a CGI into a GRID Worker Node
CLIENT (Web browser, GET, POST) GRID Infrastructure GRID Worker Node WEB Server BEFORE HTTP --- --- my_cgi.cgi AFTER remote_cgi HTTP Job Info Original CGI executable (my_cgi.cgi) NetSchedule my_cgi.cgi cgi2rcgi NetCache Original CGI, rebuilt after a 2-line code change Network Network Network Job Data
15
How to convert an existing CGI?
#include <misc/grid_cgi/remote_cgiapp.hp> ………….. class CRemoteCgiAppSample : public CRemoteCgiApp { ……. }; void CRemoteCgiAppSample::Init() // Standard CGI framework initialization CRemoteCgiApp::Init(); ………. } int CRemoteCgiAppSample::ProcessRequest( CCgiContext& ctx ) …………….. PutProgressMessage( “Work in progress"); ……………...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.