Presentation is loading. Please wait.

Presentation is loading. Please wait.

NetSchedule Push-Pull Model Queue 1Queue 2 Job 1 Job 2 ….. Job 3 NetSchedule server maintains several FIFO queues Push Job Pull Job.

Similar presentations


Presentation on theme: "NetSchedule Push-Pull Model Queue 1Queue 2 Job 1 Job 2 ….. Job 3 NetSchedule server maintains several FIFO queues Push Job Pull Job."— Presentation transcript:

1 NetSchedule Push-Pull Model Queue 1Queue 2 Job 1 Job 2 ….. Job 3 NetSchedule server maintains several FIFO queues Push Job Pull Job

2 Architecture NetSchedule -Queue 1 -Queue 2 -…. Client-submitter Client Waiting for job Worker Node (Active) Job 1Job 2Job 3 Worker Node (Waiting for a new job) Notification (UDP/IP) GetStatus() Push – Pull design (NetSchedule server is passive) Worker nodes can subscribe for queue events notifications Using connection-less network protocol GetJob() PutResult() Notification (UDP/IP) GetJob() PutResult()

3 Job Timeout Protection NetScheduleWorker node 1 Worker node 2 Job N Node Failure! Job N expired and rescheduled Job N Timer: Job N

4 Exit code, STDOUT, STDERR Cmd-line args, STDIN, Input files CLIENTWEB Server GRID Infrastructure GRID Worker Node remote_app_ dispatcher.cgi NetSchedule NetCache remote_app my_app.pl my_app.pl –x –i foo.asn --- Job Info Job Data Running an arbitrary cmd-line app remotely, as a GRID Worker Node Network some_app.exe via CRADispatcherClient API some_app.exe via CRemoteApp* API E.g. a ready-to-use utility: ns_submit_remote_job.exe ---

5 NetCache Server Input & output BLOBs 1.Put Input Data Client(s): -Submit job - Poll job status/progress msg -Retrieve the result 1.1 Data key (inp_blob_key) NetSchedule Server(s): - Control job queue 2. Job Submit (inp_blob_key) 3. Get Job 4. Get Input data (inp_blob_key) 5. Put Result5.1. Data key (out_blob_key) 2.1 (job_key) 6. Put Job Result (out_blob_key) 7. Get Result (job_key) 7.1 (out_blob_key) 8. Get Output Data (out_blob_key) 8.1 Output data Worker Node: - Get new job -Get input data (BLOB) -Do the job -Submit output BLOB -Submit job result NetCache/NetSchedule In Grid Framework Put Progress Message(s) Poll Job Status Or Progress Msg (job_key)

6 CLIENT GRID Infrastructure GRID Worker Node NetSchedule NetCache Job Info & Small data Job Data (large) Creating and using a (multithreaded) GRID Worker Node Network CGridSubmitter -------------- SetJobInput(string) ostream GetOStream() CGridJobStatus -------------- string GetJobOutput() int GetReturnCode() istream GetIstream() EJobStatus GetStatus() string GetProgressMessage() class CMyNode::IWorkerNodeJob: virtual Do(CWorkerNodeJobContext&) NCBI_WORKER_NODE_MAIN (CMyNode, 1.0.1); Class CGridClient uses these two classes to submit the job and to get the results: string GetJobInput() istream GetIStream() -------------- CWorkerNodeJobContext -------------- SetJobOutput(string) PutProgressMessage(string) stream GetOStream()

7 Performance & Latency Submit 5000 jobs Done. Elapsed: 2.147625 sec. Avg time: 0.000430 sec. GetStatus 5000 jobs........Elapsed :1.428061 sec. Avg time :0.000286 sec. Take-Return jobs... Returned 2500 jobs. Jobs processed: 2500 Elapsed: 1.848050 sec. Avg time: 0.000739 sec. Test environment: Linux-to-Linux 2 CPU machine, One active submitter Stress test results: NetSchedule Queue performance guarantees low overhead comparing to conventional CGI model

8 Worker node sample #include USING_NCBI_SCOPE; class CSampleJob : public IWorkerNodeJob { public: CSampleJob() {} virtual ~CSampleJob() {} int Do(CWorkerNodeJobContext& context) { // 1. Get an input data from the client CNcbiIstream& is = context.GetIStream(); int count; is >> count;...... // 2. Doing some time consuming job here // SleepMicroSec(3000); // 3. Return the result to the client CNcbiOstream& os = context.GetOStream(); os << dvec.size() << ' ';.... context.CommitJob(); return 0; } }; class CSampleJobFactory : public IWorkerNodeJobFactory { public: virtual ~CSampleJobFactory() {} virtual IWorkerNodeJob* CreateInstance(void) { return new CSampleJob; } virtual string GetJobVersion() const { return "Sample Job worker node version 1.0.1"; } }; int main(int argc, const char* argv[]) { CGridWorkerApp app(new CSampleJobFactory); return app.AppMain(argc, argv); } Sample is based on See also

9 Documentation and Examples GRID Overview: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=toolkit&part=ch_grid http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=toolkit&part=ch_grid Running an existing application remotely: http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=toolkit&part=ch_grid#ch_grid._Wr apping_an_existing_1 http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=toolkit&part=ch_grid#ch_grid._Wr apping_an_existing_1 Configuration file for a remote application server (launcher): http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/app/grid/remote_a pp/remote_app.ini http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/app/grid/remote_a pp/remote_app.ini Configuration file for a remote application client (jobs submitter/consumer): http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/sample/app/netsc hedule/remote_app_client_sample.ini http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/sample/app/netsc hedule/remote_app_client_sample.ini NetSchedule server configuration: http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/app/netschedule/n etscheduled.ini http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/app/netschedule/n etscheduled.ini

10 ----------------

11 High availability All central components (queue and data storage) are duplicated All components are controlled by NCBI load balancer Protection against back-end (remote CGI) failures - by timeout or via explicit rescheduling

12 Features Back end servers can run requests for more than 30 seconds (WEB timeout) Easy application migration. (Minor code tweak and recompilation) Backend machines can send progress messages (feedback to the user)

13 Worker node API High level design (use of C++ streams, compatibility with ASN.1 serialization) Support of SMP (node can run parallel jobs) Remote administrative access to worker nodes (shutdown, availability check, statistics)

14 CLIENT (Web browser, GET, POST) WEB Server GRID Infrastructure GRID Worker Node my_cgi.cgi cgi2rcgi NetSchedule NetCache remote_cgi Original CGI executable (my_cgi.cgi) Original CGI, rebuilt after a 2-line code change my_cgi.cgi --- HTTP Job Info Job Data Converting a CGI into a GRID Worker Node Network

15 How to convert an existing CGI? #include ………….. class CRemoteCgiAppSample : public CRemoteCgiApp { ……. }; void CRemoteCgiAppSample::Init() { // Standard CGI framework initialization CRemoteCgiApp::Init(); ………. } int CRemoteCgiAppSample::ProcessRequest( CCgiContext& ctx ) { …………….. PutProgressMessage( “Work in progress"); ……………... }


Download ppt "NetSchedule Push-Pull Model Queue 1Queue 2 Job 1 Job 2 ….. Job 3 NetSchedule server maintains several FIFO queues Push Job Pull Job."

Similar presentations


Ads by Google