Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture NetSchedule -Queue 1 -Queue 2 -…. Client-submitter Client Waiting for job Worker Node (Active) Job 1Job 2Job 3 Worker Node (Waiting for a.

Similar presentations


Presentation on theme: "Architecture NetSchedule -Queue 1 -Queue 2 -…. Client-submitter Client Waiting for job Worker Node (Active) Job 1Job 2Job 3 Worker Node (Waiting for a."— Presentation transcript:

1 Architecture NetSchedule -Queue 1 -Queue 2 -…. Client-submitter Client Waiting for job Worker Node (Active) Job 1Job 2Job 3 Worker Node (Waiting for a new job) Notification (UDP/IP) GetStatus() Push – Pull design (NetSchedule server is passive) Worker nodes can subscribe for queue events notifications Using connection-less network protocol GetJob() PutResult()

2 NetSchedule Queue Design Network Communication front-end Queue 1 FSM (in-memory) Pending : 0 0 1 …………. Running : 0 1 0 …………. …… Done : 1 0 0 Data Base Queue 2 FSM (in-memory) Pending : 0 1 1 …………. Running : 0 0 0 …………. …… Done : 1 0 0 Data Base Clients & Worker Nodes

3 NetCache Server -keep input & output BLOBs 1.Put Input Data CGI: -Submit job -Wait for result (off–line) -Render result as HTML 1.1 Data key (inp_blob_key) NetSchedule Server: - Control job queue 2. Job Submit (inp_blob_key) 3. Get Job 4. Get Input data (inp_blob_key) 5. Put Result5.1. Data key (out_blob_key) 2.1 (job_key) 6. Put Job Result (out_blob_key) 7. Get Result (job_key) 7.1 (out_blob_key) 8. Get Output Data (out_blob_key) 8.1 Output data Worker Node: -Get new job -Get input data (BLOB) -Do the job -Submit output BLOB -Submit job result

4 NetSchedule performance Submit 5000 jobs Done. Elapsed: 2.147625 sec. Avg time: 0.000430 sec. GetStatus 5000 jobs........Elapsed :1.428061 sec. Avg time :0.000286 sec. Take-Return jobs... Returned 2500 jobs. Jobs processed: 2500 Elapsed: 1.848050 sec. Avg time: 0.000739 sec. Test environment: Linux-to-Linux 2 CPU machine, One active submitter Performance optimization factors: - In-memory finite state machine for job status tracking with O(const) access time - Real-time database without inter-process communication overhead - Optimized communication protocol, use of datagram (UDP) for message notification

5 Fault tolerance Run under NCBI Load Balancer Run under NCBI Load Balancer NetSchedule: balanced (2 instances) NetSchedule: balanced (2 instances) NetCache: balanced (2 instances) (NetCache can be replaced with MS-SQL) NetCache: balanced (2 instances) (NetCache can be replaced with MS-SQL) Timeout protection against worker node failure Timeout protection against worker node failure Use of transactional database manager (Berkeley DB) Use of transactional database manager (Berkeley DB)

6 Job Timeout Protection NetScheduleWorker node 1 Worker node 2 Job N Estimated job execution time (based on input analysis) 50 seconds

7 Job Timeout Protection NetScheduleWorker node 1 Worker node 2 Job N Node Failure! Job N expired and rescheduled Job N Timer: Job N

8 Worker node API High level design (use of C++ streams, compatibility with ASN.1 serialization) High level design (use of C++ streams, compatibility with ASN.1 serialization) Support of SMP (node can run parallel jobs) Support of SMP (node can run parallel jobs) Remote administrative access to worker nodes (shutdown, availability check) Remote administrative access to worker nodes (shutdown, availability check)

9 Worker node sample #include USING_NCBI_SCOPE; class CSampleJob : public IWorkerNodeJob { public: CSampleJob() {} virtual ~CSampleJob() {} int Do(CWorkerNodeJobContext& context) { // 1. Get an input data from the client CNcbiIstream& is = context.GetIStream(); int count; is >> count;...... // 2. Doing some time consuming job here // SleepMicroSec(3000); // 3. Return the result to the client CNcbiOstream& os = context.GetOStream(); os << dvec.size() << ' ';.... context.CommitJob(); return 0; } }; class CSampleJobFactory : public IWorkerNodeJobFactory { public: virtual ~CSampleJobFactory() {} virtual IWorkerNodeJob* CreateInstance(void) { return new CSampleJob; } virtual string GetJobVersion() const { return "Sample Job worker node version 1.0.1"; } }; int main(int argc, const char* argv[]) { CGridWorkerApp app(new CSampleJobFactory); return app.AppMain(argc, argv, 0, eDS_Default, "grid_worker_sample.ini"); } Sample is based on See also


Download ppt "Architecture NetSchedule -Queue 1 -Queue 2 -…. Client-submitter Client Waiting for job Worker Node (Active) Job 1Job 2Job 3 Worker Node (Waiting for a."

Similar presentations


Ads by Google