Enterprise Job Scheduling for Clustered Environments Stratos Paulakis, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing R esearch.

Enterprise Job Scheduling for Clustered Environments Stratos Paulakis, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing R esearch G roup C ommunication N etworks L aboratory Department of Informatics and Telecommunications University of Athens – Greece Santorini ‘07 @ Greece

Outline Introduction System Design System Implementation Performance Evaluation & Comparison with Quartz Scheduler Conclusions

Introduction System-level scheduling vs. Application-level scheduling PoLoS platform for LBS –IST FP5 Project –Scheduler was a core architectural component (time- triggered SMS/WAP services) Clustering: modern solution for scalability and fault-tolerance in enterprise systems Objective –Design and implementation of a cluster-aware version of the original Scheduler

Functional Requirements Time-accuracy and low delay –Jobs should commence execution as close as possible to their registered time –Delay tolerance depends on the application Efficiency and scalability –High throughput is mandatory in large-scale applications Robustness through job persistence –System crashes should not result in data loss –It imposes a performance overhead High availability and fault tolerance –Near-zero downtime –No missed job execution Logging –Billing, administration, SLAs

Technical Requirements Asynchronous decoupling –The scheduling process should run independently from the job executions Parallel job execution –Multi-threaded job execution Load balancing –Maximizing utilization of available resources –Client- or server-side Clustering –Deals with most requirements –Challenge: global timer is a singleton object

Related Systems KronovaQuartzFlux Custom Java Jobs√√√ Simple Time-Scheduling (start/stop time, period) √√√ Event-Driven Scheduling√√√ API√√√ JMX-compatibleX√X Logging√√√ JTA-enabledX√X Tracking DataX√X Clustering√√√ Load-Balancing√√√ Fault-Tolerance (Fail- Over) PoorModerate (the only point of failure is the DB) Poor

Architecture Scheduling Management Caching Execution Scheduling Management Caching Execution Management Caching Execution Scheduling Queue Node ANode B Node C - Master

Caching Subsystem Distributed in-memory cache Synchronous and asynchronous replication Optimistic and pessimistic locking A DB is asynchronously updated Implementation: JBossCache –Aspect-oriented programming techniques for cache updates

Scheduling Subsystem JMX Timer –JBoss implementation Java 5.0 JMX Timer class has limitations in multi-threading (high timer delays) Singleton design pattern –one instance may be instantiated across the cluster Recovers job trigger times after master node crash Cache job data Put job into queue Job trigger

Performance Evaluation Setup Setup –Cluster with 2 nodes (AMD 64bit - 3.4 GHz, 1GB RAM) –JBoss 4.0.4, MySQL 5.0 –JMeter for workload generation, JProfiler for performance profiling Metrics –Maximum throughput: scalability measure –Delays Average total delay T1: timer delay, T2: persistence delay, T3: queuing delay

Performance Evaluation Setup II PoLoS - Synchronous replication PoLoS - Asynchronous replication Clustered Quartz JMeter parameters: –Discrete user requests: 1000, 2000, …, 4000 –Ramp up period: 120 seconds –Repetitions: 10 –Period: variable (60, 120, 180, …, 300 seconds) –Job logic: a logging command

Maximum Throughput Original non-clustered scheduler: ~3000 jobs/min PoLoS sync: 2998 jobs/min - 6000 user requests PoLoS async: 2810 jobs/min – 6000 user requests Quartz: 240 jobs/min – 4000 user requests –Transaction isolation errors during persistence  high latency –Server crashed for more than 4000 user requests Job period = 120 sec

Delays Job period = 60 sec 1000 jobsT2 Polos sync3.2 ms Polos async0.75 msec Quartz91905 msec

Delay Distribution T1T2 Polos sync41549 ms7.1 ms Polos async490 ms1 ms Quartz120 ms198550 ms 2000 jobs with period 60 sec

Conclusions JBoss JMX timer resulted in lower timer delays (T1) but the MDBs could not operate at that high rate Asynchronous replication is much more efficient than synchronous When the maximum throughput is reached, delays increase dramatically

Thank You! Questions??? http://p-comp.di.uoa.gr

Enterprise Job Scheduling for Clustered Environments Stratos Paulakis, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing R esearch.

Similar presentations

Presentation on theme: "Enterprise Job Scheduling for Clustered Environments Stratos Paulakis, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing R esearch."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Enterprise Job Scheduling for Clustered Environments Stratos Paulakis, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing R esearch.

Similar presentations

Presentation on theme: "Enterprise Job Scheduling for Clustered Environments Stratos Paulakis, Vassileios Tsetsos, and Stathes Hadjiefthymiades P ervasive C omputing R esearch."— Presentation transcript:

Similar presentations

About project

Feedback