PADS Conservative Simulation using Distributed-Shared Memory Teo, Y. M., Ng, Y. K. and Onggo, B. S. S. Department of Computer Science National University of Singapore
PADS Improve performance of SPaDES/Java by reducing overhead: Synchronization of events Distributed communications Study the memory requirements in parallel simulations. Objectives
PADS Presentation Outline Parallel Simulation Null Message Protocol Performance Improvement Memory Requirement Conclusion
PADS Parallel Simulation Sequential simulations execute on a single thread in one processor. Ideally, parallelizing the simulation should enhance its real-time performance since the workload is distributed. The need to maintain causality throughout a parallel simulation => Event synchronization protocols. => Adds to inter-process communications. => New bottleneck!
PADS Null Message Protocol First designed by Chandy and Misra (1979). Prevents deadlock situations between LPs. LP i sends null messages to each of its neighbours at the end of every simulation pass, with timestamp = local virtual time of LP i. Timestamp on null message, T, indicates that the source LP will not send any messages to other LPs before T.
PADS LP Null Message Protocol Clock = 4 LP FEL 4 7 LP
PADS Chandy-Misra-Byrant’s (CMB) protocol performs poorly due to high null message overhead. It transmits null msgs on every simulation pass NMR ~> 1 for nearly all [0, T). Optimizations incorporated: Carrier-null message scheme Flushing mechanism Demand-driven null message algorithm Remote communications using JavaSpace Performance Improvement
PADS Carrier-Null Message Algorithm Problem with cyclic topologies Use carrier-null message algorithm (Wood, Turner, 1996) Avoids transmissions of redundant null messages in such cycles.
PADS Output Channel (A) REQ 30 Request Channel (B) Logical Process (A) Logical Process (B) FEL Flusher Performance Improvement Demand driven null messaging + flushing
PADS Experiments conducted using PC cluster of 8 nodes running RedHat Linux version 7.0. Each node is a Pentium II 400 MHz processor with 256 MB of memory connected through 100 Mbps switch. 2 benchmark programs PHOLD system Linear Pipeline Performance Evaluation
PADS PHOLD (3x3, m) Node Closed system
PADS Linear Pipeline (4, ) Open system Service Center Service Center Service Center Service Center Customer population Depart
PADS PHOLD (n x n, m) CMB + Carrier-Null + Flushing + Demand-driven null msging
PADS Linear Pipeline (n, ) CMB + Carrier-Null + Flushing + Demand-driven null msging
PADS %tage Reduction in NMR: PHOLD system CMB Carrier-null 30% Flushing incorporated 42% Demand-driven null msg 55% Linear Pipeline CMB Carrier-null 0% Flushing incorporated 23% Demand-driven null msg 35% Performance Summary
PADS Distributed Communications Originally, SPaDES/Java uses the RMI library to transmit messages between remote LPs. But the serialization phase presents a bottleneck. Previous performance optimization effort: message deflation. Only solution to overcome remote communications overhead => send less messages. How? Target at null messages.
PADS JavaSpaces A special Java-Jini service developed by Sun Microsystems, Inc., built on top of Java’s RMI, mimicking a tuple space. Abstract platform for developing complex distributed applications. Distributed data persistence. Holds objects, known as entries, with variable attribute types. Key concept: matching of attribute types/values.
PADS JavaSpaces Client write Notifier notify read take 4 generic operations: write, read, take and notify.
PADS Replace the RMI communication module in SPaDES/Java with one running on a single JavaSpace. Use a FrontEndSpace: permits crash recovery of entries in the space. Transmission of processes and null messages between remote hosts go through theFrontEndSpace as space entries. Distributed Communications
PADS LP1LP2 Space Communications : Processes Time = 0 Time = t > 0 SProcess receiver = 1 SProcess sender = 2 receiver = 1 …….. SProcess receiver = 2
PADS LP1LP2 Space Communications : Null Messages NullMsg sender = 2 …….. Req sender = 2 LP3 LP4 Req sender = 2
PADS Performance Evaluation – PHOLD(n x n, m) RMI JavaSpace (4 procs) JavaSpace (8 procs)
PADS Overall Performance Evaluation – PHOLD(n x n, m) CMB + Carrier-Null + Flushing + Demand-driven null msging JavaSpace (4 procs) JavaSpace (8 procs)
PADS %tage Reduction in NMR: CMB Carrier-null 30% Flushing incorporated 42% Demand-driven null msg 55% JavaSpace (4 processors) 63% JavaSpace (8 processors) 74% Performance Summary
PADS M prob n i=1 MaxQueueSize(LP i ) M ord n i=1 MaxFELSize(LP i ) M sync n i=1 MaxNullMsgBufferSize(LP i ) Memory Requirement
PADS Memory Requirement
PADS Achievements & Conclusion Enhanced the performance of SPaDES/Java through various synchronization protocols, achieving an excellent NMR of < 30%. Implemented a brand new discrete-event simulation library based on the concept of shared memory in a JavaSpace. Implemented a TSA into SPaDES/Java that can be used as a bench for memory usage studies in parallel simulations.
PADS Acknowledgments Port of Singapore Authority (PSA) Ministry of Education, Singapore Constructive feed-back from referees
PADS References SPaDES/Java homepage Current project webpage MSG homepage