Presentation is loading. Please wait.

Presentation is loading. Please wait.

WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman.

Similar presentations


Presentation on theme: "WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman."— Presentation transcript:

1 WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman Unsal 1, Adrian Cristal 1, Eduard Ayguade 1, 2, Tim Harris 4, Mateo Valero 1, 2 1 Barcelona Supercomputing Center, 2 Universitat Politecnica de Catalunya, 3 Belgrade University, 4 Microsoft Research Cambridge UK

2 Outline Transactional Memory Idea Motivation WormBench Features WormBench main components WormBench input – run configuration Analysis Modeling STAMP’s genome Conclusion

3 Transactional Memory atomic { }

4 Idea Inspired by the Snake game Worms are active objects Worms move in a BenchWorld On every move Worms do computation

5 Motivation - General We don’t know how exactly to write TM applications 1:1 Converting applications from locks is not correct approach –For example, is it the same to convert lock based application into message passing synchronization 1:1?

6 Motivation - Existing TM Applications (1/2) STAMP [IISWC’2008] –specific to TL2 [ISCA’2007] –does not have lock based implementation –tm_write() and tm_read() carefully used – thus assuming perfect compiler STMBench7 [EuroSys’2007] –Suitable for STM –Too big data structures (700.000 bytes); too long transactions (10 tx/s)

7 Motivation - Existing TM Applications (2/2) SPLASH-2 [ISCA’1995] –Embarrassingly parallel –Fine grain locking –Not suitable for the intended TM usage pattern (coarse grain locking) Haskell STM Benchmark [CF’2007] –Implemented in declarative language –Depends on language and type system enforced constraints (TVar, monads)

8 WormBench’s Goal Unify the features of existing TM applications A tool for instrumenting multi-threaded applications Set of run configurations to serve as a baseline to evaluate TM systems among each other and locks Specific run configuration that stresses a particular design or implementation aspect of a TM system such as the sizes of internally used buffers.

9 WormBench Features (1/2) Implemented in imperative language C# –Compiling with Bartok Follows the object oriented programming concepts Critical sections are marked with atomic –Can be used to test the compiler infrastructure Represents typical parallel application with shared data Highly configurable through run configurations

10 WormBench Features (2/2) Suitable for HTM, STM and Hybrid TM variants No assumptions about TM system design and implementation Lock based and transactional implementation for comparison purposes Sanity check verification for the underlying TM system

11 Main Objects in MainBench BenchWorld –BenchWorldNode Worm –Body –Head Message

12 Example Worm –Body length 8 –Head Size 4 Operations –Sum – ahead –Average – right –Min - ahead

13 WormBench Input – Run configuration Size of the BenchWorld; Number of worms (number of threads); Body length of each worm; Head size of each worm; The number and type of worm operations that each worm has to perform while moving

14 Instantiates Common Sync Scenarios (1/2) Object access serializability –Guarding a shared variable with locks Two phase locking and its derivatives –Locking protocol which attempts non-blocking fine grain locking avoiding dead-lock Multiple granularity locking –Fine-grain locking technique used to lock a region in a collection/hierarchical data structure

15 Instantiates Common Sycnh Scenarios (2/2) Dining Philosophers –Deadlock scenario Barrier synchronization –Worms wait until all the group (or all worms) reach certain point in execution

16 Retry or Conditional Atomic Retry Mostly neglected utilization of retry or conditional atomic.

17 Currently Available Worm Operations (1/2) Read-only –Sum –Average –Min –Max –Median I/O –Checkpoint –Undo

18 Currently Available Worm Operations (1/2) Read dominated –Replace min with average –Replace max with average –Replace median with average –Replace min and max Write dominated –Sort –Transpose Leave message – for complex synchronization scenarios –Goto node message

19 Worm Operations – Execution Distribution OpB[1.1]H[1.1]B[4.4]H[4.4]B[8.8]H[8.8]B[1.8]H[1.8] Sum0.420.4330.1940.31 Avg0.420.2780.3240.434 Median0.8393.6499.3545.14 Min0.3150.5880.2780.372 Max0.420.5880.330.537 Rep Max with Avg1.3640.7110.4270.743 Rep Min with Avg0.7350.7420.5370.702 Rep Med with Avg2.5184.79311.4126.689 Rep Max and Min2.0990.5880.6340.929 Rep Med with Min2.7285.00911.1867.06 Rep Med and Max2.5185.25711.3877.122 Sort1.6796.58611.2577.184 Transpose1.1543.2472.3693.365 Checkpoint1.121.451.9821.522 Undo1.061.321.851.488 Total19.38935.23963.52142.597

20 Worm Operations – Execution Distribution OpB[1.1]H[1.1]B[4.4]H[4.4]B[8.8]H[8.8]B[1.8]H[1.8] Sum0.420.4330.1940.31 Avg0.420.2780.3240.434 Median0.8393.6499.3545.14 Min0.3150.5880.2780.372 Max0.420.5880.330.537 Rep Max with Avg1.3640.7110.4270.743 Rep Min with Avg0.7350.7420.5370.702 Rep Med with Avg2.5184.79311.4126.689 Rep Max and Min2.0990.5880.6340.929 Rep Med with Mix2.7285.00911.1867.06 Rep Med and Max2.5185.25711.3877.122 Sort1.6796.58611.2577.184 Transpose1.1543.2472.3693.365 Checkpoint1.121.451.9821.522 Undo1.061.321.851.488 Total19.38935.23963.52142.597

21 Worm Operations – TM Characteristics Op 1248 RWRWRWRW 1113 4 6 10 2113 4 6 10 3113 4 6 10 4113 4 6 10 5113 4 6 10 6144 5 7 11 7144 5 7 11 8144 5 7 11 9144 5 7 11 10144 5 7 11 144 5 7 11 12164 5 7 11 13164 5 7 11 14113 4 6 15113 4 6 Worm Head Size is fixed to 1 and body length is 1, 2, 4, 8

22 Worm Operations – TM Characteristics Op 1248 RWRWRWRW 1113 4 6 10 2113 4 6 10 3113 4 6 10 4113 4 6 10 5113 4 6 10 6144 5 7 11 7144 5 7 11 8144 5 7 11 9144 5 7 11 10144 5 7 11 144 5 7 11 12164 5 7 11 13164 5 7 11 14113 4 6 15113 4 6 Worm Head Size is fixed to 1 and body length is 1, 2, 4, 8

23 Worm Operations – TM Characteristics Op 1248 RWRWRWRW 1113143263743 2113143263743 3113143263743 4113143263743 5113143263743 6144174294774 7144174294774 8144174294774 9144175295775 10144175295775 11144175295775 1216419731197967 1316419731197967 14113143263743 15113143263743 Body Length is fixed to 1 and head size is 1, 2, 4, 8

24 Worm Operations – TM Characteristics Op 1248 RWRWRWRW 1113143263743 2113143263743 3113143263743 4113143263743 5113143263743 6144174294774 7144174294774 8144174294774 9144175295775 10144175295775 11144175295775 1216419731197967 1316419731197967 14113143263743 15113143263743 Body Length is fixed to 1 and head size is 1, 2, 4, 8

25 Analyzing Sample Run Configurations Lock vs Transactions Change in BenchWorld size Change in worms’ body length and size Initialization of worms for smaller BenchWorld

26 Lock vs Transactions

27 Throughput ~ Worms ~ BenchWorld Relationship between throughput, worms’ size and BenchWorld

28 Initializing Worms for Smaller BenchWorld How the conflict rate is affected when worms are initialized for smaller BenchWorld. Averaged Results Worms initialized for 128x128 and run in 128x, 256x, 512x and 1024xsize BenchWorld

29 Modeling Genome App. From STAMP To obtain the results shown on Table IV we used the following run configuration: –Worms body length = 1 –Worms head size = 4 –BenchWorld of size 52x52 –Worm Operations: Randomly generated stream of worm operations, where the ration between the worm operations was Operations (1:2:3:4:5:6:7:8:9:10:11:12:13:14:15) = Ration(1:1:1:0:0:2:1:1:1:1:1:1:2:0:0) T# Commit RateRead per TXWrite per TXSpeedup Gen.WBGen.WBGen.WBGen.WB 11136.36231.4801.3741.96211 20.998 34.26031.6091.3731.9622.1771.4 40.9940.99537.97431.8151.3711.9623.4742.2 80.9850.98746.21932.3001.3771.9635.4352.867

30 Future Work Toolset that automatically generates a run configuration representing a user defined transactional and runtime behavior, e.g.: –Commit rate 80% –Reads per TX = 6 –Writes per TX 2 –Runtime = 100 moves/ms Implement BenchWorld as –Linked list –Sparse matrix

31 Future Work Understand how the Messaging works in BenchWorld Prepare a baseline set of run configurations to benchmark TM systems (HTM, STM and hybrid TMs) Fine grain version using two-phase locking

32 Conclusion WormBench highly configurable workload for TM TM design and implementation independent Critical sections defined by language level atomic blocks Coarse lock based version Sanity check for the overall TM system But still small that does not exercise language extensions for TM and their semantics

33 Край


Download ppt "WormBench A Configurable Application for Evaluating Transactional Memory Systems MEDEA Workshop 26.10.2008 Ferad Zyulkyarov 1, 2, Sanja Cvijic 3, Osman."

Similar presentations


Ads by Google