Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

Similar presentations


Presentation on theme: "Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm."— Presentation transcript:

1 Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm By : Priya Limaye

2 Locality What is Locality of reference?

3 Locality What is Locality of reference? sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; } sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; }

4 Locality What is Locality of reference? sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; } sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; } Temporal Locality Recently accessed data and instruction are likely to be accessed in near future Temporal Locality Recently accessed data and instruction are likely to be accessed in near future

5 Locality What is Locality of reference? sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; } sum = 0; for (int i = 0; i < 10; i ++) { sum = sum + number[i]; } Spatial Locality Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future. Spatial Locality Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future.

6 Locality What is Locality of reference? – Recently accessed data and instructions and nearby data and instructions are likely to be accessed in the near future. – Grab a larger chunk than you immediately need – Once you’ve grabbed a chunk, keep it

7 Locality in multiprocessor Computation depends on data local to processor – Each processor uses data from its own cache – Once data is brought in cache it stays there

8 Locality in multiprocessor Memory CPU Cache CPU Cache Counter

9 Counter: Shared Memory CPU 0

10 Counter: Shared Memory CPU 0 0

11 Counter: Shared Memory CPU 1 1

12 Counter: Shared Memory CPU 1 1 1 Read : OK

13 Counter: Shared Memory CPU 2 2 Invalidate

14 Comparing counter 1.Scales well with old architecture 2.Performs worse with shared memory multiprocessor 1.Scales well with old architecture 2.Performs worse with shared memory multiprocessor

15 Counter: Array Sharing requires moving back and forth between CPU Caches Split counter into array Each CPU get its own counter

16 Counter: Array Memory CPU 00

17 Counter: Array Memory CPU 1 10

18 Counter: Array Memory CPU 1 1 11

19 Counter: Array Memory CPU 1 1 11 2 Read Counter Add All Counters (1 + 1)

20 Counter: Array This solves the problem What about performance?

21 Comparing counter Does not perform better than ‘shared counter’.

22 Counter: Array This solves the problem What about performance? What about false sharing?

23 Counter: False Sharing Memory CPU 0,0

24 Counter: False Sharing Memory CPU 0,0 CPU 0,0

25 Counter: False Sharing Memory CPU 0,0 CPU 0,0 Sharing

26 Counter: False Sharing Memory CPU 1,0 CPU 1,0 Invalidate

27 Counter: False Sharing Memory CPU 1,0 CPU 1,0 Sharing

28 Counter: False Sharing Memory CPU 1,1 Invalidate

29 Solution? Use padded array Different elements map to different locations

30 Counter: Padded Array Memory CPU 00

31 Counter: Padded Array Memory CPU 1 1 11 Update independent of each other

32 Comparing counter Works better

33 Locality in OS Serious performance impact Difficult to retrofit Tornado – Ground up design – Object Oriented approach – Natural locality

34 Tornado Object Oriented Approach Clustered Objects Protected Procedure Call Semi-automatic garbage collection – Simplified locking protocol

35 Object Oriented Approach Process 1 Process 2 … … Process Table

36 Object Oriented Approach Process 1 Process 2 … … Process Table Process 1 Lock

37 Object Oriented Approach Process 1 Process 2 … … Process Table Process 1 Lock Process 2

38 Object Oriented Approach Process 1 Process 2 … … Process Table Process 1 Lock Process 2 Lock

39 Object Oriented Approach Class ProcessTableEntry{ data lock code }

40 Object Oriented Approach Each resource is represented by different object Requests to virtual resources handled independently – No shared data structure access – No shared locks

41 Object Oriented Approach Process Page Fault Exception

42 Object Oriented Approach Process Page Fault Exception Region

43 Object Oriented Approach Process Page Fault Exception Region FCM FCMFile Cache Manager

44 Object Oriented Approach HAT Process RegionFCM RegionFCM HATHardware Address Translation FCMFile Cache Manager Search for responsibl e region Page Fault Exception

45 Object Oriented Approach Process Page Fault Exception Region FCM COR DRAM FCMFile Cache Manager CORCached Object Representative DRAMMemory manager

46 Object Oriented Approach Multiple implementations for system objects Dynamically change the objects used for resource Provides foundation for other Tornado features

47 Clustered Objects Improve locality for widely shared objects Appears as single object – Composed of multiple component objects Has representative ‘rep’ for processors – Defines degree of clustering Common clustered object reference for client

48 Clustered Objects

49 Clustered Objects : Implementation

50 A translation table per processor – Located at same virtual address – Pointer to rep Clustered object reference is just a pointer into the table ‘reps’ created on demand when first accessed – Special global miss handling object

51 Counter: Clustered Object Counter – Clustered Object CPU rep 1 Object Reference

52 Counter: Clustered Object Counter – Clustered Object CPU 1 1 rep 1 Object Reference

53 Counter: Clustered Object Counter – Clustered Object CPU 2 1 rep 2rep 1 Object Reference Update independent of each other

54 Clustered Objects Degree of clustering Multiple reps per object – How to maintain consistency ? Coordination between reps – Shared memory – Remote PPCs

55 Counter: Clustered Object Counter – Clustered Object CPU 1 1 rep 1 Object Reference

56 Counter: Clustered Object rep 1 Object Reference Counter – Clustered Object CPU 1 1 rep 1 Read Counter

57 Counter: Clustered Object rep 1 Object Reference Counter – Clustered Object CPU 1 1 2 rep 1 Add All Counters (1 + 1)

58 Clustered Objects : Benefits Facilitates optimizations applied on multiprocessor e.g. replication and partitioning of data structure Preserves object-oriented design Enables incremental optimizations Can have several different implementations

59 Synchronization Two kinds of locking issues – Locking – Existence guarantees

60 Synchronization: Locking Encapsulate locking within individual objects Uses clustered objects to limit contention Uses spin-then-block locks – Highly efficient – Reduces cost of lock/unlock pair

61 Synchronization: Existence guarantees All references to an object protected by lock – Eliminates races where one thread is accessing the object and another is deallcoating it Complex global hierarchy of locks Tornado - semi automatic garbage collection – Clustered object reference can be used any time – Eliminates needs for locks

62 Garbage Collection Distinguish between temporary references and persistent references – Temporary: clustered references held privately – Persistent: shared memory, can persist beyond lifetime of a thread

63 Garbage Collection Remove all persistent references – Normal cleanup Remove all temporary references – Event driven kernel – Maintain counter for each processor – Delete object if counter is zero Destroy object itself

64 Garbage Collection 259 Process 1 Read

65 Garbage Collection 259 Process 1 Read Counter ++

66 Garbage Collection 259 Process 1 Read Counter = 1 Process 2 Delete

67 Garbage Collection 259 Process 1 Read Counter = 1 Process 2 Delete GC If counter = 0

68 Garbage Collection 259 Process 1 Counter-- Process 2

69 Garbage Collection 29 Process 1 Counter = 0 Process 2 GC If counter = 0

70 Interprocess communication Uses Protected Procedure Calls A call from client object to server object – Clustered object call that crosses protection domain of client to server Advantages – Client requests serviced on local processor – Client and server share processors similar to handoff scheduling – Each client request has one thread in server

71 PPC: Implementation On demand creation of server threads Maintains list of worker threads Implemented as a trap and some queue manipulations – Dequeue worker thread from ready workers – Enqueue caller thread on the worker – Return from-trap to the server Registers are used to pass parameters

72 Performance

73 Performance: summary Strong basic design Highly scalable Locality and locking overhead are major source of slowdown

74 Conclusion Object-oriented approach and clustered objects exploits locality and concurrency OO design has some overhead, but these are low compared to performance advantages Tornado scales extremely well and achieves high performance on shared-memory multiprocessors

75 References http://web.cecs.pdx.edu/~walpole/class/cs51 0/papers/05.pdf http://web.cecs.pdx.edu/~walpole/class/cs51 0/papers/05.pdf Presentation by Holly Grimes, CS 533, Winter 2008 http://en.wikipedia.org/wiki/Locality_of_refer ence http://en.wikipedia.org/wiki/Locality_of_refer ence


Download ppt "Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm."

Similar presentations


Ads by Google