Download presentation
Presentation is loading. Please wait.
Published byMoris Ward Modified over 9 years ago
1
1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering
2
2 Multiplayer games Captivating, highly popular Dynamic artifacts
3
3 Multiplayer games - Long playing times: - More than 100k concurrent players
4
4 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” - Long playing times: - More than 100k concurrent players
5
5 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” 2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.” - Long playing times: - More than 100k concurrent players
6
6 Multiplayer games 1.World of Warcraft: “I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.” 2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.” - Long playing times: - More than 100k concurrent players - Game server is the bottleneck
7
7 Server scaling Game code parallelization is hard Complex and highly dynamic code Concurrency issues (data races) require conservative synchronization Deadlocks
8
8 State-of-the-art Parallel programming paradigms: Lock-based (pthreads) Transactional memory Previous parallelizations of Quake Lock-based [Abdelkhalek et. al ‘04] shows that false sharing is a challenge
9
9 Transactional Memory vs. Locks Advantages Simpler programming task Transparently ensures correct execution Shared data access tracking Detects conflicts and aborts conflicting transactions Disadvantages Software (STM) access tracking overheads
10
10 Transactional Memory vs. Locks Advantages Simpler programming task Transparently ensures correct execution Shared data access tracking Detects conflicts and aborts conflicting transactions Disadvantages Software (STM) access tracking overheads Never shown to be practical for real applications
11
11 Contributions Case study of parallelization for games synthetic version of Quake (SynQuake) We compare 2 approaches: lock-based and STM parallelizations We showcase the first realistic application where STM outperforms locks
12
12 Outline Application environment: SynQuake game Data structures, server architecture Parallelization issues False sharing Load balancing Experimental results Conclusions
13
13 Environment: SynQuake game Simplified version of Quake Entities: players resources (apples) walls Emulated quests
14
14 SynQuake Players can move and interact (eat, attack, flee, go to quest) Apples Food objects, increase life Walls Immutable, limit movement Contains all the features found in Quake
15
15 Game map representation Fast retrieval of game objects Spatial data structure: areanode tree
16
16 Areanode tree Game map Areanode tree Root node
17
17 Areanode tree Game map Areanode tree AB
18
18 Areanode tree A B Game map Areanode tree AB
19
19 A1 A2 B1 B2 Areanode tree A B A1A2B1B2 Game map Areanode tree
20
20 Areanode tree A1A2B1B2 A B Game map Areanode tree A1 A2 B1 B2
21
21 Server frame 1 2 3 Barrier
22
22 Server frame Receive & Process Requests 1 2 3 Server frame Client requests
23
23 Server frame Receive & Process Requests 1 2 3 Server frame Admin (single thread) Client requests
24
24 Server frame Client requests Receive & Process Requests Form & Send Replies Client updates 1 2 3 Server frame Admin (single thread)
25
25 Parallelization in games Quake - Locks-based synchronization [Abdelkhalek et al. 2004]
26
26 Parallelization: request processing Client requests Receive & Process Requests Form & Send Replies Client updates 1 2 3 Server frame Admin (single thread) Parallelization in this stage
27
27 Outline Application environment: SynQuake game Parallelization issues False sharing Load balancing Experimental results Conclusions
28
28 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
29
29 Collision detection Player actions: move, shoot etc. Calculate action bounding box
30
30 Action bounding box P1 Short- Range
31
31 Action bounding box P1 P2 Short- Range Long- Range
32
32 Action bounding box P1 P2 P3
33
33 Action bounding box P1 P2 P3 Overlap P1 Overlap P2
34
34 Player assignment P1 P2 P3 T1 T3 Players handled by threads If players P1,P2,P3 are assigned to distinct threads → Synchronization required Long-range actions have a higher probability to cause conflicts T2
35
35 False sharing
36
36 False sharing Move range Shoot range
37
37 False sharing Action bounding box with locks Move range Shoot range
38
38 False sharing Action bounding box with TM Action bounding box with locks Move range Shoot range
39
39 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
40
40 Synchronization algorithm: Locks Hold locks on parents as little as possible Deadlock-free algorithm
41
41 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3
42
42 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest
43
43 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest Leaves overlapped
44
44 Synchronization algorithm: Locks … AB 1 2 P1 P2 P4 P3 P5 P6 P AB A1A2 B1B2 Root P1 P2P5, P6P4 P3 Area of interest Leaves overlapped Lock parents temporarily
45
45 Synchronization: Locks vs. STM Locks: 1. Determine overlapping leaves (L) 2. LOCK (L) 3. Process L 4. For each node P in overlapping parents LOCK(P) Process P UNLOCK(P) 5. UNLOCK (L) STM: 1. BEGIN_TRANSACTION 2. Determine overlapping leaves (L) 3. Process L 4. For each node in P in overlapping parents Process P 5. COMMIT_TRANSACTION
46
46 Synchronization: Locks vs. STM Locks: 1. Determine overlapping leaves (L) 2. LOCK (L) 3. Process L 4. For each node P in overlapping parents LOCK(P) Process P UNLOCK(P) 5. UNLOCK (L) STM: 1. BEGIN_TRANSACTION 2. Determine overlapping leaves (L) 3. Process L 4. For each node in P in overlapping parents Process P 5. COMMIT_TRANSACTION STM acquires ownership gradually, reduced false sharing Consistency ensured transparently by the STM
47
47 Parallelization overview Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
48
48 Load balancing issues Assign tasks to threads Balance workload T1T2 T3 T4 P2 P1 P3 P4
49
49 Assign tasks to threads Cross-border conflicts → Synchronization T1T2 T3 T4 P2 P1 Move action Shoot action P3 P4 Load balancing issues
50
50 Load balancing goals Tradeoff: Balance workload among threads Reduce synchronization/true sharing
51
51 Load balancing policies a) Round-robin Y - Thread 3- Thread 4- Thread 1- Thread 2
52
52 Load balancing policies a) Round-robin Y - Thread 3- Thread 4- Thread 1- Thread 2 b) Spread
53
53 Load balancing policies c) Static locality-aware - Thread 3- Thread 4- Thread 1- Thread 2 b) Spread
54
54 Locality-aware load balancing Dynamically detect player hotspots and adjust workload assignments Compromise between load balancing and reducing synchronization
55
55 Dynamic locality-aware LB Game map Graph representation
56
56 Dynamic locality-aware LB Game map Graph representation
57
57 Experimental results Test scenarios Scaling: with and without physics computation The effect of load balancing on scaling The influence of locality-awareness
58
58 128 256 384 640 768 896 1024 512 01282563846407688961024512 - Quest 1 X Y Quest scenarios
59
59 Quest scenarios - Quest 3 128 256 384 640 768 896 1024 512 01282563846407688961024512 - Quest 4 X Y - Quest 1- Quest 2
60
60 Scalability
61
61 Processing times – without physics
62
62 Processing times – with physics
63
63 Load balancing
64
64 Quest scenarios (4 quadrants) Y static dynamic - Thread 3- Thread 4- Thread 1- Thread 2
65
65 Quest scenarios (4 splits) staticdynamic - Thread 3- Thread 4- Thread 1- Thread 2
66
66 Quest scenarios (1 quadrant) staticdynamic - Thread 3- Thread 4- Thread 1- Thread 2
67
67 Locality-aware load balancing (locks)
68
68 Conclusions First application where STM outperforms locks: Overall performance of STM is better at 4 threads in all scenarios Reduced false sharing through on-the-fly collision detection Locality-aware load balancing reduces true sharing but only for STM
69
69 Thank you !
70
70 Splitting components (1 center quest)
71
71 Load balancing (short range actions)
72
72 Locality-aware load balancing (STM)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.