Cristiana Amza University of Toronto
Once Upon a Time … … locks were painful for dynamic and complex applications …. e.g., Massively Multiplayer Games e.g., Massively Multiplayer Games
Massively Multiplayer Games Support many concurrent players and Low update interval to players
So, game developers said … “Forget locks ! “Forget locks ! We’ll use our secret sauce !” We’ll use our secret sauce !”
State-of-the-art in Game Code Ad-hoc parallelization: segments/shards Ad-hoc parallelization: segments/shards e.g., World of Warcraft/ Ultima Online Sequential code, admission control Sequential code, admission control e.g., Quake III
Ad-hoc Partitioning (segments) Countries, rooms
Artificial Admission Control Admission control Gateways E.g., airports, doors
But, gamers said … ”We want to interact, and we hate lag !”
Problem with State-of-the-art Flocking Flocking Players move to one area e.g., quests Players move to one area e.g., quests Overload the server hosting the hotspot Overload the server hosting the hotspot
So I said … Forget painful locks ! Transactional Memory will make game developers and players happy ! Story endorsed by Intel (fall of 2006).
Our Goals Parallelize server code into transactions Easy to thread any game Dynamic load balance of tx on any platform e.g., clusters, multi-cores, mobile devices … Beats locks any day !
Ideal solution: Contiguous world Seamless partition Players can “see” across partition boundaries Players can smoothly transfer Regardless of game map
Challenge: On Multi-core Inter-thread conflicts Mostly at the boundary
Roadmap The game Parallelization Using TM Compiler code transformations for TM Runtime TM design choices Dynamic load balancing of tx in game
15 Game Benchmark (SimMud) Interactions: player - Obj, player - player Players can move and interact Food objects Terrain fixed, restricts movement
16 Game Benchmark (SimMud) Actions: move, eat, fight Quest: flocking of players to a spot on the game map
17 Flocking in SimMud S1 S3 S2 S4 Quest
Parallelization of Server Code Process Requests Form & Send Replies Rx Tx Select Read-only phase Read-Write phase
Example: “Move” Request Move(){ region1->removePlayer( p ); region2->addPlayer( p ); }
Parallelize Move Request Insert “atomic” keyword in code Compiler makes it a transaction Ex:#pragma omp critical / __tm_atomic { region1->removePlayer( p ); region2->addPlayer( p ); }
Ex: SimMud Data Structure struct Region { int x, y; int x, y; int width, height; int width, height; set_t* players; set_t* players; set_t* objects; set_t* objects;}
Example Code for Action Move void movePlayer( Player* p, int new_x, int new_y ) { Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; }}
Manual Transformations (Locks) void movePlayer( Player* p, int new_x, int new_y ) { lock_player( p); lock_player( p); Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); lock_regions( r_old, r_new ); lock_regions( r_old, r_new ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } unlock_regions( r_old, r_new ); unlock_regions( r_old, r_new ); unlock_player( p->lock ); unlock_player( p->lock );}
Manual Transformations (TM) void movePlayer( Player* p, int new_x, int new_y ) { #pragma omp critical { #pragma omp critical { Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } }}
My Story TM will make game developers and players happy ! So far, the developers should be ! So far, the developers should be !
It Gets Worse for Locks Move May impact objects within bounding box Short-range or long-range Lock all impacted objects need to search objects Top-view of world Short-range Long-range Objects
Each region corresponds to a leaf Top-view of World e.g., Quake III Area Node Tree 27
Each region corresponds to a leaf Lock all leaf nodes in bounding box atomically atomically Top-view of World Overlapping regions e.g., Quake III Area Node Tree 28
29 – Objects linked to leaf nodes – If cross leaf boundary, link to parent node Non-Overlapping regions Top-view of world Object lists Region leafs Objects cross boundary Area Node Tree – Even Worse !
30 – Need to lock parent nodes – False Sharing – The whole tree may be locked Non-Overlapping regions Top-view of world Object lists Region leafs Objects cross boundary Area Node Tree – Even Worse !
My Story TM will make game developers and players happy ! Lock down a whole box/tree, vs just read/write what you need in TM. Players should be happy too !
Compiler/Runtime TM Support Compiler Automatic source transformations to tx Runtime track accesses resolve conflicts between transactions adapt to application pattern
Manual Transformations (TM) void movePlayer( Player* p, int new_x, int new_y ) { i #pragma omp critical { #pragma omp critical { Region* r_old = getRegion( p->x, p->y ); Region* r_old = getRegion( p->x, p->y ); Region* r_new = getRegion( new_x, new_y ); Region* r_new = getRegion( new_x, new_y ); if( isVacant_position( r_new, new_x, new_y ) ) if( isVacant_position( r_new, new_x, new_y ) ) { set_remove( r_old->players, p ); set_remove( r_old->players, p ); set_insert( r_new->players, p ); set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } }}
Automatic Transformations (TM) void tm_movePlayer( tm_Player* p, int new_x, int new_y ) { Begin_transaction; Begin_transaction; tm_Region* r_old = tm_getRegion( p->x, p->y ); tm_Region* r_old = tm_getRegion( p->x, p->y ); tm_Region* r_new = tm_getRegion( new_x, new_y ); tm_Region* r_new = tm_getRegion( new_x, new_y ); if( tm_isVacant_position( r_new, new_x, new_y ) ) if( tm_isVacant_position( r_new, new_x, new_y ) ) { tm_set_remove( r_old->players, p ); tm_set_remove( r_old->players, p ); tm_set_insert( r_new->players, p ); tm_set_insert( r_new->players, p ); p->x = new_x; p->y = new_y; p->x = new_x; p->y = new_y; } Commit_transaction; Commit_transaction;}
Automatic Transformations (TM) struct tm_Region { tm_int x, y; tm_int x, y; tm_int width, height; tm_int width, height; tm_set_t* players; //recursively re-type tm_set_t* players; //recursively re-type tm_set_t* objects; //nested structures tm_set_t* objects; //nested structures}
Compiler TM code translation #pragma begin/end Re-type variables: tm_shared<> or tm_private<>
TM Runtime (libTM) Access Tracking: tm_type<> Operator overloading for intercepting reads and writes Access Granularity: basic-type level Conflict detection and resolution Several design choices
TM Conflict Resolution Choices Pessimistic Reader/Writer Locks Read Optimistic Only writer locks Fully Optimistic ~No locks Adaptive
Pessimistic A transaction (tx) locks an object before use Waits for locks held by other tx Releases all locks at the end
BEGINEND Reader-writer locks Reader lock excludes writers Writer lock excludes readers/writers
Read Optimistic Writers take locks, readers do not A write invalidates (aborts) all readers a) Encounter-time: at the write a) Encounter-time: at the write T1: BEGIN_TRANSACTION... WRITE A... COMMIT_TRANSACTION T2: BEGIN_TRANSACTION READ A... INVALID T3: BEGIN_TRANSACTION... READ A... INVALID
Read Optimistic T1: BEGIN_TRANSACTION... WRITE A... COMMIT_TRANSACTION T2: BEGIN_TRANSACTION READ A... COMMIT_TRANSACTION T3: BEGIN_TRANSACTION... READ A... INVALID Writers take locks, readers do not A write invalidates (aborts) all readers b) Commit-time: at commit
Fully Optimistic T1: BEGIN_TRANSACTION... WRITE A... COMMIT_TRANSACTION T2: BEGIN_TRANSACTION WRITE A... COMMIT_TRANSACTION T3: BEGIN_TRANSACTION... READ A... INVALID A write invalidates (aborts) all active readers, but supports multiple writers Commit-time: at commit
Implementation Details Meta-data kept with tm_shared<> var Lock, visible-readers set
Implementation Details Validation of each read Recoverability:Undo-loggingWrite-buffering Private thread data (needs to be searchable) Necessary for fully optimistic
Factors Determining Trade-offs Conflict type w-w conflicts favor fully optimistic Conflict-span long domino-effect (no progress) for read optimistic
Evaluation of Design Trade-offs No. of threads: 4
Roadmap The game Parallelization Using TM Compiler code transformations for TM Runtime TM design choices Dynamic load balancing of tx in game
Parallel Server Phase Types Process Requests Form & Send Replies Rx Tx Select Read-only phase Read-Write phase Load balancing
Dynamic Load Management Region: grid unit Dynamic load balancing Reassign regions from one server/thread to another
Conflicts vs Load Management Locality, fewer conflicts Keep adjacent regions on same thread Global reshuffle Block partition
Overload due to Quest
Reassign Load & Minimize Conflicts
Locality-Aware Load Balancing Locality-Aware Load Balancing SimMud game map with quest in upper left Recorded dynamic load balancing
55 Dynamic Load-balancing Algorithms Lightest Shed regions to lightest loaded thread Spread Best load spread across all threads Locality aware Keep nearby regions on same thread
Locality-aware (Quad-tree) Split task when: Load > thresh Reassign tasks: reduce conflicts reduce conflicts Can approximate !
Task Splitting A B C D E F G H IJ BCD AEF GHIJ
Task Re-assignment Assign tasks to reduce conflicts Keep Load < threshold T1 T0T2
59 Dynamic Load-balancing Algorithms All algorithms implemented on A cluster (single thread on each node) A multi-core (with multiple threads)
Results on Multi-core Load balancing algorithms: StaticLightestSpread Locality (Quad-tree) Metrics Number of clients per thread Border conflicts Client update latency
Thread Load on Multi-core
Border Conflicts on Multi-core
Client update latency on M-core
Conclusion Support for seamless world partitioning Compiler & Runtime parallelization support Tx much simpler than locks Locality aware dynamic load balancing Can apply in server clusters, P2P mobile environments and multi-cores
I need your help. “When TM first beat locks” is a good story I need a more sophisticated game to make the story happen !
Backup Slides
67 Client Update Latency on Cluster STATIC LOCALITY most loaded least loaded All dynamic load balancing algs - similar
68 Number of Player Migrations Locality aware has fewest migrations
Average Execution Time / Request (when App changes access pattern)
Trade-offs Private thread data Per-thread data copy overhead (-) Search private data on read (-) No need to restore data on abort (+) Allows multiple concurrent writers (+)
Trade-offs (contd) Private thread data Per-thread data copy overhead (-) Search private data on read (-) No need to restore data on abort (+) Allows multiple concurrent writers (+) Locks Aborts due to deadlock (-) No other aborts (+)
A WAN distributed server system Quest lasts during sec
TM code translation (cont.) Based on Omni OpenMP compiler
Average Execution Time / Request