Slide 1 Atomic Quake – Use Case of Transactional Memory in an Interactive Multiplayer Game Server Ferad Zyulkyarov BSC-Microsoft Research Center 17.10.2008.

Slides:



Advertisements
Similar presentations
AKC Rally Signs These are copies of the 2008 AKC Rally signs, as re-drawn by Chuck Shultz. Use them to print your own signs. Be prepared to use a LOT of.
Advertisements

Process Description and Control
1 Concurrency: Deadlock and Starvation Chapter 6.
Zhongxing Telecom Pakistan (Pvt.) Ltd
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
1 Hyades Command Routing Message flow and data translation.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination. Introduction to the Business.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Chapter 5: Repetition and Loop Statements Problem Solving & Program.
7 Copyright © 2005, Oracle. All rights reserved. Creating Classes and Objects.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
1 Term 2, 2004, Lecture 6, TransactionsMarian Ursu, Department of Computing, Goldsmiths College Transactions 3.
1 Processes and Threads Creation and Termination States Usage Implementations.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Mike Scott University of Texas at Austin
Break Time Remaining 10:00.
PP Test Review Sections 6-1 to 6-6
Chapter 17 Linked Lists.
11 Data Structures Foundations of Computer Science ã Cengage Learning.
Chapter 1 Object Oriented Programming 1. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
Semantic Analysis and Symbol Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Chapter 10: Virtual Memory
INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Lecture plan Transaction processing Concurrency control
© 2012 National Heart Foundation of Australia. Slide 2.
3.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Process An operating system executes a variety of programs: Batch system.
Chapter 3: Processes.
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
While Loop Lesson CS1313 Spring while Loop Outline 1.while Loop Outline 2.while Loop Example #1 3.while Loop Example #2 4.while Loop Example #3.
Slide 1 Atomic Quake – Using Transactional Memory in an Interactive Multiplayer Game Server Ferad Zyulkyarov 1,2, Vladimir Gajinov 1,2, Osman S. Unsal.
QuakeTM: Parallelizing a Complex Serial Application Using Transactional Memory Vladimir Gajinov 1,2, Ferad Zyulkyarov 1,2,Osman S. Unsal 1, Adrián Cristal.
25 seconds left…...
Analyzing Genes and Genomes
Types of selection structures
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Pointers and Arrays Chapter 12
Essential Cell Biology
CSE Lecture 17 – Balanced trees
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
The DDS Benchmarking Environment James Edmondson Vanderbilt University Nashville, TN.
Immunobiology: The Immune System in Health & Disease Sixth Edition
 2003 Prentice Hall, Inc. All rights reserved. 1 Chapter 13 - Exception Handling Outline 13.1 Introduction 13.2 Exception-Handling Overview 13.3 Other.
Energy Generation in Mitochondria and Chlorplasts
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Techniques for proving programs with pointers A. Tikhomirov.
User Defined Functions Lesson 1 CS1313 Fall User Defined Functions 1 Outline 1.User Defined Functions 1 Outline 2.Standard Library Not Enough #1.
TCP/IP Protocol Suite 1 Chapter 18 Upon completion you will be able to: Remote Login: Telnet Understand how TELNET works Understand the role of NVT in.
Chapter 9: Using Classes and Objects. Understanding Class Concepts Types of classes – Classes that are only application programs with a Main() method.
The University of Adelaide, School of Computer Science
1 Scalable and transparent parallelization of multiplayer games Bogdan Simion MASc thesis Department of Electrical and Computer Engineering.
Ferad Zyulkyarov BSC-Microsoft Research Center
Presentation transcript:

Slide 1 Atomic Quake – Use Case of Transactional Memory in an Interactive Multiplayer Game Server Ferad Zyulkyarov BSC-Microsoft Research Center Workshop on Language and Runtime Support for Concurrent Programming Cambridge UK

Slide 2 Outline Quake Overview Using Transactions for Synchronization Runtime Characteristics

Slide 3 Quake Interactive first- person shooter game Very popular among gamers Challenge for the modern multi-core processors

Slide 4 Quake Server The Server -Where computations are done -Maintains the game plot -Handles player interactions -Preserves game consistency The Server -Where computations are done -Maintains the game plot -Handles player interactions -Preserves game consistency The Clients -Send requests to convey their actions -Render the graphics and sound The Clients -Send requests to convey their actions -Render the graphics and sound

Slide 5 Parallelization Methodology Execution divided in 3 phases: 1.Physics Update 2.Read client requests & process 3.Send replies Important is processing requests Processing requests prepares the next frame [Abdelkhalek, IPDPS’2004]

Slide 6 Processing Requests - The Move Operation Motion Indicators –angles of the player’s view –forward, sideways, and upwards motion indicators –flags for other actions the player may be able to initiate (e.g. jumping) –the amount of time the command is to be applied in milliseconds. Execution –simulate the motion of the player’s figure in the game world in the direction and for the duration specified –determine and protect the game objects close to the player that it may interact with Compute bounding box Add all the objects inside the bounding box to list –execute actions the client might initiate (shoot, weapon exchange)

Slide 7 Synchronization Shared Data StructureSynchronization Per player buffer (arrays) Global state buffer (array) Game objects (binary tree) Each with a single lock Single lock Fine grain locking –Locks leaves corresponding to the computed bounding box –If objects is on the parent lock the object only, but NOT the parent node. 22% overhead 8 threads and 176 clients [Abdelkhalek, IPDPS’2004] 7

Slide 8 Areanode Tree Maps the location of each object inside the virtual world to a fast access binary tree areanode tree. Children nodes split the region represented by the parent in two halves.

Slide 9 Using Transactions General Overview Challenges I/O Where Transactions Fit Error Handling Inside Transactions Failure Atomicity

Slide 10 Atomic Quake – General Overview 27,400 Lines of C code 56 files 63 atomic blocks Irregular parallelism –Requests are dispatched to their handler functions. 98% In Transactions Single thread, 100% loaded

Slide 11 Complex Atomic Block Structure Calls to internal functions Calls to external libraries Nesting up to 9 levels I/O and System calls (memory allocation) Error Handling Privatization

Slide 12 Example Callgraph Inside Atomic Block SV_RunCmd is the function that dispatches the execution to the appropriate request handler function. Nodes drawn with clouds represent calls to functions with call graph as complicated as this one.

Slide 13 Challenge – Unstructured Use of Locks Locks 1 for (i=0; i<sv_tot_num_players/sv_nproc; i++){ 2 3 LOCK(cl_msg_lock[c - svs.clients]); 4 5 if (!c->send_message) { 6 7 UNLOCK(cl_msg_lock[c - svs.clients]); 8 9 continue; 10 } if (!sv.paused && !Netchan_CanPacket (&c->netchan)) { UNLOCK(cl_msg_lock[c - svs.clients]); continue; 17 } if (c->state == cs_spawned) { 20 if (frame_threads_num > 1) LOCK(par_runcmd_lock); if (frame_thread_num > 1) UNLOCK(par_runcmd_lock); 23 } 24 UNLOCK(cl_msg_lock[c - svs.clients]); } Atomic Block 1 bool first_if = false; 2 bool second_if = false; 3 for (i=0; i<sv_tot_num_players/sv_nproc; i++){ 4 5 atomic { 6 7 if (!c->send_message) { 8 9 first_if = true; 10 } else { if (!sv.paused && !Netchan_CanPacket(&c->netchan)){ second_if = true; 15 } else { if (c->state == cs_spawned) { 18 if (frame_threads_num > 1) { 19 atomic { } 22 } else { 23 ; 24 } 25 } 26 } 27 } 28 } 29 if (first_if) { 30 ; 31 first_if = false; 32 continue; 33 } 34 if (second_if) { 35 ; 36 second_if = false; 37 continue; 38 } } Extra code Complicated Conditional Logic Solution explicit “commit” Solution explicit “commit”

Slide 14 Challenges – Thread Local Memory Locks Atomic 1 void foo1() { 2 atomic { 3 foo2(); 4 } 5 } 6 7 __attribute__((tm_callable)) 8 void foo2() { 9 int thread_id = pthread_getspecific(THREAD_KEY); 10 /* Continue based on the value of thread_id */ 11 return; 12 } 1 void foo1() { 2 int thread_id = pthread_getspecific(THREAD_KEY); 3 atomic { 4 foo2(thread_id); 5 } 6 } 7 8 __attribute__((tm_callable)) 9 void foo2(int thread_id) { 10 /* Continue based on the value of thread_id */ 11 return; 12 } The call to the pthread library serializes the transaction. Hoisted the library call. !!! Thread private data should be considered in TM implementation and language extension. !!!

Slide 15 Challenges - Conditional Synchronization LocksRetry [Harris et al. PPoPP’2005] 1 pthread_mutex_lock(mutex); 2 3 if (!condition) 4 pthread_cond_wait(cond, mutex); 5 6 pthreda_mutex_unlock(mutex); 1 atomic { 2 3 if (!condition) 4 retry; 5 6 } Retry not implemented by Intel C compiler. Left as is in the Quake code.

Slide 16 I/O in Transactions I/O used to print information messages only to the server console –Client connected –Client killed Commented all the I/O code out. 1 void SV_NewClient_f(client_t cl) { 2 3 Con_Printf(“Client %s joined the game”, cl->name); 4 5 } __attribute__((tm_pure)) Con_Printf(char*); 1 void SV_NewClient_f(client_t cl) { 2 3 Con_Printf(“Client %s joined the game”, cl->name); 4 5 } Solvable with ad hoc. 1 void SV_NewClient_f(client_t cl) { 2 3 tm_escape { 4 Con_Printf(“Client %s joined the game”, cl->name); 5 } 6 7 } Why not tm_escape?

Slide 17 Where Transactions Fit? Guarding different types of objects with separate locks. 1 switch(object->type) { /* Lock phase */ 2 KEY: lock(key_mutex); break; 3 LIFE: lock(life_mutex); break; 4 WEAPON: lock(weapon_mutex); break; 5 ARMOR: lock(armor_mutex); break 6 }; 7 8 pick_up_object(object); 9 10 switch(object->type) { /* Unlock phase */ 11 KEY: unlock(key_mutex); break; 12 LIFE: unlock(life_mutex); break; 13 WEAPON: unlock(weapon_mutex); break; 14 ARMOR: unlock(armor_mutex); break 15 }; Lock phase. Unlock phase. atomic { } pick_up_object(object);

Slide 18 Algorithm for Locking Leafs of a Tree 1: Lock Parent 2: Lock Children 1: Lock Parent 3: Unlock Parent 2: Lock Children 4: If children has children go to 2 3: Unlock Parent 2: Lock Children 4: If children has children go to 2 2: Lock Children 3: Unlock Parent RESULT 3: Unlock Parent 4: If children has children go to 2 More complicated example of fine grain locking: applied in region-based locking in Quake.

Slide 19 Code for Lock Leafs Phase while (!stack.is_empty()) { parent = stack.pop(); if (parent.has_children()) { for (child = parent.first_child(); child != NULL; child.next_sibling()) {lock(child); stack.push(child); } unlock(parent); } // else this is leaf and leaf it locked. } /* Follows code for releasing locks as complicate as for acquiring */

Slide 20 Equivalent Synchronization with TM atomic { UPDATE LEAVES CODE }

Slide 21 Error Handling Inside Transactions Approach: When error happens commit the transaction and handle the error outside the atomic block. 1 void Z_CheckHeap (void) 2 { 3 memblock_t *block; 4 LOCK; 5 for (block=mainzone->blocklist.next;;block=block->next){ 6 if (block->next == &mainzone->blocklist) 7 break; // all blocks have been hit 8 if ( (byte *)block + block->size != (byte *)block->next) 9 Sys_Error("Block size does not touch the next block"); 10 if ( block->next->prev != block) 11 Sys_Error("Next block doesn't have proper back link"); 12 if (!block->tag && !block->next->tag) 13 Sys_Error("Two consecutive free blocks"); 14 } 15 UNLOCK; 16 } 1 void Z_CheckHeap (void) { 2 memblock_t *block; 3 int error_no = 0; 4 atomic{ 5 for (block=mainzone->blocklist.next;;block=block->next){ 6 if (block->next == &mainzone->blocklist) 7 break; // all blocks have been hit 8 if ((byte *)block + block->size != 9 (byte *)block->next; { 10 error_no = 1; 11 break; /* makes the transactions commit */ 12 } 13 if (block->next->prev != block) { 14 error_no = 2; 15 break; 16 } 17 if (!block->tag && !block->next->tag) { 18 error_no = 3; 19 break; 20 } 21 } 22 } 23 if (error_no == 1) 24 Sys_Error ("Block size does not touch the next block"); 25 if (error_no == 2) 26 Sys_Error ("Next block doesn't have proper back link"); 27 if (error_no == 3) 28 Sys_Error ("Two consecutive free blocks"); 29 } Locks Transactions Commit and Abort Handlers

Slide 22 Failure Atomicity 1 void PR_ExecuteProgram (func_t fnum, int tId){ 2 f = &pr_functions_array[tId][fnum]; 4 pr_trace_array[tId] = false; 5 exitdepth = pr_depth_array[tId]; 6 s = PR_EnterFunction (f, tId); 7 while (1){ 8 s++; // next statement 9 st = &pr_statements_array[tId][s]; 10 a = (eval_t *)&pr_globals_array[tId][st->a]; 11 b = (eval_t *)&pr_globals_array[tId][st->b]; 12 c = (eval_t *)&pr_globals_array[tId][st->c]; 13 st = &pr_statements[s]; 14 a = (eval_t *)&pr_globals[st->a]; 15 b = (eval_t *)&pr_globals[st->b]; 16 c = (eval_t *)&pr_globals[st->c]; 17 if (--runaway == 0) 18 PR_RunError ("runaway loop error"); 19 pr_xfunction_array[tId]->profile++; 20 pr_xstatement_array[tId] = s; 21 if (pr_trace_array[tId]) 22 PR_PrintStatement (st); 23 } 24 if (ed==(edict_t*)sv.edicts && sv.state==ss_active) 25 PR_RunError("assignment to world entity"); 26 } 27 } 1 void PR_ExecuteProgram (func_t fnum, int tId){ 2 f = &pr_functions_array[tId][fnum]; 4 pr_trace_array[tId] = false; 5 exitdepth = pr_depth_array[tId]; 6 s = PR_EnterFunction (f, tId); 7 while (1){ 8 s++; // next statement 9 st = &pr_statements_array[tId][s]; 10 a = (eval_t *)&pr_globals_array[tId][st->a]; 11 b = (eval_t *)&pr_globals_array[tId][st->b]; 12 c = (eval_t *)&pr_globals_array[tId][st->c]; 13 st = &pr_statements[s]; 14 a = (eval_t *)&pr_globals[st->a]; 15 b = (eval_t *)&pr_globals[st->b]; 16 c = (eval_t *)&pr_globals[st->c]; 17 if (--runaway == 0) 18 abort; 19 pr_xfunction_array[tId]->profile++; 20 pr_xstatement_array[tId] = s; 21 if (pr_trace_array[tId]) 22 PR_PrintStatement (st); 23 } 24 if (ed==(edict_t*)sv.edicts && sv.state==ss_active) 25 abort; 26 } 27 } Original With failure atomicity

Slide 23 The Benefit of Failure Atomicity Original PR_RunError dumps the stack trace and terminates the server. Using failure atomicity Abort reverts the updates to the global variables The effect is as if the client packet was lost Server continues to run

Slide 24 Privatization Example The code assumes that after the memory block is returned, it will not be returned again until it is freed. Then the memory block is modified (zeroed). 1 void* buffer; 2 atomic { 3 buffer = Z_TagMalloc(size, 1); 4 } 5 if (!buffer) 6 Sys_Error("Runtime Error: Not enough memory."); 7 else 8 memset(buf, 0, size);

Slide 25 Runtime Characteristics Experimental Methodology General Performance Per Atomic Block

Slide 26 Experimental Methodology Added extra synchronization so that all the threads perform request processing at the same time. Equal to 100% loaded server Added extra synchronization so that all the threads perform request processing at the same time. Equal to 100% loaded server Used small map for 2 players representing high conflict scenario

Slide 27 General Performance *In STM_LOCK atomic blocks are guarded by global reentrant lock. The rationale is to count the STM overhead in.

Slide 28 Overall Transactional Characteristics Table1: Transactional characteristics. Table 2: Read and writes set size (in bytes).

Slide 29 Conclusion TM is not mature enough. We need –Rich language extensions for TM. –Strict semantics for TM language primitives. –Stable toolset (compilers, profilers, debuggers). –Compatibility across tools and TM implementations. –Improved performance on single thread and graceful degradation on conflicts. –External library call, system calls, I/O. –Interoperations with locks. Atomic Quake is a rich transactional application to push the research in TM ahead.

Slide 30 Край