Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory Njuguna Njoroge and Sewook Wee Transactional Coherence and Consistency Computer.

Similar presentations


Presentation on theme: "ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory Njuguna Njoroge and Sewook Wee Transactional Coherence and Consistency Computer."— Presentation transcript:

1 ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory Njuguna Njoroge and Sewook Wee Transactional Coherence and Consistency Computer System Lab Stanford University http:/tcc.stanford.edu/prototypes

2 2 Why we built ATLAS  Multicore processors exposes challenges of multithreaded programming Transactional Memory simplifies parallel programming  As simple as coarse-grain locks  As fast as fine-grain locks  Currently missing for evaluating TM Fast TM prototypes to develop software on  FPGAs improving capabilities attractive for CMP prototyping Fast  Can operate > 100 MHz More logic, memory and I/O’s Larger libraries of pre-designed IP cores  ATLAS: 8-processor Transactional Memory System 1 st FPGA-based Hardware TM system Member of RAMP initiative  RAMP Red

3 3 ATLAS provides …  Speed > 100x speed-up over SW simulator [FPGA 2007]  Rich software environment Linux OS Full GNU environment (gcc, glibc, etc.)  Productivity Guided performance tuning Standard GDB environment + deterministic replay

4 4 Transaction Building block of a program Critical region Executed atomically & isolated from others TCC’s Execution Model

5 5 CPU 0CPU 1CPU 2 Commit Arbitrate Execute Code Commit Arbitrate Execute Code Undo Execute Code ld 0xbeef Re- Execute Code... ld 0xaaaa ld 0xbbbb... ld 0xbeef... 0xbeef st 0xbeef... ld 0xdddd ld 0xeeee... In TCC, All Transactions All The Time [PACT 2004]

6 6 CMP Architecture for TCC Speculatively Read Bits: ld 0xdeadbeef Speculatively Written Bits: st 0xcafebabe Violation Detection: Compare incoming address to R bits Commit : Read pointers from Store Address FIFO, flush addresses W bits set

7 7 ATLAS 8-way CMP on BEE2 Board User FPGAs  4 FPGAs for a total of 8 TCC CPUs  PPC, TCC caches, BRAMs and busses run @ 100 MHz Control FPGA  Linux PPC @ 300 MHz Launch TCC apps here Handle system services for TCC PowerPCs  Fabric runs @ 100 MHz

8 8 ATLAS Software Overview TM Application TM APIATLAS Profiler ATLAS Subsystem Linux OS ATLAS HW on BEE2  TM application can be easily written with TM API  ATLAS profiler provides a runtime profiling and guided performance tuning  ATLAS subsystem provides Linux OS support for the TM application

9 9 ATLAS subsystem Commit Linux PPC TCC PPC0 Transfers initial context TCC PPC1 TCC PPC2 … TCC PPC7 Invokes parallel work Joins parallel work Exit with app. stats Violation

10 10 ATLAS System Support TCC PPC requests OS support. (TLB miss, system call) Linux PPC replies back to the requestor. Linux PPC regenerates and services the request.  Serialize, if request is irrevocable System Call Page-out Linux PPC TCC PPC

11 11 Coding with TM API: histogram main (int argc, void* argv) { … sequential code … TM_PARALLEL(run, NULL, numCpus); … sequential code … } // static scheduling with interleaved access to A[] void* run(void* args) { int i = TM_GET_THREAD_ID(); for (;i < NUM_LOOP; i+= TM_GET_NUM_THREAD()) { TM_BEGIN(); bucket[A[i]]++; TM_END(); } OpenTM will provide high-level (OpenMP style) pragmas

12 12 Guided Performance Tuning  TAPE: Light-weight runtime profiler [ICS 2005]  Tracking most significant violations (longest loss time) Violated object address PC where object was read Loss time & # of occurrence Committing thread’s ID and transaction PC  Tracking most significant overflows (longest duration) Overflows: when speculative state can no longer stay in TCC$ PC where overflows Overflow duration & number of occurrence Type of overflow (LRU or Write Buffer)

13 13 Deterministic Replay  All Transactions All The Time TM 101: Transaction is executed atomically and in isolation TM’s illusion: transaction starts after older transactions finish  Only need to record “the order of commit” Minimal runtime overhead & footprint size = 1B / transaction Logging executionReplay execution write-set time LOG: T0 T1 T2 write-set T0 T1 T2 Token arbiter enforces commit order specified in LOG T0 T1T2

14 14 Useful Features of Replay  Monitoring code in the transaction Remember we only record the transaction order  Verification Log is not written in stone Complete runtime scenario coverage is possible  Choice of running Replay on ATLAS itself  HW support for other debugging tools (see next slide) Local machine (your favorite desktop or workstation)  Runs natively on faster local machine, sequentially  Seamless access to existing debugging tools

15 15 GDB support  Current status GDB integrated with local machine replay  GDB provides debugability while guaranteeing deterministic replay Below are work-in-progress  Breakpoint Thread local BP vs. global BP Stop the world by controlling commit token  Stepping Backward stepping: Transaction is ready to roll back Transaction stepping  Unlimited data-watch (ATLAS only) Separate monitor TCC cache to register data-watches

16 16 Conclusion: ATLAS provides  Speed > 100x speed-up over SW simulator [FPGA 2007]  Software environment Linux OS Full GNU environment (gcc, glibc, etc.)  Productivity TAPE: Guided performance tuning Deterministic replay Standard GDB environment  Future Work High-level language support (Java, Python, …)

17 17 Questions and Answers  tcc_fpga_xtreme@mailman.stanford.edu tcc_fpga_xtreme@mailman.stanford.edu  ATLAS Team Members System Hardware – Njuguna Njoroge, PhD Candidate System Software – Sewook Wee, PhD Candidate High level languages – Jiwon Seo, PhD Candidate HW Performance – Lewis Mbae, BS Candidate  Past contributors Interconnection Fabric – Jared Casper, PhD Candidate Undergrads – Justy Burdick, Daxia Ge, Yuriy Teslar


Download ppt "ATLAS (a.k.a. RAMP Red) Parallel Programming with Transactional Memory Njuguna Njoroge and Sewook Wee Transactional Coherence and Consistency Computer."

Similar presentations


Ads by Google