Presentation is loading. Please wait.

Presentation is loading. Please wait.

Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University.

Similar presentations


Presentation on theme: "Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University."— Presentation transcript:

1 Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University of Texas at Austin

2 Intro Processors now scaling via cores, not clock rate 8 cores now, 64 tomorrow, 4096 next week? Parallel programming increasingly important Software must keep up with hardware advances But parallel programming is really hard Deadlock, priority inversion, data races, etc. STM is here We would like HTM

3 Difficult HTM Problems Enforcing atomicity and isolation requires conflict detection and rollback TM Hardware only applies to memory and processor state I/O, System calls may have effects that are not isolated, cannot be rolled back mov $norad, %eax mov $canada, %ebx launchmissiles %eax, %ebx

4 Outline Handling kernel I/O with minimal hardware User-level system call rollback Conclusions

5 Outline Handling kernel I/O with minimal hardware User-level system call rollback Conclusions

6 cxspinlocks Cooperative transactional spinlocks allow kernel to take advantage of limited TM hardware Optimistically execute with transactions Fall back on locking when hardware TM is not enough I/O, page table operations overflow? Correctness provided by isolation of lock variable Transactional threads read lock Non-transactional threads write lock

7 cxspinlock guarantees Multiple transactional threads in critical region Non-transactional thread excludes all others

8 cxspinlocks in action Thread 1 cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2 cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

9 cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Transactional threads read unlocked lock variable

10 cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Transactional threads read unlocked lock variable

11 cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Transactional threads read unlocked lock variable

12 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

13 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

14 cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA

15 cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA

16 cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA

17 cxspinlocks in action Thread 1: TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite lockA unlocked lockA: readwrite lockA Hardware restarts transactions for I/O

18 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA

19 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA contention managed CAS

20 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA contention managed CAS

21 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite contention managed CAS

22 cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite

23 cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite

24 cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite

25 cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite conditionally add variable to read set

26 cxspinlocks in action Thread 1: Non-TX cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite locked lockA: readwrite conditionally add variable to read set

27 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite conditionally add variable to read set

28 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA

29 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: TX cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite lockA

30 cxspinlocks in action Thread 1: cx_optimistic(lockA); modify_data(); if(condition) { perform_IO(); } cx_end(lock); Thread 2: cx_optimistic(lockA); modify_data(); cx_end(lock); readwrite unlocked lockA: readwrite

31 Implementing cxspinlocks Return codes: Correctness Hardware returns status code from xbegin to indicate when hardware has failed (I/O) xtest: Performance Conditionally add a memory cell (e.g. lock variable) to the read set based on its value xcas: Fairness Contention managed CAS Non-transactional threads can wait for transactional threads Simple hardware primitives support complicated behaviors without implementing them

32 Outline Handling kernel I/O with minimal hardware User-level system call rollback Conclusions

33 User-level system call rollback Open nesting requires user-level syscall rollback Many calls have clear inverses mmap, munmap Even simple calls have many side effects e.g. file write Even simple calls might be irreversible

34 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

35 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

36 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

37 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

38 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

39 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

40 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

41 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

42 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

43 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

44 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

45 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

46 Users can’t always roll back fd_A = open(“file_A”); unlink(“file_A”); void *map_A = mmap(fd=fd_A, size=4096); close(fd_A); xbegin; modify_data(); fd_B = open(“file_B”); xbegin_open; void *map_B = mmap(fd=fd_B, start=map_A, size=4096); xend_open( abort_action=munmap); xrestart;

47 Kernel participation required Users can’t track all syscall side effects Must also track all non-tx syscalls Kernel must manage transactional syscalls Kernel enhances user-level programming model Seamless transactional system calls Strong isolation for system calls?

48 Related Work Other I/O solutions Hardware open nesting [Moravan 06] Unrestricted transactions [Blundell 07] Transactional Locks Speculative lock elision [Rajwar 01] Transparently Reconciling Transactions & Locks [Welc 06] Simple hardware models Hybrid TM [Damron 06, Shriraman 07] Hardware-accelerated STM [Saha 2006]

49 Conclusions Kernel can use even minimal hardware TM designs May not get full programming model Simple primitives support complex behavior User-level programs can’t roll back system calls Kernel must participate


Download ppt "Solving Difficult HTM Problems Without Difficult Hardware Owen Hofmann, Donald Porter, Hany Ramadan, Christopher Rossbach, and Emmett Witchel University."

Similar presentations


Ads by Google