Presenter : Shih-Tung Huang Tsung-Cheng Lin Kuan-Fu Kuo 2015/6/26 EICE team dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core Chi-Neng Wen, Shu-Hsuan Chou and Tien-Fu Chen National Chung-Cheng University, Chia-Yi, Taiwan {wcn93, csh93, th International Symposium on Pervasive Systems, Algorithms, and Networks
Traditional debug facilities are limited in providing debugging requirements for multicore parallel programming. Synchronization problems or bugs due to race conditions are particularly difficult to detect with software debugging tools. This work presents a fast and feasible hardware-assistant solution for many-core non- intrusive debugging. The key idea is to keep tracks of data accesses of shared memory areas and their lock synchronization activities by proposed data structures in proposed debugging IP (dIP). A page-based shared variable cache is provided to keep shared variables as long as possible, and an inexpensive pluggable off-chip RAM can eliminate the false-positive rate efficiently. 2 Abstract (1)
To decrease the debugging traffic block, this work provides a thread library to specify shared memory/lock events and transmit those events to the dIP by a small proper hardware co-processor (eXtend dIP) of each core. Our experimental result shows the debugging traffic block (worse-case) by increasing cores, and adding tolerance buffers in XdIP can efficiently ease off. Moreover, the real workloads (SPLASH-2, MPEG-4, and H.264) are executed by the dIP non-instructive race-detection with only 4.7%~12.2% slow down in average. Finally, the hardware cost of dIP is also low when the growing of many-core. 3 Abstract (2)
Data race detection in multi-cores Software method Cause probe effect Hardware method Cause lot of memory (or hardware area) needed for log cores behavior Cause false positive This paper propose method Not software method Use related work [3] to avoid probe effect Use centralized race detection : don’t increase huge hardware area when increase cores 4 What’s the problem
Probe effect was introduced in related work [1] Use related work [4] for data race detection Related work [3] separate debugging data path from usual data path to avoid probe effect 5 Related work Race detection (multi-core) Software [5][6] hardware [7][8][9] This paper method Lock-set algo.[4] Related work[3]
6 Propose MPSOC framework Every core has a XdIP XdIP as a co-processor for each core XdIP is used to send debug event to dIP through Debug I/F The interconnection flow the standard of related work [3] Data I/F is used for usual data path Debug I/F is used for debug event path
7 XdIP architecture The architecture is quite simply Filter to filter debug event (Lock and Mem access info) to buffer which in packet & send and wait for sending to dIP Filter is settled by SW setting Event monitor and transfer in each core When buffer is full, it will announce dIP to stall all core for event transfer
8 Data race detection flow First Table manager accept debug event from XdIP and then maintain shard variable cache, lock-set and core- status table Second Rule logic check if data race happen or not happen: Alert will be enable to notify exception handler to fix race detection
9 dIP architecture Data race detection flow corresponds 1~5 6 is for ordering debug event (SqID) 7 is external RAM for cache miss
10 Three tables Page-base Variable table is used for recording variable latest access state Lock-key table is used for recording how many lock-set and how many lock key are available Core-status table is used for recording core state (thread, lock set, SqID) Fully association
11 Overall propose framework
12 Allocation/de-allocation lock-key Allocation Thread A execute W_lock S1, then the event sent to dIP by XdIP dIP allocate a lock-key to thread A, then thread A save lock-key number with S1 de-allocation Thread A execute W_unlock S1, in the mean time the lock-key will send to dIP together to de-allocate
13 Data race detect rule 6211 core1core2
When XdIP buffer full,dIP will stall all cores for non- intrusive. stall will reduce system performance, use a experience to show stall ratio by using SPLASH-2 benchmarks 14 Experiences Sol: add buffer in XdIP
Four different benchmarks worse case performance down is 12.25% Compare with related work [9] 15 Experiences