Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intel Software College

Similar presentations


Presentation on theme: "Intel Software College"— Presentation transcript:

1 Intel Software College
Correcting Threading Errors with Intel® Thread Checker for Explicit Threads Intel Software College

2 Correcting Threading Errors with Intel® Thread Checker
Objectives After successful completion of this module you will be able to… Use Thread Checker to detect and identify a variety of threading correctness issues in Pthreads* threaded applications Determine if library functions are thread-safe “At the end of this session, you will be able to:” Emphasize: Intro basic, but not all, Threading APIs from Windows*. Transition to First objective and First API Correcting Threading Errors with Intel® Thread Checker

3 Correcting Threading Errors with Intel® Thread Checker
Agenda What is Intel® Thread Checker? Detecting race conditions Thread Checker as threading assistant Some other threading errors Checking library thread-safety Other features of Thread Checker Correcting Threading Errors with Intel® Thread Checker

4 Correcting Threading Errors with Intel® Thread Checker
Motivation Developing threaded applications can be a complex task New class of problems are caused by the interaction between concurrent threads Data races or storage conflicts More than one thread accesses memory without synchronization Deadlocks Thread waits for an event that will never happen Correcting Threading Errors with Intel® Thread Checker

5 Correcting Threading Errors with Intel® Thread Checker
Debugging tool for threaded software Finds threading bugs in POSIX*, Win32*, and OpenMP* threaded software Locates bugs quickly that can take days to find using traditional methods and tools Isolates problems, not the symptoms Bug does not have to occur to find it! Plug-in to VTune™ Performance Analyzer Same look, feel, and interface as VTune™ Remote Data Collection for Linux Help to find all of those common pitfalls! Intro OpenMP* a bit: Supported in Intel Compiler, etc. Correcting Threading Errors with Intel® Thread Checker

6 Intel® Thread Checker Features
Supports different compilers IA32 and EM64T: gcc or Intel® Compilers Binary or source instrumentation 32-bit compilation on EM64T only IPF: Intel® Compilers Source instrumentation only View (drill-down to) source code for Diagnostics One-click help for diagnostics Possible causes and solution suggestions API for user-defined synchronization primitives If used instead of Pthreads* or OpenMP* libraries Correcting Threading Errors with Intel® Thread Checker

7 Thread Checker: Analysis
Dynamic as software runs Data (workload) -driven execution Includes monitoring of: Thread and Sync APIs used Thread execution order Scheduler impacts results Memory accesses between threads Code path must be executed to be analyzed Correcting Threading Errors with Intel® Thread Checker

8 Thread Checker: Before You Start
Instrumentation: background Adds calls to library to record information Thread and Sync APIs Memory accesses Increases execution time and size Use small data sets (workloads) Execution time and space is expanded Multiple runs over different paths yield best results Workload selection is important! Correcting Threading Errors with Intel® Thread Checker

9 Finds threading errors faster!
Workload Guidelines Execute problem code once per thread to be identified Use smallest possible working data set Minimize data set size Smaller image sizes Minimize loop iterations or time steps Simulate minutes rather than days Minimize update rates Lower frames per second Finds threading errors faster! Correcting Threading Errors with Intel® Thread Checker

10 Building for Thread Checker
Compile Generate symbolic information (-g) Disable optimization (-O0) Link Preserve symbolic information Use symbols so that Thread Checker can show source code. Disable optimizations so that Thread Checker shows right source because optimization scrambles source. Correcting Threading Errors with Intel® Thread Checker

11 Binary Instrumentation
Build with supported compiler Running the application Must be run from within Thread Checker Uses Remote Data Collection (RDC) Application is instrumented when executed External libraries are instrumented at run-time Correcting Threading Errors with Intel® Thread Checker

12 Remote Data Collection
Linux* target system must be running Threading Tools server Allows launch of application, transmission of results, and transfer of source files Windows* platform runs VTune* interface Set up remote activities View results Correcting Threading Errors with Intel® Thread Checker

13 Source Instrumentation
Intel® C++ or Fortran Compilers Compile with -tcheck Running the application Start in VTune™ environment Uses Remote Data Collection (RDC) Start from command line Data collected in threadchecker.thr results file View results (.thr file) in VTune™ environment Additional libraries not instrumented or analyzed More detailed diagnostics Correcting Threading Errors with Intel® Thread Checker

14 Starting Thread Checker
Intel® Thread Checker Wizard Intel® Thread Profiler Wizard Advanced Activity Configuration 1) Must Select If not using RDC, no need to set up activity or application to launch Threading Wizards Threading Wizards Intel® Thread Checker Wizard 2) To see these Wizards Correcting Threading Errors with Intel® Thread Checker

15 Correcting Threading Errors with Intel® Thread Checker
Collecting Data Compile with source instrumentation Intel compilers and –tcheck flag Run on Linux target system Transfer threadchecker.thr results file to Thread Checker platform (Windows*) Source code should be made available Open results file in Thread Checker Correcting Threading Errors with Intel® Thread Checker

16 Thread Checker Diagnostics
Correcting Threading Errors with Intel® Thread Checker

17 Correcting Threading Errors with Intel® Thread Checker
Diagnostics Grouping Correcting Threading Errors with Intel® Thread Checker

18 Correcting Threading Errors with Intel® Thread Checker
Source Code Viewer Correcting Threading Errors with Intel® Thread Checker

19 Correcting Threading Errors with Intel® Thread Checker
Diagnostic Help 1) Right-click here . . . 2) More help! Correcting Threading Errors with Intel® Thread Checker

20 Lab 1a - Potential Energy
Build and run serial version Build threaded version Run application in Thread Checker to identify threading problems Correcting Threading Errors with Intel® Thread Checker

21 Correcting Threading Errors with Intel® Thread Checker
Dependence Analysis S1: A = 1.0; S2: B = A ; S3: A = 1/3 * (C – D); S4: A = (B * 3.8) / 2.7; Consider the serial code Flow dependence between S1 and S2 Value of A computed in S1 is used in S2 Anti dependence between S2 and S3 Value of A is read in S2 before written in S3 Output dependence between S3 and S4 Value of A assigned in S3 must occur before assignment in S4 This is a topic that deals with optimization techniques, especially auto-parallelization, for serial codes. The slide is here to set the background for the race conditions diagnostics that are examined in the next slide. Instructor should note as clarification (and to lead into the next slide) the following “alternate” explanations for the dependence definitions: Flow: Write before Read Anti: Read before Write Output: Write before Write Any of these dependences requires that the statements involved must be executed in the same relative order to guarantee proper execution and results of the code. Correcting Threading Errors with Intel® Thread Checker

22 Thread Checker Dependencies
Output dependence Write-Write conflict: one thread updates a variable that is subsequently updated by another thread Anti-dependence Read-Write conflict: one thread reads a variable that is subsequently updated by another thread Flow dependence Write-Read conflict: one thread updates a variable that is subsequently read by another thread While the code is run in parallel, Thread Checker will make a best guess about what kind of dependence is involved with a data contention diagnostic. This may involve the order of execution from serial code (if possible) or the order in which the threads have executed the statements involved. Thus, some pairs of lines may have both R/W (anti) and W/R (flow) dependences noted in the results. (Serial code estimates of order of execution can be better estimated from OpenMP codes, especially if dependence occurs on different iterations of a loop, since serial consistency is expected to be preserved by OpenMP code.) Correcting Threading Errors with Intel® Thread Checker

23 Correcting Threading Errors with Intel® Thread Checker
Race Conditions Execution order is assumed but cannot be guaranteed Concurrent access of same variable by multiple threads Most common error in multithreaded programs May not be apparent at all times Correcting Threading Errors with Intel® Thread Checker

24 Solving Race Conditions
Solution: Scope variables to be local to threads When to use Value computed is not used outside parallel region Temporary or “work” variables How to implement OpenMP scoping clauses (private, shared) Declare variables within threaded functions Allocate variables on thread stack TLS (Thread Local Storage) API Correcting Threading Errors with Intel® Thread Checker

25 Solving Race Conditions
Solution: Control shared access with critical regions When to use Value computed is used outside parallel region Shared value is required by each thread How to implement Mutual exclusion and synchronization Lock, semaphore, condition variable,… Rule of thumb: Use one lock per data element Correcting Threading Errors with Intel® Thread Checker

26 Lab 1b - Potential Energy
Fix errors found by Thread Checker Correcting Threading Errors with Intel® Thread Checker

27 Implementation Assistant
When implementing threads Obvious shared and private variables can be identified and handled Should you analyze remaining variables for dependencies? What if parallel code is 100’s of lines long? What about variable use in called functions? Can you tell if pointers refer to same memory location? Use Thread Checker as a threading assistant Speculatively insert threading (OpenMP prototype?) Compile and run program in Thread Checker Review diagnostics Update directives and/or restructure Let Thread Checker do the “heavy lifting” Correcting Threading Errors with Intel® Thread Checker

28 Correcting Threading Errors with Intel® Thread Checker
Deadlock Caused by thread waiting on some event that will never happen Most common cause is locking hierarchies Always lock and un-lock in the same order Avoid hierarchies if possible void *threadA(void *arg) { pthread_mutex_lock(&L1); pthread_mutex_lock(&L2); processA(data1, data2) ; pthread_mutex_unlock(&L2); pthread_mutex_unlock(&L1); return(0); } ThreadB: L2, then L1 void *threadB(LPVOID arg) { pthread_mutex_lock(&L2); pthread_mutex_lock(&L1); processB(data1, data2) ; pthread_mutex_unlock(&L1); pthread_mutex_unlock(&L2); return(0); } Always get CS1, then CS2 for both thread1 and thread2. ThreadA: L1, then L2 Correcting Threading Errors with Intel® Thread Checker

29 Deadlock: Another Example
Add lock per element Lock only elements, not the whole array typedef struct { // some data things SomeLockType mutex; } shape_t; shape_t Q[1024]; void swap (shape_t A, shape_t B) { lock(a.mutex); lock(b.mutex); // Swap data between A & B unlock(b.mutex); unlock(a.mutex); } swap(Q[34], Q[986]); Thread 1 Grabs mutex 34 swap(Q[986], Q[34]); Thread 4 Grabs mutex 986 Correcting Threading Errors with Intel® Thread Checker

30 Be sure threads release all locks held
Thread Stalls Thread waits for an inordinate amount of time Usually on a resource Commonly caused by dangling locks Be sure threads release all locks held Correcting Threading Errors with Intel® Thread Checker

31 Correcting Threading Errors with Intel® Thread Checker
What’s Wrong? int data; void *threadFunc(void *arg) { int localData; pthread_mutex_lock(&lock); if (data == DONE_FLAG) return(1); localData = data; pthread_mutex_unlock(&lock); process(local_data); return(0); } Lock never released Enter and Leave CRIT section MUST be a paired operation. If data == DONE_FLAG, then return occurs without ever leaving the CRIT SECTION. Correcting Threading Errors with Intel® Thread Checker

32 Correcting Threading Errors with Intel® Thread Checker
Lab 2 - Deadlock Use Intel® Thread Checker to find and correct the potential deadlock problem. Review lab solution with whole class Correcting Threading Errors with Intel® Thread Checker

33 Correcting Threading Errors with Intel® Thread Checker
Thread Safe Routines All routines called concurrently from multiple threads must be thread safe How to test for thread safety? Use OpenMP and Thread Checker for analysis OpenMP simulator is systematic Use sections to simulate concurrent execution Correcting Threading Errors with Intel® Thread Checker

34 Correcting Threading Errors with Intel® Thread Checker
Thread Safety Example Check for safety issues between Multiple instances of routine1() Instances of routine1() and routine2() Set up sections to test all permutations Still need to provide data sets that exercise relevant portions of code #pragma omp parallel sections { #pragma omp section routine1(&data1); routine1(&data2); routine2(&data3); } Correcting Threading Errors with Intel® Thread Checker

35 Two Ways to Ensure Thread Safety
Routines can be written to be reentrant Any variables changed by the routine must be local to each invocation Don’t modify shared variables Routines can use mutual exclusion to avoid conflicts with other threads If accessing shared variables cannot be avoided What if third-party libraries are not thread safe? Will likely need to control threads access to library It is better to make a routine reentrant than to add synchronization Avoids potential overhead Correcting Threading Errors with Intel® Thread Checker

36 Correcting Threading Errors with Intel® Thread Checker
Lab 3 – Thread Safety Use OpenMP framework to call library routines concurrently Three library calls = 6 combinations to test A:A, B:B, C:C, A:B, A:C, B:C Correcting Threading Errors with Intel® Thread Checker

37 Instrumentation Levels
Higher levels increase memory usage and analysis time, but provide more details Binary instrumentation lowers level from default until successful Manually adjust level of instrumentation to increase speed or control amount of information gathered Instrumentation Level Description Full Image Each instruction in the module is instrumented to be checked to see if it might generate a diagnostic message. Custom Image Same as “Full Image” except user can disable selected functions from instrumentation. All Functions Turns on full instrumentation for those parts of a module that were compiled with debugging information. Custom Functions Same as “All Functions” except user can disable selected functions from instrumentation. API Imports Only system API functions that are needed to be instrumented by the tool will be instrumented. No user code is instrumented. Module Imports Disables instrumentation. This is default on system images, images without base relocations, and images not containing debug information. Correcting Threading Errors with Intel® Thread Checker

38 Large Diagnostics Counts
What do you do if you have 5000 diagnostics? Where do you begin debugging? Are all the diagnostic messages equally important/serious? Suggestions for organizing and prioritizing Add “1st Access” column Group by “1st Access” Sort by “Short Description” column Correcting Threading Errors with Intel® Thread Checker

39 Large Diagnostics Counts
Add the “1st Access” column if it not already present Correcting Threading Errors with Intel® Thread Checker

40 Large Diagnostics Counts
Correcting Threading Errors with Intel® Thread Checker

41 Large Diagnostics Counts
Groups errors reported for the same source line; each group can be seen as the same issue Correcting Threading Errors with Intel® Thread Checker

42 Large Diagnostics Counts
Sort on the “Short description” Correcting Threading Errors with Intel® Thread Checker

43 Correcting Threading Errors with Intel® Thread Checker
Summary Threading errors are easy to introduce Debugging these errors by traditional techniques is hard Intel® Thread Checker catches these errors Errors do not have to occur to be detected Greatly reduces debugging time Improves robustness of the application Correcting Threading Errors with Intel® Thread Checker

44 Correcting Threading Errors with Intel® Thread Checker
This should always be the last slide of all presentations. Correcting Threading Errors with Intel® Thread Checker

45 Additional Material

46 Remote Data Collection
Linux* target system must be running Threading Tools server Allows launch of application, transmission of results, and transfer of source files Windows* platform runs VTune* interface Set up remote activities View results Correcting Threading Errors with Intel® Thread Checker

47 Correcting Threading Errors with Intel® Thread Checker
Linux* ITT RDC Server Linux* system -- run one of: /opt/intel/itt/bin/32/ittserver /opt/intel/itt/bin/64/ittserver Starts ITT RDC server Correcting Threading Errors with Intel® Thread Checker

48 Starting Thread Checker
Intel® Thread Checker Wizard Intel® Thread Profiler Wizard Advanced Activity Configuration 1) Must Select Threading Wizards Threading Wizards Intel® Thread Checker Wizard 2) To see these Wizards Correcting Threading Errors with Intel® Thread Checker

49 Thread Checker RDC: Windows*
Linux* System Linux* application Correcting Threading Errors with Intel® Thread Checker


Download ppt "Intel Software College"

Similar presentations


Ads by Google