© 2008 CILK ARTS, Inc.1 Executive Briefing: Multicore- Enabling SaaS Applications September 3, 2008 Cilk++, Cilk, Cilkscreen, and Cilk Arts are trademarks of Cilk Arts, Inc.
© 2008 CILK ARTS, Inc.2 Agenda ∙Emergence of multicore processors ∙Key challenges facing developers ∙When can multicore help? ∙Data races: a new type of bug ∙Questions to ask when going multicore ∙Programming tools & techniques
© 2008 CILK ARTS, Inc.3 About CILK ARTS ∙Launched in March ∙Headquartered in Burlington, MA. ∙Funded by Stata Venture Partners, software industry executives, founders, and grants from the NSF and DARPA. ∙First product is Cilk++, based on 15 years of research at MIT Mission: To provide the easiest, quickest, and most reliable way to optimize application performance on multicore processors.
© 2008 CILK ARTS, Inc.4 Emergence of Multicore and Impact on SaaS
© 2008 CILK ARTS, Inc.5 Source: Herb Sutter, “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software,” Dr. Dobb's Journal, 30(3), March Transistor count is still rising, … but clock speed is bounded at ~ 5GHz. Intel CPU Introductions Moore’s Law
© 2008 CILK ARTS, Inc.6 Power Density Source: Patrick Gelsinger, Intel Developer’s Forum, Intel Corporation, 2004.
© 2008 CILK ARTS, Inc.7 Vendor Solution ∙To scale performance, put many processor cores on a chip. ∙Intel predicts 80+ cores by 2011! Intel 45nm quad-core processor
© 2008 CILK ARTS, Inc.8 SaaS Opportunity ∙Increase throughput Quantitative finance: increase volume of portfolios analyzed overnight ∙Reduce response time Engineering simulation: accelerate structural analysis of assembly ∙Improve user experience Multiplayer games: increased galaxy size ∙Reduce data center power consumption
© 2008 CILK ARTS, Inc.9 Multicore and SaaS ∙Application response time? ∙Processor utilization? P1 P2 P3 P4 P5 P6 P7 P8 Computer Operation 1 Computer Operation 2 User Work
© 2008 CILK ARTS, Inc.10 Multicore and SaaS ∙For CPU-constrained applications, multi- threading improves response time and boosts utilization P1 P2 P3 P4 P5 P6 P7 P8 Computer Operation 1 Computer Operation 2 User Work Computer Operation 1 Computer Operation 2 User Work Computer Operation 1 Computer Operation 2 User Work Computer Operation 1 Computer Operation 2 User Work Computer Operation #1 User Work Computer Operation #2
© 2008 CILK ARTS, Inc.11 Key Challenges Facing Developers
© 2008 CILK ARTS, Inc.12 Multicore Challenges Development Time ∙How will you get your product out in time? ∙Where will you find enough parallel- programming talent? ∙Will you be forced to redesign your application? Software Reliability ∙Can you debug your parallel application? ∙How will you test it effectively before release? Application Performance ∙How can you minimize response time? ∙Will your solution scale as the number of processor cores increases? ∙Can you identify performance bottlenecks?
© 2008 CILK ARTS, Inc.13 Can a Multicore CPU Help My App?
© 2008 CILK ARTS, Inc.14 Work & Span ∙Work: total amount of time spent in all the instructions ∙Span: Critical path ∙Parallelism: ratio of work to span
© 2008 CILK ARTS, Inc.15 Work & Span ∙Work: total amount of time spent in all the instructions ∙Span: Critical path ∙Parallelism: ratio of work to span ∙In this example: Work = 18 Span = 9 Parallelism = 2 i.e., little gain beyond 2 processors
© 2008 CILK ARTS, Inc.16 Can Multicore Help? ∙The more parallelism is available in an application, the more a multicore processor can help. Parallelism: T 1 /T ∞ = 6.44 Work: T 1 = 58 Span: T ∞ = 9 (same as previous example)
© 2008 CILK ARTS, Inc.17 Data Races: A New Type of Bug in Multicore Programming
© 2008 CILK ARTS, Inc.18 Race Bugs r1 = x; r1++; x = r1; r2 = x; r2++; x = r2; x = 0; assert(x == 2); Definition. A determinacy race occurs when two logically parallel instructions access the same memory location and at least one of the instructions performs a write. x++; int x = 0; assert(x == 2); x++; A A B B C C D D
© 2008 CILK ARTS, Inc.19 Coping with Race Bugs ∙Although locking can “solve” race bugs, lock contention can destroy all parallelism. ∙Making local copies of the nonlocal variables can remove contention, but at the cost of restructuring program logic. ∙ Cilk++ provides hyperobjects to mitigate data races on nonlocal variables without the need for locks or code restructuring. I DEA : Different parallel branches may see different views of the hyperobject.
© 2008 CILK ARTS, Inc Questions to Ask
© 2008 CILK ARTS, Inc.21 Development Time 1.To multicore-enable my application, how much logical restructuring of my application must I do? 2.Can I easily train programmers to use the multicore software platform? 3.Can I maintain just one code base, or must I maintain a serial and parallel versions? 4.Can I avoid rewriting my application every time a new processor generation increases the core count? 5.Can I easily multicore-enable ill-structured and irregular code, or is the multicore software platform limited to data-parallel applications? 6.Does the multicore software platform properly support modern programming paradigms, such as objects, templates, and exceptions? 7.What does it take to handle global variables in my application?
© 2008 CILK ARTS, Inc.22 Application Performance 8.How can I tell if my application exhibits enough parallelism to exploit multiple processors? 9.Does the multicore software platform address response-time bottlenecks, or just offer more throughput? 10.Does application performance scale up linearly as cores are added, or does it quickly reach diminishing returns? 11.Is my multicore-enabled code just as fast as my original serial code when run on a single processor? 12.Does the multicore software platform's scheduler load-balance irregular applications efficiently to achieve full utilization? 13.Will my application "play nicely" with other jobs on the system, or do multiple jobs cause thrashing of resources? 14.What tools are available for detecting multicore performance bottlenecks?
© 2008 CILK ARTS, Inc.23 Software Reliability 15.How much harder is it to debug my multicore-enabled application than to debug my original application? 16.Can I use my standard, familiar debugging tools? 17.Are there effective debugging tools to identify and localize parallel-programming errors, such as data-race bugs? 18.Must I use a parallel debugger even if I make an ordinary serial programming error? 19.What changes must I make to my release- engineering processes to ensure that my delivered software is reliable? 20.Can I use my existing unit tests and regression tests?
© 2008 CILK ARTS, Inc.24 Programming Tools & Techniques
© 2008 CILK ARTS, Inc.25 Parallel C++ Options Pthreads & WinAPI threads ∙An API for creating and manipulating O/S threads. ∙Programmer writes thread-interaction protocols. Intel’s Threading Building Blocks ∙A C++ template library with automatic scheduling of tasks. ∙Programmer writes explicit “continuations.” OpenMP ∙Open-source language extensions to C++. ∙Programmer inserts pragmas into code. Cilk++ ∙Faithful extension of C++. ∙Programmer inserts keywords into code that do not destroy serial semantics. ∙Provably good scheduler and a race-detection tool.
© 2008 CILK ARTS, Inc.26 Cilk++: Smooth Path to Multicore for Legacy Applications
© 2008 CILK ARTS, Inc.27 Cilk++ Cilk++ provides a smooth evolution from serial programming to parallel programming. Cilk++ is a remarkably simple set of extensions for C++ and a powerful runtime system for multicore applications.
© 2008 CILK ARTS, Inc.28 CILK ARTS Solution Development Time ∙Minimal application changes ∙Can be learned in days by programmers without multithreading expertise ∙Seamless path forward (and backward) Software Reliability ∙Multithreaded version as reliable as the original ∙No fundamental change to release engineering Application Performance ∙Best-in-class performance ∙Linear scaling as cores are added ∙Minimal overhead on a single-core
© 2008 CILK ARTS, Inc.29 int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } Serial code Cilk++ source Conventional Regression Tests Reliable Single- Threaded Code Cilk++ Compiler Conventional Compiler Cilk++ Runtime System Exceptional Performance Binary Reliable Multi- Threaded Code Cilk++ Race Detector Parallel Regression Tests Cilk++ Hyperobject Library Linker CILK ARTS Solution int fib (int n) { if (n<2) return (n); else { int x,y; x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; return (x+y); }
© 2008 CILK ARTS, Inc.30 Thank You! ∙Free e-Book ∙We are currently accepting applications for our Early Visibility program ∙For more info about Cilk++ and resources for multicoders: