© 2008 CILK ARTS, Inc.1 Executive Briefing: Multicore- Enabling SaaS Applications September 3, 2008 Cilk++, Cilk, Cilkscreen, and Cilk Arts are trademarks.

Slides:



Advertisements
Similar presentations
Multi-core Computing Lecture 3 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh.
Advertisements

© 2009 Charles E. Leiserson and Pablo Halpern1 Introduction to Cilk++ Programming PADTAD July 20, 2009 Cilk, Cilk++, Cilkview, and Cilkscreen, are trademarks.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science The Implementation of the Cilk-5 Multithreaded Language (Frigo, Leiserson, and.
The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.
SYNAR Systems Networking and Architecture Group CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of.
1 CS 140 : Jan 27 – Feb 3, 2010 Multicore (and Shared Memory) Programming with Cilk++ Multicore and NUMA architectures Multithreaded Programming Cilk++
From Essentials of Computer Architecture by Douglas E. Comer. ISBN © 2005 Pearson Education, Inc. All rights reserved. 7.2 A Central Processor.
INTEL CONFIDENTIAL Why Parallel? Why Now? Introduction to Parallel Programming – Part 1.
Dr. Gheith Abandah, Chair Computer Engineering Department The University of Jordan 20/4/20091.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
Chapter 3 Memory Management: Virtual Memory
1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Introduction to Parallel Computing. Serial Computing.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.
Multi-core architectures. Single-core computer Single-core CPU chip.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Multi-Core Architectures
Bev Bachmayer Software and Solutions group With special thanks to Matthew Wolf Georgia Technical Universitiy Pacifying the Pandora's Box of Parallelism:
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
Dr. Alexandra Fedorova School of Computing Science SFU
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
University of Washington What is parallel processing? Spring 2014 Wrap-up When can we execute things in parallel? Parallelism: Use extra resources to solve.
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Shashwat Shriparv InfinitySoft.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
THE BRIEF HISTORY OF 8085 MICROPROCESSOR & THEIR APPLICATIONS
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
Parallelism without Concurrency Charles E. Leiserson MIT.
MULTICORE PROCESSOR TECHNOLOGY.  Introduction  history  Why multi-core ?  What do you mean by multicore?  Multi core architecture  Comparison of.
Programmability Hiroshi Nakashima Thomas Sterling.
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
CS 240A: Shared Memory & Multicore Programming with Cilk++
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
Tuning Threaded Code with Intel® Parallel Amplifier.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to threads
Computer Engg, IIT(BHU)
Architecture & Organization 1
Many-core Software Development Platforms
Multi-Processing in High Performance Computer Architecture:
Jonathan Mak & Alan Mycroft University of Cambridge
Intel® Parallel Studio and Advisor
Architecture & Organization 1
Compiler Back End Panel
Compiler Back End Panel
Multithreading Why & How.
Lecture 2 The Art of Concurrency
Cilk and Writing Code for Hardware
Introduction to CILK Some slides are from:
Types of Parallel Computers
Presentation transcript:

© 2008 CILK ARTS, Inc.1 Executive Briefing: Multicore- Enabling SaaS Applications September 3, 2008 Cilk++, Cilk, Cilkscreen, and Cilk Arts are trademarks of Cilk Arts, Inc.

© 2008 CILK ARTS, Inc.2 Agenda ∙Emergence of multicore processors ∙Key challenges facing developers ∙When can multicore help? ∙Data races: a new type of bug ∙Questions to ask when going multicore ∙Programming tools & techniques

© 2008 CILK ARTS, Inc.3 About CILK ARTS ∙Launched in March ∙Headquartered in Burlington, MA. ∙Funded by Stata Venture Partners, software industry executives, founders, and grants from the NSF and DARPA. ∙First product is Cilk++, based on 15 years of research at MIT Mission: To provide the easiest, quickest, and most reliable way to optimize application performance on multicore processors.

© 2008 CILK ARTS, Inc.4 Emergence of Multicore and Impact on SaaS

© 2008 CILK ARTS, Inc.5 Source: Herb Sutter, “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software,” Dr. Dobb's Journal, 30(3), March Transistor count is still rising, … but clock speed is bounded at ~ 5GHz. Intel CPU Introductions Moore’s Law

© 2008 CILK ARTS, Inc.6 Power Density Source: Patrick Gelsinger, Intel Developer’s Forum, Intel Corporation, 2004.

© 2008 CILK ARTS, Inc.7 Vendor Solution ∙To scale performance, put many processor cores on a chip. ∙Intel predicts 80+ cores by 2011! Intel 45nm quad-core processor

© 2008 CILK ARTS, Inc.8 SaaS Opportunity ∙Increase throughput  Quantitative finance: increase volume of portfolios analyzed overnight ∙Reduce response time  Engineering simulation: accelerate structural analysis of assembly ∙Improve user experience  Multiplayer games: increased galaxy size ∙Reduce data center power consumption

© 2008 CILK ARTS, Inc.9 Multicore and SaaS ∙Application response time? ∙Processor utilization? P1 P2 P3 P4 P5 P6 P7 P8 Computer Operation 1 Computer Operation 2 User Work

© 2008 CILK ARTS, Inc.10 Multicore and SaaS ∙For CPU-constrained applications, multi- threading improves response time and boosts utilization P1 P2 P3 P4 P5 P6 P7 P8 Computer Operation 1 Computer Operation 2 User Work Computer Operation 1 Computer Operation 2 User Work Computer Operation 1 Computer Operation 2 User Work Computer Operation 1 Computer Operation 2 User Work Computer Operation #1 User Work Computer Operation #2

© 2008 CILK ARTS, Inc.11 Key Challenges Facing Developers

© 2008 CILK ARTS, Inc.12 Multicore Challenges Development Time ∙How will you get your product out in time? ∙Where will you find enough parallel- programming talent? ∙Will you be forced to redesign your application? Software Reliability ∙Can you debug your parallel application? ∙How will you test it effectively before release? Application Performance ∙How can you minimize response time? ∙Will your solution scale as the number of processor cores increases? ∙Can you identify performance bottlenecks?

© 2008 CILK ARTS, Inc.13 Can a Multicore CPU Help My App?

© 2008 CILK ARTS, Inc.14 Work & Span ∙Work: total amount of time spent in all the instructions ∙Span: Critical path ∙Parallelism: ratio of work to span

© 2008 CILK ARTS, Inc.15 Work & Span ∙Work: total amount of time spent in all the instructions ∙Span: Critical path ∙Parallelism: ratio of work to span ∙In this example:  Work = 18  Span = 9  Parallelism = 2  i.e., little gain beyond 2 processors

© 2008 CILK ARTS, Inc.16 Can Multicore Help? ∙The more parallelism is available in an application, the more a multicore processor can help. Parallelism: T 1 /T ∞ = 6.44 Work: T 1 = 58 Span: T ∞ = 9 (same as previous example)

© 2008 CILK ARTS, Inc.17 Data Races: A New Type of Bug in Multicore Programming

© 2008 CILK ARTS, Inc.18 Race Bugs r1 = x; r1++; x = r1; r2 = x; r2++; x = r2; x = 0; assert(x == 2); Definition. A determinacy race occurs when two logically parallel instructions access the same memory location and at least one of the instructions performs a write. x++; int x = 0; assert(x == 2); x++; A A B B C C D D

© 2008 CILK ARTS, Inc.19 Coping with Race Bugs ∙Although locking can “solve” race bugs, lock contention can destroy all parallelism. ∙Making local copies of the nonlocal variables can remove contention, but at the cost of restructuring program logic. ∙ Cilk++ provides hyperobjects to mitigate data races on nonlocal variables without the need for locks or code restructuring. I DEA : Different parallel branches may see different views of the hyperobject.

© 2008 CILK ARTS, Inc Questions to Ask

© 2008 CILK ARTS, Inc.21 Development Time 1.To multicore-enable my application, how much logical restructuring of my application must I do? 2.Can I easily train programmers to use the multicore software platform? 3.Can I maintain just one code base, or must I maintain a serial and parallel versions? 4.Can I avoid rewriting my application every time a new processor generation increases the core count? 5.Can I easily multicore-enable ill-structured and irregular code, or is the multicore software platform limited to data-parallel applications? 6.Does the multicore software platform properly support modern programming paradigms, such as objects, templates, and exceptions? 7.What does it take to handle global variables in my application?

© 2008 CILK ARTS, Inc.22 Application Performance 8.How can I tell if my application exhibits enough parallelism to exploit multiple processors? 9.Does the multicore software platform address response-time bottlenecks, or just offer more throughput? 10.Does application performance scale up linearly as cores are added, or does it quickly reach diminishing returns? 11.Is my multicore-enabled code just as fast as my original serial code when run on a single processor? 12.Does the multicore software platform's scheduler load-balance irregular applications efficiently to achieve full utilization? 13.Will my application "play nicely" with other jobs on the system, or do multiple jobs cause thrashing of resources? 14.What tools are available for detecting multicore performance bottlenecks?

© 2008 CILK ARTS, Inc.23 Software Reliability 15.How much harder is it to debug my multicore-enabled application than to debug my original application? 16.Can I use my standard, familiar debugging tools? 17.Are there effective debugging tools to identify and localize parallel-programming errors, such as data-race bugs? 18.Must I use a parallel debugger even if I make an ordinary serial programming error? 19.What changes must I make to my release- engineering processes to ensure that my delivered software is reliable? 20.Can I use my existing unit tests and regression tests?

© 2008 CILK ARTS, Inc.24 Programming Tools & Techniques

© 2008 CILK ARTS, Inc.25 Parallel C++ Options Pthreads & WinAPI threads ∙An API for creating and manipulating O/S threads. ∙Programmer writes thread-interaction protocols. Intel’s Threading Building Blocks ∙A C++ template library with automatic scheduling of tasks. ∙Programmer writes explicit “continuations.” OpenMP ∙Open-source language extensions to C++. ∙Programmer inserts pragmas into code. Cilk++ ∙Faithful extension of C++. ∙Programmer inserts keywords into code that do not destroy serial semantics. ∙Provably good scheduler and a race-detection tool.

© 2008 CILK ARTS, Inc.26 Cilk++: Smooth Path to Multicore for Legacy Applications

© 2008 CILK ARTS, Inc.27 Cilk++ Cilk++ provides a smooth evolution from serial programming to parallel programming. Cilk++ is a remarkably simple set of extensions for C++ and a powerful runtime system for multicore applications.

© 2008 CILK ARTS, Inc.28 CILK ARTS Solution Development Time ∙Minimal application changes ∙Can be learned in days by programmers without multithreading expertise ∙Seamless path forward (and backward) Software Reliability ∙Multithreaded version as reliable as the original ∙No fundamental change to release engineering Application Performance ∙Best-in-class performance ∙Linear scaling as cores are added ∙Minimal overhead on a single-core

© 2008 CILK ARTS, Inc.29 int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } int fib (int n) { if (n<2) return (n); else { int x,y; x = fib(n-1); y = fib(n-2); return (x+y); } Serial code Cilk++ source Conventional Regression Tests Reliable Single- Threaded Code Cilk++ Compiler Conventional Compiler Cilk++ Runtime System Exceptional Performance Binary Reliable Multi- Threaded Code Cilk++ Race Detector Parallel Regression Tests Cilk++ Hyperobject Library Linker CILK ARTS Solution int fib (int n) { if (n<2) return (n); else { int x,y; x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; return (x+y); }

© 2008 CILK ARTS, Inc.30 Thank You! ∙Free e-Book ∙We are currently accepting applications for our Early Visibility program ∙For more info about Cilk++ and resources for multicoders:  