Dr. Alexandra Fedorova School of Computing Science SFU


CMPT 886: Special Topics in Operating Systems and Computer Architecture Dr. Alexandra Fedorova School of Computing Science SFU

2 Meet the Instructor Ph.D. in Computer Science from Harvard, 2006
Dissertation on operating system design for multicore processors Concurrently with Ph.D., an intern at Sun Labs (3 years) 9 US patent applications First semester at SFU: Spring 2007 Industrial partnership with Sun Microsystems

3 Course Topic Multicore processors Many research problems to solve
New type of computer architecture Dominates new processor market Desktops, servers, mobile devices, etc. Almost all chips will be multicore soon Many research problems to solve How to design software for these chips? How to design the chips themselves? How to structure hardware/software interaction?

4 Today Introduction to multicore processors
Examples of research problems Overview of the course

5 Conventional vs. Multicore
L1 cache L1 cache L1 cache L2 cache L2 cache Conventional processor Single core Dedicated caches One thread at a time Multicore processors At least two cores Shared caches Many threads simultaneously

6 The Multicore Revolution
Most new processors are multicore Most processors shipped are multicore: 2006: 75% for desktops, 85% for servers 2007: 90% for desktop and mobile, 100% for servers Everyone’s doing it Sun Microsystems Rock, Niagara 1, Niagara 2 IBM Power4, Power5, Power6, Cell AMD Quad Core (Barcelona) Embedded: ARM

7 Why Multicore? Power consumption is a huge problem
Multicore chips potentially produce a lot more computation per unit of power

8 Superior Performance/Watt
Example: Reduce CPU clock frequency by 20% Power consumption reduces by 50%! Put two 0.8 frequency cores on the same chip Get 1.6 times the computation at the same power consumption 0.5x power 0.5x power Core 0 Core 1 0.8x frequency 0.8x frequency L1 cache L1 cache L2 cache

9 Why Multicore? Increasing processor clock speed (GHz) is inefficient
Increase clock speed by 20% Power increases by ≈75% How much does performance increase?

10 Multicore vs. Unicore Multicore: Single-core: 1.6x throughput increase
No power consumption increase Single-core: 1.2x throughput increase 1.75x power increase

11 Transistors are used for parallelism: multicore processors
Transistor density still rising Clock speed isn’t Transistors are used for parallelism: multicore processors Source: Sutter, The Free Lunch is over

12 Multicore Potential Multicores offer potential to compute more efficiently Applications and systems are not ready to realize that potential What needs to be done? A fundamental shift to parallel programming New ways to manage resources in the operating system

13 What’s Important to Remember?
Massive parallelism Good or bad? Good: We can use processor more efficiently Bad: We don’t know how to make the most out of it. Core 0 Core 1 L1 cache L1 cache L2 cache Shared resources Execution: functional units, queues, register files Memory: L1 cache, L2 cache, interconnects Good or bad? Good: More efficient resource utilization (the reason for multicore) Bad: Contention for resources

14 Problems Addressed in Research
How to manage resource allocation? Operating system solutions Architectural (hardware solution) How to take advantage of parallelism? Make concurrent programming easier (languages, performance tools, etc.) Make concurrent programming automatic (automatic parallelization)

15 Managing Resource Allocation
New OS structures Extensions to hardware architecture Analytical performance modeling New ways to write applications: can the application tell the OS how it uses resources? New algorithms (attention, theoreticians and AI researchers!)

16 Operating Systems for Multicore Processors
B C A is a database application (needs lots of L1 cache) B is a web server (needs lots of L1 cache) C is a cryptographic thread (needs little L1 cache) Core 0 Core 1 L1 cache L1 cache L2 cache Threads running concurrently compete for resources Degree of contention depends on what the threads are doing

17 Challenges A B C How to find out threads’ resource requirements? How to find out if threads will compete? Core 0 Core 1 L1 cache L1 cache L2 cache How to find out the degree of contention on performance? What is the best way to schedule threads?

18 Problems Addressed in Research
How to manage resource allocation? Operating system solutions Architectural (hardware solution) How to take advantage of parallelism? Make concurrent programming easier (languages, performance tools, etc.) Make concurrent programming automatic (automatic parallelization)

19 Support for Concurrent Programming
Writing parallel code is difficult Most people think serially Deciding how to divide the work between threads is not always trivial Parallel entities need to synchronize or communicate A new paradigm for synchronization

20 Synchronization Hurts Performance
shared data If lock is not available, threads wait Execution becomes serialized

21 Coarse vs. Fine Synchronization
int update_shared_counters(int *counters, int n_counters) { int i; coarse_lock_acquire(counters_lock); for (i=0; i<n_counters; i++) fine_lock_acquire(counter_locks[i]); counters[i]++; fine_lock_release(counter_locks[i]); } coarse_lock_release(counters_lock); Coarse locks are easy to program But perform poorly Fine locks perform well But are difficult to program

22 Transactional Memory To the Rescue!
Can we have the best of both worlds? Good performance Ease of programming The answer is: Transactional Memory (TM)

23 Transactional Memory (TM)
Programming model: Extension to the language Runtime and/or hardware support Lets you do synchronization without locks Performance of fine grained locks Ease of programming of coarse grained locks

24 Transactional Memory vs. Locks
int update_shared_counters(int *counters, int n_counters) { int i; ATOMIC_BEGIN(); coarse_lock_acquire(counters_lock); for (i=0; i<n_counters; i++) fine_lock_acquire(counter_locks[i]); counters[i]++; fine_lock_release(counter_locks[i]); } coarse_lock_release(counters_lock); ATOMIC_END(); Transactional section Looks like coarse grained lock Acts like fine grained lock Performance degrades only if there is conflict

25 The Backend of TM Abort! restart read A write B read B write A write D
read C write C read E write E read D Abort!

26 State of TM Still evolving It is very real
More work needed to make it usable and well performing It is very real Sun’s new Rock processor has TM support Intel is very active

27 Summary Multicore systems Plenty of research on multicore systems
They are everywhere: servers, desktops, small devices Must understand them Plenty of research on multicore systems System software (OS, compilers, runtimes) Architecture Analytical modeling Applications

29 Class Structure Learn about multicore research
Read and critique papers Paper summaries, presentations Learn how to do multicore research Discuss papers, think about new ideas Analyze papers Learn how to use research tools (2 homeworks) Do multicore research A research project

30 Research Project A unique experience: getting a project done from start to end Goal: generate a publication Last year: two publications out of four projects Gives you confidence as a grad student Improves your resume Challenging! You will learn a lot!

31 Your Expectations Expect to work hard
But you’ll be glad you did this later Papers will be difficult to read at first (3-5 hours/paper) Will get easy later Reward: You will be comfortable at leading your own research in this area

32 Final Project You can create your own topic
Or choose from a list of existing topics Some projects are very well specified (like an undergraduate course project) Others are more open-ended (hint: an opportunity to be creative) We have systems and tools you’ll need for the project

33 Final Project (cont.) Submit a project proposal in early February
Complete the project by early April You have only two months Have to work hard! Expect to dedicate ≈15-20 hrs/week

34 Will I Succeed in this Course?
You have to work independently! Take full responsibility for your project I will help, but I cannot do it for you I do not have all the answers You will succeed, if you are prepared to work hard What you can or cannot do now does not matter The course is designed to train you

35 Course Web Site Syllabus Wiki Multicore portal Technical documentation

