Download presentation
Presentation is loading. Please wait.
1
Cloud Computing Lecture #1 Parallel and Distributed Computing Jimmy Lin The iSchool University of Maryland Monday, January 28, 2008 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for detailshttp://creativecommons.org/licenses/by-nc-sa/3.0/us/ Material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
2
iSchool Today’s Topics Course overview Introduction to parallel and distributed processing
3
iSchool What’s the course about? Integration of research and teaching Team leaders get help on a tough problem Team members gain valuable experience Criteria for success: at the end of the course Each team will have a publishable result Each team will have a paper suitable for submission to an appropriate conference/journal Along the way: Build a community of hadoopers at Maryland Generate lots of publicity Have lots of fun!
4
iSchool Hadoop Zen Don’t get frustrated (take a deep breath)… This is bleeding edge technology Be patient… This is the first time I’ve taught this course Be flexible… Lots of uncertainty along the way Be constructive… Tell me how I can make everyone’s experience better
5
iSchool Things to go over… Course schedule Course objectives Assignments and deliverables Evaluation
6
iSchool My Role To hack alongside everyone To substantively contribute ideas where appropriate To serve as a facilitator and a resource To make sure everything runs smoothly
7
iSchool Outline Web-Scale Problems Parallel vs. Distributed Computing Flynn's Taxonomy Programming Patterns
8
iSchool Web-Scale Problems Characteristics: Lots of data Lots of crunching (not necessarily complex itself) Examples: Obviously, problems involving the Web Empirical and data-driven research (e.g., in HLT) “Post-genomics era” in life sciences High-quality animation The serious hobbyist
9
iSchool It all boils down to… Divide-and-conquer Throwing more hardware at the problem Simple to understand… a lifetime to master…
10
iSchool Parallel vs. Distributed Parallel computing generally means: Vector processing of data Multiple CPUs in a single computer Distributed computing generally means: Multiple CPUs across many computers
11
iSchool Flynn’s Taxonomy Instructions Single (SI)Multiple (MI) Data Multiple (MD) SISD Single-threaded process MISD Pipeline architecture SIMD Vector Processing MIMD Multi-threaded Programming Single (SD)
12
iSchool SISD DDDDDDD Processor Instructions
13
iSchool SIMD D0D0 Processor Instructions D0D0 D0D0 D0D0 D0D0 D0D0 D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D0D0
14
iSchool MIMD DDDDDDD Processor Instructions DDDDDDD Processor Instructions
15
iSchool Parallel vs. Distributed Shared Memory Parallel: Multiple CPUs within a shared memory machine Distributed: Multiple machines with own memory connected over a network Network connection for data transfer DDDDDDD Processor Instructions DDDDDDD Processor Instructions
16
iSchool Divide and Conquer “Work” w1w1 w2w2 w3w3 r1r1 r2r2 r3r3 “Result” “worker” Partition Combine
17
iSchool Different Workers Different threads in the same core Different cores in the same CPU Different CPUs in a multi-processor system Different machines in a distributed system
18
iSchool Parallelization Problems How do we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die? What is the common theme of all of these problems?
19
iSchool General Theme? Parallelization problems arise from: Communication between workers Access to shared resources (e.g., data) Thus, we need a synchronization system! This is tricky: Finding bugs is hard Solving bugs is even harder
20
iSchool Multi-Threaded Programming Difficult because Don’t know the order in which threads run Don’t know when threads interrupt each other Thus, we need: Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers Still, lots of problems: Deadlock, livelock Race conditions... Moral of the story: be careful!
21
iSchool Patterns for Parallelism Several programming methodologies exist to build parallelism into programs Here are some…
22
iSchool Master/Workers The master initially owns all data The master creates workers and assigns tasks The master waits for workers to report back workers master
23
iSchool Producer/Consumer Flow Producers create work items Consumers process them Can be daisy-chained CP P P C C CP P P C C
24
iSchool All available consumers should be available to process data from any producer Work queues divorce 1:1 relationship from producers to consumers Work Queues C P P P C C shared queue
25
iSchool And finally… The above solutions represent general patterns In reality: Lots of one-off solutions, custom code Burden on the programmer to manage everything Can we push the complexity onto the system? MapReduce…for next time
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.