CS10 The Beauty and Joy of Computing Lecture #19 Distributed Computing 2010-11-08 Researchers at Indiana U used data mining techniques to uncover evidence.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),
SDN Controller Challenges
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Chapter 1 An Overview of Computers and Programming Languages.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Reference: Message Passing Fundamentals.
Operating Systems Lecture 1 Crucial hardware concepts review M. Naghibzadeh Reference: M. Naghibzadeh, Operating System Concepts and Techniques, iUniverse.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Advances in Language Design
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
Threads, Thread management & Resource Management.
Multi-Core Architectures
UC Berkeley EECS Sr Lecturer SOE Dan Garcia printing-aims-to-prevent-a-piracy-plague/ Quest.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
IT253: Computer Organization
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
The role of data mining in the last presidential election. A mind-blowing piece on how the Obama campaign used various sources of data to target voters.
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Distributed Programming CA107 Topics in Computing Series Martin Crane Karl Podesta.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Data Structures and Algorithms in Parallel Computing Lecture 4.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
UC Berkeley EECS Sr Lecturer SOE Dan Garcia printing-aims-to-prevent-a-piracy-plague/ Quest.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
UC Berkeley EECS Lecturer SOE Dan Garcia The success of Apple’s Siri (only available on the iPhone 4S) has sparked competition, to be sure. Google’s IRIS.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
CS10 The Beauty and Joy of Computing Lecture #4 : Functions UC Berkeley EECS Lecturer SOE Dan Garcia Researchers at Microsoft and UW are working.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Multi-Grid Esteban Pauli 4/25/06. Overview Problem Description Problem Description Implementation Implementation –Shared Memory –Distributed Memory –Other.
UC Berkeley EECS Sr Lecturer SOE Dan Garcia Valve (video game makers of Half-Life) believes the future of video games may not be in the input device (ala.
CT101: Computing Systems Introduction to Operating Systems.
Jeremy Martin Alex Tiskin
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Job Scheduling in a Grid Computing Environment
A. Rama Bharathi Regd. No: 08931F0040 III M.C.A
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Grid Computing.
The University of Adelaide, School of Computer Science
Grid Computing Colton Lewis.
Software Engineering Introduction to Apache Hadoop Map Reduce
CS 179 Lecture 14.
Ch 4. The Evolution of Analytic Scalability
CSE8380 Parallel and Distributed Processing Presentation
COMP60621 Fundamentals of Parallel and Distributed Systems
Prof. Leonardo Mostarda University of Camerino
COMP60611 Fundamentals of Parallel and Distributed Systems
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

CS10 The Beauty and Joy of Computing Lecture #19 Distributed Computing Researchers at Indiana U used data mining techniques to uncover evidence that “political campaigns and special interest groups are giving the impression of a broad grass-roots movement”… UC Berkeley EECS Lecturer SOE Dan Garcia

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (2) Garcia, Fall 2010  Basics  Memory  Network  Distributed Computing  Themes  Challenges  Solution! MapReduce  How it works  Our implementation Lecture Overview

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (3) Garcia, Fall 2010 Memory Hierarchy Processor Size of memory at each level Increasing Distance from Processor Level 1 Level 2 Level n Level 3... Higher Lower Levels in memory hierarchy

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (4) Garcia, Fall 2010 Memory Hierarchy Details  If level closer to Processor, it is:  Smaller  Faster  More expensive  subset of lower levels  …contains most recently used data  Lowest Level (usually disk) contains all available data (does it go beyond the disk?)  Memory Hierarchy Abstraction presents the processor with the illusion of a very large & fast memory

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (5) Garcia, Fall 2010 Networking Basics  source encodes and destination decodes content of the message  switches and routers use the destination in order to deliver the message, dynamically Internet sourcedestination Network interface device

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (6) Garcia, Fall 2010 Networking Facts and Benefits  Networks connect computers, sub- networks, and other networks.  Networks connect computers all over the world (and in space!)  Computer networks...  support asynchronous and distributed communication  enable new forms of collaboration

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (7) Garcia, Fall 2010 Performance Needed for Big Problems  Performance terminology  the FLOP: FLoating point OPeration  “flops” = # FLOP/second is the standard metric for computing power  Example: Global Climate Modeling  Divide the world into a grid (e.g. 10 km spacing)  Solve fluid dynamics equations for each point & minute  Requires about 100 Flops per grid point per minute  Weather Prediction (7 days in 24 hours):  56 Gflops  Climate Prediction (50 years in 30 days):  4.8 Tflops  Perspective  Intel Core i7 980 XE Desktop Processor  ~100 Gflops  Climate Prediction would take ~5 years en.wikipedia.org/wiki/FLOPS

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (8) Garcia, Fall 2010  Supercomputing – like those listed in top500.org  Multiple processors “all in one box / room” from one vendor that often communicate through shared memory  This is often where you find exotic architectures  Distributed computing  Many separate computers (each with independent CPU, RAM, HD, NIC) that communicate through a network  Grids (heterogenous computers across Internet)  Clusters (mostly homogeneous computers all in one room)  Google uses commodity computers to exploit “knee in curve” price/performance sweet spot  It’s about being able to solve “big” problems, not “small” problems faster  These problems can be data (mostly) or CPU intensive What Can We Do? Use Many CPUs!

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (9) Garcia, Fall 2010 Distributed Computing Themes  Let’s network many disparate machines into one compute cluster  These could all be the same (easier) or very different machines (harder)  Common themes  “Dispatcher” gives jobs & collects results  “Workers” (get, process, return) until done  Examples  BOINC, Render farms  Google clusters running MapReduce en.wikipedia.org/wiki/Distributed_computing

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (10) Garcia, Fall 2010 Peer Instruction 1. Writing & managing is relatively straightforward; just hand out & gather data 2. The majority of the world’s computing power lives in supercomputer centers 12 a) FF b) FT c) TF d) TT

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (11) Garcia, Fall The heterogeneity of the machines, handling machines that fail, falsify data. FALSE 2. Have you considered how many PCs + game devices exist? Not even close. FALSE Peer Instruction Answer 1. Writing & managing is relatively straightforward; just hand out & gather data 2. The majority of the world’s computing power lives in supercomputer centers 12 a) FF b) FT c) TF d) TT

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (12) Garcia, Fall 2010 Distributed Computing Challenges  Communication is fundamental difficulty  Distributing data, updating shared resource, communicating results, handling failures  Machines have separate memories, so need network  Introduces inefficiencies: overhead, waiting, etc.  Need to parallelize algorithms, data structures  Must look at problems from parallel standpoint  Best for problems whose compute times >> overhead en.wikipedia.org/wiki/Embarrassingly_parallel

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (13) Garcia, Fall 2010  We told you “the beauty of pure functional programming is that it’s easily parallelizable”  Do you see how you could parallelize this?  Reducer should be associative and commutative  Imagine 10,000 machines ready to help you compute anything you could cast as a MapReduce problem!  This is the abstraction Google is famous for authoring  It hides LOTS of difficulty of writing parallel code!  The system takes care of load balancing, dead machines, etc. Google’s MapReduce Simplified en.wikipedia.org/wiki/MapReduce **** Output: Input: Note: only two data types!

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (14) Garcia, Fall 2010 MapReduce Advantages/Disadvantages  Now it’s easy to program for many CPUs  Communication management effectively gone  Fault tolerance, monitoring  machine failures, suddenly-slow machines, etc are handled  Can be much easier to design and program!  Can cascade several (many?) MapReduce tasks  But … it further restricts solvable problems  Might be hard to express problem in MapReduce  Data parallelism is key  Need to be able to break up a problem by data chunks  Full MapReduce is closed-source (to Google) C++  Hadoop is open-source Java-based rewrite

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (15) Garcia, Fall 2010 a) Dividing problem up b) Shipping it out / Getting it back c) Verifying the result d) Putting results together e) Depends on problem What contributes to overhead the most?

UC Berkeley CS10 “The Beauty and Joy of Computing” : Distributed Computing (16) Garcia, Fall 2010  Systems and networks enable and foster computational problem solving  MapReduce is a great distributed computing abstraction  It removes the onus of worrying about load balancing, failed machines, data distribution from the programmer of the problem  (and puts it on the authors of the MapReduce framework) Summary