What is cloud computing? Why is this different? Jimmy Lin The iSchool University of Maryland Monday, March 30, 2009 This work is licensed under a Creative.

Slides:



Advertisements
Similar presentations
Intro to Cloud Computing Andrew Rau-Chaplin - Adapted from What is Cloud Computing? (and an intro to parallel/distributed processing), Jimmy Lin, The iSchool.
Advertisements

Introduction to MapReduce Data-Intensive Information Processing Applications ― Session #1 Jimmy Lin University of Maryland Tuesday, January 26, 2010 This.
Cloud Computing Lecture #3 More MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, September 10, 2008 This work is licensed under a Creative.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 1.
Cloud Computing Lecture #2 Introduction to MapReduce Jimmy Lin The iSchool University of Maryland Monday, September 8, 2008 This work is licensed under.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Jimmy Lin The iSchool University of Maryland Wednesday, April 15, 2009
Cloud Computing Lecture #5 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, October 1, 2008 This work is licensed.
Cloud Computing Lecture #7 Introduction to Ajax Jimmy Lin The iSchool University of Maryland Wednesday, October 15, 2008 This work is licensed under a.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Cloud Computing Lecture #1 Parallel and Distributed Computing Jimmy Lin The iSchool University of Maryland Monday, January 28, 2008 This work is licensed.
Introduction to MapReduce Data-Intensive Information Processing Applications ― Session #1 Jimmy Lin University of Maryland Tuesday, January 26, 2010 This.
Cloud computing (and Google AppEngine) material adapted from slides by Indranil Gupta, Jimmy Lim, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
Cloud Computing Lecture #4 Graph Algorithms with MapReduce Jimmy Lin The iSchool University of Maryland Wednesday, February 6, 2008 This work is licensed.
Internet (large) scale Applications L. Grewe. What do I mean? Examples include Web, , Search, content delivery networks (e.g., Akamai, and Limelight),
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday,
LBSC 690 Session #12 Building and Deploying Technology Jimmy Lin The iSchool University of Maryland Wednesday, November 19, 2008 This work is licensed.
Cloud Computing Part #1 1. 2
M.A.Doman Model for enabling the delivery of computing as a SERVICE.
Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.
CS 470/570:Introduction to Parallel and Distributed Computing.
Security Difficulties of E-Learning in Cloud Computing
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Cloud Computing Source:
Computer Architecture Parallel Processing
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
CSCI-2950u :: Data-Intensive Scalable Computing Rodrigo Fonseca (rfonseca)
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1.
Cloud Computing Kwangyun Cho v=8AXk25TUSRQ.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg.
Software Architecture
Introduction to Cloud Computing
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Massive Data Processing 02: MapReduce Basics 闫宏飞 北京大学信息科学技术学院 7/1/2014 This work is licensed under a Creative Commons.
Introduction to MapReduce Data-Intensive Information Processing Applications ― Session #1 Jimmy Lin University of Maryland Tuesday, January 26, 2010 This.
Distributed Computing Seminar Lecture 1: Introduction to Distributed Computing & Systems Background Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet.
Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google.
CSC 7600 Lecture 28 : Final Exam Review Spring 2010 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS FINAL EXAM REVIEW Daniel Kogler, Chirag Dekate.
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Modified.
Hwajung Lee. Maximilien Brice, © CERN  Fact: Processor population is exploding. Technology has dramatically reduced the price of processors.  Geographic.
Lecture 1 – Introduction CSE 490h – Introduction to Distributed Computing, Spring 2007 Except as otherwise noted, the content of this presentation is licensed.
Parallel Computing.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Grid Computing Framework A Java framework for managed modular distributed parallel computing.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Cloud Computing Computation as Utility. Computer utilities Leonard Kleinrock, ARPANET, 1969: ”We will probably see the spread of computer utilities, which.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Parallel Computing Presented by Justin Reschke
Lecture 1 – Introduction CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
PRESENTED BY– IRAM KHAN ISHITA TRIPATHI GAURAV AGRAWAL GAURAV SINGH HIMANSHU AWASTHI JAISWAR VIJAY KUMAR JITENDRA KUMAR VERMA JITENDRA SINGH KAMAL KUMAR.
CLOUD COMPUTING When it's smarter to rent than to buy.. Presented by D.Datta Sai Babu 4 th Information Technology Tenali Engineering College.
Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Introduction to Cloud Computing 1. 2 Performance progress 2010: 2.57 petaflops 2005: teraflops 2000: 4.94 teraflops 1995: 170 gigaflops 15,100.
Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Lecture 1 – Parallel Programming Primer
Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Multi-Processing in High Performance Computer Architecture:
Hwajung Lee ITEC452 Distributed Computing Lecture 1 Introduction to Distributed Systems.
Presentation transcript:

What is cloud computing? Why is this different? Jimmy Lin The iSchool University of Maryland Monday, March 30, 2009 This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States See for details Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)

Source:

What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications

1. “Web-Scale” Problems Characteristics: Definitely data-intensive May also be processing intensive Examples: Crawling, indexing, searching, mining the Web Data warehouses Sensor networks “Post-genomics” life sciences research Other scientific data (physics, astronomy, etc.) Web 2.0 applications …

How much data? Internet archive has 2 PB of data + 20 TB/month Google processes 20 PB a day (2008) “all words ever spoken by human beings” ~ 5 EB CERN’s LHC will generate PB a year Sanger anticipates 6 PB of data in K ought to be enough for anybody.

Maximilien Brice, © CERN

There’s nothing like more data! s/inspiration/data/g; (Banko and Brill, ACL 2001) (Brants et al., EMNLP 2007)

What to do with more data? Answering factoid questions Pattern matching on the Web Works amazingly well Learning relations Start with seed instances Search for patterns on the Web Using patterns to find more instances Who shot Abraham Lincoln?  X shot Abraham Lincoln Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) Wolfgang Amadeus Mozart ( ) Einstein was born in 1879 PERSON (DATE – PERSON was born in DATE (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )

How do I make money? Petabytes of valuable customer data… Sitting idle in existing data warehouses Overflowing out of existing data warehouses Simply being thrown away Source of data: OLTP User behavior logs Call-center logs Web crawls, public datasets … Structured data (today) vs. unstructured data (tomorrow) How can an organization derive value from all this data?

2. Large Data Centers Web-scale problems? Throw more machines at it! Centralization of resources in large data centers Necessary ingredients: fiber, juice, and land What do Oregon, Iceland, and abandoned mines have in common? Important Issues: Efficiency Redundancy Utilization Security Management overhead

Source: Harper’s (Feb, 2008)

Maximilien Brice, © CERN

Key Technology: Virtualization Hardware Operating System App Traditional Stack Hardware OS App Hypervisor OS Virtualized Stack

3. Different Computing Models Utility computing Why buy machines when you can rent cycles? Examples: Amazon’s EC2, GoGrid, AppNexus Platform as a Service (PaaS) Give me nice API and take care of the implementation Example: Google App Engine Software as a Service (SaaS) Just run it for me! Example: Gmail “Why do it yourself if you can pay someone to do it for you?”

4. Web Applications What is the nature of future software applications? From the desktop to the browser SaaS == Web-based applications Examples: Google Maps, Facebook How do we deliver highly-interactive Web-based applications? AJAX (asynchronous JavaScript and XML) A hack on top of a mistake built on sand, all held together by duct tape and chewing gum? For better, or for worse…

What is the course about? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications

Web-Scale Problems? Don’t hold your breath: Biocomputing Nanocomputing Quantum computing … It all boils down to… Divide-and-conquer Throwing more hardware at the problem Simple to understand… a lifetime to master…

Divide and Conquer “Work” w1w1 w1w1 w2w2 w2w2 w3w3 w3w3 r1r1 r1r1 r2r2 r2r2 r3r3 r3r3 “Result” “worker” Partition Combine

Different Workers Different threads in the same core Different cores in the same CPU Different CPUs in a multi-processor system Different machines in a distributed system

Haven’t we been here before? (Quick tour through parallel and distributed computing)

Flynn’s Taxonomy Instructions Single (SI)Multiple (MI) Data Multiple (MD) SISD single-threaded process MISD pipeline architecture SIMD vector processing MIMD multi-threaded processes Single (SD)

SISD D D D D D D D D D D D D D D Processor Instructions

SIMD D0D0 D0D0 Processor Instructions D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D0D0 D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D1D1 D1D1 D2D2 D2D2 D3D3 D3D3 D4D4 D4D4 … … DnDn DnDn D0D0 D0D0

MIMD D D D D D D D D D D D D D D Processor Instructions D D D D D D D D D D D D D D Processor Instructions

Source: MIT Open Courseware

Memory (Instructions and Data) Processor InstructionsData Interface to external world

Memory (Instructions and Data) Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world

Memory (Instructions and Data) Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Processor InstructionsData Interface to external world Memory (Instructions and Data) Network

Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Network Processor Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Processor Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Processor Memory (Instructions and Data) Processor InstructionsData Interface to external world InstructionsData Processor

Choices, Choices, Choices Commodity vs. “exotic” hardware Scale “up” or scale “out” Number of machines vs. processor vs. cores Bandwidth of memory vs. disk vs. network Different programming models

Parallelization Problems How do we assign work units to workers? What if we have more work units than workers? What if workers need to share partial results? How do we aggregate partial results? How do we know all the workers have finished? What if workers die? What is the common theme of all of these problems?

General Theme? Parallelization problems arise from: Communication between workers Access to shared resources Thus, we need a synchronization system! This is tricky: Finding bugs is hard Solving bugs is even harder

Managing Multiple Workers Difficult because (Often) don’t know the order in which workers run (Often) don’t know where the workers are running (Often) don’t know when workers interrupt each other Thus, we need: Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers Still, lots of problems: Deadlock, livelock, race conditions,... Moral of the story: be careful!

Source: Ricardo Guimarães Herrmann

“Design Patterns”

slaves master

C C P P P P P P C C C C C C P P P P P P C C C C

C C P P P P P P C C C C shared queue W W W W W W W W W W

Rubber, meet road…

Source: Wikipedia

Rubber, Meet Road Existing tools: pthreads, OpenMP for multi-threaded programming MPI for clustering computing Condor, PBS, SGE, etc. for higher-level job management The reality: Lots of one-off solutions, custom code Write you own dedicated library, then program with it Burden on the programmer to explicitly manage everything

What’s different now?

Source: MIT Open Courseware

Questions?