Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1.

Slides:



Advertisements
Similar presentations
Cloud Computing at GES DISC Presented by: Long Pham Contributors: Aijun Chen, Bruce Vollmer, Ed Esfandiari and Mike Theobald GES DISC UWG May 11, 2011.
Advertisements

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Today’s topics Single processors and the Memory Hierarchy
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
Intro to Cloud Computing Andrew Rau-Chaplin - Adapted from What is Cloud Computing? (and an intro to parallel/distributed processing), Jimmy Lin, The iSchool.
Introduction to MapReduce Data-Intensive Information Processing Applications ― Session #1 Jimmy Lin University of Maryland Tuesday, January 26, 2010 This.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 1.
What is cloud computing? Why is this different? Jimmy Lin The iSchool University of Maryland Monday, March 30, 2009 This work is licensed under a Creative.
Cloud Computing Lecture #1 Parallel and Distributed Computing Jimmy Lin The iSchool University of Maryland Monday, January 28, 2008 This work is licensed.
Introduction to MapReduce Data-Intensive Information Processing Applications ― Session #1 Jimmy Lin University of Maryland Tuesday, January 26, 2010 This.
Internet (large) scale Applications L. Grewe. What do I mean? Examples include Web, , Search, content delivery networks (e.g., Akamai, and Limelight),
Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Wednesday,
AN INTRODUCTION TO CLOUD COMPUTING Web, as a Platform…
Cloud Computing Part #1 1. 2
M.A.Doman Model for enabling the delivery of computing as a SERVICE.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.
CS 470/570:Introduction to Parallel and Distributed Computing.
Security Difficulties of E-Learning in Cloud Computing
Project Proposal (Title + Abstract) Due Wednesday, September 4, 2013.
Lecture 1 Big Data + Cloud Computing 张奇 复旦大学 COMP Data Intensive Computing 1.
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Cloud Computing Source:
1 Introduction to Cloud Computing Jian Tang 01/19/2012.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
CSCI-2950u :: Data-Intensive Scalable Computing Rodrigo Fonseca (rfonseca)
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Lecture 8: Design of Parallel Programs Part III Lecturer: Simon Winberg.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Cloud Computing A set of Internet-based application.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Presented by: Mostafa Magdi. Contents Introduction. Cloud Computing Definition. Cloud Computing Characteristics. Cloud Computing Key features. Cost Virtualization.
Introduction to MapReduce Data-Intensive Information Processing Applications ― Session #1 Jimmy Lin University of Maryland Tuesday, January 26, 2010 This.
Data-Intensive Text Processing with MapReduce J. Lin & C. Dyer Chapter 1.
Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google.
Early Adopter: Integration of Parallel Topics into the Undergraduate CS Curriculum at Calvin College Joel C. Adams Chair, Department of Computer Science.
Cloud Computing Lecture #1 What is Cloud Computing? (and an intro to parallel/distributed processing) Jimmy Lin The iSchool University of Maryland Modified.
Hwajung Lee. Maximilien Brice, © CERN  Fact: Processor population is exploding. Technology has dramatically reduced the price of processors.  Geographic.
Parallel Computing.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Cloud Computing Talal Alsubaie DBA Saudi FDA. You Have a System (Website)
Parallel Computing Presented by Justin Reschke
Lecture 1 – Introduction CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed.
Background Computer System Architectures Computer System Software.
Next Generation of Apache Hadoop MapReduce Owen
PRESENTED BY– IRAM KHAN ISHITA TRIPATHI GAURAV AGRAWAL GAURAV SINGH HIMANSHU AWASTHI JAISWAR VIJAY KUMAR JITENDRA KUMAR VERMA JITENDRA SINGH KAMAL KUMAR.
From infrastructure to applications Where cloud computing is at and where it’s headed.
CLOUD COMPUTING When it's smarter to rent than to buy.. Presented by D.Datta Sai Babu 4 th Information Technology Tenali Engineering College.
Intro to Parallel and Distributed Processing Some material adapted from slides by Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Introduction to Cloud Computing 1. 2 Performance progress 2010: 2.57 petaflops 2005: teraflops 2000: 4.94 teraflops 1995: 170 gigaflops 15,100.
These slides are based on the book:
Lecture 1 – Parallel Programming Primer
buses, crossing switch, multistage network.
Multi-Processing in High Performance Computer Architecture:
Different Architectures
buses, crossing switch, multistage network.
Hwajung Lee ITEC452 Distributed Computing Lecture 1 Introduction to Distributed Systems.
Presentation transcript:

Cloud Computing Part #2 Materials adopted from the slides by Jimmy Lin, The iSchool, University of Maryland Zigmunds Buliņš, Mg. sc. ing 1

Source: 2

What is Cloud Computing? 1. Web-scale problems 2. Large data centers 3. Different models of computing 4. Highly-interactive Web applications 3

1. Web-Scale Problems  Characteristics: Definitely data-intensive May also be processing intensive  Examples: Crawling, indexing, searching, mining the Web “Post-genomics” life sciences research Other scientific data (physics, astronomers, etc.) Sensor networks Web 2.0 applications … 4

How much data?  Wayback Machine has 2 PB + 20 TB/month (2006)  Google processes 20 PB a day (2008)  “all words ever spoken by human beings” ~ 5 EB  NOAA has ~1 PB climate data (2007)  CERN’s LHC will generate 15 PB a year (2008) 640K ought to be enough for anybody. 5

Maximilien Brice, © CERN 6

7

What to do with more data?  Answering factoid questions Pattern matching on the Web Works amazingly well  Learning relations Start with seed instances Search for patterns on the Web Using patterns to find more instances Who shot Abraham Lincoln?  X shot Abraham Lincoln Birthday-of(Mozart, 1756) Birthday-of(Einstein, 1879) Wolfgang Amadeus Mozart ( ) Einstein was born in 1879 PERSON (DATE – PERSON was born in DATE (Brill et al., TREC 2001; Lin, ACM TOIS 2007) (Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … ) 8

2. Large Data Centers  Web-scale problems? Throw more machines at it!  Clear trend: centralization of computing resources in large data centers Necessary ingredients: fiber, juice, and space  Important Issues: Redundancy Efficiency Utilization Management 9

Source: Harper’s (Feb, 2008) 10

Maximilien Brice, © CERN 11

Key Technology: Virtualization Hardware Operating System App Traditional Stack Hardware OS App Hypervisor OS Virtualized Stack 12

3. Different Computing Models  Utility computing Why buy machines when you can rent cycles? Examples: Amazon’s EC2, GoGrid, AppNexus  Platform as a Service (PaaS) Give me nice API and take care of the implementation Example: Google App Engine, Heroku  Software as a Service (SaaS) Just run it for me! Example: Gmail “Why do it yourself if you can pay someone to do it for you?” 13

4. Web Applications  What is the nature of software applications? From the desktop to the browser SaaS = Web-based applications Examples: Google Maps, Facebook  How do we deliver highly-interactive Web-based applications? AJAX (asynchronous JavaScript and XML) For better, or for worse… 14

What is the course about?  MapReduce: the “back-end” of cloud computing Batch-oriented processing of large datasets  Ajax: the “front-end” of cloud computing Highly-interactive Web-based applications  Computing “in the clouds” Amazon’s EC2/S3 as an example of utility computing 15

Amazon Web Services  Elastic Compute Cloud (EC2) Rent computing resources by the hour Basic unit of accounting = instance-hour Additional costs for bandwidth  Simple Storage Service (S3) Persistent storage Charge by the GB/month Additional costs for bandwidth 16

Simple Storage Service  Pay for what you use: $0.20 per GByte of data transferred, $0.15 per GByte-Month for storage used, Second Life Update: ○ 1TBytes, 40,000 downloads in 24 hours - $200 17

Some cloud providers 18

Cloud Computing Zen  Don’t get frustrated (take a deep breath)… This is bleeding edge technology Those moments  Be patient… This is the second first time I’ve taught this course  Be flexible… There will be unanticipated issues along the way  Be constructive… Tell me how I can make everyone’s experience better 19

Source: Wikipedia 20

Web-Scale Problems?  Don’t hold your breath: Biocomputing Nanocomputing Quantum computing …  It all boils down to… Divide-and-conquer Throwing more hardware at the problem Simple to understand… a lifetime to master… 21

Divide and Conquer “Work” w1w1 w2w2 w3w3 r1r1 r2r2 r3r3 “Result” “worker” Partition Combine 22

Different Workers  Different threads in the same core  Different cores in the same CPU  Different CPUs in a multi-processor system  Different machines in a distributed system 23

Choices, Choices, Choices  Commodity vs. “exotic” hardware  Number of machines vs. processor vs. cores  Bandwidth of memory vs. disk vs. network  Different programming models 24

Flynn’s Taxonomy Instructions Single (SI)Multiple (MI) Data Multiple (MD) SISD Single-threaded process MISD Pipeline architecture SIMD Vector Processing MIMD Multi-threaded Programming Single (SD) 25

SISD DDDDDDD Processor Instructions 26

SIMD D0D0 Processor Instructions D0D0 D0D0 D0D0 D0D0 D0D0 D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D1D1 D2D2 D3D3 D4D4 … DnDn D0D0 27

MIMD DDDDDDD Processor Instructions DDDDDDD Processor Instructions 28

Memory Typology: Shared Memory Processor 29

Memory Typology: Distributed MemoryProcessorMemoryProcessor MemoryProcessorMemoryProcessor Network 30

Memory Typology: Hybrid Memory Processor Network Processor Memory Processor Memory Processor Memory Processor 31

Parallelization Problems  How do we assign work units to workers?  What if we have more work units than workers?  What if workers need to share partial results?  How do we aggregate partial results?  How do we know all the workers have finished?  What if workers die? What is the common theme of all of these problems? 32

General Theme?  Parallelization problems arise from: Communication between workers Access to shared resources (e.g., data)  Thus, we need a synchronization system!  This is tricky: Finding bugs is hard Solving bugs is even harder 33

Managing Multiple Workers  Difficult because (Often) don’t know the order in which workers run (Often) don’t know where the workers are running (Often) don’t know when workers interrupt each other  Thus, we need: Semaphores (lock, unlock) Conditional variables (wait, notify, broadcast) Barriers  Still, lots of problems: Deadlock, livelock, race conditions,...  Moral of the story: be careful! Even trickier if the workers are on different machines 34

Patterns for Parallelism  Parallel computing has been around for decades  Here are some “design patterns” … 35

Master/Slaves slaves master 36

Producer/Consumer Flow CP P P C C CP P P C C 37

Work Queues C P P P C C shared queue WWWWW 38

Rubber Meets Road  From patterns to implementation: pthreads, OpenMP for multi-threaded programming MPI for clustering computing …  The reality: Lots of one-off solutions, custom code Write you own dedicated library, then program with it Burden on the programmer to explicitly manage everything  MapReduce to the rescue! 39

Map/Reduce (1)  Document  Query 40

Map/Reduce (2)  Query result 41

Map/Reduce (3)  Reduce 42

Map/Reduce (4)  Reduce… 43

Map/Reduce (5)  Reduce… 44

Map/Reduce performance  Sorting bytes records (~1TB) 45