Download presentation
Presentation is loading. Please wait.
Published byLizbeth Carr Modified over 8 years ago
1
Lecture 1 Book: Hadoop in Action by Chuck Lam Online course – “Cloud Computing Concepts” lecture notes by Indranil Gupta
2
Content Introduction Clouds MapReduce Understanding Hadoop and MapReduce
3
Gradation policy Attendance – 10% Quizzes – 20% Midterm – 20% Assignments – 20% Final – 30% TOTAL - 100 points
4
Many Cloud Providers AWS: Amazon Web Services EC2: Elastic Compute Cloud S3: Simple Storage Service EBS: Elastic Block Storage Microsoft Azure Google Compute Enginr Rightscale, Salesforce, EMC, Gigaspaces, 10gen, Datastax, Oracle, VMWare, Yahoo, Cloudera And many, many more!
5
Two Categories of Clouds Can be either a (i) public cloud, or (ii) private cloud Private clouds are accessible only to company employees Public clouds provide service to any paying customer: Amazon S3(Simple Storage Service) Amazon EC2(Elastic Compute Cloud) Google App Engine/Compute Engine
6
What is a Cloud? It’s a cluster! It’s a supercomputer! It’s a datastore! It’s a Superman! None of the above All of the above Cloud = Lots of storage + computing cycles nearby
7
What is a Cloud? A single-site cloud (aka “datacenter”) consists of Compute nodes (grouped into racks) Switches, connecting racks A network topology, e.g. hierarchical Storage nodes connected to network Front-end for submitting jobs and receiving client requests Software services A geographically distributed cloud consists of Multiple such sites Each site perhaps with different structure and services
8
A Cloudy history of Time
9
On-demand Access:*aaS On-demand: renting a cab vs. renting a car or buying one HaaS: Hardware as a Service Access to barebones hardware machines. Not always a good idea because of security risks IaaS: Infrastructure as a Service Access to flexible computing and storage infrastructure. Ex: Amazon Web Services (AWS: EC2 and S3) PaaS: Platform as a Service Access to flexible computing and storage infrastructure, coupled with a software platform SaaS: Software as a Service Access to software services(Service Oriented Architectures) Ex: Google docs, MS office on demand
10
A Cloud... A cloud consists of Hundreds to thousands of machines in a datacenter (server side) Thousands to millions of machines accessing these services (client side) Servers communicate amongst one another Clients communicate with servers Clients also communicate with each other
11
A Cloud... IS a Distributed System Servers communicate amongst one another -> Distributed System Essentially a cluster! Clients communicate with servers Also a distributed system! Clients may also communicate with each other In peer-to-peer systems like BitTorrent Also a distributed system!
12
Four Features of Clouds = All Distributed Systems Features! I. Massive Scale: many servers II. On-demand nature –access (multiple) servers anywhere III. Data-Intensive Nature – lots of data => need a cluster (multiple machines) to store IV. New Cloud Programming Paradigms – Hadoop/Mapreduce, NoSQL all need clusters
13
Distributed System = Many Processes Sending and Receiving Messages
14
Many Challenges Abound... Failures : no longer the exception, but rather a norm Scalability: 1000s of machines, Terabytes of data Asynchrony : clock skew and clock drift Concurrency : 1000s of machines interacting with each other accessing the same data...
15
Hadoop Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop. Today, Hadoop is a core part of the computing infrastructure for many web companies, such as Yahoo, Facebook, LinkedIn, and Twitter. An effective programmer, today, must have knowledge of relational databases, networking, and security, all of which were considered optional skills a couple decades ago. Similarly, basic understanding of distributed data processing will soon become an essential part of every programmer’s toolbox.
16
What is MapReduce
17
Map
18
Reduce
20
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.