Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net

Yahoo ！ Cloud computing

babycenter epicurious Search Results of the Future yelp.com answers.com LinkedIn webmd Gawker New York Times

What’s in the Horizontal Cloud? Common Approaches to QA, Production Engineering, Performance Engineering, Datacenter Management, and Optimization Common Approaches to QA, Production Engineering, Performance Engineering, Datacenter Management, and Optimization ID & Account Management Monitoring & QoS Shared Infrastructure Metering, Billing, Accounting Horizontal Cloud Services Edge Content Services e.g., YCS, YCPI Provisioning & Virtualization e.g., EC2 Batch Storage & Processing e.g., Hadoop & Pig Operational Storage e.g., S3, MObStor, Sherpa Other Services Messaging, Workflow, virtual DBs & Webserving Security Simple Web Service API’s

Yahoo! Cloud Stack Provisioning (Self-serve) Horizontal Cloud Services …YCSYCPI Brooklyn EDGE Monitoring/Metering/Security Horizontal Cloud Services …Hadoop BATCH Horizontal Cloud Services …SherpaMOBStor STORAGE Horizontal Cloud Services VM/OS… APP Horizontal Cloud Services VM/OSyApache WEB Data Highway Serving Grid PHPApp Engine

Web Data Management Large data analysis (Hadoop) Structured record storage (PNUTS/Sherpa) Blob storage (SAN/NAS) Scan oriented workloads Focus on sequential disk I/O $ per cpu cycle CRUD Point lookups and short scans Index organized table and random I/Os $ per latency Object retrieval and streaming Scalable file storage $ per GB

The World Has Changed Web serving applications need: Scalability! Preferably elastic Flexible schemas Geographic distribution High availability Reliable storage Web serving applications can do without: Complicated queries Strong transactions

PNUTS / SHERPA To Help You Scale Your Mountains of Data

Yahoo! Serving Storage Problem Small records – 100KB or less Structured records – lots of fields, evolving Extreme data scale - Tens of TB Extreme request scale - Tens of thousands of requests/sec Low latency globally - 20+ datacenters worldwide High Availability - outages cost $millions Variable usage patterns - as applications and users change 9

The PNUTS/Sherpa Solution The next generation global-scale record store Record-orientation: Routing, data storage optimized for low-latency record access Scale out: Add machines to scale throughput (while keeping latency low) Asynchrony: Pub-sub replication to far-flung datacenters to mask propagation delay Consistency model: Reduce complexity of asynchrony for the application programmer Cloud deployment model: Hosted, managed service to reduce app time-to-market and enable on demand scale and elasticity 10

E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E What is PNUTS/Sherpa? E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Structured, flexible schema Hosted, managed infrastructure A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E 11

What Will It Become? E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Indexes and views Structured, flexible schema Hosted, managed infrastructure

What Will It Become? E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E E 75656 C A 42342 E B 42521 W C 66354 W D 12352 E F 15677 E Indexes and views

Scalability Thousands of machines Easy to add capacity Restrict query language to avoid costly queries Geographic replication Asynchronous replication around the globe Low-latency local access High availability and fault tolerance Automatically recover from failures Serve reads and writes despite failures Design Goals 14 Consistency Per-record guarantees Timeline model Option to relax if needed Multiple access paths Hash table, ordered table Primary, secondary access Hosted service Applications plug and play Share operational cost

Technology Elements PNUTS Query planning and execution Index maintenance Distributed infrastructure for tabular data Data partitioning Update consistency Replication YDOT FS Ordered tables Applications Tribble Pub/sub messaging YDHT FS Hash tables Zookeeper Consistency service YCA: Authorization PNUTS API Tabular API 15

Data Manipulation Per-record operations Get Set Delete Multi-record operations Multiget Scan Getrange 16

Tablets—Hash Table Apple Lemon Grape Orange Lime Strawberry Kiwi Avocado Tomato Banana Grapes are good to eat Limes are green Apple is wisdom Strawberry shortcake Arrgh! Don’t get scurvy! But at what price? How much did you pay for this lemon? Is this a vegetable? New Zealand The perfect fruit NameDescriptionPrice $12 $9 $1 $900 $2 $3 $1 $14 $2 $8 0x0000 0xFFFF 0x911F 0x2AF3 17

Tablets—Ordered Table 18 Apple Banana Grape Orange Lime Strawberry Kiwi Avocado Tomato Lemon Grapes are good to eat Limes are green Apple is wisdom Strawberry shortcake Arrgh! Don’t get scurvy! But at what price? The perfect fruit Is this a vegetable? How much did you pay for this lemon? New Zealand $1 $3 $2 $12 $8 $1 $9 $2 $900 $14 NameDescriptionPrice A Z Q H

Flexible Schema Posted dateListing idItemPrice 6/1/07424252Couch$570 6/1/07763245Bike$86 6/3/07211242Car$1123 6/5/07421133Lamp$15 Color Red Condition Good Fair

Storage units Routers Tablet Controller REST API Clients Local region Remote regions Tribble Detailed Architecture 20

Tablet Splitting and Balancing 21 Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over time Overfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers Storage unit Tablet

QUERY PROCESSING 22

Accessing Data 23 SU 1 Get key k 2 3 Record for key k 4

Bulk Read 24 SU Scatter/ gather server SU 1 {k1, k2, … kn} 2 Get k 1 Get k 2 Get k 3

Storage unit 1Storage unit 2Storage unit 3 Range Queries in YDOT Clustered, ordered retrieval of records Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1

Updates 1 Write key k 2 7 Sequence # for key k 8 SU 3 Write key k 4 5 SUCCESS 6 Write key k Routers Message brokers 26

ASYNCHRONOUS REPLICATION AND CONSISTENCY 27

Asynchronous Replication 28

Goal: Make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key “Alice”? Consistency Model 29 Time Record inserted Update Delete Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Update As the record is updated, copies may get out of sync.

Example: Social Alice UserStatus AliceBusy West East UserStatus AliceFree UserStatus Alice??? UserStatus Alice??? UserStatus AliceBusy UserStatus Alice___ Busy Free Record Timeline

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Current version Stale version Read Consistency Model 31 In general, reads are served using a local copy

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read up-to-date Current version Stale version Consistency Model 32 But application can request and get current version

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Read ≥ v.6 Current version Stale version Consistency Model 33 Or variations such as “read forward”—while copies may lag the master record, every copy goes through the same sequence of changes

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write Current version Stale version Consistency Model 34 Achieved via per-record primary copy protocol (To maximize availability, record masterships automaticlly transferred if site fails) Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors)

Time v. 1 v. 2 v. 3v. 4 v. 5 v. 7 Generation 1 v. 6 v. 8 Write if = v.7 ERROR Current version Stale version Consistency Model 35 Test-and-set writes facilitate per-record transactions

Consistency Techniques Per-record mastering Each record is assigned a “master region” May differ between records Updates to the record forwarded to the master region Ensures consistent ordering of updates Tablet-level mastering Each tablet is assigned a “master region” Inserts and deletes of records forwarded to the master region Master region decides tablet splits These details are hidden from the application Except for the latency impact!

37 Mastering A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E A 42342 E B 42521 W C 66354 W D 12352 E E 75656 C F 15677 E Tablet master

Bulk Insert/Update/Replace Client Source Data Bulk manager 1.Client feeds records to bulk manager 2.Bulk loader transfers records to SU’s in batches Bypass routers and message brokers Efficient import into storage unit

Bulk Load in YDOT YDOT bulk inserts can cause performance hotspots Solution: preallocate tablets

Index Maintenance How to have lots of interesting indexes and views, without killing performance? Solution: Asynchrony! Indexes/views updated asynchronously when base table updated

SHERPA IN CONTEXT 41

Types of Record Stores Query expressiveness Simple Feature rich Object retrieval Retrieval from single table of objects/records SQL S3 PNUTS Oracle

Types of Record Stores Consistency model Best effort Strong guarantees Eventual consistency Timeline consistency ACID S3 PNUTS Oracle Program centric consistency Object-centric consistency

Types of Record Stores Data model Flexibility, Schema evolution Optimized for Fixed schemas CouchDB PNUTS Oracle Consistency spans objects Object-centric consistency

Types of Record Stores Elasticity (ability to add resources on demand) Inelastic Elastic Limited (via data distribution) VLSD (Very Large Scale Distribution /Replication) Oracle PNUTS S3

Data Stores Comparison User-partitioned SQL stores Microsoft Azure SDS Amazon SimpleDB Multi-tenant application databases Salesforce.com Oracle on Demand Mutable object stores Amazon S3 Versus PNUTS More expressive queries Users must control partitioning Limited elasticity Highly optimized for complex workloads Limited flexibility to evolving applications Inherit limitations of underlying data management system Object storage versus record management

Application Design Space Records Files Get a few things Scan everything Sherpa MObStor Everest Hadoop YMDB MySQL Filer Oracle BigTable 47

Alternatives Matrix Elastic Operability Global low latency Availability Structured access Sherpa Y! UDB MySQL Oracle HDFS BigTable Dynamo Updates Cassandra Consistency model SQL/ACID 48

Further Reading Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008) Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008) Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni Asynchronous View Maintenance for VLSD Databases, Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu Ramakrishnan SIGMOD 2009 (to appear) Cloud Storage Design in a PNUTShell Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava Beautiful Data, O’Reilly Media, 2009 (to appear)

QUESTIONS? 50

Hadoop

Problem How do you scale up applications? Run jobs processing 100’s of terabytes of data Takes 11 days to read on 1 computer Need lots of cheap computers Fixes speed problem (15 minutes on 1000 computers), but… Reliability problems In large clusters, computers fail every day Cluster size is not fixed Need common infrastructure Must be efficient and reliable

Solution Open Source Apache Project Hadoop Core includes: Distributed File System - distributes data Map/Reduce - distributes application Written in Java Runs on Linux, Mac OS/X, Windows, and Solaris Commodity hardware

Hardware Cluster of Hadoop Typically in 2 level architecture Nodes are commodity PCs 40 nodes/rack Uplink from rack is 8 gigabit Rack-internal is 1 gigabit

Distributed File System Single namespace for entire cluster Managed by a single namenode. Files are single-writer and append-only. Optimized for streaming reads of large files. Files are broken in to large blocks. Typically 128 MB Replicated to several datanodes, for reliability Access from Java, C, or command line.

Block Placement Default is 3 replicas, but settable Blocks are placed (writes are pipelined): On same node On different rack On the other rack Clients read from closest replica If the replication for a block drops below target, it is automatically re-replicated.

How is Yahoo using Hadoop? Started with building better applications Scale up web scale batch applications (search, ads, …) Factor out common code from existing systems, so new applications will be easier to write Manage the many clusters

Running Production WebMap Search needs a graph of the “known” web Invert edges, compute link text, whole graph heuristics Periodic batch job using Map/Reduce Uses a chain of ~100 map/reduce jobs Scale 1 trillion edges in graph Largest shuffle is 450 TB Final output is 300 TB compressed Runs on 10,000 cores Raw disk used 5 PB

Terabyte Sort Benchmark Started by Jim Gray at Microsoft in 1998 Sorting 10 billion 100 byte records Hadoop won the general category in 209 seconds 910 nodes 2 quad-core Xeons @ 2.0Ghz / node 4 SATA disks / node 8 GB ram / node 1 gb ethernet / node 40 nodes / rack 8 gb ethernet uplink / rack Previous records was 297 seconds

Hadoop clusters We have ~20,000 machines running Hadoop Our largest clusters are currently 2000 nodes Several petabytes of user data (compressed, unreplicated) We run hundreds of thousands of jobs every month

Research Cluster Usage

Who Uses Hadoop? Amazon/A9 AOL Facebook Fox interactive media Google / IBM New York Times PowerSet (now Microsoft) Quantcast Rackspace/Mailtrust Veoh Yahoo! More at http://wiki.apache.org/hadoop/PoweredBy

Q&A For more information: Website: http://hadoop.apache.org/core Mailing lists: core-dev@hadoop.apache core-user@hadoop.apache

QUESTIONS? 64

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Similar presentations

Presentation on theme: "Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net.

Similar presentations

Presentation on theme: "Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net."— Presentation transcript:

Similar presentations

About project

Feedback