1 Sept 7, 2011 COMP6111A Fall 2011 HKUST Lin Gu Cloud Computing Systems.

Slides:

Advertisements

Similar presentations

Ali Ghodsi UC Berkeley & KTH & SICS

Advertisements

Introduction to Data Center Computing Derek Murray October 2010.

Chapter 4 Infrastructure as a Service (IaaS)

The Google Cluster Architecture

The Next I.T. Tsunami Paul A. Strassmann. Copyright © 2005, Paul A. Strassmann - IP4IT - 11/15/05 2 Perspective Months  Weeks.

1 Sept 5, 2011 COMP6111A Fall 2011 HKUST Lin Gu Cloud Computing Systems.

Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

1 Sept 1, 2009 COMP660L Fall 2009 HKUST Lin Gu Topics in Computer and Communication Networks: Cloud Computing.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

2/25/2004 The Google Cluster Architecture February 25, 2004.

City University London

1 COMP6111A Fall 2011 HKUST Lin Gu Cloud Computing Systems.

Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2010.

The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.

Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.

1 Sept 3, 2009 COMP660L Fall 2009 HKUST Lin Gu Topics in Computer and Communication Networks: Cloud Computing.

Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.

Client/Server Architectures

SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.

Word Wide Cache Distributed Caching for the Distributed Enterprise.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

TWO CASES OF CLOUD COMPUTING SOFTWARE AS A SERVICE AND STORAGE AS A SERVICE ECLT 5820 – Distributed System (Group 7) Lin, chen Tso, Sze Hon.

MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.

Data Structures & Algorithms and The Internet: A different way of thinking.

Above the Clouds : A Berkeley View of Cloud Computing

MapReduce M/R slides adapted from those of Jeff Dean’s.

Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,

Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper

CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

VMware vSphere Configuration and Management v6

1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.

By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

Chapter 20 Parallel Sysplex

Data Communications and Networks Chapter 9 – Distributed Systems ICT-BVF8.1- Data Communications and Network Trainer: Dr. Abbes Sebihi.

CPS 49S Google: The Computer Science Within and its Impact on Society Shivnath Babu Spring 2008.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.

Data Centers and Cloud Computing 1. 2 Data Centers 3.

The Google Cluster Architecture Written By: Luiz André Barroso Jeffrey Dean Urs Hölzle Presented By: Omkar Kasinadhuni Simerjeet Kaur.

MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.

BIG DATA/ Hadoop Interview Questions.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2012.

Warehouse Scaled Computers

Chapter 1 Characterization of Distributed Systems

CIS 700-5: The Design and Implementation of Cloud Networks

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

Hadoop Aakash Kag What Why How 1.

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

Large Distributed Systems

CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037

CHAPTER 3 Architectures for Distributed Systems

Cloud Computing Ed Lazowska August 2011 Bill & Melinda Gates Chair in

Be Fast, Cheap and in Control

湖南大学-信息科学与工程学院-计算机与科学系

Learning Google

Ch 4. The Evolution of Analytic Scalability

Distributed File Systems

Internet and Web Simple client-server model

Introduction to MapReduce

Caching 50.5* + Apache Kafka

Presentation transcript:

1 Sept 7, 2011 COMP6111A Fall 2011 HKUST Lin Gu Cloud Computing Systems

2 Internet-Scale Computing We know how to solve “some” problems on a global scale –Example: DNS, MAC and IP assignment, web search, web , … Each web search query essentially involves an Internet of data –Main players: AltaVista, Inktomi, Google –Conservatively assume 20 billion web documents, 4KB/doc  80TB data –“grep” would take more than one day on extremely fast hard drives. Traditional RDB? Probably slower. What if we had only half a second?

3 How to Search for a “Planet”? Luiz Andre Barroso, Jeffrey Dean, Urs Holzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, vol. 23, no. 2, pp , Mar./Apr Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. Above the Clouds: A Berkeley View of Cloud Computing. UC Berkeley Technical Report UCB/EECS , Feb., Birman, K., Chockler, G., and van Renesse, R. Toward a cloud computing research agenda. SIGACT News 40, 2 (Jun. 2009),

4 How are data processed in a datacenter? Let’s look at a working example: the Google search engine Not typical business application, but provides insights

5 How to Search for a “Planet”? The search engine’s mission: Flip through 20 billion documents, locate all the files containing all sensible variants of all keywords, calculate the relevance of all the matches, compute the query-specific representative “excerpt” for every matching document, and sort the resulting 1 million document… all in 0.5 second! And do this times per second for 600 million users around the world! Google search engine –Built on commodity components, searching in less than 0.5 seconds! –Hundreds of engineers, years of hard work, and innovation Luiz Andre Barroso et al. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, vol. 23, no. 2, pp , Mar./Apr. 2003

6 How to Search for a “Planet”? The system builds up from commodity components Hundreds of engineers, years of hard work, and innovation The system must scale –The search-oriented architecture evolves to support new online services such as social network Many parts of the system are different from traditional distributed system solutions –“Compatibility” is a non-goal and non-concern

7 A Closer Look at the Problem Indices –Index the data to transform 80TB raw data to multiple TBs of inverted index –Each query “only” reads hundreds of MBs of data –Results returned for each indexed term are merged and ranked Still a significant computation task –Billions of CPU cycles Must handle thousands of queries per second at peak –Conservatively assume: 1B Internet users, each issuing one search per day  queries per second How many machines do we need? Can we synchronize them? In addition, enormous computation for constructing the index

8 Google’s Cluster Architecture Goals A high-performance distributed system for search –Thousands of machines collaborate to handle the workload Price-performance ratio Scalability Energy efficiency and cooling High availability Luiz Andre Barroso, Jeffrey Dean, Urs Holzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, vol. 23, no. 2, pp , Mar./Apr. 2003

9 Google’s Cluster Architecture Parallelism Crucial to performance (both throughput and latency) Data centric parallelization –MapReduce –Data dependence Goals A high-performance distributed system for search Price-performance ratio Scalability Energy efficiency and cooling High availability

10 Google’s Cluster Architecture Reliability from software Hardware is unreliable commodity PCs –Good for price-performance ratio Reliability from redundancy –Replicate data and functions Automatically handles failure Goals A high-performance distributed system for search Price-performance ratio Scalability Energy efficiency and cooling High availability

11 Query Processing How to serve a query –The browser issues a query –DNS lookup –HTTP handling –GWS –Backend –HTTP response San Jose HTTP London Hong Kong Google.com GWS Backend HTTP Inside data centers

12 Query Processing Query backend and query execution –Index server  Hit lists –Intersection –Calculate relevance scores and rank –Document servers: form title, URL, summary (snippet) –Ancillary tasks (e.g., spelling check) –And ads inserted Question: how many servers would be allocated for the index server conglomerate? How many for document servers, spell checking, etc? Goals A high-performance distributed system for search Price-performance ratio Scalability Energy efficiency and cooling High availability

13 Query Processing Scalable architecture (relate to parallelism) –Data partitioning and replication  Shards and replica –Data (documents, indices) increase  add shards –User base expands  add machines for each shard Question: How about latency? Would latency increase with the multiple-tier query processing? How long is the latency like? Goals A high-performance distributed system for search Price-performance ratio Scalability Energy efficiency and cooling High availability

14 Hardware Based on commodity x86 products Racks of servers –40—80 servers/rack –Each rack has two sides, about 40u/side –Not targeting the top performance servers. “large” (80GB) hard drives Expect servers to work for two or three years

15 Hardware Switches –Each side of a rack has a 100Mbps Ethernet switch that connects to a core gigabit switch via one or two gigabit uplinks –The core gigabit switch connects all racks together Routing Fiber links Today we have 10Gbps switches. How would this change the way we compute?

16 Energy Efficiency Calculation –PC: 90W DC, 120W AC –Rack: 10KW –Power density: 400W/square ft  700W/square ft or more for high-end servers –Typical datacenter’s power density: 150W/squre ft. Solution: cooling and/or additional space Reducing power consumption also lowers operational cost Goals A high-performance distributed system for search Price-performance ratio Scalability Energy efficiency and cooling High availability

17 Availability Fault tolerance –Multiple levels of load balancing, sharding, and replication Disaster recovery –Highly distributed geographically Goals A high-performance distributed system for search Price-performance ratio Scalability Energy efficiency and cooling High availability

18 Summary Review the goals A high-performance distributed system for search –Hardware, networking, parallelization, software Price-performance ratio –Commodity PC servers, software reliability Scalability –Sharding, replication Energy efficiency and cooling High availability –Redundancy, automatic fail over, globally distributed system Goals accomplished?

19 Summary Design for price-performance ratio Data centric parallelization –Abundant thread-level parallelism –Achieves very high throughput and low latency Partition and replicate data and logic –For reliability and performance Multi-level load balancing “Simple” is beautiful Orchestrate global computing resources for global users

20 Questions and Limitations How close are we to a good cloud computing infrastructure? Like any systems, the Google system as described in the paper has limitations Can we improve?

21 Questions and Limitations Update friendliness –The consistency of the system relies on the fact that frequent data accesses (e.g., querying the index servers) are reads Timeliness –Multiple levels of load balancing, sharding, and replication Hardware –Is the current hardware hierarchy the ultimate design for Internet-based computing?

22 Questions and Limitations Architecture –Multiple-issue out-of-order execution is “beyond the point of diminishing return”. What architectural designs can help further enhance the performance? –The paper provides a few speculations Data dependence –The limitation of sharding General review of the design context –Has the design context changed? Perfect solution?

23 Summary The Google search system is a good example of solutions to Internet-scale problems Today, many applications are more complex than search There are many new challenges and opportunities when we gradually implement the idea of cloud computing