CS 142 Lecture Notes: DatacentersSlide 1 Google Datacenter.

Slides:



Advertisements
Similar presentations
"Failure is not an option. It comes bundled with your system.“ (--unknown)
Advertisements

‘s Overload Tolerant Design Exacerbates Failure Detection and Recovery Florin Dinu T. S. Eugene Ng Rice University.
By Taylor Gardner  This is a proposal for a computer Network spanning 3 buildings. The equipment selected is for a typical network for a small to.
Networks and Distributed Systems: Project Ideas
Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2010.
High Performance Networking with Little or No Buffers Yashar Ganjali High Performance Networking Group Stanford University
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
1 Circuit Switching in the Core OpenArch April 5 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
CS 142 Lecture Notes: DatacentersSlide 1 Google Datacenter.
CS 142 Lecture Notes: Large-Scale Web ApplicationsSlide 1 RAMCloud Overview ● Storage for datacenters ● commodity servers ● GB DRAM/server.
Datacenter Networks Mike Freedman COS 461: Computer Networks
Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
1 CS : Technology Trends Ion Stoica ( September 12, 2011.
Event Viewer Was of getting to event viewer Go to –Start –Control Panel, –Administrative Tools –Event Viewer Go to –Start.
CERN openlab Open Day 10 June 2015 KL Yong Sergio Ruocco Data Center Technologies Division Speeding-up Large-Scale Storage with Non-Volatile Memory.
Computer communication
What We Have Learned From RAMCloud John Ousterhout Stanford University (with Asaf Cidon, Ankita Kejriwal, Diego Ongaro, Mendel Rosenblum, Stephen Rumble,
Challenges of Storage in an Elastic Infrastructure. May 9, 2014 Farid Yavari, Storage Solutions Architect and Technologist.
Components of Windows Azure - more detail. Windows Azure Components Windows Azure PaaS ApplicationsWindows Azure Service Model Runtimes.NET 3.5/4, ASP.NET,
RAMCloud: Concept and Challenges John Ousterhout Stanford University.
Microsoft Azure Virtual Machines. Networking Compute Storage Virtual Machine Operating System Applications Data & Access Runtime Provision & Manage.
Harness Your Internet Activity. Drilling down into DNS DDoS Data Amsterdam, May 2015 Ralf Weber.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University (Joint work with Diego Ongaro, Ryan Stutsman, Steve Rumble, Mendel.
CMPTR Chapter 2 – Part 2 (Storage). Storage – Punch Cards Player Piano Roll Punch Cards were used before disk drives.
3/31/121 Practicing Safe Computing Brian Cox 3/31/12.
Day 20. Network/System Administrator Plenty of jobs –Not likely to be outsourced because you need physical access to the machines Challenging job –New.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Module 1: Installing and Configuring Servers. Module Overview Installing Windows Server 2008 Managing Server Roles and Features Overview of the Server.
Firewalls Nathan Long Computer Science 481. What is a firewall? A firewall is a system or group of systems that enforces an access control policy between.
Low-Latency Datacenters John Ousterhout Platform Lab Retreat May 29, 2015.
RAMCloud: Low-latency DRAM-based storage Jonathan Ellithorpe, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro,
RAMCloud: Scalable High-Performance Storage Entirely in DRAM John Ousterhout Stanford University (with Christos Kozyrakis, David Mazières, Aravind Narayanan,
Lecture 1 Page 1 CS 239, Fall 2010 Distributed Denial of Service Attacks and Defenses CS 239 Advanced Topics in Computer Security Peter Reiher September.
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
Company LOGO “ALEXANDRU IOAN CUZA” UNIVERSITY OF IAŞI” Digital Communications Department Status of RO-16-UAIC Grid site in 2011 System manager: Pînzaru.
CSE 451: Operating Systems Spring 2013 Module 26 Cloud Computing Ed Lazowska Allen Center 570 © 2013 Gribble, Lazowska, Levy,
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
1 After completing this lesson, you will know: What computers need to operate—basic hardware and peripheral devices Why and how to protect your computer.
NMS LAB1 ● Mikko Suomi LABS ● 4 –labs + one extra time to do missed labs ● 3 x 45 min ● Max 4 persons in a group ● RETURN LABS TO.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Gossiping Steve Ko Computer Sciences and Engineering University at Buffalo.
IT Professionals David Tesar | Microsoft Technical Evangelist David Aiken | Microsoft Group Technical Product Manager 07 | High Availability and Load Balancing.
CSE 451: Operating Systems Autumn 2010 Module 25 Cloud Computing Ed Lazowska Allen Center 570.
Distributed Systems CS Consistency and Replication – Part I Lecture 10, September 30, 2013 Mohammad Hammoud.
ACE in Clouds: Availability Changes Everything Chunming Qiao, IEEE Fellow Computer Science and Engineering, SUNY Buffalo Collaborators: T. Furlani, R.
TOTAL 23 SLIDES BELOW The network is Reliable An informal survey of real-world communications failures BY PETER BAILIS AND KYLE KINGSBURY.
Slammer Worm By : Varsha Gupta.P 08QR1A1216.
MICROSOFT TESTS /291/293 Fairfax County Adult Education Courses 1477/1478/1479.
1 Chapter Overview Networking requirements Network types and topologies Network cabling Local area network (LAN) communication Maintaining and troubleshooting.
John Ousterhout Stanford University RAMCloud Overview and Update SEDCL Retreat June, 2013.
ICC Module 3 Lesson 3 – Storage 1 / 4 © 2015 Ph. Janson Information, Computing & Communication Storage – Clip 0 – Introduction School of Computer Science.
Information Systems CS-507 Lecture 32. Physical Intrusion The intruder could physically enter an organization to steal information system assets or carry.
RAMCloud and the Low-Latency Datacenter John Ousterhout Stanford Platform Laboratory.
Cloud Computing Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington August 2012.
NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.
Return of the wall of fire
CIS 700-5: The Design and Implementation of Cloud Networks
CS 142 Lecture Notes: Datacenters
Cloud Computing Ed Lazowska August 2011 Bill & Melinda Gates Chair in
مقدمة في الحاسب الآلي T. Arwa Alsarami.
COMPUTER MEMORY & DATA STORAGE
COMPUTER MEMORY & DATA STORAGE
CS 140 Lecture Notes: Technology and Operating Systems
RAMCloud Architecture
RAMCloud Architecture
CS 140 Lecture Notes: Technology and Operating Systems
COT 4600 Operating Systems Fall 2010
CS5112: Algorithms and Data Structures for Applications
CS 295: Modern Systems Storage Technologies Introduction
In-network computation
Presentation transcript:

CS 142 Lecture Notes: DatacentersSlide 1 Google Datacenter

CS 142 Lecture Notes: DatacentersSlide 2 Datacenter Organization Rack: ● 50 machines ● DRAM: 300 µs ● Disk: 10ms Single server: ● 8-24 cores ● DRAM: 100ns ● Disk: 2 Row/cluster: ● 30+ racks ● DRAM: 500 µs ● Disk: 3 10ms

CS 142 Lecture Notes: DatacentersSlide 3 Google Containers

CS 142 Lecture Notes: DatacentersSlide 4 Microsoft Containers

CS 142 Lecture Notes: DatacentersSlide 5 Microsoft Containers, cont'd

CS 142 Lecture Notes: DatacentersSlide 6 Failures are Frequent Typical first year for a new datacenter (Jeff Dean, Google):  ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)  ~1 PDU failure (~ machines suddenly disappear, ~6 hours to come back)  ~1 rack-move (plenty of warning, ~ machines powered down, ~6 hours)  ~1 network rewiring (rolling ~5% of machines down over 2-day span)  ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)  ~5 racks go wonky (40-80 machines see 50% packet loss)  ~8 network maintenances (4 might cause ~30-minute random connectivity losses)  ~12 router reloads (takes out DNS and external vips for a couple minutes)  ~3 router failures (have to immediately pull traffic for an hour)  ~dozens of minor 30-second blips for DNS  ~1000 individual machine failures  ~thousands of hard drive failures  Slow disks, bad memory, misconfigured machines, flaky machines, etc.  Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc.

How Many Datacenters? ● 1-10 datacenter servers/human? ● 100,000 servers/datacenter ● 80-90% of general-purpose computing will soon be in datacenters? August 25, 2010RAMCloudSlide 7 U.S.World Servers0.3-3B7-70B Datacenters ,00070, ,000

CS 142 Lecture Notes: Security Attacks: PhishingSlide 8

CS 142 Lecture Notes: DatacentersSlide 9 Sun Containers

CS 142 Lecture Notes: DatacentersSlide 10 Sun Containers, cont'd