TOTAL 23 SLIDES BELOW The network is Reliable An informal survey of real-world communications failures BY PETER BAILIS AND KYLE KINGSBURY.

Slides:



Advertisements
Similar presentations
Networks & Communications
Advertisements

Distributed Data Processing
M A Wajid Tanveer Infrastructure M A Wajid Tanveer
Traffic Engineering with Forward Fault Correction (FFC)
Chapter 4 Infrastructure as a Service (IaaS)
1 Experimental Study of Internet Stability and Wide-Area Backbone Failure Craig Labovitz, Abha Ahuja Merit Network, Inc Presented by Changchun Zou.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
Lab Practical 1 Study about different types of networks
Network and Server Basics. 6/1/20152 Learning Objectives After viewing this presentation, you will be able to: Understand the benefits of a client/server.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 13: Troubleshoot TCP/IP.
CS 142 Lecture Notes: DatacentersSlide 1 Google Datacenter.
Data Center Basics (ENCS 691K – Chapter 5)
Hardware & Software Needed For LAN and WAN
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
Network Topology. Cisco 2921 Integrated Services Router Security Embedded hardware-accelerated VPN encryption Secure collaborative communications with.
Presented by INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used?
Network Topologies.
Justine Sherry*, Shaddi Hasan*, Colin Scott*, Arvind Krishnamurthy†,
15-1 More Chapter 15 Goals Compare and contrast various technologies for home Internet connections Explain packet switching Describe the basic roles of.
Introduction to IT and Communications Technology Justin Champion C208 – 3292 Ethernet Switching CE
1 Chapter Overview Introduction to network troubleshooting Incident administration Gathering information Possible causes Internet router problem Internet.
Managing Network connections. Network Cabling Ethernet Topology Bus topology – Connects each node in a line – Has no central connection point Star topology.
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
Introductionto Networking Basics By Avinash Kulkarni.
Common Devices Used In Computer Networks
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
15-1 Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources.
Today’s Topics Chapter 8: Networks Chapter 8: Networks HTML Introduction HTML Introduction.
608D CloudStack 3.0 Omer Palo Readiness Specialist, WW Tech Support Readiness May 8, 2012.
Networks CS105. What is a computer network? A computer network is a collection of computing devices that are connected in various ways so that they can.
Network equipment used in a modern office
FireProof. The Challenge Firewall - the challenge Network security devices Critical gateway to your network Constant service The Challenge.
How Networks work?.
Understanding the basics of networking Welcome to the jungle!
OARN Database UPDATE – SEPTEMBER We’re Live – and Testing  The site is up and running in Google’s data centers:  The site has been secured: 
1 Root-Cause VoIP Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions.
3.3 Data Networks. Overview Identify the main differences between LAN and WAN. Identify the advantages of using a network over stand-alone computers.
Networks. What is a computer network? A network is two or more computers that are linked together so that they are able to share resources.
Geo-distributed Messaging with RabbitMQ
1 Wide Area Network Emulation on the Millennium Bhaskaran Raman Yan Chen Weidong Cui Randy Katz {bhaskar, yanchen, wdc, Millennium.
Networking And their components. What is Networking? In it’s simplest term, networking is when two or more things are able to communicate and understand.
ACE in Clouds: Availability Changes Everything Chunming Qiao, IEEE Fellow Computer Science and Engineering, SUNY Buffalo Collaborators: T. Furlani, R.
Network Programming Chapter 1 Networking Concepts and Protocols.
Network Topologies.
CS 142 Lecture Notes: DatacentersSlide 1 Google Datacenter.
Unit 3, Lesson 6 Types of Network Topologies AOIT Computer Networking Copyright © 2008–2013 National Academy Foundation. All rights reserved.
Topic 6, Lesson 3: The Internet Computer Communications and Networking.
Activity 1 5 minutes to discuss and feedback on the following:
Systems Analysis and Design in a Changing World, 6th Edition 1 Chapter 6 - Essentials of Design an the Design Activities.
Mr. Sathish Kumar. M Department of Electronics and Communication Engineering I’ve learned that people will forget what you said, people will forget what.
Understanding IT Infrastructure Lecture 9. 2 Announcements Business Case due Thursday Business Analysis teams have been formed Business Analysis Proposals.
Computer Networks – the basics Week 1 Lesson 1. In this project, you will be learning about the computer networks which we use every day – when we log.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
FatPipe Networks invented the concept of router clustering to make branch office connectivity reliable without BGP Programming FatPipe Networks provides.
By Harshal Ghule Guided by Mrs. Anita Mahajan G.H.Raisoni Institute Of Engineering And Technology.
Networking Revision. Advantages:  Communication (remotely)  Sharing hardware (saves on cost, eg. Printers)  Sharing of data and info (eg. Databases.
Network and Server Basics. Learning Objectives After viewing this presentation, you will be able to: Understand the benefits of a client/server network.
Chen Qian, Xin Li University of Kentucky
Unit 36: Internet Server Management
Unit 3 Virtualization.
CS 142 Lecture Notes: Datacenters
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Chapter 5 Networks Communicating and Sharing Resources
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
Chapter 6 Networks Communicating and Sharing Resources
Introduction There are many situations in which we might use replicated data Let’s look at another, different one And design a system to work well in that.
COS 561: Advanced Computer Networks
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
COS 461: Computer Networks
Presentation transcript:

TOTAL 23 SLIDES BELOW

The network is Reliable An informal survey of real-world communications failures BY PETER BAILIS AND KYLE KINGSBURY

CONTENTS Abstract Various survey reports of network reliability under different circumstance Conclusion

ABSTRACT “The network is reliable.” is a fallacy of distributed computing. The degree of network reliability is critical for systems to function robustly. It is hard to determine the degree of network reliability.

VARIOUS SURVEY REPORTS OF NETWORK RELIABILITY UNDER DIFFERENT CIRCUMSTANCE

LARGE DEPLOYMENTS & ISSUES What are large deployments? Large deployments mean a distributed network system that is run globally having distributed infrastructure with hundreds of thousands of servers. What is serious considered issue in large deployments? Partitions : A network partition refers to the failure of a network device that causes a network to be split

LARGE DEPLOYMENTS & ISSUES(CONTD.) EXAMPLES BEHAVIOR OF NETWORK FAILURE IN MICROSOFT DATACENTERS Average failure rate 5.2 devices/day 40.8 links/day. which causes Avg loss of packets per failure. Avg time to repair is of approximately five minutes Redundancy improves Avg traffic by 43%.

LARGE DEPLOYMENTS & ISSUES(CONTD.) EXAMPLES NETWORK FAILURES IN HP’S MANAGED NETWORKS Analysis of Support ticket data Connectivity-related tickets accounted for 11.4% 14% of which were of the highest priority level 2 hours and 45 minutes for the highest priority tickets and a median duration of 4 hours 18 minutes for all tickets

LARGE DEPLOYMENTS & ISSUES(CONTD.) EXA MPLES FIRST YEAR FOR NEW GOOGLE CLUSTER INVOLVES Five racks were faulty (40–80 machines seeing 50% packet loss) Eight network maintenances (four might cause 30- minute random connectivity losses) Three router failures (have to immediately pull traffic for an hour)

LARGE DEPLOYMENTS & ISSUES(CONTD.) How these companies try to repair network partitions? Google(by Dean): “easy-to use” abstractions PNUTS: Weeker consistency alternatives

DATACENTER NETWORK FAILURES A Datacenter of Google Main factors of Failures : 1)Power failure 2)Misconfiguration 3)Firmware bugs 4)Topology changes 5)Cable damage 6)Malicious traffic

CLOUD NETWORKS What is Cloud Networks? Key issues: 1)Transient latency 2)Dropped packets 3)Full network partitions

CLOUD NETWORKS(CONTD.) When two nodes connected to the internet but unable to see each other? What experience can we learn from this case?

HOST PRVIDERS  Could host providers offer reliable networks? E.g. Freistil IT : a specific data center has50%-100%packet loss that leads GlusterFS disturbuted file system to entire split-brain undetected  Why?  What is the main issue?

WIDE AREA NETWORKS(WAN) Why WAN failures are particularly interesting? Example: CENIC: Average partition duration(5 years): SRF: 6 mins HRF:8.2 hours  Conclusion: Graceful degradation Under partition or increased Latency is especially important for WAN.

GLOBAL ROUTING FAILURES Can a high level redundancy internet system be safe?  1) Firewall configuration error: e.g CloudFlare  2)Firmware bug: e.g Juniper Networks  3) BGP misconfiguration: e.g Pakistan Telecom

NICS AND DRIVERS Firmware bug: NICs problem e.g. BCM5709 (chip model) Misconfiguration : Drivers problem e.g. bnx2

APPLICATION-LEVEL FAILURES What are the issues causing messages drop ping and delay? 1).Crashes 2). Program errors 3).Scheduler latency 4).Overloaded processes

CONCLUSION Where are the communication failures occur? Processes Servers NICs, switches local and wide area networks Etc.

CONCLUSION(CONTD.) Whether there exist a reliable network? Depends on 1).Cautious engineering 2)Aggressive network advance 3).Lots of investments

CONCLUSION(CONTD.) What can we do ? Consider the risk before a partition occurs.

QUESTIONS TIME ! LOL!

REFERENCES "Physical Network Interface". Microsoft. January 7, Stonebraker, Michael (April 5, 2010). "Errors in Database Systems, Eventual Consistency, and the CAP Theorem". Communications of the ACM CityCloud, 2011; post-mortem/. Davidson, S.B., Garcia-Molina, H. and Skeen, D. Consistency in a partitioned network: A survey. ACM Computing Surveys 17, 3 (1985), 341–370; dl.acm.org/citation.cfm?id=5508.

THANK YOU FOR YOUR PATIENCE