Network Automation Albert Greenberg, Nick Feamster, Richard Mortier, Mark Poepping, Lun Li, Sharad Agarwal, Changhoon Kim, Ramveer Chandra, et al.

Slides:



Advertisements
Similar presentations
Generating Ideas #1: Research Patterns
Advertisements

Data Mining Challenges for Network Management Nick Feamster, Georgia Tech Dave Andersen, CMU (joint with Jay Lepreau and Emulab)
Challenges in Making Tomography Practical
Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)
Steve Lewis J.D. Edwards & Company
IP ADDRESS MANAGEMENT [IPAM]
KAIS T The Vision of Autonomic Computing Jeffrey O. Kephart, David M Chess IBM Watson research Center IEEE Computer, Jan 발표자 : 이승학.
 Cyber Ecosystem & Data Security Subhro Kar CSCE 824, Spring 2013 University of South Carolina, Columbia.
The Connectivity and Fault-Tolerance of the Internet Topology
The ABC and CDA of DevOps! Faraz Syed, Vice President of Engineering Checkpoint Technologies Inc.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Keeping our websites running - troubleshooting with Appdynamics Benoit Villaumie Lead Architect Guillaume Postaire Infrastructure Manager.
® IBM Software Group © 2006 IBM Corporation Rational Software France Object-Oriented Analysis and Design with UML2 and Rational Software Modeler 04. Other.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
The Business Value of CA Solutions Ovidiu VALEANU Senior Consultant DNA Software – CA Regional Representative.
Towards Energy Efficient Hadoop Wednesday, June 10, 2009 Santa Clara Marriott Yanpei Chen, Laura Keys, Randy Katz RAD Lab, UC Berkeley.
1 DCS860A Emerging Technology Physical layer transparency in Cloud Computing (rev )
1 Software Testing and Quality Assurance Lecture 33 – Software Quality Assurance.
Towards Energy Efficient MapReduce Yanpei Chen, Laura Keys, Randy H. Katz University of California, Berkeley LoCal Retreat June 2009.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
Critical Care Bioinformatics at Columbia University Medical Center J. Michael Schmidt, PhD Neurological Institute of New York Columbia University College.
SIM5102 Software Evaluation
1 Design and implementation of a Routing Control Platform Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, Jacobus van der.
Failure Avoidance through Fault Prediction Based on Synthetic Transactions Mohammed Shatnawi 1, 2 Matei Ripeanu 2 1 – Microsoft Online Ads, Microsoft Corporation.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 14: Troubleshooting Windows Server 2003 Networks.
Chapter 1 Introduction to Databases
The Northwestern Mutual Life Insurance Company – Milwaukee, WI Application Monitoring Jeremy Kalsow.
© 2009 Cisco Systems, Inc. All rights reserved. SWITCH v1.0—5-1 Implementing a Highly Available Network Understanding High Availability.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Basic Concepts The Unified Modeling Language (UML) SYSC System Analysis and Design.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
by Marc Comeau. About A Webmaster Developing a website goes far beyond understanding underlying technologies Determine your requirements.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Version 4.0. Objectives Describe how networks impact our daily lives. Describe the role of data networking in the human network. Identify the key components.
An automated diagnostic system to streamline DSM project maintenance Johan du Plessis 15 August 2012.
& Dev Ops. Sherwin-Williams & DevOps Introduction to Sherwin-Williams.
Strategy comprises… A diagnosis –that defines the nature of the challenge i.e., Framing A guiding policy –for dealing with the challenge Vision, mission.
Unit 8b Troubleshooting; Maintenance and Upgrades; Interaction with Vendors, Developers, and Users Component 8 Installation and Maintenance of Health IT.
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
Towards a Well-Managed Next Generation Internet! Hot Research Topics in Next Generation Internet Panel NY Systems/Networking Summit, NYU Aman Shaikh AT&T.
1 Makes Mobile WiMAX Simple Netspan Overview Andy Hobbs Director, Product Management 5 th October 2007.
1 Theme 2: Thinking Like a Tester, Continued. 2 Thinking Like a Tester Lesson 20: “Testing requires inference, not just comparison of output to expected.
NATIONAL M&E PLANNING AND LESSONS LEARNED. WHAT’S M&E?  Let’s keep our definition practical – are we:  Doing the right thing?  Doing it right?  Doing.
Installation and Maintenance of Health IT Systems Unit 8b Troubleshooting; Maintenance and Upgrades; and Interaction with Vendors, Developers, and Users.
Sensors and Control Applications 7 Rivers Robotics Coalition December, 2015 D. Foye.
Bookshelf APWA BookstoreFree Online From APWA “Public Works Management Practices Manual,” Version 7.
Configuring Debugging as Search: Finding the Needle in the Haystack Andrew Whitaker, Richard S. Cox and Steven D. Gribble. University of Washington Presented.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
Complex Systems Workshop, September 20-21, 2012 Evaluation of Complex Systems J. Bryan Lyles Program Director CISE/CNS.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Self-Managing Networks Summit1 Managing Enterprise Networks: How do we reduce IT costs? June 2, 2005.
AUTONOMIC COMPUTING B.Akhila Priya 06211A0504. Present-day IT environments are complex, heterogeneous in terms of software and hardware from multiple.
«My future profession»
PREPARED BY G.VIJAYA KUMAR ASST.PROFESSOR
MOBILE NETWORKS DISASTER RECOVERY USING SDN-NFV
Predicting Interface Failures For Better Traffic Management.
An Open Source Project Commonly Used for Processing Big Data Sets
Leverage What’s Out There
Microsoft SharePoint Server 2016
COS 518: Advanced Computer Systems Lecture 11 Michael Freedman
Sivaram kishan A, Consultant
The Vision of Autonomic Computing
ABHISHEK SHARMA ARVIND SRINIVASA BABU HEMANT PRASAD 08-OCT-2018
COS 518: Advanced Computer Systems Lecture 12 Michael Freedman
EdgeData & Analytics “Big Data” and “Data Analytics” are broad industry terms, with specificity associated with a given area of study or application.
Presentation transcript:

Network Automation Albert Greenberg, Nick Feamster, Richard Mortier, Mark Poepping, Lun Li, Sharad Agarwal, Changhoon Kim, Ramveer Chandra, et al.

2 What is network automation? The performance of the following network tasks with minimal human involvement: –Provisioning –Detection –Diagnosis –Remediation Corollary: Humans become involved with network operation at higher levels (i.e., not repeatedly doing the same painful tasks)

3 Some Questions Why automate? What to automate? (desired end states) How do we get there? Robotize current methodology, or rethink? Self-correction (like biological systems, e.g., DNA) What are the roadblocks? Are our network element building blocks and their behavior fit for automation? Big guard rails?

4 Why Automate? Human cost –Are we talking about making operators redundant? –No…it’s more about automating folklore? –Care costs >> Ops costs, so self-help >> self-managing? Reliability!!! –Continuous high quality service – very high availability –Faster detection, remediation, etc. Scale!!! –How else to keep up with feature creep? –“Every case is a special case” (we don’t really believe this)

5 What to Automate? Proactive Piece –Is-ness spec driving automation? Reactive Piece –Detection (See) Possible to monitor and detect network problems? What data sets are needed? How to do correlation of those datasets? (metadata) The role of detection vs. statistical analysis –Diagnosis (Know) Again, what data needs to be collected to make this possible Stat based vs model based? –Remediation (Restore) Do we want automated scripts How far along this spectrum to go? (Many answers.)

6 Vision Network operators plug in boxes, and walk away…sort of –A small set of policies trigger programs which write programs which write programs which … realizes the network –A small set of probes provide all measurements and event collection/ correlation needed to support internal metrics and external SLAs Knowledge database –Operators become specialists: forensics, software development, etc. (operation at a higher level, less fire-fighting) Caveat: there will always be a need for amazing people, but doing more introspective work: (design, test, certification... and … automation over-ride when needed)

7 Roadblocks Cost Complexity Data Knowledge Human factors

8 Obstacle 1: Cost Automation costs money and time –Worth detecting if there’s nothing to do about it? –Worth automating if the operation only happens once? Alternate solution 1: Monkeys –At what point is it time to automate the corner case Alternate solution 2: Overprovision –Perhaps we can ride out the storm… (or expect failures and design low cost systems so that they don’t really matter) –Server community has seen that repeatable simple components + software can provide both very low cost and resilient whole (e.g., Google switching and computing platform)

9 Obstacle 2: Complexity How to manage it? –Dummy boxes and lots of wires/stitching –Monolithic box with complexity in configuration Fewer types of boxes, templates, ways to do essentially the same thing? –Coke’s network vs Pepsi’s network?

10 Obstacle 3: Data Lots of inputs –Topology –Configuration –Fault events (measured and logged) –Performance events (e.g., active measurements) –Version numbers –Fiber mappings Metadata Crucial! Version numbers, gaps in data collection, collection method, staleness… If this data goes inconsistent, big surprises! Challenges –Correlation what to do when data isn’t correlated? –Privacy and sharing issues

11 Obstacle 4: Human Nature/Corner Cases Operators are used to touching routers Automation effectively adds a “shim” Humans will likely want a way to bypass the configuration database –How to maintain consistency between human tweakage and the database? –How to evolve the automation database? (when does a corner case become “normal”)