Automate all the thingS! David Wilde david.wilde@aarnet.edu.au
Disclaimer Description of this talk, as seen on https://tnc18.geant.org/ Automate all the things! AARNet is on a journey. In the beginning, network engineering was done by old-school artisans. Configurations were lovingly hand-crafted. Routers and switches petted and cared for. Loving attention was paid to each device under monitoring. This was fine…. until it wasn’t fine any more. We’re in 2018 now. It turns out that our CloudStor synch&share service got to the point where the number of users shot up from 500 to 50,000 over the course of a couple of years, and we ended up manually configuring hundreds of containers; that was really spreading the love too thin. Our systems have become cattle, not pets. AARNet network engineers are reluctantly handing in their CLI licences and learning to navigate Ansible + python + Git + Jenkins. Operator error is reduced, time to deployment is minimised, troubleshooting tools are improved, integration with applications is achieved. It all sounds wonderful. But - it’s still a journey. What have we achieved to date? What has turned out to be harder than expected? All shall be revealed… © AARNet Pty Ltd |
I LIED: NOT ALL SHALL BE REVEALED Disclaimer Description of this talk, as seen on https://tnc18.geant.org/ Automate all the things! AARNet is on a journey. BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH BLAH All shall be revealed… I LIED: NOT ALL SHALL BE REVEALED © AARNet Pty Ltd |
Why automate?
'cos your boss told you to? https://www.cio.com/article/3124505/big-data/september-2016-big-data-on-campus.html © AARNet Pty Ltd |
Really why?
Why automate? Number of services in 2014 © AARNet Pty Ltd |
Why automate? Number of services in 2016 © AARNet Pty Ltd |
Why automate? Number of services in 2017 © AARNet Pty Ltd |
So, why?
Because we’re LAZY Why: really Don’t want to do anything twice Don’t want to deal with the manual effort coming from operational inconsistencies Don’t want to fix problems due to human error (And because it’s fun) © AARNet Pty Ltd |
How
(Although not always technically hard.) How Warning - unfortunate truth follows: Automation is hard. (Although not always technically hard.) Cultural change: no longer touching the router Skills change: network engineer >> coder Organisational change: agile, scrum, devops, SRE © AARNet Pty Ltd |
Automation – Initial (non-destructive) steps Document your procedures and business processes Verify these processes. Verify again. Audit the network against your source of truth. Start small. Be willing to change Grey Box Monitoring & Analysis Is your source of truth valid? Or is the network the source of truth? Alarm in case of mismatch! Start thinking like a coder. Run regression checks. Check pre- and post- maintenance windows Do you monitor DOM statistics on an interface today? Do you know what type of Optic is in a port so you know if the DOM stats are within spec or not? Do you know if your firewall rules for protecting your RE are what you expect? Do you know if they are applied on the Lo0 interface? Do you know if your CoS/QoS policy is applied correctly? Do you know if your interface IPv4/IPv6 have correct forward and reverse entries in DNS? Do you know when your box has produced a core-dump? Do you check daily or do you only look when there is a fault that you saw? © AARNet Pty Ltd |
“Network services audit tool” Smart Small “Network services audit tool” Note – a network engineer would have produced a wall of ascii, fed by the command line… © AARNet Pty Ltd |
Grey Box Monitoring & Analysis Next Level 31337 Grey Box Monitoring & Analysis © AARNet Pty Ltd |
Grey Box Monitoring & Analysis 1. Build a Virtual model Grey Box Monitoring & Analysis http://www.eve-ng.net/ © AARNet Pty Ltd |
Grey Box Monitoring & Analysis 2. Build a physical lab Grey Box Monitoring & Analysis http://www.flickriver.com/photos/anachrocomputer/3080420597/ © AARNet Pty Ltd |
Grey Box Monitoring & Analysis 3. Build UNIT TESTS Grey Box Monitoring & Analysis https://aws.amazon.com/blogs/mobile/automated-device-testing-with-aws-device-farm-and-jenkins/ © AARNet Pty Ltd |
4. The holy grail: CI-CD © AARNet Pty Ltd |
Grey Box Monitoring & Analysis Useful links Virtual testbed EVE-NG (http://www.eve-ng.net) VRNetLab (https://github.com/plajjan/vrnetlab) VIRL (http://virl.cisco.com) Wistar (https://github.com/Juniper/wistar) GNS3 (http://www.gns3.com) CI/CD Jenkins (https://jenkins.io/) Travis (https://travis-ci.org/) TeamCity (http://www.jetbrains.com/teamcity/) Automation Ansible (https://www.ansible.com/) SaltStack (https://saltstack.com/) Puppet (https://puppet.com/) Chef (https://www.chef.io/) Juniper PyEZ (https://github.com/Juniper/py-junos-eznc) VMX (https://www.juniper.net/us/en/products-services/routing/mx-series/vmx/) Grey Box Monitoring & Analysis © AARNet Pty Ltd |
Grey Box Monitoring & Analysis Join the conversation Slack channel for NRENs: https://nren.slack.com/ Grey Box Monitoring & Analysis © AARNet Pty Ltd |
Thank you David Wilde david.wilde@aarnet.edu.au