Testing in Production Key to Data Driven Quality #GHC14 Testing in Production Key to Data Driven Quality Jyoti Jacob Senior Software Engineer- Microsoft 10/9/2014 2014
Agenda Why and What is Testing in Production? Types Analytics Vs Synthetics Flighting and Experimentation TiP in other companies My Key Takeaways
Traditional Software Development Scrub dates Longer Deployment cycles Credits: Patrick Patterson Director of Test, APPs, Microsoft
Challenges of Service Meet Availability SLA Good customer experience Faster detection Faster Recovery Failure not option but inevitable Unpredictable user interactions Environment and partner dependencies Agility = faster deployment Learn from service Easier to deploy Easier to get to the customer Telemetry Data Production difficult to mimic
Development in Service Design Scenarios Metrics Experimentation Develop Implementation Validation Deploy TiP Flighting Evaluate Analyze Adjust Data Driven Decision Continuous Development Faster Deployment Availability with agility Each feature is released separately. Credits: Patrick Patterson Director of Test, APPs, Microsoft
Testing in Production(TiP) Testing in production (TiP) is a set of software methodologies that derive quality assessments not from test results run in a lab but from where your services actually run – in production. Seth Eliot, Principal Knowledge Engineer, Microsoft
Types of Testing
Robot initiated actions for availability Synthetics Robot initiated actions for availability Goal is Availability Simulate customer scenarios Should trigger alerts on failure. E.g. active monitoring, customer simulation, fault injection etc. Useful for maintaining SLA Alert Probe Animation of alert and tag together Triggered few mins Production synthetic transaction HomePage availability drops to 4% in content farm <farm info>
Data driven validation Analytics e.g. API failed for specific locale on unknown browser Data driven validation Data analysis and alerts based on conditions. Real users mostly == varying actions. Measure true customer experience Analyzing logs and data Alerts- not always urgent. Could be based on threshold or occurring over a period or condition. PII scrubbed logs -Request start -Success -Failure -Perf latency Device image Define machine learning Alert Detect Issues Issues, Perf and Usage Analysis
Analytics Dashboard
API Analysis Example
Feature Flighting Deployment
Deployment (a. k. a feature toggle, a. k Deployment (a.k.a feature toggle, a.k.a deployment does not equate release) If #featureEnabled { Do Something; } You can check-in features to Production but the code path will never be hit if the feature is not enabled. Rethink image Test Production Big Red Switch Pre-Production
Feature Feedback Even during Design
A/B Testing (a.k.a Online Controlled Experimentation) Most of the users get the original experience (control) Some users are offered the new experiences or features (experiment) For success: least # of variables should change. Remove random noise or assumption. Detrimental variable are masked Data shows that when ice cream consumption increases , drowing increases
TiP in Other Companies Data driven quality “Netflix is a log generating company that also happens to streams movies”- Adrian Cockroft A/B and multi-variate testing for Experimentations. Have adopted some form of dark or ramped deployment. Shadowing e.g. Google
My Key Takeaways What not to do Key Learnings Expose PII information Too many synthetic transactions Expose test data to customers. Key Learnings “It is a capital mistake to theorize before one has data” –Sherlock Holmes Engineers need access to data and debug boxes easily but with security considerations. Synthetics + Analytics + measurements against key performance indicators (KPI) = Quality Assessments
Questions @ end of session.
Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org This is the last slide and must be included in the slide deck