Download presentation
Presentation is loading. Please wait.
Published byElinor Gardner Modified over 6 years ago
2
Machine Learning Turbo-Charges the Ops Portion of DevOps
Sampanna Salunke Consulting Member Technical Staff Oracle Management Cloud March, 2017 Confidential – Oracle Internal/Restricted/Highly Restricted
3
This is a Safe Harbor Front slide, one of two Safe Harbor Statement slides included in this template. One of the Safe Harbor slides must be used if your presentation covers material affected by Oracle’s Revenue Recognition Policy To learn more about this policy, For internal communication, Safe Harbor Statements are not required. However, there is an applicable disclaimer (Exhibit E) that should be used, found in the Oracle Revenue Recognition Policy for Future Product Communications. Copy and paste this link into a web browser, to find out more information. For all external communications such as press release, roadmaps, PowerPoint presentations, Safe Harbor Statements are required. You can refer to the link mentioned above to find out additional information/disclaimers required depending on your audience.
4
The Product Area I Work On
Our Vision Complete, integrated suite of systems management solutions Security Monitoring & Analytics Infrastructure Monitoring Orchestration Compliance Application Performance Monitoring Designed for heterogeneous applications and infrastructure Log Analytics IT Analytics Rapid time to value On Premise
5
Program Agenda 1 Defining terms Why Machine Learning is Perfect for (Dev)Ops Making Machine Learning Smarter Q&A 2 3 4
6
Program Agenda 1 Defining terms Why Machine Learning is Perfect for (Dev)Ops Making Machine Learning Smarter Q&A 2 3 4
7
Defining Terms (source: wikipedia.com)
Machine Learning Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data. DevOps DevOps (a clipped compound of "software DEVelopment" and "information technology OPerationS") is a term used to refer to a set of practices that emphasize the collaboration and communication of both software developers and information technology (IT) professionals while automating the process of software delivery and infrastructure changes.
8
Program Agenda 1 Defining terms Why Machine Learning is Perfect for (Dev)Ops Making Machine Learning Smarter Q&A 2 3 4
9
IT Organizations are Drowning in Data
Too many tools Too much data No insight
10
Rate of Change Increasing Due to DevOps Automation
Develop Build Package Deploy Continuous Integration
11
Machine Learning is Perfect for (Dev)Ops
Structured, Time-Series Data User Performance Metrics Server-side Performance Metrics (App & Infrastructure) Configurations Events/Alerts Transaction Payloads Unstructured Text Data Log Records Massive volume Highly patterned Predictable format Exists in identifiable silos Exhibits long-term trends Sources constantly change
12
Machine Learning Powers Oracle Management Cloud
END USER EXPERIENCE Real Users Synthetic Users ✔ Anomaly detection APPLICATION App metrics Transactions ✔ clustering MIDDLE TIER Server metrics Diagnostics Logs ✔ FORECASTING DATA TIER Host metrics VM metrics Container metrics VIRTUALIZATION TIER Unified Platform VM CONTAINER ✔ correlation VM CONTAINER CMDB Tickets Alerts INFRASTRUCTURE TIER
13
Program Agenda 1 Defining terms Why Machine Learning is Perfect for (Dev)Ops Making Machine Learning Smarter Q&A 2 3 4
14
To us, “smarter” means 3 things…
Enhance Algorithms Increase Breadth Increase Depth
15
Threshold Based Alerting is Being Eclipsed
Before you shout at me – threshold based alerting is a must for many situations – especially for user facing application response times (ex. page should always load in less than a second). For everything else, standard was to set thresholds manually or via percentile. Manual is becoming increasingly impractical – what should thresholds be & who is going to do it? Percentile based alerting had its day, but does not scale from an alert volume perspective. If alerts are set at 99.9 percentile, then for 1 million metrics, that is 1000 alerts If those metrics are sampled every 5 minutes, that is 1000 alerts every 5 minutes Or 200 alerts / minute >> NOT OK OMC, and indeed, the industry, is incrementally replacing thresholds with high-low channels that are derived from a time series based model such as Holt Winters.
16
OMC’s Baselining & Anomaly Detection
Begin with the Basics Distribution Based Unseasonal Model Daily + Weekly Additive Holt- Winter Modeling Automatic Season Detection Tune Based on Validation Robust to Sparse Pattern Variability Robust to Small Anomalies Graceful Transition from Daily-to-Weekly Evaluation Model Segmentation Daily seasonality detected. Base lines are wide because metric has a weekly pattern. Weekly seasonality detected and base lines much tighter around the observed values. Anomalies b/c observations higher than expected. CPU Utilization Anomalies b/c observations lower than expected. No seasonality detected. Time
17
9x Improvement in False Positive Rate by Addressing Common Corner Cases
Before: Weekdays and weekends are allowed to be imbalanced. Before: Flagged as an anomaly due to load/measurement variability. Before: Anomalies are out-of-band samples. After: Select days to keep weekday-weekend balance. Graceful Day-to-Week Transition Sparse Pattern Variability After: Computing baselines at higher scale (hourly, configurable) solves this problem. Small Anomalies After: Anomalies are statistically significant out-of-band samples.
18
Scalability Incremental updates to baseline models
Learning algorithms improve with more data. Storing months of data for millions of targets is expensive. Models are updated incrementally, so a model can reflect months of learning even when the actual stored data for a short duration. Segmenting models when evaluating data Testing incoming data for anomalies needs to be fast. To speed up processing, models are cached. But time series models like Holt Winters consume a lot of memory. To reduce memory costs, the model is segmented and only the part of the model required for processing the current time is cached.
19
Baselining Laid Foundation for Early Warning
Forecast Mirrors Baseline when Observations are In Line with Expectation Derivative of Baseline Algorithm Hybrid Long & Short Term Modeling Configurable Horizon & Sensitivity Sensitivity can be Controlled via Confidence Forecast Becomes Baseline + Trend of Errors when Observations Deviate
20
OMC’s Forecasting Capability
Traditional Linear Forecast Begin with the Basics Robust Linear Regression for Unseasonal Automatic Season Detection Tolerance Intervals Tune Based on Validation Season Specific Trending- Uncertainty Regime Change Detection Seasonal Pattern Trending Temporal Weighting OMC
21
2x Improvement in Forecast Accuracy by Addressing Common Corner Cases
Low Seasons: Flat & Predictable High Seasons: Trend & Fluctuate Sparse High Seasons: Flat & Predictable Before: Legacy Linear Fit Season Specific Trending-Uncertainty Regime Change Detection After: Regime Change Identified
22
2x Improvement in Forecast Accuracy by Addressing Common Corner Cases
Before: Un-Weighted Before After: Temporally Weighted After Seasonal Pattern Trending Temporal Weighting
23
Data Unification & Normalization Enables Greater Breadth
Application Performance Monitoring Security Monitoring & Analytics Infrastructure Monitoring Log Analytics Orchestration Compliance Oracle Management Cloud Data Store Norm is repo by repo projects: slow and incremental. By centralizing data, we are able to deliver ML driven features more quickly. Convert to Time Series (Clustering & Rollup) Base Lining & Anomaly Detection IT Analytics
24
Program Agenda 1 Defining terms Why (Dev)Ops is perfect for machine learning Making Machine Learning Smarter Q&A 2 3 4
26
For More Information/Questions
cloud.oracle.com/management community.oracle.com/mgmtcloud #MgmtCloud @OracleMgmtCloud blogs.oracle.com/cloud
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.