THE NEED FOR CONTEXT 1 Applying Machine Learning to Incident Response Matt
Who Am I?Who I Am Not. Product manager –fraud prevention to infosec Former math(s) geek Solution skeptic Frequent ranter A data scientist A security practitioner A marketer
Then Why Me And This Topic? Confidential and Proprietary 3 Data Science is a very broad field, new for security Machine Learning is currently beloved by InfoSec marketing teams Product managers need to be the realists
Today’s Topics De-mystifying the buzzwords Ultimate goals of machine learning in your organisation Domain expertise is a must Context must be applied before the algorithms Significant strides being made (and where) Prompt response seen from attackers
DE-MYSTIFYING THE BUZZWORDS
Big Data Analytics “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it” — Dan Ariely
Machine Learning 7
Baselining & Anomaly Detection 8
Peer Group Analysis 9
Artificial Intelligence 10
WHY? 11 What information is going to be produced? Why is it important to the security team? How easily can it be explained? Who will be capable of digesting and acting upon it?
ULTIMATE GOALS OF ML IN YOUR ORGANISATION
Do You Have Data Scientists On Staff? Yes. –You need data and a toolkit –Unfinished results are okay –Unsupervised learning helps –Alerting data scientists is okay No. –You need simple results –Value is prioritisation –Key question: “Is this normal?” –Just fewer alerts please
Data Scientists On Staff - Unsupervised Confidential and Proprietary 14 Purpose: Reveal hidden info No target variable Learns patterns that segregate data into groups/clusters “Discovered” groups reveal hidden structure in data
15
Data Scientists On Staff - Results Confidential and Proprietary 16 A lot of dead-ends (but that’s okay) Reveals extensive misconfigurations and unpredictable behaviour Identifies valuable areas to explore deeper with the experts Leads to supervised learning algorithms the security team can use
No DS Staff – Simple Results Confidential and Proprietary 17 Purpose: Make predictions Known target variable Learns patterns corresponding to target values Makes predictions on new data (blind to actual outcomes)
No DS Staff – “Is This Normal?” (Group) Confidential and Proprietary 18 Anomalous findings in a specific data set (example: DNS)
No DS Staff – “Is This Normal?” (Asset) Confidential and Proprietary 19 Basic counting has an immense value Live Security Platinum –1 instance dwm –1,200 instances
No DS Staff – “Is This Normal”? (User) Confidential and Proprietary 20 Stray from the baseline for a specific individual
No DS Staff – “Is This Normal?” Confidential and Proprietary 21 Viewing the Anomalies and Commonalities with the alerts –Same asset has a unique process running –Large spike in firewall traffic from primary user 12 hours earlier
No DS Staff – Rare ‘Alert’ Use Case Confidential and Proprietary 22 Premise 1: Malicious links have something(s) in common Premise 2: The commonalities are hard to spot with the naked eye – – – – Premise 3: Given enough data, algorithms can find these hidden common factors and learn to separate good links from bad ones –This is machine learning!
3 Most Important Factors In “ML Solution” Implemented the analysis techniques without bias 2. Understand the domain enough to understand the data 3. Combines techniques and adds context to quickly explain results
DOMAIN EXPERTISE
Interesting… Useless… 25
Not All Data Is Relevant Confidential and Proprietary 26
An Absolute Must 27
APPLYING CONTEXT BEFORE MATH(S)
Trading Noise 29
Context to Understand 30 WHO: John Hand, cloud operations WHERE: Primary asset ‘mac-7345’ WHAT: Massive spike in firewall traffic to AWS WHEN: Friday, 23 rd February
Root Cause… Sooner Confidential and Proprietary Data was generated –Understand why it was generated –Automate the explanation 2. Analyse the root cause instead of data –You shouldn’t have to make sense of a raw log line –If you know what was actually done, you can decide if it was misuse/abuse
SIGNIFICANT STRIDES
Gartner: UBA (or UEBA) Definition: User and entity behaviour analytics (UEBA) evaluates the activity of users and other entities (for example, applications, IP addresses, devices and networks) in combination with resource access to discover security infractions. UEBA profiles the behaviour of individuals, groups of individuals and, optionally, other entities (for example, devices) to discover malicious behaviour. “It achieves a better signal-to-noise ratio than security information and event management (SIEM) or data loss prevention (DLP)…”
But It Will Take Time…
Still Early Days Confidential and Proprietary 35 Look out for solutions: –Promising too much – “works for credit card fraud and insider threats” –Lacking focus – “identifies anomalies in any data set” –Selling on buzzwords – “Big Data anomaly detection approach using identity as a threat surface along with contextual access, intelligent security analytics…” Run a POC –Your team needs to see it analyse your data
ATTACKER RESPONSE
Just Don’t Rely Too Heavily On ML
The Importance of The Feedback Loop 38
Remember to always ask “WHY?” 39 What information is going to be produced? Why is it important to the security team? How easily can it be explained? Who will be capable of digesting and acting upon it?
THANK YOU