Download presentation
Presentation is loading. Please wait.
Published byLucy Allison Modified over 9 years ago
1
THE NEED FOR CONTEXT 1 Applying Machine Learning to Incident Response Matt Hathaway @TheWay99#MLforIR
2
Who Am I?Who I Am Not. Product manager –fraud prevention to infosec Former math(s) geek Solution skeptic Frequent ranter A data scientist A security practitioner A marketer
3
Then Why Me And This Topic? Confidential and Proprietary 3 Data Science is a very broad field, new for security Machine Learning is currently beloved by InfoSec marketing teams Product managers need to be the realists
4
Today’s Topics De-mystifying the buzzwords Ultimate goals of machine learning in your organisation Domain expertise is a must Context must be applied before the algorithms Significant strides being made (and where) Prompt response seen from attackers
5
DE-MYSTIFYING THE BUZZWORDS
6
Big Data Analytics “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it” — Dan Ariely
7
Machine Learning 7
8
Baselining & Anomaly Detection 8
9
Peer Group Analysis 9
10
Artificial Intelligence 10
11
WHY? 11 What information is going to be produced? Why is it important to the security team? How easily can it be explained? Who will be capable of digesting and acting upon it?
12
ULTIMATE GOALS OF ML IN YOUR ORGANISATION
13
Do You Have Data Scientists On Staff? Yes. –You need data and a toolkit –Unfinished results are okay –Unsupervised learning helps –Alerting data scientists is okay No. –You need simple results –Value is prioritisation –Key question: “Is this normal?” –Just fewer alerts please
14
Data Scientists On Staff - Unsupervised Confidential and Proprietary 14 Purpose: Reveal hidden info No target variable Learns patterns that segregate data into groups/clusters “Discovered” groups reveal hidden structure in data
15
15
16
Data Scientists On Staff - Results Confidential and Proprietary 16 A lot of dead-ends (but that’s okay) Reveals extensive misconfigurations and unpredictable behaviour Identifies valuable areas to explore deeper with the experts Leads to supervised learning algorithms the security team can use
17
No DS Staff – Simple Results Confidential and Proprietary 17 Purpose: Make predictions Known target variable Learns patterns corresponding to target values Makes predictions on new data (blind to actual outcomes)
18
No DS Staff – “Is This Normal?” (Group) Confidential and Proprietary 18 Anomalous findings in a specific data set (example: DNS)
19
No DS Staff – “Is This Normal?” (Asset) Confidential and Proprietary 19 Basic counting has an immense value Live Security Platinum –1 instance dwm –1,200 instances
20
No DS Staff – “Is This Normal”? (User) Confidential and Proprietary 20 Stray from the baseline for a specific individual
21
No DS Staff – “Is This Normal?” Confidential and Proprietary 21 Viewing the Anomalies and Commonalities with the alerts –Same asset has a unique process running –Large spike in firewall traffic from primary user 12 hours earlier
22
No DS Staff – Rare ‘Alert’ Use Case Confidential and Proprietary 22 Premise 1: Malicious links have something(s) in common Premise 2: The commonalities are hard to spot with the naked eye –http://hometrendsdinnerware.org/wp-settings/image.htm –http://agorregi.com/main/secure/dropbox/login/ –http://realenergy.ro/fonts/update.htmy/image.htm –http://daviddavis.es/wp-content/gdoc/index.html Premise 3: Given enough data, algorithms can find these hidden common factors and learn to separate good links from bad ones –This is machine learning!
23
3 Most Important Factors In “ML Solution” 23 1. Implemented the analysis techniques without bias 2. Understand the domain enough to understand the data 3. Combines techniques and adds context to quickly explain results
24
DOMAIN EXPERTISE
25
Interesting… Useless… 25
26
Not All Data Is Relevant Confidential and Proprietary 26
27
An Absolute Must 27
28
APPLYING CONTEXT BEFORE MATH(S)
29
Trading Noise 29
30
Context to Understand 30 WHO: John Hand, cloud operations WHERE: Primary asset ‘mac-7345’ WHAT: Massive spike in firewall traffic to AWS WHEN: Friday, 23 rd February
31
Root Cause… Sooner Confidential and Proprietary 31 1. Data was generated –Understand why it was generated –Automate the explanation 2. Analyse the root cause instead of data –You shouldn’t have to make sense of a raw log line –If you know what was actually done, you can decide if it was misuse/abuse
32
SIGNIFICANT STRIDES
33
Gartner: UBA (or UEBA) Definition: User and entity behaviour analytics (UEBA) evaluates the activity of users and other entities (for example, applications, IP addresses, devices and networks) in combination with resource access to discover security infractions. UEBA profiles the behaviour of individuals, groups of individuals and, optionally, other entities (for example, devices) to discover malicious behaviour. “It achieves a better signal-to-noise ratio than security information and event management (SIEM) or data loss prevention (DLP)…”
34
But It Will Take Time…
35
Still Early Days Confidential and Proprietary 35 Look out for solutions: –Promising too much – “works for credit card fraud and insider threats” –Lacking focus – “identifies anomalies in any data set” –Selling on buzzwords – “Big Data anomaly detection approach using identity as a threat surface along with contextual access, intelligent security analytics…” Run a POC –Your team needs to see it analyse your data
36
ATTACKER RESPONSE
37
Just Don’t Rely Too Heavily On ML
38
The Importance of The Feedback Loop 38
39
Remember to always ask “WHY?” 39 What information is going to be produced? Why is it important to the security team? How easily can it be explained? Who will be capable of digesting and acting upon it?
40
THANK YOU MATTHEW_HATHAWAY@RAPID7.COM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.