Download presentation
Presentation is loading. Please wait.
Published byCory Woods Modified over 6 years ago
1
Using Sequence Statistics to Fight Advanced Persistent Threats
Ted Dunning
2
Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Hashtags today: #hs16dublin #mapr
3
Agenda What’s this persistent threat stuff? Examples
What attackers do How they do it Examples Sequence statistics Really geeking with gas now! Detection techniques Specifics Summary
4
Agenda of All Security Talks
Terror Faint hope More terror Practical suggestions Summary
5
Operation Ababil – Brobots on Parade
Dork attack to find unpatched default Joomla sites Especially web servers with high bandwidth connections Basically just Google searches for default strings Joomla compromised into attack Brobot C&C network checks in occasionally Note C&C is incoming request and looks like normal web requests Later, on command, multiple Brobots direct Gb/s of attack Attacks come from white-listed sites
6
Attack Sequence
7
Attack Sequence
8
Attack Sequence
9
Attack Sequence
10
Outline of an Advanced Persistent Threat
Common use of zero-day for preliminary attacks Often attributed to state-level actors Modern privateers blur the line Persistent Result of first attack is heavily muffled, no immediate exploit Remote access toolset installed (RAT) Threat On command, data is exfiltrated covertly or en masse Or the compromised host is used for other nefarious purpose
11
APT in Summary Attack, penetrate, pivot, exfiltrate or exploit
If you are a high-value target, attack is likely and stealthy High-value = telecom, banks, utilities, retail targets, web100 … and all their vendors Conventional multi-factor auth is easily breached Penetration and pivot are critical counter-measure opportunities In 2010, RAT would contact command and control (C&C) In 2016, C&C looks like normal traffic Once exfiltration or exploit starts, you may no longer have a business
12
So are we totally screwed?
13
So are we totally screwed?
Not entirely!
14
Event Sequences Provide Clues
Event sequence appear in many places Headers Header types, ordering in requests IP address accesses Source and destination, sequences of either TLS options Which options, which values, which algorithms Incoming component request ordering and timing Body first, CSS, scripts and images next But which are cached, what is round-trip time?
15
Sequences and Cooccurrences
All of these characteristics form symbolic sequences Current systems use hand-crafted rules about particular state But hand-crafting depends on human knowledge We can do much, much better by considering cooccurrence and ordering of symbols in these sequences Log-likelihood ratio test (jargon alert) is a key tool
16
A core technique Many of these easy problems reduce to finding interesting coincidences This can be summarized as a 2 x 2 table Actually, many of these tables A Other B k11 k12 k21 k22
17
How do you do that? This is well handled using G-test
See wikipedia See Original application in linguistics now cited > 2000 times Available in ElasticSearch, in Solr, in Mahout Available in R, C, Java, Python
18
Which one is the anomalous co-occurrence?
not A B 13 1000 not B 100,000 A not A B 1 not B 2 A not A B 1 not B 10,000 A not A B 10 not B 100,000
19
Which one is the anomalous co-occurrence?
not A B 13 1000 not B 100,000 A not A B 1 not B 2 0.90 1.95 A not A B 1 not B 10,000 A not A B 10 not B 100,000 4.52 14.3 Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
20
How to Count (header-like documents)
For each “document”: For each “word” A: left[A]++ For each “word” B after that (within window): count[A,B]++ right[B]++ total++
21
We wanted this 2 x 2 table for each A,B
But we only counted k11 directly But we did count k*1 = k11 + k (how many A’s we saw) k1* = k11 + k (how many B’s we saw) k** = k11 + k21 + k12 + k (how many pairs in total) A Other B k11 k12 k21 k22
22
How to Count (continued)
Map<PriorityQueue> queue for each pair (A,B) k11 = count[A,B] k1x = left[A] kx1 = right[B] kxx = total k12 = k1x - k11 k21 = kx2 - k11 k22 = kxx - k11 - k12 - k21 queue.add(A, (LLR(k11,k12,k21,k22), B))
23
How to Count (cooccurrence)
for each (C,B)=(“context”, “word”): if (!filter(C) && !filter(B)): right[B]++ for each A in history(C): count[A,B]++ left[A]++ history(C) += B total++
24
It really can be that simple
Seriously... It really can be that simple
25
Basic techniques Counting – often the hardest part
LLR – the basic tool Order models Ordered cooccurrences Transition probabilities Recurrent neural networks Ploughing a quiet field Reimage servers often Force attackers to pivot repeatedly
26
Example 1 - Ababil Defense has to happen here
27
Spot the Important Difference?
Attacker request Real request
28
Spot the Important Difference?
Attacker request Real request
29
This could only be found at scale
30
Overall Outline Again Tradecraft error!
31
Large corpus analysis of source IP’s wins big
33
Example 2 - Common Point of Compromise
Scenario: Merchant 0 is compromised, leaks account data during compromise Fraud committed elsewhere during exploit High background level of fraud Limited detection rate for exploits Goal: Find merchant 0 Meta-goal: Screen algorithms for this task without leaking sensitive data
34
Example 2 - Common Point of Compromise
Card data is stolen from Merchant 0 That data is used in frauds at other merchants
35
Simulation Setup
36
Simulation Strategy For each consumer Restate data Tunables
Pick consumer parameters such as transaction rate, preferences Generate transactions until end of sim-time If merchant 0 during compromise time, possibly mark as compromised For all transactions, possible mark as fraud, probability depends on history Merchants are selected using hierarchical Pittman-Yor Restate data Flatten transaction streams Sort by time Tunables Compromise probability, transaction rates, background fraud, detection probability
38
Really truly bad guys
39
Historical cooccurrence gives high S/N
40
Summary The world can be seen as sequences of symbols
We can find patterns Those patterns can nail opponents Many patterns only appear at scale You can do this
42
Short Books by Ted Dunning & Ellen Friedman
Published by O’Reilly in 2014 and 2015 For sale from Amazon or O’Reilly Free e-books currently available courtesy of MapR
43
Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today (oops… that was earlier)
44
Thank You!
45
Q & A Engage with us! @mapr maprtech mapr-technologies MapR
maprtech
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.