Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Sequence Statistics to Fight Advanced Persistent Threats

Similar presentations


Presentation on theme: "Using Sequence Statistics to Fight Advanced Persistent Threats"— Presentation transcript:

1 Using Sequence Statistics to Fight Advanced Persistent Threats
Ted Dunning

2 Contact Information Ted Dunning Chief Applications Architect at MapR Technologies Committer & PMC for Apache’s Drill, Zookeeper & others VP of Incubator at Apache Foundation Hashtags today: #hs16dublin #mapr

3 Agenda What’s this persistent threat stuff? Examples
What attackers do How they do it Examples Sequence statistics Really geeking with gas now! Detection techniques Specifics Summary

4 Agenda of All Security Talks
Terror Faint hope More terror Practical suggestions Summary

5 Operation Ababil – Brobots on Parade
Dork attack to find unpatched default Joomla sites Especially web servers with high bandwidth connections Basically just Google searches for default strings Joomla compromised into attack Brobot C&C network checks in occasionally Note C&C is incoming request and looks like normal web requests Later, on command, multiple Brobots direct Gb/s of attack Attacks come from white-listed sites

6 Attack Sequence

7 Attack Sequence

8 Attack Sequence

9 Attack Sequence

10 Outline of an Advanced Persistent Threat
Common use of zero-day for preliminary attacks Often attributed to state-level actors Modern privateers blur the line Persistent Result of first attack is heavily muffled, no immediate exploit Remote access toolset installed (RAT) Threat On command, data is exfiltrated covertly or en masse Or the compromised host is used for other nefarious purpose

11 APT in Summary Attack, penetrate, pivot, exfiltrate or exploit
If you are a high-value target, attack is likely and stealthy High-value = telecom, banks, utilities, retail targets, web100 … and all their vendors Conventional multi-factor auth is easily breached Penetration and pivot are critical counter-measure opportunities In 2010, RAT would contact command and control (C&C) In 2016, C&C looks like normal traffic Once exfiltration or exploit starts, you may no longer have a business

12 So are we totally screwed?

13 So are we totally screwed?
Not entirely!

14 Event Sequences Provide Clues
Event sequence appear in many places Headers Header types, ordering in requests IP address accesses Source and destination, sequences of either TLS options Which options, which values, which algorithms Incoming component request ordering and timing Body first, CSS, scripts and images next But which are cached, what is round-trip time?

15 Sequences and Cooccurrences
All of these characteristics form symbolic sequences Current systems use hand-crafted rules about particular state But hand-crafting depends on human knowledge We can do much, much better by considering cooccurrence and ordering of symbols in these sequences Log-likelihood ratio test (jargon alert) is a key tool

16 A core technique Many of these easy problems reduce to finding interesting coincidences This can be summarized as a 2 x 2 table Actually, many of these tables A Other B k11 k12 k21 k22

17 How do you do that? This is well handled using G-test
See wikipedia See Original application in linguistics now cited > 2000 times Available in ElasticSearch, in Solr, in Mahout Available in R, C, Java, Python

18 Which one is the anomalous co-occurrence?
not A B 13 1000 not B 100,000 A not A B 1 not B 2 A not A B 1 not B 10,000 A not A B 10 not B 100,000

19 Which one is the anomalous co-occurrence?
not A B 13 1000 not B 100,000 A not A B 1 not B 2 0.90 1.95 A not A B 1 not B 10,000 A not A B 10 not B 100,000 4.52 14.3 Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)

20 How to Count (header-like documents)
For each “document”: For each “word” A: left[A]++ For each “word” B after that (within window): count[A,B]++ right[B]++ total++

21 We wanted this 2 x 2 table for each A,B
But we only counted k11 directly But we did count k*1 = k11 + k (how many A’s we saw) k1* = k11 + k (how many B’s we saw) k** = k11 + k21 + k12 + k (how many pairs in total) A Other B k11 k12 k21 k22

22 How to Count (continued)
Map<PriorityQueue> queue for each pair (A,B) k11 = count[A,B] k1x = left[A] kx1 = right[B] kxx = total k12 = k1x - k11 k21 = kx2 - k11 k22 = kxx - k11 - k12 - k21 queue.add(A, (LLR(k11,k12,k21,k22), B))

23 How to Count (cooccurrence)
for each (C,B)=(“context”, “word”): if (!filter(C) && !filter(B)): right[B]++ for each A in history(C): count[A,B]++ left[A]++ history(C) += B total++

24 It really can be that simple
Seriously... It really can be that simple

25 Basic techniques Counting – often the hardest part
LLR – the basic tool Order models Ordered cooccurrences Transition probabilities Recurrent neural networks Ploughing a quiet field Reimage servers often Force attackers to pivot repeatedly

26 Example 1 - Ababil Defense has to happen here

27 Spot the Important Difference?
Attacker request Real request

28 Spot the Important Difference?
Attacker request Real request

29 This could only be found at scale

30 Overall Outline Again Tradecraft error!

31 Large corpus analysis of source IP’s wins big

32

33 Example 2 - Common Point of Compromise
Scenario: Merchant 0 is compromised, leaks account data during compromise Fraud committed elsewhere during exploit High background level of fraud Limited detection rate for exploits Goal: Find merchant 0 Meta-goal: Screen algorithms for this task without leaking sensitive data

34 Example 2 - Common Point of Compromise
Card data is stolen from Merchant 0 That data is used in frauds at other merchants

35 Simulation Setup

36 Simulation Strategy For each consumer Restate data Tunables
Pick consumer parameters such as transaction rate, preferences Generate transactions until end of sim-time If merchant 0 during compromise time, possibly mark as compromised For all transactions, possible mark as fraud, probability depends on history Merchants are selected using hierarchical Pittman-Yor Restate data Flatten transaction streams Sort by time Tunables Compromise probability, transaction rates, background fraud, detection probability

37

38 Really truly bad guys

39 Historical cooccurrence gives high S/N

40 Summary The world can be seen as sequences of symbols
We can find patterns Those patterns can nail opponents Many patterns only appear at scale You can do this

41

42 Short Books by Ted Dunning & Ellen Friedman
Published by O’Reilly in 2014 and 2015 For sale from Amazon or O’Reilly Free e-books currently available courtesy of MapR

43 Streaming Architecture
by Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly) Free copies at book signing today (oops… that was earlier)

44 Thank You!

45 Q & A Engage with us! @mapr maprtech mapr-technologies MapR
maprtech


Download ppt "Using Sequence Statistics to Fight Advanced Persistent Threats"

Similar presentations


Ads by Google