Download presentation
Presentation is loading. Please wait.
1
Low Hanging Fruit Tastes Just as Good
Dan Ruef Emily Sarneso
2
Copyright 2016 Carnegie Mellon University This material is based upon work funded and supported by the Department of Defense under Contract No. FA C with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Department of Defense. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [Distribution Statement A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at This material was prepared for the exclusive use of Flocon conference attendees and may not be used for any other purpose without the written consent of Carnegie Mellon®, CERT®, CERT Coordination Center® and FloCon® are registered marks of Carnegie Mellon University. DM
3
Agenda What is low hanging fruit? Tools Flow Baselining DNS Baselining Results
4
What is low hanging fruit?
5
The Myth of Low Hanging Fruit
“a thing or person that can be won, obtained, or persuaded with little effort.” “.. declaring that an unfamiliar task will yield low-hanging fruit is almost always an admission that you have little insight about what you’re setting out to do. And any estimate of how much work it’ll take to do something you’ve never tried before is likely to be off by degrees of magnitude.”[1] [1]
6
Tools
7
Publicly Released Tools
CERT NetSA Security Suite (to nobody’s surprise) Analysis Pipeline 5.6 netsa-python 1.5 pyfixbuf schemaTools 1.3 YAF Libfixbuf SiLK super_mediator tools.netsa.cert.org
8
Retrospective analysis
Tools Alerts/results Flow + DNS DNS Flow Retrospective analysis
9
Baselining?
10
Baseline A history of an IP’s traffic behavior observed for a specific timeframe. Common metrics: Number of packets, bytes, records Active times of day Number of applications, protocols, unique dIPs
11
Baselining Only Sounds Easy
Get some numbers that describe the past Compare the current numbers to them Which fields? Bytes Packets Records Bytes per packet Which values? Sum Average Maximum Standard deviation Multiplier Over what time? Per hour Per 8 hours Per day Per month How are they grouped? Any day Day of week Hour of day
12
Our Baseline Average sum of bytes per day of week * 10 Per internal IP address: Sunday Monday Tuesday Wednesday Thursday Friday Saturday Daily Sum … Get the sum of bytes per IP per day. Group sums by day of week Average these sums per day of week Multiply average by 10 to create threshold Average * 10
13
Targeted End Results of Analysis
Bytes per port for traffic sent to any new IP address on the day that an internal IP sent out an anomalous amount of traffic for that day of the week. Bytes per port for traffic sent to any IP address that received more traffic than had ever been sent on the day that an internal IP sent out an anomalous amount of traffic for that day of the week. Piece of cake…ok…a whole cake…but feasible if cut into slices
14
All flows used to establish baselines
Average outgoing bytes sent in one day per day of the week Pull outgoing flow data sent from the internal network rwfilter --start-date=2016/05/01 --end-date=2016/05/ type=out,outweb –sipset=internal.set --pass- destination=flowFiles/ rw All flows used to establish baselines IP Day of Week Daily Sum SUNDAY 12345 23456 34567 MONDAY 45678 56789 67890 … Analysis Pipeline
15
Analysis Pipeline Can group state by field tuples WHAT?
Streaming analysis of flow and other records WHEN? Run on live incoming data or past flow files retrospectively Retrospectively to establish baseline Live to search for anomalous traffic HOW? Filter traffic Build state Evaluate: Compare state to thresholds Send alerts (Evaluations) / updates (Statistics) Can group state by field tuples
16
Process Explain bags Create Daily Sums per IP Convert alerts to IPFIX
Compute Average Sum per IP per Day Make SiLK bags from daily averages Explain bags
17
Step 1 - Daily Sums Using Pipeline
Process traffic to external IPs Group traffic by: {Day of Week, SIP} State is sum of BYTES field Update at the end of each day
18
Step 2 – Convert alerts to IPFIX
Use pyfixbuf to convert Pipeline’s alerts of daily sums to IPFIX Input: Pipeline’s alert log Output: One IPFIX file of alert records Pipeline can ingest any IPFIX record natively Converted record format [sourceIPv4Address, octetTotalCount, dayOfWeek] [ , , SATURDAY]
19
Step 3 - Compute Average Sum per IP per Day
Send converted alerts through pipeline to compute averages Group by [SIP, Day of Week] Compute the average of the daily byte counts Statistics run over a single file, update at the end. Result: We now have a baseline average bytes per IP per day of week
20
Basic vs. Custom Thresholds
Basic check construction FOREACH SIP SUM BYTES > TIME WINDOW 1 DAY Same threshold for all IPs More than BYTES (1MB) in 1 DAY will trigger an alert Custom thresholds FOREACH SIP - SUM BYTES > “thresholds.bag” - TIME WINDOW 1 DAY Unique for every IP address
21
Step 4 - Convert Daily Averages to Bags
Python script to consume alerts of averages Each alert contains: IP, Day of Week, Daily Bytes Average Use netsa-python to build seven bags, one for each day of the week Key: IP Value: Daily Bytes Average We multiplied these averages by 10 to limit anomalous counts
22
Storage/Time Impact One month of outgoing flow files for the internal subnet: 4.0 GB Our baseline consists of past 6 months (May – October ‘16) Step 1 – Calculate sums: 1,417,661,775 flows processed in 9 minutes and 4 seconds Step 2 - Convert to IPFIX: records; 4.9 MB IPFIX file Step 4 - Make SiLK Bags of thresholds: entries; 128K of bags
23
Transition to Live Pipeline Anomaly Detection
Create one FILTER for each day of the week Sunday traffic internal to external Group state by SIP. Alert if any SIP sends more than their threshold within 1 day. Include each SIP in an alert once per day How do we update the bags?
24
Results from Live Pipeline
25
We have alerts…now what?
IP Day of Week Baseline Date Bytes Over Tuesday 97.83MB 289.15MB 191.32MB Pull flows and build an IPSet of all IPs sent data to in the “past” rwfilter --start-date=2016/05/01--end-date=2016/10/31 –saddress= pass-dest=past rw --not-dipset=internalIPs.set rwset --dip-file=past set past rw Pull flows and build an IPSet of the IPs the sent data to on rwfilter --start-date=2016/11/08 --saddress= pass- dest=nov8_ rw --not-dipset=internalIPs.set rwset --dip-file=anom set nov8_ rw Did talk to any new external IPs on ? rwsettool --difference --output-path=diff.set anom set past set
26
Processing alerts Did communicate more than usual to any IP? Run Pipeline in batch mode over past flows pipeline --country-code-file=country_codes.pmap --site-config-file=silk.conf --alert-log- file=pastLog.txt --aux-alert-file=pastAux.txt --name-files --silk -- configuration=getPastSums.conf past rw Run Pipeline in batch mode over single day flows Compare alert files FILTER all END FILTER STATISTIC anomPerDipPerDay FOREACH DIP SUM BYTES UPDATE EVERY DAY END STATISTIC
27
Results New External IPs Baseline per dIP per day
We can also use the DNS data we have collected to figure out the domains associated with those Ips
28
Updating the Bags Pipeline reloads bag files when updated
New Flow Files For Baselines Daily Sums Alert File Pipeline reloads bag files when updated Existing Daily Sums IPFIX File New Daily Sums IPFIX File Sunday Bag Files Key artifacts of computing baselines: IPFIX file of daily sums 7 Bags of thresholds To incorporate new data into baselines: Retrieve new flow files Compute their daily sums Convert alerts to IPFIX file Combine new IPFIX file with existing IPFIX file Compute averages Make new bags from averages Overwrite existing bag files, live Pipeline will reload New IPFIX with ALL daily sums Updated Averages Alert File
29
DNS Baselining
30
DNS Baselining – Why? Assumption:
If we’re not already infected, malicious things will be new at some point Many attacks use recently registered domain names Change can be potentially interesting New Domains Bad Domains New domains aren’t always bad, but bad domains are usually new
31
Retrospective analysis
DNS Baselining – How? Alerts/results Flow + DNS De-duplicated DNS Flow Retrospective analysis
32
DNS Baselining – Create whitelists
Domains only Domain IP pairs Unique SLDs Unique SLDs + TLDs Populate whitelist for 2-4 weeks Continuously update whitelist
33
DNS Baselining – Live Start alerting: New Domains New Domain IP pairs
New SLDs New SLDs + TLDs
34
DNS Baselining Results
35
DNS Baselining – Future Work
Determine best length of time for whitelist creation Compare with other data feeds “So the next time you call someone’s job easy — or tell an employee to go pick some low-hanging fruit — stop yourself. Respect the work that you’ve never done before. Remind yourself that other people’s jobs aren’t so simple. Results rarely come without effort. If momentum and experience is on your side, what is hard can masquerade as easy, but never forget that not having done something before doesn’t make it easy. It usually makes it hard.”
36
Author Software Engineering Institute
Contact Information 4/14/2018 Dan Ruef Emily Sarneso Development team: Public Mailing List:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.