Download presentation
Presentation is loading. Please wait.
Published byGriffin Hopkins Modified over 6 years ago
1
Pre-flight checks Can boot VM and log in SSH client and connectivity
Text editor – vim? Internet: BSIDES-SECURE BSIDES2018! VM login: teapot/workshop
2
My log obeys commands – Parse!
An Introduction to log parsing Joash lewis @http_error_418
3
What can you get from logs?
[12/Dec/2015:18:25: ] "GET /administrator/ HTTP/1.1" "-" "Mozilla/5.0 (Windows NT 6.0; rv:34.0) Gecko/ Firefox/34.0" "-" who when what where how How can you extract the useful information from a flat string? < 30 minutes
4
Regular expressions! How is an IPv4 address written?
– numbers, dot, numbers, dot, numbers, dot, numbers Regular expression can find this pattern; then we can store it \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} Not strictly accurate (regex for fully conformant IPv4 is complicated) < 30 minutes
5
Core regex concepts Metacharacters Character classes Repetion
\d (digit), \w (alphanumeric and underscore) Character classes [abc] matches one of a, b or c [A-Fa-f0-9] matches one of a-f, A-F or 0-9 inclusive (hexadecimal!) [^a-f] matches any character that is NOT a-f Repetion \d* - any digit, any number of times (including zero) \d+ - any digit, at least once . – any character (if you want a literal . It must be escaped: \.) < 30 minutes
6
Extracting matches A basic regex has a Boolean result: does input match pattern? yes/no Capture groups can store sections of the input into variables Variables can be stored into database for search Regex supports ‘named capture groups’ SIEMs use named capture groups to determine indexing and display of data (?<group_name>\d+) < 30 minutes
7
Practice time https://bpaste.net/show/546db0c4f931 www.regex101.com
Task: capture the source IP, HTTP method and URI into named fields Useful regex mini patterns: .*abc - match any character any number of times (including zero), followed by ‘abc’ .*abc is GREEDY and will match up to the LAST occurrence of ‘abc’ .*?abc is LAZY and will match up to the FIRST occurrence of ‘abc’ [^\s]+ - match any character except a space, at least once Should be ~30 minutes in
8
Screw it, we’ll do it live
2 terminals Edit /etc/logstash/conf.d/workshop.conf with your regex sudo /usr/share/logstash/bin/logstash –f /etc/logstash/conf.d/ head access.log | nc Now try making one for bind.log! Source IP and port Query target hostname, verb and type 30-60 mins
9
SIEMulation SIEMs often have built-in shorthand for common data types
In Logstash, full IPv4 pattern is: (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0- 9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0- 5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0- 1]?[0-9]{1,2}))(?![0-9]) In shorthand: %{IPV4} %{IPV4:variable_name} Saves time, easier to read Syntax differs between platform, but basic concept is the same 30-60 mins
10
Scaling out Build your regexes incrementally
If it’s not working, remove a section (paste to a notepad so you don’t lose it) and test again. If still not working, take out a bit more Keep an eye on performance Some regex operations are computationally expensive (e.g. lookahead, lookbehind) regex-performance/ Test with a decent sized sample (1000+) ~60 minutes
11
Surprise! One size does NOT fit all
Some data sources are kind and one regex can match every message (hooray Apache access log). Some are not. Different approaches: Conditional matching Match part of the message to determine message type Apply regex specific to that message type All of the regexes. All of them. Try every regex in sequence until one matches Important to identify if a particular message does not match ASAP Order regexes by frequency of message type mins
12
Bring your own beer Log, I meant bring your own log.
Not got one? Let’s try PFSense’s Filterlog (firewall log) mins
13
Only do the work you have to
Unless your source is bespoke, there’s a good chance someone has done the hard work already BUT do not assume they did a good job. TEST, TEST, TEST. This goes even for vendor-supplied content. Adapting content from other SIEMs is very workable Log formats are USUALLY documented somewhere. RTFM if you can. Prioritise. Yes a regex can extract every scrap of data, but it’s not always wise. SIEMs have limitations, and your time is valuable. ~ 110 minutes
14
Regex is not always the answer
Two kinds of log data: with formal structure, and without XML, CSV, JSON If your SIEM has an interpreter for the data structure: It MAY be more efficient/performant than regex It is LIKELY to be easier on you ORDER MATTERS XML and JSON do not (inherently) mandate order You do NOT want to try handling arbitrary ordering with regex. That way lies madness. I’m not kidding: match-open-tags-except-xhtml-self-contained-tags
15
Happy logging! www.unsafehex.com
hexistentialist dot com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.