Log Analysis with GAWK Back to Basics
Who Am I? Brad Isbell 20 years in IT Range Operations Lead / SimSpace Inc. Instructor / Sun Microsystems, DCITA, Stevenson University Contractor / DISA, DOL, NELO, UMUC, FEMA CyberPatriot Mentor MIS, OSCP, CISSP, CEH, Sec+, Linux+
Who are we?
Log Analysis with GAWK Why? I have Splunk/ELK What happens when you don’t? Simple, Efficient, Common
What is GAWK? GNU AWK AWK: Aho, Wienberger, Kernighan Data Driven Text Processing and Reporting Language Pattern Search + Action
GAWK Terminology Records and Fields Records: One Line Fields: Records Contain Fields of Data
Well Formatted Data Logs are (usually) well formatted How are records defined? Can you break the records into fields? What is/are the field separator(s)? How generically can you describe the data?
Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1
Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Client IP & Source Port
Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Timestamp
Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 HTTP Verb & URL
Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Status Code & Bytes Transferred
Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Referrer & User Agent
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28 Which fields are the source IP / destination IP
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .] Fields 1 & 2: (Timestamp): 00:15:46 467385 Field 3: (Network Protocol): IP Fields 4, 5, 6, 7: (Source IP): 173 194 207 189 Field 8: (Source Port): 443
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .] Fields 10, 11, 12, 13: (Destination IP): 192 168 207 5 Field 14: (Destination Port): 50740: Field 15: (Transport Protocol): UDP,
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,] Fields 1, 2, 3, 4: (Timestamp): 00 15 46 467385 Field 5: (Network Protocol): IP Fields 6, 7, 8, 9: (Source IP): 173 194 207 189 Field 10: (Source Port): 443
Well Formatted Data (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,] Fields 12, 13, 14, 15: (Destination IP): 192 168 207 5 Field 16: (Destination Port): 50740 Field 18: (Transport Protocol): UDP
Well Formatted Data (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,] Fields 12, 13, 14, 15: (Destination IP): 192 168 207 5 Field 16: (Destination Port): 50740 Field 18: (Transport Protocol): UDP
Well Formatted Data (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,]+ Fields 12, 13, 14, 15: (Destination IP): 192 168 207 5 Field 16: (Destination Port): 50740 Field 17: (Transport Protocol): UDP
Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28 IF Field 3 == “IP” AND Field 15 == “UDP”: UDP Packet IF Field 3 == “IP” AND Field 15 == “Flags”: TCP Packet IF Field 3 == ”ARP”: ARP Packet Notice: packet length is always the last field
Challenges in Describing Data Web Server Access Logs 192.168.1.116 - - [16/May/2019:03:16:27 -0400] "GET /favicon.ico HTTP/1.1" 301 184 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0" 192.168.1.116 - - [16/May/2019:03:16:27 -0400] "GET /favicon.ico HTTP/1.1" 301 184 "-" " "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"
Challenges in Describing Data Custom Logs 02/27/16 12:25:53 PM,182.100.67.173,root,chester
Challenges in Describing Data Custom Logs 02/27/16 12:25:53 PM,182.100.67.173,root,chester 02/27/16 12:27:32 PM,182.100.67.173,root,mko0,lp-.;[=
Challenges in Describing Data Custom Logs 02/27/16 12:25:53 PM,182.100.67.173,root,Y2hlc3Rlcg== 02/27/16 12:27:32 PM,182.100.67.173,root,bWtvMCxscC0uO1s9
GAWK Syntax $ gawk –F <field separator> ‘ SEARCH { ACTION }’ FILE Default field separator: [ \t]+ SEARCH: Which Records (PCRE Pattern) ACTION: What To Do
Fields and Records Fields are stored in variables: $1, $2, $3 … $0 = The entire record NF = Total Number of Fields $NF = The Last Field NR = Record Number
Demonstration gawk ‘{print $0}’ contacts.txt gawk ‘{print NF}’ contacts.txt gawk ‘{print $NF}’ contacts.txt gawk ‘{print NF}’ pcap.txt gawk ‘{print $NF}’ pcap.txt
Search Syntax /PCRE/ : Search for pattern in record $1 == “PATTERN” : First field is an exact pattern match $1 != “PATTERN” : First field does not exactly match pattern $1 ~ /PCTE/ : First field matches a regex BEGIN : Perform Action Before Reading Records END : Perform Action After Reading Records
Demonstration gawk '/Amelia/{print $0 } ' contacts.txt gawk ' /A/{print $0}' contacts.txt gawk ' $4 == "A" {print $0}' contacts.txt gawk ' $1 ~ /^A/ {print $3} ' contacts.txt gawk ' /gmail.com/{print $1} ' contacts.txt gawk -F'[ :,.]+' ' $17 == "UDP" {print $NF } ' pcap.txt
Variables Undeclared Letters, Digits, Underscore Cannot Begin with Digit
Demonstration gawk -F'[ ,.:]+' '{print $17}' pcap.txt $17 == "UDP" {UDP = UDP + $NF} END { print UDP} ' pcap.txt gawk -F'[ ,.:]+' ' $17 == "Flags" {TCP += $NF } END {print "UDP: " UDP "\nTCP: " TCP} gawk -f pcap-1.gawk pcap.txt
Arrays proto[“UDP”] = 1371 proto[“TCP”] = 63 for (p in proto) print p: proto[p]
Demonstration pcap-2.gawk BEGIN { FS = "[ ,.:]+" } $17 == "UDP" { proto["UDP"] += $NF } $17 == "Flags" { proto["TCP"] += $NF } END { for (p in proto) print p": " proto[p] }
printf printf “%s\n”, $1 printf “%-15s%s\n”, $1, $3 %s : string %15s : string right justified on 15 characters %-15s : string left justified on 15 characters %d : decimal %f : float %10.2f : float right justified on 10 character, 2 places after decimal
Additional Functions Convert Epoch Timestamp to Human Readable strftime(“%c”,TIMESTAMP) Regex Substitution gensub(SEARCH,REPLACE,INSTANCE,INPUT)
Demonstration head access.log | gawk -F'|' '{print $3}' head access.log | gawk -F'|' '{print strftime("%c",$3)}' head access.log | gawk -F'|' '{print gensub("https?://([^/]+)/.*","\\1",1,$5)} ' QUESTIONS: Which client is generating the most traffic? gawk -F'|' ' { LENGTH[$1] += $9 } END { for (IP in LENGTH) printf "%-15s : %d\n", IP , LENGTH[IP] } ' access.log Which website is that IP going to? -> squid-1.gawk