Log Analysis with GAWK Back to Basics.

Slides:



Advertisements
Similar presentations
CST8177 awk. The awk program is not named after the sea-bird (that's auk), nor is it a cry from a parrot (awwwk!). It's the initials of the authors, Aho,
Advertisements

COEN 252 Computer Forensics Using TCPDump / Windump for package analysis.
1 Unix Talk #2 AWK overview Patterns and actions Records and fields Print vs. printf.
Network Analyzer Example
STFTP (Simplified Trivial File Transfer Protocol) MODULE #1.
AWK: The Duct Tape of Computer Science Research Tim Sherwood UC San Diego.
Practical Networking. Introduction  Interfaces, network connections  Netstat tool  Tcpdump: Popular network debugging tool  Used to intercept and.
Lecture 4: stateful inspection, advanced protocols Roei Ben-Harush 2015.
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Identification of Mobile Devices from Network Traffic Measurements - a HTTP User Agent Method Master’s Thesis August 2 8, 2012 Supervisor – Prof. Heikki.
9/15/2015© 2008 Raymond P. Jefferis IIILect Application Layer.
Examining TCP/IP.
Web HTTP Hypertext Transfer Protocol. Web Terminology ◘Message: The basic unit of HTTP communication, consisting of structured sequence of octets matching.
The complete picture Linux Network Management. End to End Connection Being able to describe the end to end connection sequence is a useful thing Very.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
1 © 2004, Cisco Systems, Inc. All rights reserved. Chapter 9 Intermediate TCP/IP/ Access Control Lists (ACLs)
Chapter 12: gawk Yes it sounds funny. In this chapter … Intro Patterns Actions Control Structures Putting it all together.
Practice 4 – traffic filtering, traffic analysis
Sniffer, tcpdump, Ethereal, ntop
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
CSCI 330 UNIX and Network Programming
Awk- An Advanced Filter by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
1 Lecture 10 Introduction to AWK COP 3344 Introduction to UNIX.
Lecture 4: Stateful Inspection, Advanced Protocols.
Awk 2 – more awk. AWK INVOCATION AND OPERATION the "-F" option allows changing Awk's "field separator" character. Awk regards each line of input data.
Network Analyzer :- Introduction to Ethereal Computer Networking (Graduate Class)
The purpose of a CPU is to process data Custom written software is created for a user to meet exact purpose Off the shelf software is developed by a software.
iperf a gnu tool for IP networks
Traffic Analysis– Wireshark
© 2003, Cisco Systems, Inc. All rights reserved.
Chapter 10: Web Basics.
Wireshark Tutorial KUAS, Hao-Xiang Gu.
Module 3: Enabling Access to Internet Resources
Instructor Materials Chapter 5 Providing Network Services
How HTTP Works Made by Manish Kushwaha.
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
CS 330 Class 7 Comments on Exam Programming plan for today:
HTTP – An overview.
The Hypertext Transfer Protocol
Chapter 5 Network and Transport Layers
How does it work ?.
COMP2322 Lab 2 HTTP Steven Lee Feb. 8, 2017.
or call for office visit,
Process-to-Process Delivery
TCP/IP Networking An Example
File Transfer and access
Chapter 5 Network and Transport Layers
John Carelli, Instructor Kutztown University
The Request & Response object
Widgets – Usage statistics collection Task force for the strategic project on the development and use of common ESS tools and services for dissemination.
Multimedia and Networks
LING 408/508: Computational Techniques for Linguists
TCP/IP Networking An Example
PHP.
When you connect with DHCP, you are assigned a
Network Analyzer :- Introduction to Wireshark
Ed Ferrara, MSIA, CISSP MIS 5208 Processing and Analyzing Data Ed Ferrara, MSIA, CISSP
PART V Transport Layer.
World Wide Web Uniform Resource Locator hostname [:port]/path
Network Analyzer :- Introduction to Wireshark
Applications Layer Functionality & Protocols
46 to 1500 bytes TYPE CODE CHECKSUM IDENTIFIER SEQUENCE NUMBER OPTIONAL DATA ICMP Echo message.
Transport Protocols: TCP Segments, Flow control and Connection Setup
Transport Protocols: TCP Segments, Flow control and Connection Setup
Q/ Compare between HTTP & HTTPS? HTTP HTTPS
Will Code For Food The website will begin as a site where I can advertise my skills as a programmer and offer services for free, for food, or for money.
File Transfer Protocol
Transport Layer 9/22/2019.
Presentation transcript:

Log Analysis with GAWK Back to Basics

Who Am I? Brad Isbell 20 years in IT Range Operations Lead / SimSpace Inc. Instructor / Sun Microsystems, DCITA, Stevenson University Contractor / DISA, DOL, NELO, UMUC, FEMA CyberPatriot Mentor MIS, OSCP, CISSP, CEH, Sec+, Linux+

Who are we?

Log Analysis with GAWK Why? I have Splunk/ELK What happens when you don’t? Simple, Efficient, Common

What is GAWK? GNU AWK AWK: Aho, Wienberger, Kernighan Data Driven Text Processing and Reporting Language Pattern Search + Action

GAWK Terminology Records and Fields Records: One Line Fields: Records Contain Fields of Data

Well Formatted Data Logs are (usually) well formatted How are records defined? Can you break the records into fields? What is/are the field separator(s)? How generically can you describe the data?

Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1

Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Client IP & Source Port

Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Timestamp

Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 HTTP Verb & URL

Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Status Code & Bytes Transferred

Well Formatted Data (Squid Proxy Logs) 210.40.18.54|63731|1566469606|GET|http://ctldl.windowsupdate.com /msdownload/update/v3/static/trustedr/en/disallowedcertstl.cab?8303c 1e235fc3944|HTTP/1.1|200|4622|237|4385|-|Microsoft-CryptoAPI/6.1 Referrer & User Agent

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28 Which fields are the source IP / destination IP

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:15:46.521849 IP 192.168.207.5.62019 > 18.215.66.244.443: Flags [.], ack 64, win 2047, options [nop,nop,TS val 1271398963 ecr 10276941], length 0 00:15:46.602773 IP 192.168.207.5.63365 > 172.217.11.3.443: UDP, length 1350 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .] Fields 1 & 2: (Timestamp): 00:15:46 467385 Field 3: (Network Protocol): IP Fields 4, 5, 6, 7: (Source IP): 173 194 207 189 Field 8: (Source Port): 443

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .] Fields 10, 11, 12, 13: (Destination IP): 192 168 207 5 Field 14: (Destination Port): 50740: Field 15: (Transport Protocol): UDP,

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,] Fields 1, 2, 3, 4: (Timestamp): 00 15 46 467385 Field 5: (Network Protocol): IP Fields 6, 7, 8, 9: (Source IP): 173 194 207 189 Field 10: (Source Port): 443

Well Formatted Data (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,] Fields 12, 13, 14, 15: (Destination IP): 192 168 207 5 Field 16: (Destination Port): 50740 Field 18: (Transport Protocol): UDP

Well Formatted Data (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,] Fields 12, 13, 14, 15: (Destination IP): 192 168 207 5 Field 16: (Destination Port): 50740 Field 18: (Transport Protocol): UDP

Well Formatted Data (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, Field Separator = [ .:,]+ Fields 12, 13, 14, 15: (Destination IP): 192 168 207 5 Field 16: (Destination Port): 50740 Field 17: (Transport Protocol): UDP

Well Formatted Data? (tcpdump) 00:15:46.467385 IP 173.194.207.189.443 > 192.168.207.5.50740: UDP, length 21 00:15:46.521767 IP 18.215.66.244.443 > 192.168.207.5.62019: Flags [P.], seq 1:64, ack 63, win 11, options [nop,nop,TS val 10276941 ecr 1271398822], length 63 00:50:14.519922 ARP, Request who-has 192.168.2.100 tell 192.168.2.104, length 28 IF Field 3 == “IP” AND Field 15 == “UDP”: UDP Packet IF Field 3 == “IP” AND Field 15 == “Flags”: TCP Packet IF Field 3 == ”ARP”: ARP Packet Notice: packet length is always the last field

Challenges in Describing Data Web Server Access Logs 192.168.1.116 - - [16/May/2019:03:16:27 -0400] "GET /favicon.ico HTTP/1.1" 301 184 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:66.0) Gecko/20100101 Firefox/66.0" 192.168.1.116 - - [16/May/2019:03:16:27 -0400] "GET /favicon.ico HTTP/1.1" 301 184 "-" " "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"

Challenges in Describing Data Custom Logs 02/27/16 12:25:53 PM,182.100.67.173,root,chester

Challenges in Describing Data Custom Logs 02/27/16 12:25:53 PM,182.100.67.173,root,chester 02/27/16 12:27:32 PM,182.100.67.173,root,mko0,lp-.;[=

Challenges in Describing Data Custom Logs 02/27/16 12:25:53 PM,182.100.67.173,root,Y2hlc3Rlcg== 02/27/16 12:27:32 PM,182.100.67.173,root,bWtvMCxscC0uO1s9

GAWK Syntax $ gawk –F <field separator> ‘ SEARCH { ACTION }’ FILE Default field separator: [ \t]+ SEARCH: Which Records (PCRE Pattern) ACTION: What To Do

Fields and Records Fields are stored in variables: $1, $2, $3 … $0 = The entire record NF = Total Number of Fields $NF = The Last Field NR = Record Number

Demonstration gawk ‘{print $0}’ contacts.txt gawk ‘{print NF}’ contacts.txt gawk ‘{print $NF}’ contacts.txt gawk ‘{print NF}’ pcap.txt gawk ‘{print $NF}’ pcap.txt

Search Syntax /PCRE/ : Search for pattern in record $1 == “PATTERN” : First field is an exact pattern match $1 != “PATTERN” : First field does not exactly match pattern $1 ~ /PCTE/ : First field matches a regex BEGIN : Perform Action Before Reading Records END : Perform Action After Reading Records

Demonstration gawk '/Amelia/{print $0 } ' contacts.txt gawk ' /A/{print $0}' contacts.txt gawk ' $4 == "A" {print $0}' contacts.txt gawk ' $1 ~ /^A/ {print $3} ' contacts.txt gawk ' /gmail.com/{print $1} ' contacts.txt gawk -F'[ :,.]+' ' $17 == "UDP" {print $NF } ' pcap.txt

Variables Undeclared Letters, Digits, Underscore Cannot Begin with Digit

Demonstration gawk -F'[ ,.:]+' '{print $17}' pcap.txt $17 == "UDP" {UDP = UDP + $NF} END { print UDP} ' pcap.txt gawk -F'[ ,.:]+' ' $17 == "Flags" {TCP += $NF } END {print "UDP: " UDP "\nTCP: " TCP} gawk -f pcap-1.gawk pcap.txt

Arrays proto[“UDP”] = 1371 proto[“TCP”] = 63 for (p in proto) print p: proto[p]

Demonstration pcap-2.gawk BEGIN { FS = "[ ,.:]+" } $17 == "UDP" { proto["UDP"] += $NF } $17 == "Flags" { proto["TCP"] += $NF } END { for (p in proto) print p": " proto[p] }

printf printf “%s\n”, $1 printf “%-15s%s\n”, $1, $3 %s : string %15s : string right justified on 15 characters %-15s : string left justified on 15 characters %d : decimal %f : float %10.2f : float right justified on 10 character, 2 places after decimal

Additional Functions Convert Epoch Timestamp to Human Readable strftime(“%c”,TIMESTAMP) Regex Substitution gensub(SEARCH,REPLACE,INSTANCE,INPUT)

Demonstration head access.log | gawk -F'|' '{print $3}' head access.log | gawk -F'|' '{print strftime("%c",$3)}' head access.log | gawk -F'|' '{print gensub("https?://([^/]+)/.*","\\1",1,$5)} ' QUESTIONS: Which client is generating the most traffic? gawk -F'|' ' { LENGTH[$1] += $9 } END { for (IP in LENGTH) printf "%-15s : %d\n", IP , LENGTH[IP] } ' access.log Which website is that IP going to? -> squid-1.gawk