Presentation is loading. Please wait.

Presentation is loading. Please wait.

Netflow and Botnets Steven M. Bellovin Columbia University 1smb.

Similar presentations


Presentation on theme: "Netflow and Botnets Steven M. Bellovin Columbia University 1smb."— Presentation transcript:

1 Netflow and Botnets Steven M. Bellovin Columbia University 1smb

2 Hypothesis Most hosts are either clients or servers – P2P traffic is an exception Bots talk to other bots and thus to command and control node By looking for unusual traffic flows – client-to- client traffic that isn’t P2P – we can find bots 2smb

3 Methodology Use Netflow data to identify clients and servers Classify nodes as clients or servers Build a traffic matrix from the data to see which clients talk to which other clients Exclude P2P traffic, which is generally identifiable based on flow size 3smb

4 Netflow Originally from Cisco; now implemented by most router vendors – Also an IETF “Proposed Standard” Records “flow information” – src/dst pairs (addresses and port numbers), length, timing, etc. – for “connections” through a given router Intended for accounting and for traffic engineering 4smb

5 Problems with Netflow Flows are unidirectional; need two records for complete picture This is a consequence of Internet topology; most inter-ISP connections follow asymmetric paths Routers often deliver sampled data; can miss flow start/end packets Does not give unambiguous indication of client versus server 5smb

6 Strategy Build tools at Columbia – Easy access to machines and data Use existing archive of CU netflow data – Unclear if there are botnets present; get classification right first Get other netflow archives (e.g., from predict.org) Bring nominally-working code to AT&T to experiment with large-scale datasets Compare with previous results from AT&T as check on correctness 6smb

7 Node Classification Must use heuristics – Flag field in netflow data doesn’t show client vs. server – Timestamp not useful because of sampling Current strategy: look at port number distribution – Clients usually use ports 48K-64K Considering using node degree – But – problems with low-activity hosts? 7smb

8 Classification is Hard Simple heuristics have not been satisfactory Building visualization tools to help us understand the data 8smb

9 Client: Port Number by Volume smb9

10 Client: Port Number Scatter Plot smb10

11 Server: Port Number by Volume smb11

12 Server: Port Number Scatter Plot smb12

13 Ambiguous Host smb13

14 Ambiguous Host Scatter Plot smb14 Is this the sort of host we’re looking for?

15 Current Status Have basic tools built Working with visualization tools to understand the data Next steps: – Refine classification algorithms – Confirm analysis of bots in sample data – Try tools on larger dataset smb15


Download ppt "Netflow and Botnets Steven M. Bellovin Columbia University 1smb."

Similar presentations


Ads by Google