Download presentation
Presentation is loading. Please wait.
Published byJacob Barton Modified over 9 years ago
1
1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA Graduate School of Information Science and Electrical Engineering (ISEE) Kyushu University
2
2 Contents Background Purpose Background Knowledge AS and Internet routing Property of Internet Flow Analysis method Progress of this research Conclusion and Future Work
3
3 Background Internet is growing as a Global Information Infrastructure always-on connection by laptop PC, cellular, etc. many service as music and video delivery distance medicine and learning reliable Internet system are required We should grasp tendency of flows in Internet to manage reliable Internet infrastructure
4
4 Background It is difficult to grasp the tendency of Internet flows Amount of flow are increasing with development of Internet A lot of Garbage such as DDos Attack and illegal accesses are flows in Internet. Physical hazard such as electrical power failure and router failure Expert engineers are requires to manage Internet system It take a great deal of time and effort
5
5 Purpose It is required that the method to detecting anomaly and tendency in Internet flow automatically There are many research of macro analyzing research in Internet flow It is difficult to grasp detail bias and anomaly because Internet flow are complicated I suggest that micro analyzing method by segment Network Flows in port number, AS number,area information and country etc. I can analyze Flow Data in detail The drop of false alarm can give reduce managing cost I suggest that detecting anomaly in Network traffic, and visualize
6
6 Background knowledge AS(Autonomous system) Collection of IP networks and routers under the control of one entity (or sometimes more) that presents a common routing policy to the Internet. An Internet Service Provider (ISP) A very large organization AS numbers are currently 16-bit integers, which allow for a maximum of 65536 assignments. AS:1 AS:2 AS:3 AS:4 Router
7
7 BGP table BGP BGP is the core routing protocol in Internet It works by maintaining a table of IP networks or 'prefixes' which designate network reachability among autonomous systems (AS). We find out the destination AS number by referring to the prefix Network Next Hop Metric LocPrf Weight Path *>i3.0.0.0 210.138.15.145 300 0 2497 2497 701 703 80 i *>i4.0.0.0 210.138.15.145 300 0 2497 2497 3356 i *>i4.23.112.0/22 210.138.15.145 300 0 2497 2497 174 21889 i *>i4.23.180.0/24 210.138.15.145 300 0 2497 2497 701 6128 30576 i reachable prefix (IP address) destination AS number
8
8 Flow-Data is the collection of unidirectional packets which used in same application is exported by router include the information that source (destination) IP address, port number, number of packet, etc. are enormous quantity, so we use sampling data The example of Flow Data (of Kyushu University)
9
9 Analysis method We propose that hierarchically building of database to enhance scalability I export Flow Data and BGP routing information maintained in server, and calculate AS number from Flow Data. I make database which include necessary data (AS number, port number, number of packets, etc..). I categorize database as country, area, and port number. I sort database and calculate correlation for each data which we want to see tendency. I refer to the categorized database, and visualize. I calculated the database and detect anomaly. analyzing traffic categorize visualize anomaly detection
10
10 Analysis method – BGP table and Flow Data I use the collecting BGP table exported from QGPOP and the collecting Flow Data exported from Kyushu University Flow Data I analyze the sampled day’s data which is collected at 0-5 minutes in every hour Sampling rate is 10% KOREN SINET QGPOP Information communication network dedicated to academic research Korea Advanced Research Network BGP table IIJ Internet Initiative Japan Kyushu University Universities Research institutes Universities and research institutes Flow Data
11
11 Analysis method 1 Detailed Analysis and Categorize I assign AS number to IP address with reference BGP table and Flow Data. I categorize Flow Data as port number (communicative purpose), country, area information (Asia, Europe, etc.). I analyze the distribution of the port number in each country. The distribution of port number may be nonbiased in the countries which frequently accesses with illegal port number illegal accesses use various (random) port number.
12
12 Time change of number of flows in Asia Almost of traffic flew with Japan, and number of flows in Japan is increasing for a year. This figure shows time change of number of flows of top 5 country in decreasing order of amount
13
13 Time change of number of flows in Asia This figure shows time change of number of flows of top 4 country in decreasing order of amount, except Japan. The number of flows in China is increasing for a year.
14
14 Analyzing distribution of port number I analyze the distribution of port number used with port 53 flows. I analyze the destination of port number accessed by the host which accessed the DNS server The host is determined by the IP address on Flow Data port:53 port:?? port:XX DNS server host database port number 2022255380443well – known registratedprivate and dynamic 2007/010450476217572717925066129451077150113519 ・・・
15
15 The distribution of port 53 flows and port 25 flows 2007/01/04 ~ 02/22 every Wednesday’s Flow data (every one hours) Horizontal axis show the number of flows in port 25 Vertical axis show the number of flows in port 53 The number of port 53 flows is increasing with the number of port 25 flows (positive correlation)
16
16 Analysis method 2 Anomaly detection We handle the database compiled from Flow Data We smooth the database to make data visualizing easier by adopting exponential smoothing method Flow Data have periodicity (daily, or weekly), so we use Holt-Winters method
17
17 Anomaly detection Data smoothing When I analyze long term in Flow Data, I use Exponentially Weighted Moving Average (EWMA) method. applies weighting factors which decrease exponentially. The weighting for each older data point decreases exponentially Flow Data have periodicity property, so we adopt Holt-Winters method in short term analysis. Holt-Winters method is expanded EWMA method for the periodicity data Y t+1 = a t + b t + c t+1-m Y i = α * Y i - 1 + ( 1 - α ) * Y i - 1 a t = α( Y t + c t-m ) + ( 1 - α)( a t-1 + b t-1 ) b t = β( a t - a t ) + ( 1 - β) b t-1 c t = γ( Y t - a t ) + ( 1 - γ) c t-m
18
18 Anomaly detection I smooth Flow Data by using EWMA or Holt-Winters method, and calculate threshold. When the value exceed the threshold, I consider this point as anomaly 0 time Number of flows 1 cycle (one day) anomaly high threshold level low threshold level threshold area
19
19 Visualization I develop the tool which detect anomaly and visualize The tool should analyze only specific Flow Data which is selected by user (port number, country etc.) In Internet traffic, there are communication data which have large amount of packets, such as port 8000 (DVTS) We want to grasp the tendency not only All Flow Data but also the Flow Data restricted to certain country, AS or port number. It should be versatile tool.
20
20 Conclusion and future work Implementation of analyzing Flow Data The program that categorize Flow Data as country, AS number, and port number are completed I will develop the program to find out the correlation between each port number. Anomaly detection and visualization I smooth the Database made by analyzing program, and calculate the threshold and detect anomaly in Flow Data I develop the tool to visualize not only all data and anomaly, but also the data which is selected by user. I conduct verification experiment for Flow Data include electrical power failure.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.