Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA Graduate School of Information Science and Electrical.

Similar presentations


Presentation on theme: "1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA Graduate School of Information Science and Electrical."— Presentation transcript:

1 1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA Graduate School of Information Science and Electrical Engineering (ISEE) Kyushu University

2 2 Contents Background Purpose Background Knowledge  AS and Internet routing  Property of Internet Flow Analysis method Progress of this research Conclusion and Future Work

3 3 Background Internet is growing as a Global Information Infrastructure  always-on connection by laptop PC, cellular, etc.  many service as music and video delivery  distance medicine and learning reliable Internet system are required We should grasp tendency of flows in Internet to manage reliable Internet infrastructure

4 4 Background It is difficult to grasp the tendency of Internet flows  Amount of flow are increasing with development of Internet  A lot of Garbage such as DDos Attack and illegal accesses are flows in Internet.  Physical hazard such as electrical power failure and router failure Expert engineers are requires to manage Internet system  It take a great deal of time and effort

5 5 Purpose It is required that the method to detecting anomaly and tendency in Internet flow automatically  There are many research of macro analyzing research in Internet flow It is difficult to grasp detail bias and anomaly because Internet flow are complicated I suggest that micro analyzing method by segment Network Flows in port number, AS number,area information and country etc. I can analyze Flow Data in detail  The drop of false alarm can give reduce managing cost I suggest that detecting anomaly in Network traffic, and visualize

6 6 Background knowledge AS(Autonomous system)  Collection of IP networks and routers under the control of one entity (or sometimes more) that presents a common routing policy to the Internet. An Internet Service Provider (ISP) A very large organization  AS numbers are currently 16-bit integers, which allow for a maximum of 65536 assignments. AS:1 AS:2 AS:3 AS:4 Router

7 7 BGP table BGP  BGP is the core routing protocol in Internet  It works by maintaining a table of IP networks or 'prefixes' which designate network reachability among autonomous systems (AS).  We find out the destination AS number by referring to the prefix Network Next Hop Metric LocPrf Weight Path *>i3.0.0.0 210.138.15.145 300 0 2497 2497 701 703 80 i *>i4.0.0.0 210.138.15.145 300 0 2497 2497 3356 i *>i4.23.112.0/22 210.138.15.145 300 0 2497 2497 174 21889 i *>i4.23.180.0/24 210.138.15.145 300 0 2497 2497 701 6128 30576 i reachable prefix (IP address) destination AS number

8 8 Flow-Data  is the collection of unidirectional packets which used in same application  is exported by router  include the information that source (destination) IP address, port number, number of packet, etc.  are enormous quantity, so we use sampling data The example of Flow Data (of Kyushu University)

9 9 Analysis method We propose that hierarchically building of database to enhance scalability I export Flow Data and BGP routing information maintained in server, and calculate AS number from Flow Data. I make database which include necessary data (AS number, port number, number of packets, etc..). I categorize database as country, area, and port number. I sort database and calculate correlation for each data which we want to see tendency. I refer to the categorized database, and visualize. I calculated the database and detect anomaly. analyzing traffic categorize visualize anomaly detection

10 10 Analysis method – BGP table and Flow Data I use the collecting BGP table exported from QGPOP and the collecting Flow Data exported from Kyushu University Flow Data  I analyze the sampled day’s data which is collected at 0-5 minutes in every hour Sampling rate is 10% KOREN SINET QGPOP Information communication network dedicated to academic research Korea Advanced Research Network BGP table IIJ Internet Initiative Japan Kyushu University Universities Research institutes Universities and research institutes Flow Data

11 11 Analysis method 1 Detailed Analysis and Categorize  I assign AS number to IP address with reference BGP table and Flow Data.  I categorize Flow Data as port number (communicative purpose), country, area information (Asia, Europe, etc.).  I analyze the distribution of the port number in each country. The distribution of port number may be nonbiased in the countries which frequently accesses with illegal port number  illegal accesses use various (random) port number.

12 12 Time change of number of flows in Asia Almost of traffic flew with Japan, and number of flows in Japan is increasing for a year. This figure shows time change of number of flows of top 5 country in decreasing order of amount

13 13 Time change of number of flows in Asia This figure shows time change of number of flows of top 4 country in decreasing order of amount, except Japan. The number of flows in China is increasing for a year.

14 14 Analyzing distribution of port number I analyze the distribution of port number used with port 53 flows. I analyze the destination of port number accessed by the host which accessed the DNS server  The host is determined by the IP address on Flow Data port:53 port:?? port:XX DNS server host database port number 2022255380443well – known registratedprivate and dynamic 2007/010450476217572717925066129451077150113519 ・・・

15 15 The distribution of port 53 flows and port 25 flows 2007/01/04 ~ 02/22 every Wednesday’s Flow data (every one hours) Horizontal axis show the number of flows in port 25 Vertical axis show the number of flows in port 53 The number of port 53 flows is increasing with the number of port 25 flows (positive correlation)

16 16 Analysis method 2 Anomaly detection  We handle the database compiled from Flow Data We smooth the database to make data visualizing easier by adopting exponential smoothing method Flow Data have periodicity (daily, or weekly), so we use Holt-Winters method

17 17 Anomaly detection Data smoothing  When I analyze long term in Flow Data, I use Exponentially Weighted Moving Average (EWMA) method. applies weighting factors which decrease exponentially. The weighting for each older data point decreases exponentially  Flow Data have periodicity property, so we adopt Holt-Winters method in short term analysis. Holt-Winters method is expanded EWMA method for the periodicity data Y t+1 = a t + b t + c t+1-m Y i = α * Y i - 1 + ( 1 - α ) * Y i - 1 a t = α( Y t + c t-m ) + ( 1 - α)( a t-1 + b t-1 ) b t = β( a t - a t ) + ( 1 - β) b t-1 c t = γ( Y t - a t ) + ( 1 - γ) c t-m

18 18 Anomaly detection I smooth Flow Data by using EWMA or Holt-Winters method, and calculate threshold.  When the value exceed the threshold, I consider this point as anomaly 0 time Number of flows 1 cycle (one day) anomaly high threshold level low threshold level threshold area

19 19 Visualization I develop the tool which detect anomaly and visualize  The tool should analyze only specific Flow Data which is selected by user (port number, country etc.) In Internet traffic, there are communication data which have large amount of packets, such as port 8000 (DVTS) We want to grasp the tendency not only All Flow Data but also the Flow Data restricted to certain country, AS or port number.  It should be versatile tool.

20 20 Conclusion and future work Implementation of analyzing Flow Data  The program that categorize Flow Data as country, AS number, and port number are completed  I will develop the program to find out the correlation between each port number. Anomaly detection and visualization  I smooth the Database made by analyzing program, and calculate the threshold and detect anomaly in Flow Data  I develop the tool to visualize not only all data and anomaly, but also the data which is selected by user.  I conduct verification experiment for Flow Data include electrical power failure.


Download ppt "1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA Graduate School of Information Science and Electrical."

Similar presentations


Ads by Google