CS519 BGP Project Report Kai-Wen Chung (kc279) San-Yiu Cheng (sc345)
How to Proceed BGP Analysis Collect Raw Data Query Database and Analyze data Import into Database
Collect Raw Data MAE-EAST ( ~ ) ( ~ )
Database Schema Original Schema
Database Schema (cont.) Record Size Message: 94 bytes/record MsgPath: 18 bytes/record # Record Message: 104,841,405 (98.1 ~ 98.11) MsgPath: 251,442,478 (98.1 ~ 98.11)
Database Schema (cont.) Database space allocation: 20GB About 12 hours to import raw data for 1 month (about 10,000,000 messages, and 20,000,000 paths) Data volume reaches limitation soon
Our Solution Allocate larger space Move Database from SQLServer -> Sparrow Total 70GB Modify data schema to reduced record size
Data Schema Modification
Record Size Message: 52 bytes/record MsgPath: 14 bytes/record Size Reduces Message: 46.9% MsgPath: 22.2% Faster Data Importing
Current Status Database P3-500 with 128MB ram, and Windows 2000 Server and SQL Server 2000 installed Imported Data ~ About 21GB in DB About 34GB in DB
Current Database Issue SQL Server Performance It could take several hours to run a query Space problem 70GB is only enough for data of 1 ~ 2 month (of 2003) We need a “Tera-byte” Database to accommodate all data of 2002, and 2003
Summary of Data Total space used: ~55G (1998 and 03/2003) Number of Messages: ~220.5 Million (1998 and 03/2003) Number of DataSet: ~30,000 (1998 and 03/2003)
Summary of Data (cont.) A small number of IP addresses dominate the routing table 15 Source IP addresses occupy about 68% of the PeerIp field of the Messages 15 Destination IP Addresses occupy about 47% of the NextHop field of the Messages
Summary of Data (cont.) Advertisement Vs. Withdrawal Messages There are about 220 Million Messages ~31.5% of all Messages are Withdrawal Messages ~68.5% of all Messages are Advertisement Messages
Data Analysis
Data Analysis (cont.)
Some Advices Optimize your query Some queries are going to take several hours to execute Test on bgpbaby first This is a smaller version of bgpdata (~1G) Don’t try to execute all your queries on last day The SQL Server database is going to be overwhelmed