Dynamics of Malicious Software in the Internet Tatehiro Kaiwa, University of Aizu. E-mail:m5081224@u-aizu.ac.jp I’d like to start my presentation. My name is Tatehiro Kaiwa. My thesis title is Dynamics of Malicious Software in the Internet. 1
Outline Random Network and Scale-free Network Observed Arrivals of E-mail Simulation Model of Worm Spread Dynamics Local Network Structure Inference Mathematical Model of Outbreak Hub Defense Strategy Conclusion This is outline First, I introduce relationship between random network and scale-free network. Next, I explain why I observed arrival time of e-mail. And, I explain my simulation model and results. Then, I explain Mathematical Model and I validate a method based on mathematical model. Finally, conclusion. 2
Two Model of Network Model of Network Random Network Degree Distribution: bell curve Scale-free Network Degree Distribution: power-law In this world, there are various networks – such as Highway, Social Network, the Internet and so on. We can categorize structure of such network into this two models. One is Random Network Model and Another One is Scale-free Network model. Comparing Degree Distribution of these models, we can see a difference like this.(図を指し示しながら) This figures show degree distribution of each. Degree Distribution of random network follows a bell curve whereas Degree Distribution of Scale-free Network follows power-law. ここから図を指しながら説明していく。 In random network, most nodes have the same number of links, and nodes with a very large number of links don’t exist. In contrast, power-law degree distribution predicts that most nodes have only a few links and that a few nodes have many links. In fact, many of networks in the world are categorized into Scale-free Network. 3
Scale-free and Preferential Attachment Why are there such networks. A keyword for understanding this is Preferential Attachment. Network grows daily. When a new node belongs to a network, the node does not connect to another one selected randomly. In real networks linking is never random. A subtle law of preferential attachment governs evolution of a network. Therefore, in such networks, some nodes attached preferentially become special nodes called hub nodes. As a newly added node attaches hub nodes preferentially, hub nodes have more and more links. Recently, structure of various network are Scale-free. For example, network of airline map, of epidemic and of connections in the Internet. Therefore, there is a probability that the e-mail network in the Internet is also Scale-free network. Scale-free Network is a network with power-law degree distribution. 4
Structure of E-mail Network *k: The number of links. Degree Distribution of an e-mail network. Reference: Holger Ebel, Lutz-Ingo Mielsch, and Stefan Bornholdt, “Scale-free topology of e-mail networks”, Physical Review E 66, 2002 This figure shows degree distribution of k. k is the number of links a node has. This data is observed in Kiel University. We can see that this is Scale-free because the graph decrease linear in double logarithmic plot. Therefore, I propose that we consider propagation of a worm on Scale-free Network. It is not clearly that how a worm propagates on Scale-free Network. A network in which we know topology is only a local network such this example. We don't have data to analyze the structure of network outside local network. Next, I will introduce a reason why we cannot analyze the structure of network outside local network. 5
Spoofed From-field The From-filed of an e-mail message a worm sends is varies and/or is spoofed. It is almost impossible to identify where a worm sends the e-mail and how many worms send observed e-mails. It is only arrival intervals that we can obtain a correct data from received e-mails. Because the From-field of an e-mail message a worm sends often varies and/or is spoofed, it is difficult to identify the PC a mass-mailing worm has infected and a correct data we can obtain from received e-mails is arrival time. Thus, a way by which we estimate a network structure is only arrival time. I thought that I could estimate a network structure from the arrival intervals. Therefore, I used observed data of arrival time of wild worms. 6
Observed Arrivals of E-mail There are log data* of the time on which each e-mail messages with a worm attached arrived at University of Aizu. Generally, we can observe arrival time of e-mail messages with a worm attached like this image. 図を指し示しながら。 When there are some infected PCs, they send e-mail messages with a worm attached to other address independently. E-mail messages we can observed are only the e-mails sent to an our address. Because the from-fields is spoofed, we don’t identify who send a observed e-mail. As I used arrival intervals of these, I use log data at University of Aizu. We can see the data from this URL. * http://web-int/labs/istc/ipc/Security/virus/index.html 7
Simulation Model of Worm Spread Dynamics I assume that the log data were observed such like this image. The PC object shows an e-mail address. Then, if two nodes shares address each other, the nodes are considered to have a link between them. I set the number of nodes in the community is 10000. I assume that the connectivity between the nodes in the community obeys power-law with Barabasi-Albert model. Brabasi-Albert model is one of a Scale-free Model. Each node has many addresses except links connecting to a node in the network. I suppose that the number of addresses excluding the addresses in the community is proportional to the number of links connected to nodes in the community. And a probability with which an infected node infects other neighbors is c. To fit a result of simulation to observed data, I estimate these parameter are nine times of links and 0.01 from results of many experiments. Then, in this simulation, each node does not be infected in 2 hours even though it receives an e-mail with a worm attached. This corresponds to intervals at which a receiver runs a mailer program. 8
Comparison between Simulation and Observed Data This figure shows result of our simulation compared with observed data of a worm, Netsky.P. Netsky.P is one of worms which infected many PCs in 2004. 9
Arrival Intervals of Simulation ii) iii) i) mk:115.619 ii) mk:92.15 iii) mk:61.95 Each graph shows a result on different observer. Time 0 is started time in simulation. Red line shows arrival intervals and Blue points shows time at which neighbors of each observer are infected. Arrival intervals in top left figure decrease slowly than others. This result shows the number of infected neighbors increases slowly and the number of links the neighbors have are large. Arrival intervals in bottom left figure decrease rapidly than others. This result shows the number of infected neighbors increases rapidly and the number of links the neighbors have are small. Arrival intervals in top right figure decrease slowly than bottom left one even though increase of the number of infected neighbors is similar. As mean of the number of links the neighbors have in top left one are larger than bottom left ones, we can see the result. From introducing results, we can estimate the structure of neighbors by comparing some observer. *mk : Mean of Number of links neighbors have. 10
Mathematical Model of Outbreak We assume that network size is enough and there is no infection loop in the network. S is the total number of infected nodes, and the infection started from a node selected randomly. We can obtain expectation of S from this equation. A parameter M is the number of links a randomly selected node has. However each one of links connects to a node which is liable to infect. Then a parameter Me is the number of links which a node which a randomly selected link connects to has. As hub nodes have extremely many links, a probability which a link selected randomly connects to a hub node with is high. So, In long-tailed distribution such as a power-law distribution, Expectation of M_e is greater than or equal expectation of M. The more the tail is long, the more expectation of M_e is large. From this equation, we can see that expectation of S diverges when (2 – Expectation of Me) equals 0. So we cannot prevent the outbreak. Decreasing E[M_e], we need to decrease Variance of M. Then, decreasing Variance of M, we need to protect hub nodes which have many links. I simulated an effect of this method. 11
Hub Defense Strategy (1) Difference of Number of immune hub nodes. These graph show an effect of hub defense strategy in changing number of immune hub nodes. Bottom one is result in early time. We can see that we obtain enough advantage in early time even though the number of immune hub nodes is much enough. *h = Number of immune hub nodes 12
Hub Defense Strategy (2) Comparison Between Hub Defense and Random Defense A purple line shows an effect when I make 3000 nodes selected randomly immune. We can see that we cannot obtain advantage by protecting random selected nodes. We can see that we obtain more advantage by protecting 1000 hub nodes better than by protecting 3000 selected randomly because outbreak is occurred in early time. r = Number of immune nodes selected randomly. h= Number of immune hub nodes. 13
Conclusion Observing arrival intervals, we can estimate damage of a worm and estimate a network structure around observer. We can confirm that hub defense strategy is an effective method in this network even though the number of immune hub nodes are not much enough. Conclusion. Observing arrival intervals, we can estimate damages by a worm and can estimate a network structure around observer. We can confirm that hub defense strategy is an effective method in this netowork even though the number of immune hub nodes are not much enough. 14
Thank you That’s all. Thank you for your listening. 15