Code Red Worm Propagation Modeling and Analysis Cliff Changchun Zou, Weibo Gong, Don Towsley
Outline Introduction Background on code red worm Models Simulation Numerical Analysis Conclusion
Introduction Easy access and wide usage of the Internet – target for malicious activities such as, “Worms” “Worm” – defined as autonomous programs that spread through computer networks by searching, attacking, and infecting remote computers automatically
Introduction (cont.) the Internet has become a powerful mechanism for propagating malicious software programs The Code Red worm and Nimda worm incidents of 2001 – shown us how vulnerable our networks are – and how fast a virulent worm can spread
Introduction (cont.) In order to defend against future worms, we need to understand various properties of worms – propagation of worms – impact of patching, human countermeasures... – impact of network traffic, network topology, etc
Introduction (cont.) Before 2001, few models exist for Internet worm propagation modeling – Homogeneous an infected host is equally likely to infect any of other susceptible hosts – Non-homogeneous random graph two-dimensional lattice tree-like hierarchical graph today it’s no longer valid for worm modeling
Introduction (cont.) After the Code Red worm incident of July 2001 – stimulated activities to model and analyze Internet worm propagation Staniford et al Moore Weaver Previous work on worm modeling neglects the dynamic effect of human countermeasures on worm behavior
Introduction (cont.) Human countermeasures: – Using anti-virus softwares or special programs – Patching or upgrading susceptible computers – Setting up filters on firewalls or routers to filter or block the virus or worm traffic – Disconnecting networks or computers
Introduction (cont.) In this paper, through analysis of the Code Red incident of July 19th 2001 – two factors affecting Code Red propagation: (1) the dynamic countermeasures taken by ISPs and users (2) the slowed down worm infection rate because the rampant propagation of Code Red caused congestion and troubles to some routers
Background on code red worm On June 18th 2001 a serious Windows IIS vulnerability was discovered the first version of Code Red worm emerged on July 13th, 2001 – it did not propagate well The Code Red version 2 began to spread around 10:00 UTC of July 19th
Background on code red worm (cont.) It generated 100 threads: – 100 th thread: deface itself – Other 99 threads: randomly chose one IP address set up connection on port 80 with the target machine If the victim was not a web server or the connection fail – randomly generate another IP address to probe – The timeout of the Code Red connection request: 21s
Background on code red worm (cont.) Three independent observed data sets are available on the Code Red incident of July 19 th – Goldsmith and Eichman
Background on code red worm (cont.) – Moore et al:
Background on code red worm (cont.) We are interested in the following issues: – How can we explain these Code Red worm propagation curves shown in Fig. 1, 2, and Fig. 3? – What factors affect the spreading behavior of an Internet worm? – Can we derive a more accurate model for an Internet worm?
Model Code Red Worm ProPagation Similar to biological viruses 1.Stochastic for small-scale system with simple virus dynamics 2.Deterministic for large-scale system under the assumption of mass action
Model Code Red Worm ProPagation Some definition in epidemiology modeling – Susceptible hosts – Infectious hosts – Removed hosts
Classical simple epidemic model each host stays in one of two states: Susceptible or Infectious SusceptibleInfectious
Classical simple epidemic model This model for a finite population is J(t) is the number of infected hosts at time t N is the size of population β is the infection rate
Classical simple epidemic model
Using value k=1.8
Classical general epidemic model: Kermack-Mckendrick model Considers the removal process of infectious hosts SusceptibleInfectiousremoved I(t) denotes the number of infectious hosts at time t. R(t) denotes the number of removed hosts from previously infectious hosts at time t
Classical general epidemic model: Kermack-Mckendrick model Kermack-Mckendrick model is β is the infection rate; γ is the rate of removal of infectious hosts; S(t) is the number of susceptible hosts at time t N is the size of population.
Classical general epidemic model: Kermack-Mckendrick model Define ρ ≡ γ/β to be the relative removal rate
Classical general epidemic model: Kermack-Mckendrick model
A New Internet Worm Model: Two Factor Worm Model Human countermeasures result in removing both susceptible and infectious computers from circulation. The large-scale worm propagation have caused congestion and troubles to some Internet routers thus slowed down the Code Red scanning process.
A New Internet Worm Model: Two Factor Worm Model SusceptibleInfectious removed
A New Internet Worm Model: Two Factor Worm Model In order to account for the slowed down worm scan rate, the infection rate β must be modeled as a function of time β(t). the removal process consists of two parts: removal of infectious hosts R(t) and removal of susceptible hosts Q(t).
A New Internet Worm Model: Two Factor Worm Model Classical simple epidemic model Two-factor worm model
A New Internet Worm Model: Two Factor Worm Model Note that S(t) + I(t) + R(t) + Q(t) = N holds for any time t. Substituting S(t) = N − I(t) − R(t) − Q(t)
Simulation
Description N hosts that can reach each other directly 3 states : susceptible, infectious, or removed. Susceptible -> infectious -> removed Susceptible -> removed At beginning several hosts are initially infectious, others are susceptible An infectious host sends out a sequence of infection attempts during its lifetime.
Description Capture the cleaning, patching and filtering impacts on the worm propagation each discrete time t randomly choose some non- immunized hosts to immunize regardless of whether they are infectious or still susceptible C(t) denote the total number of removed hosts J(t) includes both infectious hosts and those previously infected hosts that have been immunized before t C(t) = a*J(t) 0 <= a < 1
Description Capture the slowed down worm infection process vary the infection delay time D(t) n is used to adjust the sensitivity of the D(t) p(t) = J(t) / N X(t) ~ N(k 1 p(t) n, k 2 p(t) n ) D(t) = D(0) + max(floor(X(t)), 0)
Experiment (para.) The classical simple epidemic model D(t) = D(0), a = 0 consider the e ff ects of patching and filtering but with constant infection rate D(t) = D(0), a = 0.5 Consider only the decreased infection rate a = 0.5 Other N = , D(0) = 10, k 1 = 150, k 2 = 70, n = 2 I(0) = 10 (infected hosts at beginning)
Experiment (result.)
Experiment (conclude.) Match the observed data better than the original Code Red worm simulation the propagation speed decreases when the total number of infected hosts reaches only about 50% of the population by adjusting the parameters in our simulation, we can adjust the curve to match real data and then understand more of the characteristics of the worms we investigate. The worm propagation is almost a deterministic process
Numerical Analysis (para.) dynamic parameters β(t) : Infection rate R(t) : # removed hosts from the infectious population Q(t) : # removed hosts from the susceptible population Kermack-Mckendrick model β(t) = constant No removal process from susceptible population dR(t)/dt = γI(t)
Numerical Analysis For the general two-factor worm model, we analyze the model based on the numerical solutions of the di ff erential equation by using Matlab Simulink Determine the dynamical equations For R(t), use the same assumption as what Kermack-McKendrick model uses dR(t)/dt = γI(t)
Numerical Analysis Determine the dynamical equations For Q(t), the removal process of the susceptible hosts looks similar to a typical epidemic propagation dQ(t)/dt = µS(t)J(t) For β(t) β(t) = β 0 [1-I(t)/N] η β 0 is the initial infection rate η is used to adjust the sensitivity to the # of I(t) η = 0 means constant infection rate
Numerical Analysis (model.) dS(t)/dt = −β(t)S(t)I(t) − dQ(t)/dt dR(t)/dt = γI(t) dQ(t)/dt = µS(t)J(t) β(t) = β 0 [1−I(t)/N] η N = S(t) + I(t) + R(t) + Q(t) I(0) = I 0 << N S(0) = N −I 0 R(0) = Q(0) = 0
Numerical Analysis N = I 0 = 1 η = 3 γ = 0.05 µ = 0.06/N β 0 = 0.8/N
Numerical Analysis (discuss.) I(t) reaches its maximum value at t = 29, then decreases dI(t)/dt = d(N-S(t)-R(t)-Q(t))/dt = β(t)S(t)I(t)+dQ(t)/dt − dR(t)/dt – dQ(t)/dt = [β(t)S(t)−γ]I(t) max I(t) will be reached at time t c when S(t c ) = γ/β(t c ) β(t)S(t) − γ t c, thus I(t) decreases after t > t c The behavior of the number of infectious hosts I(t) can explain why the Code Red scan attempts dropped down during the last several hours of July 19th
Numerical Analysis (comp.)
Conclusion present a more accurate Internet worm model and use it to model Code Red worm propagation two major factors that a ff ect an Internet worm propagation: the e ff ect of human countermeasures the slowing down of worm infection rate two-factor worm model
Conclusion Internet worm models have their limitations only suitable for modeling a continuously spreading worm, or the continuously spreading period of a worm can’t predict those arbitrary stopping or restarting events of a worm For the prediction and damage assessment of future viruses and worms, we need to do more research to find an analytical way to determine these parameters(γ, µ, β 0, n and η) beforehand.
Q&A