Formalizing Sensitivity in Program Models for Intrusion Detection Henry Hanping Feng Yong Huang University of Massachusetts Jonathon T. Giffin Somesh Jha Barton P. Miller University of Wisconsin Wenke Lee Georgia Institute of Technology
11 May 2004 Henry Feng & Jonathon Giffin 2 Important Ideas Formalizing program models facilitates understanding & comparison. Exposing additional program state improves monitoring speed & model accuracy. –VPStatic model: reads program’s call stack –Dyck model: instruments binary code
11 May 2004 Henry Feng & Jonathon Giffin 3 Model-Based Intrusion Detection Build model of correct program behavior Model: automaton specifying all valid system call sequences Runtime monitor ensures execution does not violate model Operating System User Process
11 May 2004 Henry Feng & Jonathon Giffin 4 Model-Based Intrusion Detection Model must be fast to operate Model must accurately represent program –Context-sensitive models restrict impossible paths Operating System User Process
11 May 2004 Henry Feng & Jonathon Giffin 5 Code Example char *filename; pid_t[2] pid; int prepare (int index) { char buf[20]; pid[index] = getpid(); strcpy(buf, filename); return open(buf, O_RDWR); } getpid open
11 May 2004 Henry Feng & Jonathon Giffin 6 Code Example void action (void) { uid_t uid = getuid(); int handle; if (uid != 0) { handle = prepare(1); read(handle, …); } else { handle = prepare(0); write(handle, …); } close(handle); } getuid writeread close prepare
11 May 2004 Henry Feng & Jonathon Giffin 7 NFA Model getuid writeread close prepare getpid open Function action Function prepare
11 May 2004 Henry Feng & Jonathon Giffin 8 NFA Model getuid writeread close getpid open Function action Function prepare
11 May 2004 Henry Feng & Jonathon Giffin 9 Impossible Path Exploit void action (void) { uid_t uid = getuid(); int handle; if (uid != 0) { handle = prepare(1); read(handle, …); } else { handle = prepare(0); write(handle, …); } close(handle); } getuid writeread close prepare
11 May 2004 Henry Feng & Jonathon Giffin 10 pop Y push Y pop X PDA Model getuid writeread close getpid open push X Function action Function prepare
11 May 2004 Henry Feng & Jonathon Giffin 11 PDA Problems Impossible paths still exist –Non-determinism indicates missing execution information PDA run-time state explosion –ε-edge identifiers maintained on a stack –Stack non-determinism is expensive –post* algorithm: cubic in automaton size X push Y getuid getpid push X
11 May 2004 Henry Feng & Jonathon Giffin 12 Determinize PDA Expand the input alphabet by exposing the stack operations and the target state of the transition –f a,p,z indicates consume input a, push z on the stack, and transition to state p. –g a,p,z for pop operations. –e a,p for operations with no stack activity. Result in a Deterministic PDA (or DPDA). Exposing only stack operations we get PDA with deterministic stack operations (or sDPDA).
11 May 2004 Henry Feng & Jonathon Giffin 13 NFA State non-determinism is cheap. State non-determinism unlink
11 May 2004 Henry Feng & Jonathon Giffin 14 Non-Deterministic PDA Stack non-determinism is expensive. State non-determinism unlink push Ypush X Stack non-determinism
11 May 2004 Henry Feng & Jonathon Giffin 15 Deterministic PDA (DPDA) State non-determinism unlink push Ypush X Stack non-determinism
11 May 2004 Henry Feng & Jonathon Giffin 16 Stack-Deterministic PDA (sDPDA) State non-determinism unlink push Ypush X Stack non-determinism
11 May 2004 Henry Feng & Jonathon Giffin 17 Input Symbol Processing Complexity n is state count m is transition count k is PDA input alphabet size r is PDA stack alphabet size Model Time Complexity Space Complexity Input Alphabet Size PDAO(nm 2 ) k DPDAO(1) (knr) sDPDAO(n) (kr)
11 May 2004 Henry Feng & Jonathon Giffin 18 VPStatic A variant of VtPath model. –DPDA. –Generated by static analysis of binary. Use Addr(state) to expose states.
11 May 2004 Henry Feng & Jonathon Giffin 19 Determinizing via Observation Extract return addresses from call stack into virtual stack list for each system call. Generate a bunch of input symbols for each system call.
11 May 2004 Henry Feng & Jonathon Giffin 20 Addr(C0) Addr(C1) e(open,Addr(Sopen)) Determinizing via Observation char *filename; pid_t[2] pid; int prepare (int index) { char buf[20]; pid[index] = getpid(); strcpy(buf, filename); return open(buf, O_RDWR); } … Addr(C1) … e(none,Exit(prepare)) g(none,Addr(C1),Addr(C1)) e(none,Addr(C0)) f(none,Entry(prepare),Addr(C0)) e(open,Addr(Sopen)) getpid open
11 May 2004 Henry Feng & Jonathon Giffin 21 pop Y ]Y]Y [Y[Y ]X]X [X[X push Y pop X Dyck Model getuid writeread close getpid open push X
11 May 2004 Henry Feng & Jonathon Giffin 22 Dyck Model getuid [ X getpid open ] X read close getuid [ Y getpid open ] Y write close Matching brackets are alphabet symbols –Expose stack operations to runtime monitor –Language of bracket symbols is a Dyck language –Rewrite binary to generate bracket symbols
11 May 2004 Henry Feng & Jonathon Giffin 23 Determinizing via Binary Rewriting Insert code to generate bracket symbols around function call sites Notify monitor of stack activity Determinizes stack operations void action (void) { uid_t uid = getuid(); int handle; if (uid != 0) { precall(X); handle = prepare(1); postcall(X); read(handle, …); } else { precall(Y); handle = prepare(0); postcall(Y); write(handle, …); } close(handle); }
11 May 2004 Henry Feng & Jonathon Giffin 24 push Ypush X [Y[Y [X[X Dyck Model Dyck model stack-determinizes PDA Stack deterministism push Ypush X Stack non-determinism Only one valid stack configuration
11 May 2004 Henry Feng & Jonathon Giffin 25 Test Programs Program Number of Instructions htzipd 110,096 gzip 57,271 cat 52,601
11 May 2004 Henry Feng & Jonathon Giffin 26
11 May 2004 Henry Feng & Jonathon Giffin 27
11 May 2004 Henry Feng & Jonathon Giffin 28 Questions? Henry Hanping Feng, Yong Huang University of Massachusetts—Amherst Jonathon T. Giffin, Somesh Jha, Barton P. Miller University of Wisconsin—Madison Wenke Lee Georgia Institute of Technology