SSD951: Secure Software Development Language-based Security Dr. Shahriar Bijani Shahed University Fall 2016
Slide References David Aspinall, Secure Programming Lecture 15: Information Leakage, Edinburgh University, March 2016. Sabelfeld and Myers, Language-Based Information- Flow Security, Cornell University, 2003. Drayton Benner, A Lattice Model of Secure Information Flow by Dorothy Denning, 2000. Dorothy Denning & Peter Denning, Certification of Programs for Secure Information Flow, Communications of the ACM (CACM) 1977 Anupam Datta, Language-based Security: Information Flow Control, Carnegie Mellon University, 2009.
Introduction
End-to-end Security End-to-end security (confidentiality, integrity) is a general need End-to-end security requirement: protection at all levels So we need application level protection
Traditional Approach to Security Typical security solutions are not enough. Encryption Pro: secures a communication channel Con: but not the endpoints, where data enters or leaves (Application) Firewalls Pro: stop some bad things entering programs Con: massive leakage via application ports 21, 23, 80 Access control (ACLs) in the OS Pro: isolates users, files, processes Con: what if one part of a process should be protected from parts of the same process?
Traditional Approach to Security Anti-virus Pro: Good with known malware, recognize by signature Con: Little use on zero-day exploits Code signing Pro: Digital signatures identify code producer/packager Con: Don’t actually guarantee code is secure Sandboxing and OS-based monitoring Pro: Can block low-level accesses Cons: - Can not block information transfer within applications - Pure sandboxes too strict (may prevent information sharing) The above mechanisms check release of data but not data propagation
Language-based Attempts Java Bytecode verifier Sandbox mode Stack inspection Not intended to control information flow (so insufficient)
Language-based security (LBS) Idea: prevent application-level attacks inside the application. Advantages: Semantics-based security specification: exact and precise definition of what is required, based on definitions and data used inside program. Static enforcement sometimes possible if we can examine the code (white box technique), And use programmer annotations and/or special type systems, Or force run-time monitoring if needed.
Information Flow Analysis Information Flow (IF) analysis: How the information flow inside programs: is there a secret “going out” of the system? Information Flow policies The focus can be on confidentiality or integrity Information Flow controls Mechanisms that implement the above policies Active research field (studied for ~ 40 years) IF-based Compilers JIF (Java) 2001-2009 (Cornell University) FlowCaml (ML) 2002 (INRIA) Limited impact on practice! Another Approach: Information Flow Analysis
Language-based security (LBS) Language-based security (LBS) approaches: Taint tracking (dynamic) Type checking (static)
Dynamic taint tracking Idea: add security labels to data inputs (sources) and data outputs (sinks). Propagate labels during computation (cf dynamic typing). Labels are: Tainted Data from taint sources (e.g., user input) Data arising from or influenced by tainted data Untainted Data that is safe to output or use in sensitive ways
Dynamic taint tracking Disadvantages: “Preventing code injection exploits using dynamic taint tracking is like letting a thief in your house and checking his bag for stolen goods at the very moment he tries to leave. It might work, but only if you never lose track of the gangster and if you really know your house. However, I would prefer a solution that does not let thieves in my house in the first place.” Martin Johnsused, dynamic taint tracking, 2007 implicit flows
Language-based Security (LBS) Many security models are based on abstract formalisms Typically, state machines [Bell-LaPadula73, Goguen- Meseguer82,84,Rushby81] Challenge: accurately relating formal security specification to concrete implementations Denning & Denning proceed from a new (at the time 1977) starting point: language-based security Define security certification of programs at the language level Compile-time, completely automated process Goal: If program p is certified by the compiler, then it is secure Compile-time, completely automated process based on well-known “attribute grammar” compiler concept
Definitions Confinement: the ability to prevent capabilities (and authority) from being transmitted improperly Noninterference: no data visible publicly is affected by confidential data “High” security versus “low” security: the idea that some code and data is associated with being inaccessible and other code and data is public (these are not technical terms)
Terminology: Covert Channels Channel: a mechanism for signaling information through a computing system Covert channel: a channel whose primary purpose is not information transfer
Types of Covert Channels Implicit flows: signal information through the control structure of a program Termination channel: signal information through the termination or nontermination of computation Timing channel: signal information through the time at which an action occurs rather than through the data associated with the action
Types of Covert Channels Probabilistic channel: signal information by changing the probability distribution of observable data Resource exhaustion channel: signal information by the possible exhaustion of a finite, shared resource Power channel: embed information in the power consumed by the computer
Security properties What kinds of properties do we want to ensure programs or computing systems satisfy?
Safety properties “Nothing bad ever happens” or “Something bad must not happen” E.g.: system should not crash A property that can be enforced using only history of program Amenable to purely run-time enforcement Examples: access control (e.g. checking file permissions on file open) memory safety (process does not read/write outside its own memory space) type safety (data accessed in accordance with type)
Liveness properties “Something good eventually happens” or “Something good must happen” Example: availability “The email server will always respond to mail requests in less than one second” “Every packet sent must be received at its destination” Violated by denial of service attacks Can’t enforce purely at run time Tactic: restrict to a safety property “web server will respond to page requests in less than 10 sec or report that it is overloaded.”
“Information Flows” Attribute “xy” means that information flows from x to y this is the attribute calculated during certification Explicit flow: e.g., “y := x” implies “xy” Implicit flow: “y := 1; if x=0 then y:=0” Assuming x is 0 or 1, then x=y after completion xy Generally, control structures in language cause such indirect/implicit flows Transitive: xy and yz implies xz Defn. Program statement specifies a flow if its execution could result in flow N.b., this is weaker than “does result in flow”
Security Requirements Program p is secure iff flow xy results from executing p only when xy Security Definition (1st shot): flow xy results from executing p only when xy Undecidable: is there a flow from x to y in “if f(x) halts then y:=0”? Security Definition: flow xy is specified by p only when xy note that “is specified by” is weaker than “results from executing” Living with imprecision: “if x=0 then if x0 then y:=z” is disallowed if zy
Information Flow Policy as Lattice least upper bound xy x y security level of storage object “x” “xy” means that information flow is permitted by policy from object x to object y greatest lower bound xy
Certification Mechanism abc ??? Stmt Var c := Exp + * a 2 Stmt Var c := Exp + * a 2 c ab ab c b b aL=a b a L Calculate flows “upwards”
The Model FM = < N, P, SC, , > N = { a, b, … }: a set of logical storage objects or information containers: files, segments, program variables, and also users. P = processes. “Processes are the active agents responsible for all information flow.”
The Model (cont.) FM = < N, P, SC, , > SC = { A, B, …} is a set of security classes. The security classes are disjoint classes of information. Every object belongs to a security class. An example would be { public knowledge, confidential, secret, top secret, only available to teenage hackers }.
The Model (cont.) FM = < N, P, SC, , > Binding of objects to security classes can be static or dynamic. With static binding, the security class of an object never changes. With dynamic binding, the object’s security class can change based on the contents of the object. A process can also be bound to a security class.
The Model (cont.) FM = < N, P, SC, , > is a class combining (binary) operator that is associative and commutative. Let A and B be security classes. A B refers to the security class of the result of any binary function on values a and b (a = A, b = B). is function independent.
The Model (cont.) FM = < N, P, SC, , > is a flow relation. A B if and only if information in class A is allowed to flow to class B. Information can be passed by copying, assignment, I/O, parameter passing, message sending, etc. Concerned with information flow on “legitimate” and “storage” channels, not “covert” channels.
The Model (cont.) FM = < N, P, SC, , > The purpose of coming up with a flow model FM is for us now to be able to say that “FM is secure if and only if execution of a sequence of operations cannot give rise to a flow that violates the relation ‘’.”
Universally bounded lattice What is a universally bounded lattice? “a structure consisting of a finite partially ordered set together with least upper and greatest lower bound operators on the set.” You know what is a partially ordered set is! (Lecture 02). a set with a relation R that is reflexive, transitive, and antisymmetric
Universally Bounded Lattice (cont.) So, what are least upper and greatest lower bounds? Suppose <= is the relation. C is an upper bound of A and B if A <= C and B <= C. C is a least upper bound of A and B if for any upper bound D of A and B, C <= D. Lower bounds and greatest lower bounds work the same way.
Derivation of Lattice Structure 1) we show that < SC, > is a poset. Reflexive: A A (for consistency sake) Transitive: if A B and B C, then A C (for consistency sake) Antisymmetric: if A B and B A, then A = B (otherwise, you have a superfluous security class, so this assumption can be made without loss of generality)
Derivation of Lattice Structure (cont.) 2) we assume SC is finite because we are hopefully dealing with the real world. 3) we can assume that there exists a lower bound L on SC without loss of generality. If needed, we can insert L with no objects. Or, perhaps we could fill it with constants. 4) we show that is a least upper bound operator.
Derivation of Lattice Structure (cont.) A B is an upper bound of A and B because from the definition information must be able to flow from A or B into A B. A B is a least upper bound because an upper bound C of A and B can get information from A and B in the same way as A B, so preventing information from flowing from A B to C does not make sense.
Derivation of Lattice Structure (cont.) Similar to the operator, we can define the operator such that A B is the greatest lower bound of A and B. The greatest lower bound of SC we call L, and the least upper bound of SC we call H. Thus, we have established that SC, “”, and “” form a universally bounded lattice with greatest lower bound L and least upper bound H.
Enforcement of Security The goal of deriving this information flow model is for it to help us enforce security. To do this, we must monitor all flow causing operations. We must monitor explicit flow (assignment, I/O) and implicit flow. An example of implicit flow: if a = 0 then b := c can cause information to flow from a to b whether or not the line b := c is executed.
Enforcement of Security (cont.) We want to represent a program or statement S in a way that easily allows us to evaluate whether or not it is secure. Define S recursively: S is an elementary statement (assignment, I/O) S = S1; S2 S = c: S1, …, Sm
Enforcement of Security (cont.) For elementary statements, S is secure if any explicit flow caused by S is secure. For S = S1; S2 , S is secure if both S1 and S2 are secure. For S = c: S1, …, Sm , S is secure if each Sk is secure and all implicit flows from c are secure.
Mechanisms for Static Binding Mechanisms for static binding can occur at run- time or at compile-time. Access Control Mechanisms operate at run-time. The Data Mark Machine also operates at run-time. The Certification Mechanism operates at compile- time.
Dynamic Security Enforcement
Static Certification
A Security Type System
A Security Type System: Compositional Rule
A Security Type System: Examples
Semantic-based Security
Non-Interference Non-interference Programs have secret and public inputs and outputs, respectively Leak of Information!
Semantic-based Security
Semantic-based Security