Air Force Institute of Technology

Air Force Institute of Technology
Using Logic-Based Reduction for Adversarial Component Recovery* J. Todd McDonald, Eric D. Trias, Yong C. Kim, and Michael R. Grimaila Center for Cyberspace Research Air Force Institute of Technology WPAFB, OH *The views expressed in this article are those of the authors and do not reflect the official policy or position of the United States Air Force, Department of Defense, or the U.S. Government

Outline Protection Context Polymorphic Variation as Protection
Hiding Properties of Interest Framework and Experimental Results

Protection Context Embedded Systems / “Hardware”
Increasingly represented as reprogrammable logic (i.e., software!) We used to like hardware because it offered “hard” solutions for protection (physical anti-tamper, etc.) Our beginning point: what happens if hardware-based protections fail? Hardware protection: I try to keep you from physically getting the netlist/machine code Software protection: I give you a netlist/machine code listing and ask you questions pertaining to some protection property of interest Protection/exploitation both exist in the eye of the beholder

Protection Context Critical military / commercial systems vulnerable to malicious reverse engineering attacks Financial loss National security risk Reverse Engineering and Digital Circuit Abstractions

Polymorphic Variation as Protection
Experimental Approach: Consider practical / real-world / theoretic circuit properties related to security Use a variation process to create polymorphic circuit versions Polymorphic = many forms of circuits with semantically equivalent or semantically recoverable functionality Characterize algorithmic effects: Empirically demonstrate properties Prove as intractable Prove as undecidable

Two Roads Met in the Woods… and I Went Down Both…
Semantic Changing Semantic Preserving Black-Box Refinement Semantic Transformation Polymorphic Generation Polymorphic Generation Obfuscation Program Encryption Random Program Model BBR = fake inputs, fake outputs: original I/O hidden in plain sight ST = P + E (our approach) or homomorphic transform (Gentry, Sander/Tschudin) Polymorphic generation = whitebox variation VBB is model for describing limits on the right side “ i.e.if you don’t change I/O Same I/O Polynomial slowdown VBB :: variant leaks no more info than TT or oracle of original would Barak/goldreich For semantic preserving, proving anything useful about obfuscation Preventing all info leakage impossible We are looking to see when variation might produce hiding properties of interest Property = something I can measure or characterize What can I prove / not prove under RPM? What can I measure? What can I characterize? What are the limits if I am only allowed to retain functionality?

Defining Obfuscation Since we can’t hide all information leakage….
Can we protect intent? Tampering with code in order to get specific results Manipulating input in order to get specific results Correlating input/output with environmental context Can we impede identical exploits on functionally equivalent versions? Can we define and measure any useful definition of hiding short of absolute proof and not based solely on variant size?

Hierarchy of Obfuscating Transforms
Functional Hiding Logical View Control Hiding Component Hiding Signal Hiding Topology Hiding (Gate Replacement) Not all white-box variants are obfuscations We are experimenting to know when a white-box variant manifests an obfuscation property of interest We call this the hierarchy of obfuscating transformations… very similar to Collberg et al. very similar to ISCAS-85: case study on reverse engineering circuits WE have accomplished initial experiments aimed at measuring/ evaluating variants based on: Topology:: change in graph or gate type relative to original Signals:: change in full logic signal (truth table column) relative to the original Components:: jamie parham/ jason williams syntax Hierarchy: we postualte that properties build on each other Meaning, you can’t achieve component hiding without also signal and topology hiding The focus of this paper is on component identification and hiding Metrics may be influenced by decisions or choices in the variation process: Some more random, some more deterministic Some more “redundant”, some more “non-redundant” Function hiding is measure more theoretical models (VBB, RPM) versus metrics associated with components, signals, topology Side Channel Properties Physical Manifestation

Polymorphic Variation as Protection
Algorithm and Variant Characterization: Selection: Random Deterministic Mixture Replacement

Framework and Experimental Results
When does (random/deterministic) iterative selection and replacement: 1) Manifest hiding properties of interest? 2) Cause an adversarial reverse engineering task to become intractable or undecidable? What role does logic reduction and adversarial reversal play in the outcome (ongoing) Are there circuits which will fail despite the best variation we can produce? (yes)

Components Components are building block for virtually all real-world circuits Given: circuit C gate set G input set I integer k > 1, where k is the number of components Set M of components {c1,…, ck} partitions G and I into k disjoint sets of inputs and/or gates. Four base cases Based on input/output boundary of component and the parent circuit propagated through a circuit structure to produce output signals) 3) Component Recovery: reproducing the architectural or component level relationships of the original circuit (this is in essence the well-studied problem of module identification [8, 11]). In [11], for example, module identification consists of two parts: first, enumerating potential sub-circuits, and second, matching known component functions to those sub-circuits. We refer to this entire process of successful enumeration and matching as component identification or component recovery. In [7], Hansen et al. list similar reverse engineering techniques and goals based on deriving the following: (known/standard) library modules or program components, repeated modules, expected global structures, computed functions, control functions, bus structures, and common names. The underlying goal of such analysis assumes that logic gates grouped into higher level modules makes reverse engineering easier. Of interest, we note that Hansen's definition of black box relates to circuit understanding on a large scale: when all other derivation methods fail, we must consider a circuit (or sub-circuit) a black box if the function is unknown or if the circuit consists of truly random logic. We note that size alone of the variant may or may not have any bearing on the ability of any adversary to accomplish goals related to these protective categories. We may observe that, in general, larger intermediate gate size may result in longer analysis times or may consume larger resources (memory and storage), but no definitive link exists between protection and size per se. Our interest concerns whether definable properties of protection emerge at certain sizes or whether they emerge based on the method by which we create a circuit variant using the experimental choices described in Section 2.3.

Component Recovery Since component recovery plays a major role in reverse engineering study and research, we focus on this adversarial goal as it relates to analyzing circuit variants. By narrowing our focus, we setup a test environment that makes failure (to hide components) easy to identify. Even though measuring absolute success requires further work, we report here the results of experimental studies along these partial lines. Given a circuit C, its gate set G, its input set I, and an integer k > 1, where k is the number of components, a set M of components {c1,…, ck} partitions G and I into k disjoint sets of inputs and/or gates. Figure 3 illustrates the four base cases by which a component might partition a circuit in relationship to other components or gates. Consider a circuit whose component set consists of four disjoint gate sets, M = {A,B,C,D}. Component A receives input from the circuit boundary and produces output received by another component; component B receives input from a component and produces output received by another component; component C receives input from another component and produces output at the circuit boundary; and component D receives input from the circuit boundary and produces output at the circuit boundary. In terms of component hiding that occurs as an artifact of white-box structural variation, component D in Figure 3 represents at least one instance of a "Kobayashi maru" or no-win scenario for an obfuscating algorithm. Because semantic preserving algorithms produce variants that must maintain the function of the original circuit, gates within component D have the greatest probability of recovery when compared to scenarios A, B, and C. This is because no function preserving obfuscator can alter the I/O relationships of component D, thereby giving analysis algorithms an advantage to distinguish gates within D from the rest of the circuit. It should also follow intuitively that components that fit the pattern of component B have a greater chance of hiding because they are fully contained within the circuit boundary. A test in the fictional Star Trek movie universe for Star Fleet cadets where the computer is allowed to cheat so that it always wins

Independent Components and Induced Redundancy
ORIGINAL WHITE-BOX VARIANTS How much of this induced redundancy between independent components cannot be removed and how do we characterize the efficiency of the algorithms that would remove them, assuming no truth-table based synthesis could occur at the circuit level? In Section 3.1 we briefly discuss the techniques we use to minimize the circuit seen in Figure 6 and we give results related to size and levels as a measure of success. From a graph-based perspective, we can also apply a simple failure test: if component merging takes place as a result of our variation process, we may use graph analysis to determine whether such merging remains after we apply logic reduction. REDUCED VARIANTS

Observing Independent Component Hiding

Case Study

Conclusions

Questions ?

Hiding Properties of Interest
General Intuition and Hardness of Obfuscation The ONLY true “Virtual Black Box” “The How” Semantic Behavior

Framework and Experimental Results
Is perfect or near topology recovery useful (therefore, is topology hiding useful)? In some cases, yes Foundation for other properties (signal / component hiding) For certain attacks, it is all that is required Accomplishing topology hiding Change basis type (normalizing distributions, removing all original) Guarantee every gate is replaced at least once Multiple / overlapping replacement = diffusion Topology: Gate fan-in Gate fan-out Gate type

Experiment 1: Measuring “Replacement” Basis Change
120 gates ( 4 ANDs + 79 NANDs + 19 NORs + 18 XORs + 40 inverters ) Decomposed 230 gates ( 60 ANDs NANDs + 19 NORs + 40 inverters ) Decomposed NOR 843 gates ( 843 NORs)

Experiment 1a: Measuring “Replacement” Basis Change
 = {NOR}   = {AND, NAND, OR, XOR, NXOR}

Experiment 1b: Measuring “Replacement” Basis Change
 = {NAND}   = {AND, NOR, OR, XOR, NXOR}

Experiment 2: Measuring “Replacement” Uniform Basis Distribution
ISCAS-85 c1355 C1355 506 gates ( 56 ANDs NANDs + 2 ORs + 32 buffers + 40 inverters ) Decomposed 550 gates ( 96 ANDs NANDs + 6 ORs + 32 buffers + 40 inverters ) Decomposed NAND 730 gates ( 730 NANDs )

 = {NAND}   = {AND, NAND, OR, NOR, XOR, NXOR} “Single 4000 Iteration Experiment”

 = {NAND}   = {AND, NAND, OR, NOR, XOR, NXOR} “Multiple 4000 Iteration Experiments”

Experiment 3: Measuring “Replacement” Smart Random Selection
ISCAS-85 c432 Iterative Smart Random 2-Gate Selection Algorithm: Selection Strategy: Replacement Strategy: Smart Two Gate Random Random Equivalent

Experiment 3: Measuring “Replacement” Smart Random Selection
 = {NOR}   = {AND, NAND, OR, XOR, NXOR}

Things We’ve Learned Along the Way
What algorithmic factors influence hiding properties the most? Iteration number Selection size Replacement circuit generation (redundant vs. non-redundant) Ongoing work in: Increasing selection size Determinist generation Integrated logic reduction Formal models: term rewriting systems, abstract interpretation, graph partitioning

Obfuscation Comparison Models

Experiment 1a: Measuring “Replacement”
% of ORIGINAL GATES 675 600 600 600

Experiment 1a: Measuring “Replacement”
 = {NOR}   = {AND, NAND, OR, XOR, NXOR} # of NORs ~7500 # of Iterations ISCAS-85 c1355

Experiment 2: Measuring “Replacement”
 = {NAND}   = {AND, NAND, OR, NOR, XOR, NXOR} “Single 4000 Iteration Experiment”

Experiment 2: Measuring “Replacement”
 = {NAND}   = {AND, NAND, OR, NOR, XOR, NXOR} “Multiple 4000 Iteration Experiments”

Air Force Institute of Technology

Similar presentations

Presentation on theme: "Air Force Institute of Technology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Air Force Institute of Technology

Similar presentations

Presentation on theme: "Air Force Institute of Technology"— Presentation transcript:

Similar presentations

About project

Feedback