SRS Architecture Study Partha Pal Franklin Webber.

SRS Architecture Study Partha Pal Franklin Webber

2Outline Study goals SRS Technologies Top down Bottom up Strawman Issues, challenges Level of service w/o attack undefended Survivable (OASIS Dem/Val) Regenerative time Level of service Start of focused attack Self-Regenerative Survivable System: Self: Organic decision making Regenerative: Better than graceful degradation/simple recovery– reversing the trend

3 Balances pros and cons of both approaches If the high watermark is implemented then it provides a concrete context, but “grand fathering” may impact choice and Integration of new capabilities This is study in the abstract..leading to an abstract architecture that will need a concrete context to realize.. 3 rd generation assumptions are still valid- Absolute prevention, and accurate and on time detection are impossible to achieve Study Plan Understand how to incorporate the new technologies in a distributed information system that not only tolerates the effects of cyber-attacks, but also attempts to stop and reverse the loss of resources and capabilities Start with the new (SRS) capabilities, build a partial architectural framework, and then see what other capabilities, mechanisms and services are needed to complete the architecture– offers a high level of resistance to attacks (protection), improves visibility of attacker activity/attack effects (detection), and is able to adapt to changes caused by the attacker (react) Start with a high watermark survivability architecture, identify where SRS capabilities could benefit, re-organize the architecture to integrate the selected capabilities, mechanisms and services Combine & contrast the abstract architecture with the more concrete case to create a Strawman Self-regenerative Survivable System Architecture Bottom up.. Top down..

4 Summary of SRS Technology Study Sent a questionnaire to each original SRS project (i.e., all except Asbestos) General outline: –Claims –Key Capabilities –Benefits and Other Distinguishing Factors –Assumptions –Use Cases and Interface Issues Customized for issues we thought especially important or were confused about All responded, some very quickly some needed gentle prodding – thank you! Varying degrees of maturity –Some projects started with existing technology At least half of the projects offer multiple technologies that could be used independently Less overlap than we expected: many technologies seem complementary Unsurprisingly, not a lot of support for integration Process General Observations

5 Biologically-Inspired Diversity Projects Genesis –A toolkit offering a variety of transformations –Based on Strata and is portable DAWSON –A toolkit offering a variety of transformations –Based on Windows DLLs Comparisons –Some overlap in randomization techniques –Genesis also offers highly-attack-resistant runtime transformations that incur Strata’s overhead –DAWSON also offers Windows-specific transformations –May be combined but value and difficulty are unclear

6 Cognitive Immunity and Self-Healing Projects Learning and Repair –Daikon: learns program constraints from a set of traces –Kvasir: monitors program to create traces for Daikon –Archie: checks program constraints at runtime –Repair Tool: repairs damage to conform to constraints –Tools existed before SRS but are being improved RMPL (Concurrent Model-Based Execution) –A language expressing temporal properties without fully specifying an order of execution, and probabilistic assumptions and choices –An executive that plans, dispatches methods and replans when necessary

7 AWDRAT –Language to specify behavior (Architectural Model) –Language to describe Method Selection Metadata –Tools to instrument Java to monitor and control behavior –An executive that Detects anomalies by Architectural Differencing Combines other observations to update a Trust Model Selects methods to maximize utility and/or minimize costs Cortex –A “taste-tester” framework for redundant components –Scyllarus: situation assessment –CIRCA: generates controllers from models Cognitive Immunity and Self-Healing, cont’d

8 Comparisons –Learning and Repair tools are complementary to others –Cortex learning by taste testing is also complementary –AWDRAT and RMPL address some of the same issues but: AWDRAT is middleware to defend existing application; RMPL is a language and environment for building new applications Geared to different application domains: –RMPL– embedded/autonomous vehicle systems –AWDRAT- information processing systems –AWDRAT’s Trust Modeling is complementary to others Cognitive Immunity and Self-Healing, cont’d

9 Granular Scalable Redundancy Projects Steward –Scalable support for Byzantine fault-tolerant state-machine replication BFT-like protocol for LANs Paxos-like protocol for WANs Library for threshold crypto CMU –Byzantine fault-tolerant data storage using scalable asynchronous protocols read/write (R/W) query/update (Q/U) QuickSilver –Tempest (time-critical; probabilistic; SlingShot protocol) –QuickSilver (scale to many groups; virtual synchronous protocol) –Cayuga (efficient automata for searching publication histories) –ChunkySpread (dynamic IP multicast)

10 Comparisons among protocols –Significantly different attack (fault) models –Significantly different assumptions about applications –CMU’s Q/U protocol makes the weakest assumptions about the attacker but has more restrictive application than Steward, SlingShot or QuickSilver Granular Scalable Redundancy, cont’d

11 Reasoning about Insider Threat Projects PMOP –Framework for monitoring operator behavior, recognizing and blocking bad actions HDSM (High-Dimensional Search and Modeling) –Insider Modeler and Analyzer, currently used offline –Search engine for high-dimensional space of sensor data –Response Engine Asbestos –New x86 OS with efficient support for trustworthy isolation in hosts and processes running untrusted code

12 Comparisons –All are complementary to each other –PMOP seems to be AWDRAT’s Architectural Differencing applied to operators rather than components –HDSM’s search engine is complementary to other SRS technologies but the Response Engine overlaps in scope with AWDRAT executives Reasoning about Insider Threat, cont’d

13 SRS Technologies Top Down Approach What can we learn about the architecture of SRS systems by trying to transform a high watermark survivable system into an SRS system? DPASA Architecture applied to the JBI Exemplar used in OASIS Dem/Val And, its limitations and shortcomings, as identified by: Developers’ experiences Testing and validation Out of lab deployment Multiple red team exercises Understanding of their Capabilities Assumptions Limitations Maturity Our study found that the there is sizable intersection that pushes the high watermark more towards an SRS system! Much better than finding that technologies do not address the identified problems; or even if they do, “self” and “regenerative” aspects had no gain These changes are incremental improvements over current DPASA architecture. Changing the architecture substantially, (e.g., implementing JBI CAPI using QuickSilver) without appropriate forethought is not likely to lead to a more survivable system because the system will lose the well tested interaction of existing protection, detection and adaptive response mechanisms

14 Limitations and Shortcomings of the DPASA Architecture Recovery supported only for some key components Availability seems to be the most attractive target for the adversary Interpretation of observation, deduction and decision making require expertise More options for adaptive response Lack of support for improving the system on the fly The last three are more tightly inter-related among themselves and more SRS oriented, but SRS technologies may help in all but the last one

15 Improving Recovery State: –Partially implemented: some clients and some PSQ (those committed to MySQL) Connection: –Reasonably handled Group view: –PSQ: View among servers: handled well View of servers from clients: takes a long time –SM: Dependant on Spread: could be broken in a bad way Improvement possibilities –Need “safe” state transfer or carry over Can SRS technologies help? –Replace Spread transmitter? –Implement the (in memory) data structures maintained by PSQ servers as Q/U objects using CMU protocol? –Clients and DC: use Asbestos for protecting check-pointed state? SM and PSQ are redundant, maintain some replicated state SRS technologies provide supporting infrastructure Self: who makes the decision to recover (or not to recover), and when? Regenerative: Recovering to “operational” without any other “changes” is still in the realm of “delaying the eventual degradation” Full recoveryRestart with state loss

16 Some Details q1sm Sig Vrfy Voting q2sm Sig Vrfy Voting q3sm Sig Vrfy Voting q4sm Sig Vrfy Voting SXMTR SPREAD GCS q1sm wants to multicast message M: q1sm signs M and hands it to its XMTR, which returns success only of all XMTRs in the group acknowledges receiving M q1sm q2sm q3sm q4sm q1psq q1dc q2psqq3psqq4psq q2dcq3dc q4dc Combination of managed switches and ADF policies define who can talk to whom and over which port and protocol SsXMTR Steward or QuickSilver It is not clear whether the unavailability observed is purely an implementation problem, but switching over to Steward or QuickSilver transport may still be advantageous: Maintaining the state machine replication abstraction is advantageous for state recovery Simpler XMTR Can handle more quads The way client’s PSQ messages are handled by our PSQ servers are similar to using CMU’s Q/U protocols– imagine the subscription info as a Q/U object, replicated at each PSQ server, part of which is maintained in memory-one difference is that instead of the client, one PSQ server acts as its proxy. q1psq Sig Vrfy Voting q2psq Sig Vrfy Voting q3psq Sig Vrfy Voting q4psq Sig Vrfy Voting socket Q/U Objects Q/U client Client’s PSQ Req Q/U protocol and Object Synching Using the Q/U object abstraction and associated protocol will help state recovery of a restarted PSQ server—different clients may have interacted with different quads while the recovering quad was down.

17 Making Availability Compromises More Difficult Unavailability triggered by corruption: Non-redundant and homogeneous perimeter (PIX FW Routers) Corrupt references Attacks on Java: serialization bombs, garbage collection/lease SQL Injection Need privileged access on inside host(s) From outside WNIDS CombOPS Wing Ops LAN ANIDS MAF AMC CONUS LAN q4sm q4ps q4cor q4psq q4dc q4NIDS q4ap q4sm q4ps q4cor q4psq q4dc q4NIDS q4ap PIX x x x Attacker network Not brute force DOS! PIX HUB MAF ANIDS AMC CONUS LAN Redundant and fail over Monitor all legs other Diversity (costly) Taste tester? Dynamic diversity using Genesis? May not be memory exploit? PIX T tester HUB MAF ANIDS AMC CONUS LAN spl hw

18 Availability cont’d Corrupt references: Attacks on Java: serialization bombs, garbage collection/lease mechanisms q2smq3smq4sm Q1’s IP=127.0.0.1 Q2’s IP=127.0.0.1 Q3’s IP=127.0.0.1 Q4’s IP=127.0.0.1 Registering client gets 127.0.0.1 for all quads Flaw: Q1SM’s unsolicited statement about other quads IP address is believed by everybody Send a serialized packet with a huge number in the size field JVM tries to allocate the memory in the size field and throws OOME Variations: Send a number of such packets to consume available memory Packets may or may not be well formed, but to keep the memory allocated “serialization” must succeed Tell a RMI server that a client reference has been released Need to guess the client side refs Create variants of JVM or other libraries using Genesis or DAWSON tools Enforce a size rule? Use AOP to implement a check before allocation? Use Daikon toolset to learn the max size of serialized packets, enforce it as an invariant and fix when violated? Code/Implementation problem

19 Availability cont’d SQL Injection Client PSQ Proxy PSQ Server RDB DC Roles and other checks Encrypted message containing data D Session check Encrypted message containing data D Query created from D by PSQ DPASA has the notion of escrow MySQL in all quads, but on different OS Improving Detection and Response Add taste tester Two possibilities at PSQ level or at the RDB level Improving Prevention (& detection) X Strictly control what is executed on the RDB Vet D Create a white list Use diverse DBs (hoping some will behave differently) Can SRS diversity techniques help Genesis tainting? Client PSQ Proxy PSQ Server RDB DC Encrypted message containing data D Session check Encrypted message containing data D Query created from D by PSQ T taster PSQ T taster RDB X X Cost… Applicability, Extendibility …

20 More Organic Decision Making At which granularity the cost overruns benefits? Most DPASA implemented components have some of these in “code”– should they be made explicit? Should we add these as architectural elements at key components SM, PS, PSQ and LC Q1sm invites Combat Ops, but does not see all heartbeats Q2SM sees heartbeats from 4 out of 5 Combat Ops components Q3sm shows some missing heartbeats from Combat Ops Q4sm same as Q3SM GUI Up, but cannot subscribe No significant alerts in Emerald Combat ops got bad references for Q1, Q3 and Q4? Most likely not all at once Try to push right references Try refreshing these first If fails try refreshing with q3 blocked? (DPASA Operators) DPASA Operators Organic Decision Making: within the system, by the system Issues to be addressed by the architecture Detection –Arch differencing –Deviation from spec Interpretation –Models, JHU A-DAGs Deductive analysis, hypothesis testing –HDSM? Cortex Response selection –RMPL? Cortex

21 More Maneuvering Room for Defense Beyond restart process, reboot,and graceful degradation (block or isolate, reduce quorum size etc) –More spares, distributed widely (Scalable redundancy) –Restart a variant (Genesis, Dawson) –Reboot a new system (Asbestos?) –Change transport (from QuickSilver to SlingShot, accept the weaker guarantees) SRS technologies provide the infrastructure or mechanisms– but the management? Policies, decision making– when to restart a variant, when to reboot with what restrictions, which transport? SRS cognitive capabilities (reasoning about the system) will likely fall short in reasoning about SRS technologies Carrying over state and keys?

22 Improving the System on the Fly Even if improvement causing changes are identified along with the right time to apply them, the system must be “architected” to take the changes –Authorized vs unauthorized changes –Risk of automation– a new attack avenue –Different kinds “Change” Code changes– –Restart– state and key issues Policy or configuration changes –IP Tables, ADF, rate limiting, size checking »Hooks exists, can be done manually Protocol/transport changes This is an architecture and implementation issue– solution will likely be dependant on the technologies being used

23 A Futuristic DPASA++ System Taste testers: at key service providers such as PSQ (using existing redundancy) and may even at the perimeter router. q1sm q1ps q1cor q1psq q1dc q1NIDS q1ap q2sm q2ps q2cor q2psq q2dc q2NIDS q2ap q3sm q3ps q3cor q3psq q3dc q3NIDS q3ap q4sm q4ps q4cor q4psq q4dc q4NIDS q4ap q4sm q4ps q4cor q4psq q4dc q4NIDS q4ap ENIDS WxHaz ChemHaz EDC JEES ENIDS WxHaz ChemHaz EDC JEES SCRBT TAP SWDIST AODBSVR TAPDB PNIDS AODB Target CAF SCRBT TAP SWDIST AODBSVR TAPDB PNIDS AODB Target CAFANIDS MAF WNIDS CombOPS ENV LANPLANNING LAN Wing Ops LAN AMC CONUS LAN Enhanced SMs: eliminate advisors, more decision support interfaces EmeraldAuto-actionArch DifferenceHD Search Diverse variants of JVM and libraries OS support of isolation– keys, check pointed data, etc. LCs enhanced with Arch Diff and Cognitive Executive Use Genesis, DAWSON, Asbestos, RMPL/AWDRAT technologies LC Removal of existing component/feature Enhancement of existing component/feature Addition of new component/feature Color Code

24 Bottom Up Approach: Self-Regeneration Feedback Loop service deviation Controller Application service specification resource allocation “service” may include the app’s functional correctness and/or quality of service delivery

25 Resource Controller Application Resource service measurement service measurement resource configuration resource allocation service deviation knowledge analysis strategy service specification Feedback Loop Including Resources

26 Using SRS Technologies in Feedback Loop Service specification: – RMPL, AWDRAT, Daikon Service measurement: – Archie, RMPL, Architectural Differencing, PMOP Resource configuration: – Genesis, DAWSON, Repair Tool, Cortex, HDSM Resource allocation: – RMPL, AWDRAT Controller: –Knowledge: Cortex –Analysis: Trust Modeling, HDSM –Strategy: RMPL, AWDRAT

27 Using SRS Technologies for Distributivity Self-Regenerative System will likely distribute –Application and/or –Resources and/or –Controller For coordinating distributed redundant application services and resources –Steward, Q/U, R/W, QuickSilver (virtual synchrony) For coordinating distributed redundant controllers –SlingShot (probabilistic time-critical)

28 Design Choices for Feedback Loops Hierarchy –Loops may be placed within application components, resources, and/or controllers of larger loops –Loops may share resources and/or controllers –Controllers often share data: Synthesized from lower layers Inherited from higher layers –Trade speed for smarts: small loops are fast and dumb; large loops slow and aware Coordination –Replicated controllers allow easier analysis of defensive properties –Autonomous, decentralized controllers reduce the cost of coordination

29 Example: Multiple Components, Nested and Distributed Controllers, Shared Resources Controller Component Resource Controller Component Controller Resource

30 Design Rules-Of-Thumb Use purely local reaction only when accurate self-accusation is possible –“Organic” decision-making –Examples: if uncaught exception, restart thread; if seg fault, start new variant Controller scope should follow some boundary defined by access controls. –Examples: a LAN bounded by firewalls For every resource, some controller scope should monitor all its uses.

31 Natural Architectural Fragments Use AWDRAT, RMPL, or Cortex as Controller framework entire system or a significant subsystem and/or one object or process Use Genesis or DAWSON to create alternate method implementations used in AWDRAT or RMPL Use Asbestos to compartmentalize data for multiple clients in Q/U protocol or multiple groups in QuickSilver protocol Construct a Unified Communication Service from multicast protocols Runtime selection of alternate communication protocol with different properties Apply Learning and Repair technology to other SRS components

32Conclusion Various SRS technologies would have allowed improvement to our DPASA system defenses. Taken collectively, SRS technologies address most parts of the problem of self-regenerative control. Underlying SRS ideas seem sound but many implementations are immature. SRS technologies do not show how to distribute and scale self-regenerative control loops.

Backup

34 Placeholder for Strawman Componentization of defense –Protection, detection and adaptation –Organic decision making Unified Communication Service Architecture: –Organizing defense-enabled components over the UCS substrate Layered vs monolithic Loose confederation vs Logical centralization (DPASA is layered and logically centralized) –Deliberative inter-component adaptations FixIt

SRS Architecture Study Partha Pal Franklin Webber.

Similar presentations

Presentation on theme: "SRS Architecture Study Partha Pal Franklin Webber."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SRS Architecture Study Partha Pal Franklin Webber.

Similar presentations

Presentation on theme: "SRS Architecture Study Partha Pal Franklin Webber."— Presentation transcript:

Similar presentations

About project

Feedback