Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide 1 ISTORE Software Runtime Architecture Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John Kubiatowicz, and David Patterson.

Similar presentations


Presentation on theme: "Slide 1 ISTORE Software Runtime Architecture Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John Kubiatowicz, and David Patterson."— Presentation transcript:

1 Slide 1 ISTORE Software Runtime Architecture Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John Kubiatowicz, and David Patterson http://iram.cs.berkeley.edu/istore 1999 Winter IRAM Retreat

2 Slide 2 ISTORE Runtime Software Architecture Runtime system goals for the ISTORE meta-appliance (1) Provide mechanisms that allow network service applications to exploit introspection (monitor + adapt) (2) Allow appliance designer to tailor runtime system policies and interfaces How the goals are achieved (1) Introspection: layered local and global runtime system libraries that manipulate and react to monitoring data (2) Specialization: runtime system is extensible using domain- specific languages (DSLs)

3 Slide 3 Roadmap Layered software structure Example of introspection Runtime system extensibility using DSLs Conclusion

4 Slide 4 Device Interface & RTOS Layered software structure Device nodesFront-end node(s) Switch LAN/WAN HW Device HW Device (NIC) Local Runtime Distributed Global Runtime Parallel App. Worker Code Application Front- End Code

5 Slide 5 Device Interface & RTOS Device Interface Layer Device nodesFront-end node(s) Switch LAN/WAN HW Device HW Device (NIC)

6 Slide 6 Device interface layer Microkernel OS modules Traditional OS services –Networking, mem management, process scheduling, threads, … Device-specific monitoring –Raw access patterns –Utilization statistics –Environmental parameters –Indications of impending failure Self-characterization of performance, functional capabilities

7 Slide 7 Device Interface & RTOS Local runtime layer Device nodesFront-end node(s) Switch LAN/WAN HW Device HW Device (NIC) Local Runtime

8 Slide 8 Local runtime layer Non-distributed mechanisms needed by network service applications Feeds information to global layer or performs local operations on behalf of global layer Example mechanisms –Application-specific filtering/aggregation of device monitoring data »Example: OLTP server vs. DSS server –Data layout and naming »Example: record-based interface for DB, file-based for web server –Device scheduling »Example: maximize TPS vs. maximize disk bandwidth utilization –Caching »Coherence essential vs. coherence unnecessary »More efficient caching implementation possible in second case

9 Slide 9 Device Interface & RTOS Global runtime layer Device nodesFront-end node(s) Switch LAN/WAN HW Device HW Device (NIC) Local Runtime Distributed Global Runtime

10 Slide 10 Global runtime layer Aggregate, process, react to monitoring data Relies on local per-device runtime mechanisms to provide monitoring data, implement control actions Provides application interface that hides distributed implementation of runtime services Example services –High-level services »Load balancing: replicate and/or migrate heavily used data objects when a disk becomes over-utilized »Availability: replicate data from failed or failing component to restore required redundancy »Plug-and-play: integrate new devices into the system –Low-level services used to implement high-level global services »Distributed directory tracks data and metadata objects »Migration, replication, caching »Inter-brick communication »Distributed transactions

11 Slide 11 Device Interface & RTOS Distributed application worker code Device nodesFront-end node(s) Switch LAN/WAN HW Device HW Device (NIC) Local Runtime Distributed Global Runtime Parallel App. Worker Code

12 Slide 12 Distributed application worker code Runs on top of global runtime system Written by appliance designer Application-specific –Database »scan, sort, join, aggregate, update record, delete record,... –Transformational web proxy »fetch web page (from disk or remote site), apply transformation filter, update user preferences database,... System administration tools implemented at this level –Customized runtime system defines administrative interface tailored to application

13 Slide 13 Device Interface & RTOS Application front-end code Device nodesFront-end node(s) Switch LAN/WAN HW Device HW Device (NIC) Local Runtime Distributed Global Runtime Parallel App. Worker Code Application Front- End Code

14 Slide 14 Application front-end code Runs on front-end interface bricks Accepts requests from LAN/WAN connection –Incoming requests made using standard high-level protocols »HTTP, NFS, SQL, ODBC, … Invokes and coordinates appropriate worker code components that execute on internal blocks –Takes into account locality and load balancing –Database: front-end performs SQL query optimization, invokes distributed relational operators on data storage devices –Transformational proxy: front-end invokes distiller thread on appropriate device brick »if data is cached, invoke on disk node »otherwise, fetch data from web and invoke on compute node or disk node

15 Slide 15 Roadmap Layered software structure Example of introspection Runtime system extensibility using DSLs Conclusion

16 Slide 16 From introspection to adaptation Example: slowly-failing data disk in large DB system (1) Detect problem (2) Repair problem while continuing to handle incoming requests (3) Return to normal system operation Intelligent HW components Continuous monitoring Extensible, application- tailored runtime system Adaptive, self- maintaining appliance +

17 Slide 17 Failing disk: detection Microkernel monitoring module continuously monitoring disk’s health detects exceptional condition, e.g. –ECC failures –Media errors –Increased rates of ECC retries Notifies global fault handling mechanism

18 Slide 18 Failing disk: reaction Global fault handling mechanism… –Prevents system from sending more work to failed device »Modifies global directory to remove entries corresponding to failed component’s data –Application-specific response to impending failure »Transactional system: discard work currently in progress on failing device, reissue to another data replica »Non-transactional system w/o coherent replicas: checkpoint computation, restore on another data replica »Transformational web proxy: do nothing –Instruct disk runtime system to shut disk down »Disk device is considered failed

19 Slide 19 Failing disk: return to normal operation Global fault handling mechanism... –Rebuilds data redundancy »By allocating space for a new replica on a functioning disk and copying data to it from existing replicas »Using an application-specific data replication mechanism Where to allocate new replicas, how to copy data, how to lay out data for new replicas, how to update global directory Example in upcoming slide Life returns to normal –Degree of fault-tolerance has been restored Failed component can be replaced during regularly- scheduled maintenance

20 Slide 20 Roadmap Layered software structure Example of introspection Runtime system extensibility using DSLs Conclusion

21 Slide 21 Runtime system extensibility Two ways of looking at system –Partitioned on functional/mechanism boundaries »Collection of libraries: failure detection, transactions,... »Mechanisms are isolated application libfail librepl libtrxn libcache OS

22 Slide 22 Runtime system extensibility Two ways of looking at system –Partitioned on functional/mechanism boundaries »Collection of libraries: failure detection, transactions,... »Mechanisms are isolated –Partitioned on global system properties »This is how the programmer thinks about the system (high-level) »e.g. application-specific data availability policy Failure detection (which devices to monitor, …) Replication (used to restore redundancy) Transactions (how to restart work in progress) Caching (how to handle dirty cached objects during failure) application libfail librepl libtrxn libcache OS

23 Slide 23 Runtime system extensibility Two ways of looking at system –Partitioned on functional/mechanism boundaries »Collection of libraries: failure detection, transactions,... »Mechanisms are isolated –Partitioned on global system properties »This is how the programmer thinks about the system (high-level) »e.g. application-specific data availability policy Failure detection (which devices to monitor, …) Replication (used to restore redundancy) Transactions (how to restart work in progress) Caching (how to handle dirty cached objects during failure) application libfail librepl libtrxn libcache OS policy compiler Customized runtime system library

24 Slide 24 Extensibility using DSLs DSLs are languages specialized for a particular task Each ISTORE DSL –Encapsulates high-level semantics of one system behavior –Allows declarative specification of »Behavior of one aspect of the system (a “policy”) »Interfaces to coordinated mechanisms that implement the policy –Is compiled into an implementation that might coordinate several local and/or global base runtime system mechanisms »May be implemented as background and/or foreground tasks Analysis tools can potentially infer unspecified emergent system behaviors from the specifications –e.g. what impact will a new redundancy policy have on transaction commit time Extensions compiled together with local and global base mechanisms form the distributed runtime system

25 Slide 25 Extensibility using DSLs: Example Avail::FailureDetected(Device d) { Object o; ObjList objs; Transaction t; TxnList txns; Replica x, c, r; Directory::MarkDeviceDisabled(d); Admin::AlertFailure(d); objs = Directory::GetObjects(d); objs stored on failed device foreach o (objs) { x = Directory::GetReplica(o,d) find o’s replica on d Directory::DeleteReplica(x); delete from global directory txns = Txn::GetActiveTxns(x); foreach t (txns) { Txn::AbortTxn(t); abort pending txns for o on d } c = Directory::GetReplica(o); find still-accessible copy r = LoadBalancer::AllocateReplica(o); get space for new repl LocalRuntime::CopyObject(c,c->device,r,r->device); copy it Directory::AddReplica(r,r->device); update directory foreach t (txns) { Txn::IssueTxn(txn,r); reissue txns on new replica }

26 Slide 26 Extensibility using DSLs (cont.) Similar specification written for each extension to base library Other examples of extensible system behaviors –Transaction response time requirements –Prioritizing operations based on type of data processed –Resource allocation –Backup policy –Exported administrative interface

27 Slide 27 Why use DSLs? Possible choices –Each appliance designer writes runtime system from scratch »Similar to exokernel operating systems –All designers use single parameterized runtime system library »Similar to tunable kernel parameters in modern OSs –Designer writes high-level specification of system behavior »DSL compiler automatically translates specification into runtime system extensions that coordinate base mechanisms »Advantages include Programmability Performance Reliability, verifiability, safety Artificial diversity

28 Slide 28 DSL advantages (cont.) Programmability –High-level specification close to designer’s abstraction level »Easier to write, reason about, maintain, modify runtime system code »Simple enough to allow site-specific customization at installation time Performance –Aggressive DSL compiler can take advantage of high-level semantics of specification language –Base library mechanisms can be highly optimized; optimization complexity hidden from appliance designer –Web example: infer that TCP checksums should be stored with web pages

29 Slide 29 DSL advantages (cont.) Reliability –Automatically generate code that’s easy to forget or get wrong »Example: synchronization operations to serialize accesses to distributed data structure Verifiability –Of input code (DSL specification) »More abstract form of semantic checking »e.g. DSL supports types natural to behavior being specified => type-checking verifies some semantic constraints e.g. “ensure no unencrypted objects are written to disk” –Of output code (coordinated use of base mechanisms) »DSL compiler writer satisfied DSL compiler is correct => appliance designer inherits verification effort Safety (prevent runtime errors) –Whole classes of general programming errors not possible »DSLs hide details: runtime memory management, IPC,... »Compiler automatically adds code: synchronization,...

30 Slide 30 DSL advantages (cont.) Artificial diversity –Potentially allow system to continue operation in face of internal bugs or malicious attack »Multiple implementations of component run simultaneously on different data replicas »Continuously check each other with respect to high-level behavior »Non-traditional fault-tolerance, but related to process pairs –Potentially usable to enhance performance »Select best-performing implementation(s) for future use; periodically reevaluate choice –Examples of possible implementation differences »Low-level: runtime memory layout, code ordering and layout »High-level: system resource usage (recompute vs. use stored data, general space/time/bandwidth tradeoffs) Specification DSL compiler Implementation 1Implementation 2Implementation 3...

31 Slide 31 ISTORE software summary ISTORE software architecture provides an extensible runtime environment for distributed network service application code –Layered local and global mechanism libraries provide introspection and self-maintenance –Mechanisms can be customized using DSL-based specifications of application policy »DSL code coordinates base mechanisms to implement application semantics and interfaces »DSL-based extension offers significant advantages in programmability, performance, reliability, safety, diversity

32 Slide 32 ISTORE summary Network services are increasing in importance –Self-maintaining scaleable storage appliances match the needs of these services ISTORE provides a flexible architecture for implementing storage-based network service apps –Modular, intelligent, fault-tolerant hardware platform is easy to configure, scale, and administer –Runtime system allows applications to leverage intelligent hardware, achieve introspection, and provide self- maintenance through »Layered runtime software structure »DSL-based extensibility that allows easy application-specific customization

33 Slide 33 Agenda Overview of ISTORE: Motivation and Architecture Hardware Details and Prototype Plans Software Architecture Discussion and Feedback

34 Slide 34 Backup slides

35 Slide 35 What ISTORE is not An extensible operating system –Use commodity OS, only add hardware monitoring module »MM could just be a device driver => no need for microkernel OS »ISTORE could be built on top of an extensible operating system for even greater flexibility An attempt to make commodity OS’s extensible –Extensible runtime system allows designer to customize higher-level operations than OS extensions do –Closest to an extensible distributed operating system built on top of a commodity single-node operating system A multiple-protection-domain system –Assumes non-malicious programmer –If user-downloaded code permitted, sandbox must be implemented as part of (trusted) application –DSLs specify resource allocation/scheduling policies, appliance designer responsible for ensuring fairness A framework for building generic servers

36 Slide 36 ISTORE boot process (1) Initially, undifferentiated ISTORE system (2) On boot, each device block contacts system boot server (3) Device blocks download customized runtime system and application worker code –Front-end blocks also download application front-end code Runtime system libraries structured as shared libraries => hot upgrade

37 Slide 37 Example Appliances E-commerce Web search engine Transformational web/PDA proxy Election server Mail server News server NFS server Database server: OLTP, DSS, mixed OLTP-DSS Video server


Download ppt "Slide 1 ISTORE Software Runtime Architecture Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John Kubiatowicz, and David Patterson."

Similar presentations


Ads by Google