Slide 1 ISTORE: Introspective Storage for Data-Intensive Network Services Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Noah Treuhaft,

ISTORE: Introspective Storage for Data-Intensive Network Services Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Noah Treuhaft, John Kubiatowicz, Kathy Yelick, and David Patterson Computer Science Division University of California, Berkeley http://iram.cs.berkeley.edu/istore/

Technical Problem to Tackle? Build HW/SW to provide a scalable, available, maintainable (“SAM”) server that is dedicated to a single data-intensive application Currents state-of-the-art emphasis is on cost-performance, with “SAM” largely ignored –Cost of administration/year typically 3X cost of disk –RAID, Tandem-like availability based on “fail fast”; nothing fails fast today –Themis.cs outage: relied on humans to watch behavior of system to invoke proper response; what if take vacation day, go to dentist?

ISTORE-1 Hardware Prototype Based on intelligent disk bricks (64 nodes) –fast embedded CPU performs local monitoring tasks, runs parallel application code –diagnostic hardware provides fail-fast behavior, self- testing, additional monitoring  “Introspection” Intelligent Chassis: scalable redundant switching, power, env’t monitoring Intelligent Disk “Brick” Disk CPU, memory, redundant NICs

A Software Framework for Introspection ISTORE hardware provides device monitoring ISTORE software framework should simplify writing introspective applications –Rule-based adaptation engine encapsulates the mechanisms of collecting, processing monitoring data –Maintainability information stored in data base –Policy compiler and mechanism libraries help turn application adaptation goals into rules & reaction code –These provide a high-level, abstract interface to the system’s monitoring and adaptation mechanisms

What is our plan for success? One year view –Run data intensive kernels (sort, hash-join,...) on small cluster of PCs to establish performance level –3 OS (Linix, Free BSD, NetBSD) for genetic diversity –Install Berkeley DB to collect maintainability data –Learn about adaptabilty theory from Michael Jordan –Invent SAM Benchmarks –Construct ISTORE-1 Three year view –Based on lessons learned, construct ISTORE-2 –Policy-based monitoring and reaction to SAM events

Status and Conclusions ISTORE’s focus is on introspective systems –a new perspective on systems research priorities Proposed framework for building introspection –intelligent, self-monitoring plug-and-play hardware –software that provides a higher level of abstraction for the construction of introspective systems »flexible, powerful rule system for monitoring »policy specification automates generation of adaptation Status –ISTORE-1 hardware prototype being constructed now –software prototyping just starting

Backup Slides

Related Work Hardware: –CMU and UCSB Active Disks Software: –Adaptive databases: MS AutoAdmin, Informix NoKnobs –Adaptive OSs: MS Millennium, adaptive VINO –Adaptive storage: HP AutoRAID, attribute-managed storage –Active databases: UFL Gator, TriggerMan ISTORE unifies many of these techniques in a single system

ISTORE-1 Hardware Design Brick –processor board »mobile Pentium-II, 366 MHz, 128MB SODRAM »PCI and ISA busses/controllers, SuperIO (serial ports) »Flash BIOS »4x100Mb Ethernet interfaces »Adaptec Ultra2-LVD SCSI interface –disk: one 18.2GB 10,000 RPM low-profile SCSI disk –diagnostic processor

ISTORE-1 Hardware Design (2) Network –primary data network »hierarchical, highly-redundant switched Ethernet »uses 16 20-port 100Mb switches at the leaves each brick connects to 4 independent switches »root switching fabric is two ganged 25-port Gigabit switches (PacketEngines PowerRails) –diagnostic network

Diagnostic Support Each brick has a diagnostic processor –Goal: small, independent, trusted piece of hardware running hand-verifiable monitoring/control software »monitoring: CPU watchdog, environmental conditions »control reboot/power-cycle main CPU inject simulated faults: power, bus transients, memory errors, network interface failure,... Separate “diagnostic network” connects the diagnostic processors of each brick –provides independent network path to diagnostic CPU »works when brick CPU is powered off or has failed »separate failure modes from Ethernet interfaces

Diagnostic Support Implementation Not-so-small embedded Motorola 68k processor –provides the flexibility needed for research prototype »can communicate with CPU via serial port, if desired –still can run just a small, simple monitoring and control program if desired (no OS, networking, etc.) CAN (Controller Area Network) diagnostic interconnect –one brick per “shelf” of 8 acts as gateway from CAN to redundant switched Ethernet fabric –CAN connects directly to automotive environmental monitoring sensors (temperature, fan RPM,...)

ISTORE Research Agenda ISTORE goal = create a hardware/software framework for building introspective servers –Hardware –Software: toolkit that allows programmers to easily define the system’s adaptive behavior »provides abstractions for manipulating and reacting to monitoring data

Rule-based Adaptation ISTORE’s adaptation framework built on model of active database –“database” includes: »hardware monitoring data: device status, access patterns, performance stats »software monitoring data: app-specific quality-of- service metrics, high-level workload patterns,... –applications define views and triggers over the DB »views select and aggregate data of interest to app. »triggers are rules that invoke application-specific reaction code when their predicates are satisfied –SQL-like declarative language used to specify views and trigger rules

Benefits of Views and Triggers Allow applications to focus on adaptation, not monitoring –hide the mechanics of gathering and processing monitoring data –can be dynamically redefined without altering adaptation code as situation changes Can be implemented without a real database –views and triggers implemented as device-local and distributed filters and reaction rules –defined views and triggers control frequency, granularity, types of data gathered by HW monitoring –no materialized database necessary

Raising the Level of Abstraction: Policy Compiler and Mechanism Libs Rule-based adaptation doesn’t go far enough –application designer must still write views, triggers, and adaptation code by hand »but designer thinks in terms of system policies Solution: designer specifies policies to system; system implements them –policy compiler automatically generates views, triggers, adaptation code –uses preexisting mechanism libraries to implement adaptation algorithms –claim: feasible for common adaptation mechanisms needed by data-intensive network service apps.

Open Research Issues Defining appropriate software abstractions –how should views and triggers be declared? –what is the system’s “schema”? »how should heterogeneous hardware be integrated? »can it be extended by the user to include new types and statistics? –what should the policy language look like? –what level of policies can be expressed? »how much of the implementation can the system figure out automatically? »to what extent can the system reason about policies and their interactions? –what functions should mechanism libraries provide?

More Open Research Issues Implementing an introspective system –what default policies should the system supply? –what are the internal and external interfaces? –debugging »visualization of states, triggers,... »simulation/coverage analysis of policies, adaptation code »appropriate administrative interfaces Measuring an introspective system –what are the right benchmarks for maintainability, availability, scalability? O(>=1000)-node scalability –how to write applications that scale and run well despite continual state of partial failure?

Motivation: Technology Trends Disks, systems, switches are getting smaller Convergence on “intelligent” disks (IDISKs) –MicroDrive + system-on-a-chip => tiny IDISK nodes Inevitability of enormous-scale systems –by 2006, a O(10,000) IDISK-node cluster with 90TB of storage could fit in one rack IBM MicroDrive (340MB, 5MB/s) World’s Smallest Web Server (486/66, 16MB RAM, 16MB ROM)

Disk Limit Continued advance in capacity (60%/yr) and bandwidth (40%/yr) Slow improvement in seek, rotation (8%/yr) Time to read whole disk YearSequentiallyRandomly (1 sector/seek) 1990 4 minutes6 hours 200012 minutes 1 week(!) 3.5” form factor make sense in 5-7 years?

ISTORE-II Hardware Vision System-on-a-chip enables computer, memory, redundant network interfaces without significantly increasing size of disk Target for + 5-7 years: 1999 IBM MicroDrive: – 1.7” x 1.4” x 0.2” (43 mm x 36 mm x 5 mm) –340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006 MicroDrive? –9 GB, 50 MB/s (1.6X/yr capacity, 1.4X/yr BW)

2006 ISTORE ISTORE node –Add 20% pad to MicroDrive size for packaging, connectors –Then double thickness to add IRAM –2.0” x 1.7” x 0.5” (51 mm x 43 mm x 13 mm) Crossbar switches growing by Moore’s Law –2x/1.5 yrs  4X transistors/3yrs –Crossbars grow by N 2  2X switch/3yrs –16 x 16 in 1999  64 x 64 in 2005 ISTORE rack (19” x 33” x 84”) (480 mm x 840 mm x 2130 mm) –1 tray (3” high)  16 x 32  512 ISTORE nodes –20 trays+switches+UPS  10,240 ISTORE nodes(!)

Slide 1 ISTORE: Introspective Storage for Data-Intensive Network Services Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Noah Treuhaft,

Similar presentations

Presentation on theme: "Slide 1 ISTORE: Introspective Storage for Data-Intensive Network Services Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Noah Treuhaft,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Slide 1 ISTORE: Introspective Storage for Data-Intensive Network Services Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Noah Treuhaft,

Similar presentations

Presentation on theme: "Slide 1 ISTORE: Introspective Storage for Data-Intensive Network Services Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Noah Treuhaft,"— Presentation transcript:

Similar presentations

About project

Feedback