Emergence of Extreme Networked Devices David Culler Computer Science Division U.C. Berkeley USC, Feb 28, 2001
2/28/2001Emerging Extremes2 The Expanding Computing Spectrum Servers Workstations Personal Computers Internet Services PDAs / HPCs/ smartphones
2/28/2001Emerging Extremes3 Convergence at the middle Common platform –powerful microproc (choice of 3), dram (3), disk (2) –deep I/O hierarchy, OS layering Common system abstraction –collection of threads sharing a large virtual address space –GUI orientation –blocking interfaces Concurrency as threads Services as local call / remote thread –RPC, rmi, dCOM, http Ample resources easily abstracted –open loop –transparent allocation and usage
2/28/2001Emerging Extremes4 Servers Workstations Personal Computers Internet Services PDAs / HPCs/ smartphones svr Open Internet Services Microscopic sensor networks Planetary Services
2/28/2001Emerging Extremes5 Convergence at the Extremes Concurrency intensive –data streams and real-time events, not command-response Communications-centric Limited resources (relative to load) Huge variation in load –population usage & physical stimuli –robustness Hands-off (no UI) Dynamic configuration, discovery –Self-organized and reactive control Similar execution model –event driven, –components Complimentary roles –tiny semi-autonomous devices empowered by infrastructure –infrastructure services connected to the real world
2/28/2001Emerging Extremes6 Outline Emerging Extremes Robust Framework for Open Scalable Internet Services –the garden path: threads to non-block I/O and RPC –structured event-driven alternatives –controllers within a graph of stages Tiny OS for Wireless Embedded Sensor Networks
2/28/2001Emerging Extremes7 Servers Clients Servers Infrastructure Services Open Ninja: Open Infrastructure Services systematic framework for building robust, composable services focus here on execution model
2/28/2001Emerging Extremes8 Variation in Load – slashdot effect USGS Web Server Traffic October 16, 1999 Hector Mine Earthquake
2/28/2001Emerging Extremes9 Inherent Variation: Gnutella Router Traffic Matt Welsh
2/28/2001Emerging Extremes10 Toward Robust Behavior Under Load Traditional Capacity Planning –over-provision by factor over typical (increasing 4 -> 10-15) –cluster-based replication is, at least, cost-effective –peaks occur when it matters most Content-distribution –potential replication proportional to use Still want graceful degradation when instance is overloaded thru-put (op/s) response-time (s) load
2/28/2001Emerging Extremes11 Threads as THE building block Freely compose these two primitives But,... threads a limited resource Remote Services Masking I/O Latency
2/28/2001Emerging Extremes12 Service “test problem” A: popularity L: I/O, network, or service composition depth Threaded server task arrivals rate: A tasks / sec # concurrent tasks in server: T = A x L task completions rate: S tasks / sec closed loop implies S = A latency: L sec dispatch( ) or create( )
2/28/2001Emerging Extremes13 ultra 170 and E450, Solaris 7.2, jdk Threads are a “limited resource” Fix L = 10 ms, for each T measure max A = S Cluster parallelism just raises the threshold
2/28/2001Emerging Extremes14 Alternative: queues, events, typed msgs single-threaded server queues absorb load and decouple operations –svr chooses when to assign resources to request event bounded resources at request interface –impose load-conditioning or admission control provide non-blocking interface client retains control of its thread –chooses when to block –permits negotiation protocol –key to service composition Explicit request queue
2/28/2001Emerging Extremes15 Event-per-task saturates gracefully Better and more robust performance –Use cluster parallelism to match desired thruput Can decompose task into multiple events –circulate or pipeline but...
2/28/2001Emerging Extremes16 Down-side of monolithic event approach Lose familiar programming model –thread steps through each stage in the task –need a handler per stage Difficult software engineering –composing and scheduling Does not naturally exploit SMP parallelism –must pipeline multiple event handler blocks Whenever the thread blocks, the whole structure stalls –throughput ~ 1/L
2/28/2001Emerging Extremes17 State-of-practice: bounded thread pool Only allow K threads to “accept” connections –some OS’s have fixed hard limit Additional requests time-out choose K < T max xput choose K large enough to hide L Threaded server task arrivals rate: A tasks / sec task completions rate: S tasks / sec
2/28/2001Emerging Extremes18 A “third road” Building block –bounded internal thread pool –queue-based interface –subset of task stages »request event processing in familiar style –can chunk request stream for efficiency Compose Service as a graph of “stages” –modularity –stages can be replicated across nodes Stage “control loop” manages threads read header read header read header exec read header cache check read header cache miss read header write resp
2/28/2001Emerging Extremes19 Well-conditioned Service Architecture Abstract System I/F as non-blocking stages –careful engineering at the system interface Describe stages as modular state machines Associate thread manager with stages Build Service as composition of stages –can be dynamic Matt Welsh
2/28/2001Emerging Extremes20 example: http throughput SPECweb99 static workload, 4 classes
2/28/2001Emerging Extremes21 Response Time
2/28/2001Emerging Extremes22 Reactive Stage Thread Pool Sizing Two Packet Types ping – fast query – 20 ms delay Thread Governor - observes queue length - over threshold => add threads clients
2/28/2001Emerging Extremes23 Scalable Persistent Data Structures I/O core disk I/O core network buffer cache single-node HT distributedhashtable “RPC” skeletons operating system DDS Brick Steve Gribble Service DDS lib Storage “brick” Service DDS lib Service DDS lib Storage “brick” Storage “brick” Storage “brick” Storage “brick” Storage “brick” System Area Network Clustered Service
2/28/2001Emerging Extremes24 Scalable Throughput
2/28/2001Emerging Extremes25 Robust under load
2/28/2001Emerging Extremes26 Outline Emerging Extremes Robust Framework for Open Scalable Internet Services –modular generalized state machines –constrained use of threads –thread-manager as controller Tiny OS for Wireless Embedded Sensor Networks –characteristics of the other extreme –current platforms –events and primitive threads in a graph of components –exploring open problems
2/28/2001Emerging Extremes27 Emerging Microscopic Devices CMOS trend is not just Moore’s law Micro Electical Mechanical Systems (MEMS) –rich array of sensors are becoming cheap and tiny Imagine, all sorts of chips that are connected to the physical world and to cyberspace! LNA mixer PLL baseband filters I Q Low-power Wireless Communication
2/28/2001Emerging Extremes28 Characteristics of Network Sensors Small physical size and low power consumption Concurrency-intensive operation –flow-thru, not wait-command-respond Limited Physical Parallelism and Controller Hierarchy –primitive direct-to-device interface Diversity in Design and Usage –application specific, not general purpose –huge device variation => efficient modularity => migration across HW/SW boundary Robust Operation –numerous, unattended, critical => narrow interfaces sensors actuators network storage
2/28/2001Emerging Extremes29 Current Example 1” x 1.5” motherboard –ATMEL 4Mhz, 8bit MCU, 512 bytes RAM, 8K pgm flash –900Mhz Radio (RF Monolithics) ft. range –ATMEL network pgming assist –Radio Signal strength control and sensing –I2C EPROM (logging) –Base-station ready –stackable expansion connector »all ports, i2c, pwr, clock… Several sensor boards –basic protoboard –tiny weather station (temp,light,hum,press) –vibrations (acc, temp,...) –accelerometers –magnetometers
2/28/2001Emerging Extremes30 Basic Power Breakdown… But what does this mean? –Lithium Battery runs for 35 hours at peak load and years at minimum load! »three orders of magnitude difference! –A one byte transmission uses the same energy as approx cycles of computation. –Idleness is not enough, sleep! ActiveIdleSleep CPU5 mA2 mA5 μA Radio7 mA (TX)4.5 mA (RX)5 μA EE-Prom3 mA00 LED’s4 mA00 Photo Diode200 μA00 Temperature200 μA00 Panasonic CR mAh
2/28/2001Emerging Extremes31 A Operating System for Tiny Devices? Traditional approaches –command processing loop (wait request, act, respond) –monolithic event processing –bring full thread/socket posix regime to platform Alternative –provide framework for concurrency and modularity –never poll, never block –interleaving flows, events, energy management –allow appropriate abstractions to emerge
2/28/2001Emerging Extremes32 Tiny OS Concepts Scheduler + Graph of Components –constrained two-level scheduling model: threads + events Component: –Commands, –Event Handlers –Frame (storage) –Tasks (concurrency) Constrained Storage Model –frame per component, shared stack, no heap Very lean multithreading Efficient Layering Messaging Component init Power(mode) TX_packet(buf) TX_pack et_done (success ) RX_pack et_done (buffer) Internal State init power(mode) send_msg (addr, type, data) msg_rec(type, data) msg_sen d_done) internal thread Commands Events
2/28/2001Emerging Extremes33 Application = Component Graph RFM Radio byte Radio Packet UART Serial Packet ADC Tempphoto Active Messages clocks bit byte packet Route map routersensor appln application HW SW Example: ad hoc, multi-hop routing of photo sensor readings
2/28/2001Emerging Extremes34 TOS Execution Model commands request action –ack/nack at every boundary –call cmd or post task events notify occurrence –HW intrpt at lowest level –may signal events –call cmds –post tasks Tasks provide logical concurrency –preempted by events Migration of HW/SW boundary RFM Radio byte Radio Packet bit byte packet event-driven bit-pump event-driven byte-pump event-driven packet-pump message-event driven active message application comp encode/decode crc data processing
2/28/2001Emerging Extremes35 Dynamics of Events and Threads bit event filtered at byte layer bit event => end of byte => end of packet => end of msg send thread posted to start send next message radio takes clock events to detect recv
2/28/2001Emerging Extremes36 Storage Breakdown (C Code) 3450 B code 226 B data
2/28/2001Emerging Extremes37 Empirical Breakdown of Effort can take apart time, power, space, … 50 cycle thread overhead, 10 cycle event overhead Components Packet reception work breakdown Percent CPU UtilizationEnergy (nj/Bit) AM 0.05%0.20%0.33 Packet 1.12%0.51%7.58 Radio handler 26.87%12.16% Radio decode thread 5.48%2.48%37.2 RFM 66.48%30.08% Radio Reception Idle-54.75%- Total %
2/28/2001Emerging Extremes38 Working Across Levels Encoding –DC-balanced SECDED Proximity detection –signal strength or error rates Low power listening Fair and efficient network access Security Tiny virtual machines Larger challenges
2/28/2001Emerging Extremes39 Low-Power Listening Costs about as much to listen as to xmit, even when nothing is received Only way to save power is to turn radio off when there is nothing to hear. Can turn radio on/of in about 1 bit –Can detect transmission at cost of ~2 bit times Small sub-msg recv sampling (10x) Application-level synchronization rendezvous to determine when to sample (10X) Xmit: Recv: preamblemessage sleep b Jason Hill
2/28/2001Emerging Extremes40 Managing local contention Channel Utilization ~70% Throughput per node is fair Highly correlated traffic, no collision detection –sensor events and beacons Randomize initial listen period, simple backoff Alec Woo
2/28/2001Emerging Extremes41 Managing aggregate contention Hidden nodes between each pair of “levels” –CSMA is not enough RTS/CTS acks too costly (power & BW) P[msg-to-base] drops rapidly with hops –Investment in packet increases with distance Local rate control to approx. fairness Priority to forwarding, adjust own data rate Additive increase, multiplicative decrease Listen for retransmission as ack ~ ½ of packets get through 4 levels out
2/28/2001Emerging Extremes42 Authentication / Security RC-5 shared key crypto in 1.7 kb Modified Tesla protocol for confidential & authenticated base broadcast Easy to compromise a node, but hard to get most of them
2/28/2001Emerging Extremes43 What’s in a program? HW + collection of components supports space of applications Application-Specific Virtual Machine –code-density, not portability –small byte-code interpreter component –accepts clock & message event capsules –Hides split-phase operations below interpreter Capsules define specific query / logic –filter criteria –diffusion primitives –...
2/28/2001Emerging Extremes44 Thoughts about robust Algorithms Active Dynamic Route Determination –When hear a new route beacon, record “parent”, retransmit from SELF, ignore additional messages for epoch Radio cell structure very unpredictable Builds and maintains good breadth-first forest Each node maintains O(1) state Fundamental operation is pruning retransmission –Monotonic variables –Message signature caches Takes energy to retain structure
2/28/2001Emerging Extremes45 Larger Challenges Programming support for systems of generalized state machines –language, debugging, verification Programming the unstructured aggregate Resilient Aggregators Understanding how an extreme system is behaving and what is its envelope –adversarial simulation
2/28/2001Emerging Extremes46 Tides of Innovation Time Integration Innovation Log R Mainframe Minicomputer Personal Computer Workstation Server 2/2001
2/28/2001Emerging Extremes47 Summary The extremes of the computing spectrum present tremendous opportunities for innovation Systems challenges –variation in load, unpredictability, hands-off embedded operation –limited resources, concurrency intensive, power constrained –self-organizing and adaptive More in common with each other than with the “average” devices New kinds of software system structures –modular event-driven structures –intrinsic feedback and control
2/28/2001Emerging Extremes48
2/28/2001Emerging Extremes49 Historical Perspective New eras of computing start when the previous era is so strong it is hard to imagine that things could be different –mainframe -> mini –mini -> workstation -> PC –PC -> ??? It is often smaller than what came before. –Most think of the new technology as “just a toy” The new dominant use was almost completely absent. it is likely to come from the extremes
2/28/2001Emerging Extremes50 Mean Response Time(A) closed system, but limited bandwidth
2/28/2001Emerging Extremes51 Threaded non-blocking disk-read service
2/28/2001Emerging Extremes52 Example: Disk-read “Stage”