Download presentation
Presentation is loading. Please wait.
1
Overheads for Computers as Components, 2nd ed.
Introduction What are embedded computing systems? Challenges in embedded computing system design. Design methodologies. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
2
Overheads for Computers as Components, 2nd ed.
Definition Embedded computing system: any device that includes a programmable computer but is not itself a general-purpose computer. Take advantage of application characteristics to optimize the design: don’t need all the general-purpose bells and whistles. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
3
Overheads for Computers as Components, 2nd ed.
Embedding a computer output analog input CPU analog mem embedded computer © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
4
Overheads for Computers as Components, 2nd ed.
Examples Cell phone. Printer. Automobile: engine, brakes, dash, etc. Airplane: engine, flight controls, nav/comm. Digital television. Household appliances. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
5
Overheads for Computers as Components, 2nd ed.
Early history Late 1940’s: MIT Whirlwind computer was designed for real-time operations. Originally designed to control an aircraft simulator. First microprocessor was Intel 4004 in early 1970’s. HP-35 calculator used several chips to implement a microprocessor in 1972. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
6
Overheads for Computers as Components, 2nd ed.
Early history, cont’d. Automobiles used microprocessor-based engine controllers starting in 1970’s. Control fuel/air mixture, engine timing, etc. Multiple modes of operation: warm-up, cruise, hill climbing, etc. Provides lower emissions, better fuel efficiency. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
7
Microprocessor varieties
Microcontroller: includes I/O devices, on-board memory. Digital signal processor (DSP): microprocessor optimized for digital signal processing. Typical embedded word sizes: 8-bit, 16-bit, 32-bit. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
8
Overheads for Computers as Components, 2nd ed.
Application examples Simple control: front panel of microwave oven, etc. Canon EOS 3 has three microprocessors. 32-bit RISC CPU runs autofocus and eye control systems. Digital TV: programmable CPUs + hardwired logic for video/audio decode, menus, etc. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
9
Automotive embedded systems
Today’s high-end automobile may have 100 microprocessors: 4-bit microcontroller checks seat belt; microcontrollers run dashboard devices; 16/32-bit microprocessor controls engine. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
10
BMW 850i brake and stability control system
Anti-lock brake system (ABS): pumps brakes to reduce skidding. Automatic stability control (ASC+T): controls engine to improve stability. ABS and ASC+T communicate. ABS was introduced first---needed to interface to existing ABS module. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
11
Overheads for Computers as Components, 2nd ed.
BMW 850i, cont’d. sensor sensor brake brake hydraulic pump ABS brake brake sensor sensor © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
12
Characteristics of embedded systems
Sophisticated functionality. Real-time operation. Low manufacturing cost. Low power. Designed to tight deadlines by small teams. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
13
Functional complexity
Often have to run sophisticated algorithms or multiple algorithms. Cell phone, laser printer. Often provide sophisticated user interfaces. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
14
Overheads for Computers as Components, 2nd ed.
Real-time operation Must finish operations by deadlines. Hard real time: missing deadline causes failure. Soft real time: missing deadline results in degraded performance. Many systems are multi-rate: must handle operations at widely varying rates. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
15
Non-functional requirements
Many embedded systems are mass-market items that must have low manufacturing costs. Limited memory, microprocessor power, etc. Power consumption is critical in battery-powered devices. Excessive power consumption increases system cost even in wall-powered devices. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
16
Overheads for Computers as Components, 2nd ed.
Design teams Often designed by a small team of designers. Often must meet tight deadlines. 6 month market window is common. Can’t miss back-to-school window for calculator. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
17
Why use microprocessors?
Alternatives: field-programmable gate arrays (FPGAs), custom logic, etc. Microprocessors are often very efficient: can use same logic to perform many different functions. Microprocessors simplify the design of families of products. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
18
The performance paradox
Microprocessors use much more logic to implement a function than does custom logic. But microprocessors are often at least as fast: heavily pipelined; large design teams; aggressive VLSI technology. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
19
Overheads for Computers as Components, 2nd ed.
Power Custom logic uses less power, but CPUs have advantages: Modern microprocessors offer features to help control power consumption. Software design techniques can help reduce power consumption. Heterogeneous systems: some custom logic for well-defined functions, CPUs+software for everything else. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
20
Overheads for Computers as Components, 2nd ed.
Platforms Embedded computing platform: hardware architecture + associated software. Many platforms are multiprocessors. Examples: Single-chip multiprocessors for cell phone baseband. Automotive network + processors. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
21
The physics of software
Computing is a physical act. Software doesn’t do anything without hardware. Executing software consumes energy, requires time. To understand the dynamics of software (time, energy), we need to characterize the platform on which the software runs. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
22
What does “performance” mean?
In general-purpose computing, performance often means average-case, may not be well-defined. In real-time systems, performance means meeting deadlines. Missing the deadline by even a little is bad. Finishing ahead of the deadline may not help. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
23
Characterizing performance
We need to analyze the system at several levels of abstraction to understand performance: CPU. Platform. Program. Task. Multiprocessor. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
24
Challenges in embedded system design
How much hardware do we need? How big is the CPU? Memory? How do we meet our deadlines? Faster hardware or cleverer software? How do we minimize power? Turn off unnecessary logic? Reduce memory accesses? © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
25
Overheads for Computers as Components, 2nd ed.
Challenges, etc. Does it really work? Is the specification correct? Does the implementation meet the spec? How do we test for real-time characteristics? How do we test on real data? How do we work on the system? Observability, controllability? What is our development platform? © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
26
Overheads for Computers as Components, 2nd ed.
Design methodologies A procedure for designing a system. Understanding your methodology helps you ensure you didn’t skip anything. Compilers, software engineering tools, computer-aided design (CAD) tools, etc., can be used to: help automate methodology steps; keep track of the methodology itself. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
27
Overheads for Computers as Components, 2nd ed.
Design goals Performance. Overall speed, deadlines. Functionality and user interface. Manufacturing cost. Power consumption. Other requirements (physical size, etc.) © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
28
Overheads for Computers as Components, 2nd ed.
Levels of abstraction requirements specification architecture component design system integration © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
29
Overheads for Computers as Components, 2nd ed.
Top-down vs. bottom-up Top-down design: start from most abstract description; work to most detailed. Bottom-up design: work from small components to big system. Real design uses both techniques. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
30
Overheads for Computers as Components, 2nd ed.
Stepwise refinement At each level of abstraction, we must: analyze the design to determine characteristics of the current state of the design; refine the design to add detail. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
31
Overheads for Computers as Components, 2nd ed.
Requirements Plain language description of what the user wants and expects to get. May be developed in several ways: talking directly to customers; talking to marketing representatives; providing prototypes to users for comment. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
32
Functional vs. non-functional requirements
output as a function of input. Non-functional requirements: time required to compute output; size, weight, etc.; power consumption; reliability; etc. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
33
Overheads for Computers as Components, 2nd ed.
Our requirements form © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
34
Example: GPS moving map requirements
Moving map obtains position from GPS, paints map from local database. I-78 Scotch Road lat: lon: 32 19 © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
35
Overheads for Computers as Components, 2nd ed.
GPS moving map needs Functionality: For automotive use. Show major roads and landmarks. User interface: At least 400 x 600 pixel screen. Three buttons max. Pop-up menu. Performance: Map should scroll smoothly. No more than 1 sec power-up. Lock onto GPS within 15 seconds. Cost: $120 street price = approx. $30 cost of goods sold. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
36
GPS moving map needs, cont’d.
Physical size/weight: Should fit in hand. Power consumption: Should run for 8 hours on four AA batteries. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
37
GPS moving map requirements form
© 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
38
Overheads for Computers as Components, 2nd ed.
Specification A more precise description of the system: should not imply a particular architecture; provides input to the architecture design process. May include functional and non-functional elements. May be executable or may be in mathematical form for proofs. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
39
Overheads for Computers as Components, 2nd ed.
GPS specification Should include: What is received from GPS; map data; user interface; operations required to satisfy user requests; background operations needed to keep the system running. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
40
Overheads for Computers as Components, 2nd ed.
Architecture design What major components go satisfying the specification? Hardware components: CPUs, peripherals, etc. Software components: major programs and their operations. Must take into account functional and non-functional specifications. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
41
GPS moving map block diagram
display GPS receiver search engine renderer database user interface © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
42
GPS moving map hardware architecture
display frame buffer CPU GPS receiver memory panel I/O © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
43
GPS moving map software architecture
database search renderer pixels position user interface timer © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
44
Designing hardware and software components
Must spend time architecting the system before you start coding. Some components are ready-made, some can be modified from existing designs, others must be designed from scratch. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
45
Overheads for Computers as Components, 2nd ed.
System integration Put together the components. Many bugs appear only at this stage. Have a plan for integrating components to uncover bugs quickly, test as much functionality as early as possible. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
46
Overheads for Computers as Components, 2nd ed.
Summary Embedded computers are all around us. Many systems have complex embedded hardware and software. Embedded systems pose many design challenges: design time, deadlines, power, etc. Design methodologies help us manage the design process. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
47
Overheads for Computers as Components, 2nd ed.
Introduction Object-oriented design. Unified Modeling Language (UML). © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
48
Overheads for Computers as Components, 2nd ed.
System modeling Need languages to describe systems: useful across several levels of abstraction; understandable within and between organizations. Block diagrams are a start, but don’t cover everything. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
49
Object-oriented design
Object-oriented (OO) design: A generalization of object-oriented programming. Object = state + methods. State provides each object with its own identity. Methods provide an abstract interface to the object. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
50
Overheads for Computers as Components, 2nd ed.
Objects and classes Class: object type. Class defines the object’s state elements but state values may change over time. Class defines the methods used to interact with all objects of that type. Each object has its own state. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
51
Overheads for Computers as Components, 2nd ed.
OO design principles Some objects will closely correspond to real-world objects. Some objects may be useful only for description or implementation. Objects provide interfaces to read/write state, hiding the object’s implementation from the rest of the system. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
52
Overheads for Computers as Components, 2nd ed.
UML Developed by Booch et al. Goals: object-oriented; visual; useful at many levels of abstraction; usable for all aspects of design. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
53
Overheads for Computers as Components, 2nd ed.
UML object object name class name d1: Display pixels is a 2-D array pixels: array[] of pixels elements menu_items attributes comment © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
54
Overheads for Computers as Components, 2nd ed.
UML class Display class name pixels elements menu_items mouse_click() draw_box operations © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
55
Overheads for Computers as Components, 2nd ed.
The class interface The operations provide the abstract interface between the class’s implementation and other classes. Operations may have arguments, return values. An operation can examine and/or modify the object’s state. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
56
Choose your interface properly
If the interface is too small/specialized: object is hard to use for even one application; even harder to reuse. If the interface is too large: class becomes too cumbersome for designers to understand; implementation may be too slow; spec and implementation are probably buggy. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
57
Relationships between objects and classes
Association: objects communicate but one does not own the other. Aggregation: a complex object is made of several smaller objects. Composition: aggregation in which owner does not allow access to its components. Generalization: define one class in terms of another. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
58
Overheads for Computers as Components, 2nd ed.
Class derivation May want to define one class in terms of another. Derived class inherits attributes, operations of base class. Derived_class UML generalization Base_class © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
59
Class derivation example
Display base class pixels elements menu_items pixel() set_pixel() mouse_click() draw_box derived class BW_display Color_map_display © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
60
Overheads for Computers as Components, 2nd ed.
Multiple inheritance base classes Speaker Display Multimedia_display derived class © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
61
Links and associations
Link: describes relationships between objects. Association: describes relationship between classes. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
62
Overheads for Computers as Components, 2nd ed.
Link example Link defines the contains relationship: message msg = msg1 length = 1102 message set count = 2 message msg = msg2 length = 2114 © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
63
Overheads for Computers as Components, 2nd ed.
Association example # contained messages # containing message sets message message set 0..* 1 msg: ADPCM_stream length : integer count : integer contains © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
64
Overheads for Computers as Components, 2nd ed.
Stereotypes Stereotype: recurring combination of elements in an object or class. Example: <<foo>> © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
65
Behavioral description
Several ways to describe behavior: internal view; external view. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
66
Overheads for Computers as Components, 2nd ed.
State machines transition a b state state name © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
67
Event-driven state machines
Behavioral descriptions are written as event-driven state machines. Machine changes state when receiving an input. An event may come from inside or outside of the system. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
68
Overheads for Computers as Components, 2nd ed.
Types of events Signal: asynchronous event. Call: synchronized communication. Timer: activated by time. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
69
Overheads for Computers as Components, 2nd ed.
Signal event <<signal>> mouse_click a leftorright: button x, y: position mouse_click(x,y,button) b declaration event description © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
70
Overheads for Computers as Components, 2nd ed.
Call event draw_box(10,5,3,2,blue) c d © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
71
Overheads for Computers as Components, 2nd ed.
Timer event tm(time-value) e f © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
72
Overheads for Computers as Components, 2nd ed.
Example state machine start finish input/output region = menu/ which_menu(i) mouse_click(x,y,button)/ find_region(region) call_menu(I) region found got menu item called menu item region = drawing/ find_object(objid) highlight(objid) object highlighted found object © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
73
Overheads for Computers as Components, 2nd ed.
Sequence diagram Shows sequence of operations over time. Relates behaviors of multiple objects. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
74
Sequence diagram example
m: Mouse d1: Display u: Menu mouse_click(x,y,button) which_menu(x,y,i) time call_menu(i) © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
75
Overheads for Computers as Components, 2nd ed.
Summary Object-oriented design helps us organize a design. UML is a transportable system design language. Provides structural and behavioral description primitives. © 2008 Wayne Wolf Overheads for Computers as Components, 2nd ed.
76
Overheads for Computers as Components 2nd ed.
Introduction Example: model train controller. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
77
Overheads for Computers as Components 2nd ed.
Purposes of example Follow a design through several levels of abstraction. Gain experience with UML. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
78
Overheads for Computers as Components 2nd ed.
Model train setup rcvr motor power supply console ECC address header command © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
79
Overheads for Computers as Components 2nd ed.
Requirements Console can control 8 trains on 1 track. Throttle has at least 63 levels. Inertia control adjusts responsiveness with at least 8 levels. Emergency stop button. Error detection scheme on messages. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
80
Overheads for Computers as Components 2nd ed.
Requirements form © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
81
Digital Command Control
DCC created by model railroad hobbyists, picked up by industry. Defines way in which model trains, controllers communicate. Leaves many system design aspects open, allowing competition. This is a simple example of a big trend: Cell phones, digital TV rely on standards. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
82
Overheads for Computers as Components 2nd ed.
DCC documents Standard S-9.1, DCC Electrical Standard. Defines how bits are encoded on the rails. Standard S-9.2, DCC Communication Standard. Defines packet format and semantics. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
83
DCC electrical standard
Voltage moves around the power supply voltage; adds no DC component. 1 is 58 ms, 0 is at least 100 ms. logic 1 logic 0 time 58 ms >= 100 ms © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
84
DCC communication standard
Basic packet format: PSA(sD)+E. P: preamble = S: packet start bit = 0. A: address data byte. s: data byte start bit. D: data byte (data payload). E: packet end bit = 1. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
85
Overheads for Computers as Components 2nd ed.
DCC packet types Baseline packet: minimum packet that must be accepted by all DCC implementations. Address data byte gives receiver address. Instruction data byte gives basic instruction. Error correction data byte gives ECC. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
86
Conceptual specification
Before we create a detailed specification, we will make an initial, simplified specification. Gives us practice in specification and UML. Good idea in general to identify potential problems before investing too much effort in detail. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
87
Overheads for Computers as Components 2nd ed.
Basic system commands © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
88
Typical control sequence
:console :train_rcvr set-inertia set-speed set-speed estop set-speed © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
89
Overheads for Computers as Components 2nd ed.
Message classes command set-speed set-inertia estop value: integer value: unsigned- integer © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
90
Roles of message classes
Implemented message classes derived from message class. Attributes and operations will be filled in for detailed specification. Implemented message classes specify message type by their class. May have to add type as parameter to data structure in implementation. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
91
Subsystem collaboration diagram
Shows relationship between console and receiver (ignores role of track): 1..n: command :console :receiver © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
92
System structure modeling
Some classes define non-computer components. Denote by *name. Choose important systems at this point to show basic relationships. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
93
Overheads for Computers as Components 2nd ed.
Major subsystem roles Console: read state of front panel; format messages; transmit messages. Train: receive message; interpret message; control the train. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
94
Console system classes
1 1 1 1 1 1 panel formatter transmitter 1 1 1 1 receiver* sender* © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
95
Overheads for Computers as Components 2nd ed.
Console class roles panel: describes analog knobs and interface hardware. formatter: turns knob settings into bit streams. transmitter: sends data on track. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
96
Overheads for Computers as Components 2nd ed.
Train system classes train set 1 1..t 1 1 train 1 motor interface 1 receiver 1 1 1 controller 1 1 1 detector* pulser* © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
97
Overheads for Computers as Components 2nd ed.
Train class roles receiver: digitizes signal from track. controller: interprets received commands and makes control decisions. motor interface: generates signals required by motor. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
98
Detailed specification
We can now fill in the details of the conceptual specification: more classes; behaviors. Sketching out the spec first helps us understand the basic relationships in the system. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
99
Overheads for Computers as Components 2nd ed.
Train speed control Motor controlled by pulse width modulation: duty cycle + V - © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
100
Console physical object classes
knobs* pulser* train-knob: integer speed-knob: integer inertia-knob: unsigned- integer emergency-stop: boolean pulse-width: unsigned- integer direction: boolean sender* detector* send-bit() read-bit() : integer © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
101
Panel and motor interface classes
speed: integer train-number() : integer speed() : integer inertia() : integer estop() : boolean new-settings() © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
102
Overheads for Computers as Components 2nd ed.
Class descriptions panel class defines the controls. new-settings() behavior reads the controls. motor-interface class defines the motor speed held as state. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
103
Transmitter and receiver classes
current: command new: boolean send-speed(adrs: integer, speed: integer) send-inertia(adrs: integer, val: integer) set-estop(adrs: integer) read-cmd() new-cmd() : boolean rcv-type(msg-type: command) rcv-speed(val: integer) rcv-inertia(val:integer) © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
104
Overheads for Computers as Components 2nd ed.
Class descriptions transmitter class has one behavior for each type of message sent. receiver function provides methods to: detect a new message; determine its type; read its parameters (estop has no parameters). © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
105
Overheads for Computers as Components 2nd ed.
Formatter class formatter current-train: integer current-speed[ntrains]: integer current-inertia[ntrains]: unsigned-integer current-estop[ntrains]: boolean send-command() panel-active() : boolean operate() © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
106
Formatter class description
Formatter class holds state for each train, setting for current train. The operate() operation performs the basic formatting task. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
107
Overheads for Computers as Components 2nd ed.
Control input cases Use a soft panel to show current panel settings for each train. Changing train number: must change soft panel settings to reflect current train’s speed, etc. Controlling throttle/inertia/estop: read panel, check for changes, perform command. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
108
Control input sequence diagram
:knobs :panel :formatter :transmitter change in control settings read panel panel-active change in speed/ inertia/estop panel settings send-command read panel send-speed, send-inertia. send-estop panel settings read panel change in train number train number change in panel settings new-settings set-knobs © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
109
Formatter operate behavior
update-panel() panel-active() new train number idle send-command() other © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
110
Panel-active behavior
current-train = train-knob update-screen changed = true panel*:read-train() F T current-speed = throttle changed = true panel*:read-speed() F ... ... © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
111
Overheads for Computers as Components 2nd ed.
Controller class controller current-train: integer current-speed[ntrains]: integer current-direction[ntrains]: boolean current-inertia[ntrains]: unsigned-integer operate() issue-command() © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
112
Overheads for Computers as Components 2nd ed.
Setting the speed Don’t want to change speed instantaneously. Controller should change speed gradually by sending several commands. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
113
Sequence diagram for set-speed command
:receiver :controller :motor-interface :pulser* new-cmd cmd-type rcv-speed set-speed set-pulse set-pulse set-pulse set-pulse set-pulse © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
114
Controller operate behavior
wait for a command from receiver receive-command() issue-command() © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
115
Refined command classes
type: 3-bits address: 3-bits parity: 1-bit set-speed set-inertia estop type=010 value: 7-bits type=001 value: 3-bits type=000 © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
116
Overheads for Computers as Components 2nd ed.
Summary Separate specification and programming. Small mistakes are easier to fix in the spec. Big mistakes in programming cost a lot of time. You can’t completely separate specification and architecture. Make a few tasteful assumptions. © 2000 Morgan Kaufman Overheads for Computers as Components 2nd ed.
117
Overheads for Computers as Components 2nd ed.
Instruction sets Computer architecture taxonomy. Assembly language. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
118
von Neumann architecture
Memory holds data, instructions. Central processing unit (CPU) fetches instructions from memory. Separate CPU and memory distinguishes programmable computer. CPU registers help out: program counter (PC), instruction register (IR), general-purpose registers, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
119
Overheads for Computers as Components 2nd ed.
CPU + memory memory address CPU PC 200 data 200 ADD r5,r1,r3 ADD r5,r1,r3 IR © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
120
Overheads for Computers as Components 2nd ed.
Harvard architecture address CPU data memory data PC address program memory data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
121
Overheads for Computers as Components 2nd ed.
von Neumann vs. Harvard Harvard can’t use self-modifying code. Harvard allows two simultaneous memory fetches. Most DSPs use Harvard architecture for streaming data: greater memory bandwidth; more predictable bandwidth. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
122
Overheads for Computers as Components 2nd ed.
RISC vs. CISC Complex instruction set computer (CISC): many addressing modes; many operations. Reduced instruction set computer (RISC): load/store; pipelinable instructions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
123
Instruction set characteristics
Fixed vs. variable length. Addressing modes. Number of operands. Types of operands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
124
Overheads for Computers as Components 2nd ed.
Programming model Programming model: registers visible to the programmer. Some registers are not visible (IR). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
125
Multiple implementations
Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes; etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
126
Overheads for Computers as Components 2nd ed.
Assembly language One-to-one with instructions (more or less). Basic features: One instruction per line. Labels provide names for addresses (usually in first column). Instructions often start in later columns. Columns run to end of line. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
127
ARM assembly language example
label1 ADR r4,c LDR r0,[r4] ; a comment ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ; comment © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
128
Overheads for Computers as Components 2nd ed.
Pseudo-ops Some assembler directives don’t correspond directly to instructions: Define current address. Reserve storage. Constants. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
129
Overheads for Computers as Components 2nd ed.
CPUs Input and output. Supervisor mode, exceptions, traps. Co-processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
130
Overheads for Computers as Components 2nd ed.
I/O devices Usually includes some non-digital component. Typical digital interface to CPU: status reg CPU mechanism data reg © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
131
Overheads for Computers as Components 2nd ed.
Application: 8251 UART Universal asynchronous receiver transmitter (UART) : provides serial communication. 8251 functions are integrated into standard PC interface chip. Allows many communication parameters to be programmed. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
132
Overheads for Computers as Components 2nd ed.
Serial communication Characters are transmitted separately: no char bit 0 bit 1 bit n-1 ... start stop time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
133
Serial communication parameters
Baud (bit) rate. Number of bits per character. Parity/no parity. Even/odd parity. Length of stop bit (1, 1.5, 2 bits). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
134
Overheads for Computers as Components 2nd ed.
8251 CPU interface 8251 status (8 bit) CPU xmit/ rcv data (8 bit) serial port © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
135
Overheads for Computers as Components 2nd ed.
Programming I/O Two types of instructions can support I/O: special-purpose I/O instructions; memory-mapped load/store instructions. Intel x86 provides in, out instructions. Most other CPUs use memory-mapped I/O. I/O instructions do not preclude memory-mapped I/O. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
136
Overheads for Computers as Components 2nd ed.
ARM memory-mapped I/O Define location for device: DEV1 EQU 0x1000 Read/write code: LDR r1,#DEV1 ; set up device adrs LDR r0,[r1] ; read DEV1 LDR r0,#8 ; set up value to write STR r0,[r1] ; write value to device © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
137
Overheads for Computers as Components 2nd ed.
Peek and poke Traditional HLL interfaces: int peek(char *location) { return *location; } void poke(char *location, char newval) { (*location) = newval; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
138
Overheads for Computers as Components 2nd ed.
Busy/wait output Simplest way to program device. Use instructions to test when device is ready. current_char = mystring; while (*current_char != ‘\0’) { poke(OUT_CHAR,*current_char); while (peek(OUT_STATUS) != 0); current_char++; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
139
Simultaneous busy/wait input and output
while (TRUE) { /* read */ while (peek(IN_STATUS) == 0); achar = (char)peek(IN_DATA); /* write */ poke(OUT_DATA,achar); poke(OUT_STATUS,1); while (peek(OUT_STATUS) != 0); } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
140
Overheads for Computers as Components 2nd ed.
Interrupt I/O Busy/wait is very inefficient. CPU can’t do other work while testing device. Hard to do simultaneous I/O. Interrupts allow a device to change the flow of control in the CPU. Causes subroutine call to handle device. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
141
Overheads for Computers as Components 2nd ed.
Interrupt interface intr request status reg CPU intr ack mechanism IR PC data/address data reg © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
142
Overheads for Computers as Components 2nd ed.
Interrupt behavior Based on subroutine call mechanism. Interrupt forces next instruction to be a subroutine call to a predetermined location. Return address is saved to resume executing foreground program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
143
Interrupt physical interface
CPU and device are connected by CPU bus. CPU and device handshake: device asserts interrupt request; CPU asserts interrupt acknowledge when it can handle the interrupt. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
144
Example: character I/O handlers
void input_handler() { achar = peek(IN_DATA); gotchar = TRUE; poke(IN_STATUS,0); } void output_handler() { © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
145
Example: interrupt-driven main program
while (TRUE) { if (gotchar) { poke(OUT_DATA,achar); poke(OUT_STATUS,1); gotchar = FALSE; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
146
Example: interrupt I/O with buffers
Queue for characters: head tail a head tail © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
147
Buffer-based input handler
void input_handler() { char achar; if (full_buffer()) error = 1; else { achar = peek(IN_DATA); add_char(achar); } poke(IN_STATUS,0); if (nchars == 1) { poke(OUT_DATA,remove_char(); poke(OUT_STATUS,1); } } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
148
Overheads for Computers as Components 2nd ed.
I/O sequence diagram :foreground :input :output :queue empty a empty b bc c © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
149
Debugging interrupt code
What if you forget to change registers? Foreground program can exhibit mysterious bugs. Bugs will be hard to repeat---depend on interrupt timing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
150
Priorities and vectors
Two mechanisms allow us to make interrupts more specific: Priorities determine what interrupt gets CPU first. Vectors determine what code is called for each type of interrupt. Mechanisms are orthogonal: most CPUs provide both. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
151
Prioritized interrupts
device 1 device 2 device n interrupt acknowledge L1 L2 .. Ln CPU © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
152
Interrupt prioritization
Masking: interrupt with priority lower than current priority is not recognized until pending interrupt is complete. Non-maskable interrupt (NMI): highest-priority, never masked. Often used for power-down. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
153
Example: Prioritized I/O
:interrupts :foreground :A :B :C B C A A,B © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
154
Overheads for Computers as Components 2nd ed.
Interrupt vectors Allow different devices to be handled by different code. Interrupt vector table: Interrupt vector table head handler 0 handler 1 handler 2 handler 3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
155
Interrupt vector acquisition
:CPU :device receive request receive ack receive vector © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
156
Generic interrupt mechanism
continue execution intr? Assume priority selection is handled before this point. N Y intr priority > current priority? N ignore Y ack Y bus error Y N timeout? vector? Y call table[vector] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
157
Overheads for Computers as Components 2nd ed.
Interrupt sequence CPU acknowledges request. Device sends vector. CPU calls handler. Software processes request. CPU restores state to foreground program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
158
Sources of interrupt overhead
Handler execution time. Interrupt mechanism overhead. Register save/restore. Pipeline-related penalties. Cache-related penalties. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
159
Overheads for Computers as Components 2nd ed.
ARM interrupts ARM7 supports two types of interrupts: Fast interrupt requests (FIQs). Interrupt requests (IRQs). Interrupt table starts at location 0. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
160
ARM interrupt procedure
CPU actions: Save PC. Copy CPSR to SPSR. Force bits in CPSR to record interrupt. Force PC to vector. Handler responsibilities: Restore proper PC. Restore CPSR from SPSR. Clear interrupt disable flags. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
161
Overheads for Computers as Components 2nd ed.
ARM interrupt latency Worst-case latency to respond to interrupt is 27 cycles: Two cycles to synchronize external request. Up to 20 cycles to complete current instruction. Three cycles for data abort. Two cycles to enter interrupt handling state. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
162
Overheads for Computers as Components 2nd ed.
C55x interrupts Latency is between 7 and 13 cycles. Maskable interrupt sequence: Interrupt flag register is set. Interrupt enable register is checked. Interrupt mask register is checked. Interrupt flag register is cleared. Appropriate registers are saved. INTM set to 1, DBGM set to 1, EALLOW set to 0. Branch to ISR. Two styles of return: fast and slow. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
163
Overheads for Computers as Components 2nd ed.
Supervisor mode May want to provide protective barriers between programs. Avoid memory corruption. Need supervisor mode to manage the various programs. SHARC does not have a supervisor mode. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
164
Overheads for Computers as Components 2nd ed.
ARM supervisor mode Use SWI instruction to enter supervisor mode, similar to subroutine: SWI CODE_1 Sets PC to 0x08. Argument to SWI is passed to supervisor mode code. Saves CPSR in SPSR. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
165
Overheads for Computers as Components 2nd ed.
Exception Exception: internally detected error. Exceptions are synchronous with instructions but unpredictable. Build exception mechanism on top of interrupt mechanism. Exceptions are usually prioritized and vectorized. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
166
Overheads for Computers as Components 2nd ed.
Trap Trap (software interrupt): an exception generated by an instruction. Call supervisor mode. ARM uses SWI instruction for traps. SHARC offers three levels of software interrupts. Called by setting bits in IRPTL register. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
167
Overheads for Computers as Components 2nd ed.
Co-processor Co-processor: added function unit that is called by instruction. Floating-point units are often structured as co-processors. ARM allows up to 16 designer-selected co-processors. Floating-point co-processor uses units 1, 2. C55x uses co-processors as well. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
168
C55x image/video hardware extensions
Available in 5509 and 5510. Equivalent C-callable functions for other devices. Available extensions: DCT/IDCT. Pixel interpolation Motion estimation. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
169
Overheads for Computers as Components 2nd ed.
DCT/IDCT 2-D DCT/IDCT is computed from two 1-D DCT/IDCT. Put data in different banks to maximize throughput. block Column DCT interim DCT Row DCT © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
170
C55 DCT/IDCT coprocessor extensions
Load, compute, transfer to accumulators: ACy=copr(k8,ACx,Xmem,Ymem) Compute, transfer, mem write: ACy=copr(k8,ACx,ACy), Lmem=ACz Special: ACy=copr(k8,ACx,ACy) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
171
Software pipelined load/compute/store for DCT
Iteration i-1 Iteration i Dual_load 8 compute 4 Long_store 4 empty 3 empty Iteration i+1 Dual_load 8 compute 4 Long_store 4 empty 3 empty Dual_load op_i(0), load_i+1(0,1) op_i(1), store_i-1(0,1) op_i(2), store_i-1(2,3) op_i(2), store_i-1(4,5) op_i(2), store_i-1(6,7) op_i(2), load_i+1(2,3) … 4 empty 3 Dual_load 8 compute empty 4 Long_store © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
172
Overheads for Computers as Components 2nd ed.
C55 motion estimation Search strategy: Full vs. non-full. Accuracy: Full-pixel vs. half-pixel. Number of returned motion vectors: 1 (one 16x16) vs. 4 (four 8x8). Algorithms: 3-step algorithm (distance 4,2,1). 4-step algorithm (distance 8,4,2,1). 4-step with half-pixel refinement. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
173
Four-step motion estimation breakdown
for (i=0; i<4; i++) { compute 3 upper differences for d[i]; compute 3 middle differences for d[i]; compute 3 lower differences for d[i]; compute minimum value; move to next d; } X © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
174
C55 motion estimation accelerator
Includes 3 16-bit pixel data paths, 3 16-bit absolute differences (ADs). Basic operation: [ACx,ACy] = copr(k8,ACx,ACy,Xmem,Ymem,Coeff) K8 = control bits (enable AD units, etc.) ACx, ACy = accumulated absolute differences Xmem, Ymem = pointers to odd, even lines of the search window Pointer to two adjacent pixels from reference window © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
175
Overheads for Computers as Components 2nd ed.
C55 pixel interpolation Given four pixels A, B, C, D, interpolate three half-pixels: A U M R B C D © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
176
Pixel interpolation coprocessor operations
Load pixels and compute: ACy=copr(k8,AC,Lmem) Load pixels, compute, and store: ACy=copr(k8,AACx,Lmem) || Lmem=ACz © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
177
Overheads for Computers as Components 2nd ed.
CPUs Caches. Memory management. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
178
Overheads for Computers as Components 2nd ed.
Caches and CPUs address data cache main memory CPU controller cache address data data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
179
Overheads for Computers as Components 2nd ed.
Cache operation Many main memory locations are mapped onto one cache entry. May have caches for: instructions; data; data + instructions (unified). Memory access time is no longer deterministic. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
180
Overheads for Computers as Components 2nd ed.
Terms Cache hit: required location is in cache. Cache miss: required location is not in cache. Working set: set of locations used by program in a time interval. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
181
Overheads for Computers as Components 2nd ed.
Types of misses Compulsory (cold): location has never been accessed. Capacity: working set is too large. Conflict: multiple locations in working set map to same cache entry. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
182
Memory system performance
h = cache hit rate. tcache = cache access time, tmain = main memory access time. Average memory access time: tav = htcache + (1-h)tmain © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
183
Multiple levels of cache
L2 cache CPU L1 cache © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
184
Multi-level cache access time
h1 = cache hit rate. h2 = rate for miss on L1, hit on L2. Average memory access time: tav = h1tL1 + (h2-h1)tL2 + (1- h2-h1)tmain © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
185
Overheads for Computers as Components 2nd ed.
Replacement policies Replacement policy: strategy for choosing which cache entry to throw out to make room for a new memory location. Two popular strategies: Random. Least-recently used (LRU). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
186
Overheads for Computers as Components 2nd ed.
Cache organizations Fully-associative: any memory location can be stored anywhere in the cache (almost never implemented). Direct-mapped: each memory location maps onto exactly one cache entry. N-way set-associative: each memory location can go into one of n sets. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
187
Cache performance benefits
Keep frequently-accessed locations in fast cache. Cache retrieves more than one word at a time. Sequential accesses are faster after first access. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
188
Overheads for Computers as Components 2nd ed.
Direct-mapped cache valid tag data 1 0xabcd byte byte byte ... byte cache block tag index offset = hit value © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
189
Overheads for Computers as Components 2nd ed.
Write operations Write-through: immediately copy write to main memory. Write-back: write to main memory only when location is removed from cache. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
190
Direct-mapped cache locations
Many locations map onto the same cache block. Conflict misses are easy to generate: Array a[] uses locations 0, 1, 2, … Array b[] uses locations 1024, 1025, 1026, … Operation a[i] + b[i] generates conflict misses. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
191
Set-associative cache
A set of direct-mapped caches: Set 1 Set 2 Set n ... hit data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
192
Example: direct-mapped vs. set-associative
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
193
Direct-mapped cache behavior
After 001 access: block tag data After 010 access: block tag data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
194
Direct-mapped cache behavior, cont’d.
After 011 access: block tag data After 100 access: block tag data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
195
Direct-mapped cache behavior, cont’d.
After 101 access: block tag data After 111 access: block tag data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
196
2-way set-associtive cache behavior
Final state of cache (twice as big as direct-mapped): set blk 0 tag blk 0 data blk 1 tag blk 1 data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
197
2-way set-associative cache behavior
Final state of cache (same size as direct-mapped): set blk 0 tag blk 0 data blk 1 tag blk 1 data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
198
Overheads for Computers as Components 2nd ed.
Example caches StrongARM: 16 Kbyte, 32-way, 32-byte block instruction cache. 16 Kbyte, 32-way, 32-byte block data cache (write-back). SHARC: 32-instruction, 2-way instruction cache. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
199
Memory management units
Memory management unit (MMU) translates addresses: main memory logical address memory management unit physical address CPU © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
200
Memory management tasks
Allows programs to move in physical memory during execution. Allows virtual memory: memory images kept in secondary storage; images returned to main memory on demand during execution. Page fault: request for location not resident in memory. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
201
Overheads for Computers as Components 2nd ed.
Address translation Requires some sort of register/table to allow arbitrary mappings of logical to physical addresses. Two basic schemes: segmented; paged. Segmentation and paging can be combined (x86). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
202
Overheads for Computers as Components 2nd ed.
Segments and pages memory page 1 segment 1 page 2 segment 2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
203
Segment address translation
segment base address logical address + segment lower bound range error range check segment upper bound physical address © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
204
Page address translation
offset page i base concatenate page offset © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
205
Page table organizations
descriptor page descriptor flat tree © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
206
Caching address translations
Large translation tables require main memory access. TLB: cache for address translation. Typically small. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
207
Overheads for Computers as Components 2nd ed.
ARM memory management Memory region types: section: 1 Mbyte block; large page: 64 kbytes; small page: 4 kbytes. An address is marked as section-mapped or page-mapped. Two-level translation scheme. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
208
ARM address translation
Translation table base register 1st index 2nd index offset 1st level table descriptor concatenate concatenate 2nd level table descriptor physical address © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
209
Overheads for Computers as Components 2nd ed.
CPUs CPU performance CPU power consumption. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
210
Elements of CPU performance
Cycle time. CPU pipeline. Memory system. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
211
Overheads for Computers as Components 2nd ed.
Pipelining Several instructions are executed simultaneously at different stages of completion. Various conditions can cause pipeline bubbles that reduce utilization: branches; memory system delays; etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
212
Overheads for Computers as Components 2nd ed.
Performance measures Latency: time it takes for an instruction to get through the pipeline. Throughput: number of instructions executed per time period. Pipelining increases throughput without reducing latency. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
213
Overheads for Computers as Components 2nd ed.
ARM7 pipeline ARM 7 has 3-stage pipe: fetch instruction from memory; decode opcode and operands; execute. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
214
ARM pipeline execution
add r0,r1,#5 fetch decode fetch execute decode fetch execute decode sub r2,r3,r6 execute cmp r2,#3 time 1 2 3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
215
Overheads for Computers as Components 2nd ed.
Pipeline stalls If every step cannot be completed in the same amount of time, pipeline stalls. Bubbles introduced by stall increase latency, reduce throughput. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
216
ARM multi-cycle LDMIA instruction
r0,{r2,r3} fetch decode ex ld r2 ex ld r3 sub r2,r3,r6 fetch decode ex sub cmp r2,#3 fetch decode ex cmp time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
217
Overheads for Computers as Components 2nd ed.
Control stalls Branches often introduce stalls (branch penalty). Stall time may depend on whether branch is taken. May have to squash instructions that already started executing. Don’t know what to fetch until condition is evaluated. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
218
Overheads for Computers as Components 2nd ed.
ARM pipelined branch fetch decode ex bne bne foo sub r2,r3,r6 foo add r0,r1,r2 ex bne fetch decode ex add time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
219
Overheads for Computers as Components 2nd ed.
Delayed branch To increase pipeline efficiency, delayed branch mechanism requires n instructions after branch always executed whether branch is executed or not. SHARC supports delayed and non-delayed branches. Specified by bit in branch instruction. 2 instruction branch delay slot. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
220
Example: ARM execution time
Determine execution time of FIR filter: for (i=0; i<N; i++) f = f + c[i]*x[i]; Only branch in loop test may take more than one cycle. BLT loop takes 1 cycle best case, 3 worst case. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
221
Overheads for Computers as Components 2nd ed.
FIR filter ARM code ; loop initiation code MOV r0,#0 ; use r0 for i, set to 0 MOV r8,#0 ; use a separate index for arrays ADR r2,N ; get address for N LDR r1,[r2] ; get value of N MOV r2,#0 ; use r2 for f, set to 0 ADR r3,c ; load r3 with address of base of c ADR r5,x ; load r5 with address of base of x ; loop body loop LDR r4,[r3,r8] ; get value of c[i] LDR r6,[r5,r8] ; get value of x[i] MUL r4,r4,r6 ; compute c[i]*x[i] ADD r2,r2,r4 ; add into running sum ; update loop counter and array index ADD r8,r8,#4 ; add one to array index ADD r0,r0,#1 ; add 1 to i ; test for exit CMP r0,r1 BLT loop ; if i < N, continue loop loopend ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
222
FIR filter performance by block
Variable # instructions # cycles Initialization tinit 7 Body tbody 4 Update tupdate 2 Test ttest [2,4] tloop = tinit+ N(tbody + tupdate) + (N-1) ttest,worst + ttest,best Loop test succeeds is worst case Loop test fails is best case © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
223
Overheads for Computers as Components 2nd ed.
C55x pipeline C55x has 7-stage pipe: fetch; decode; address: computes data/branch addresses; access 1: reads data; access 2: finishes data read; Read stage: puts operands on internal busses; execute. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
224
Overheads for Computers as Components 2nd ed.
C55x organization C, D busses Dual operand read D bus Single operand read B bus Dual-multiply coefficient 3 data read busses 16 Data read from memory 3 data read address busses 24 program address bus 24 Instruction fetch program read bus Instruction unit Program flow unit Address unit Data unit 32 Writes 2 data write busses 16 2 data write address busses 24 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
225
Overheads for Computers as Components 2nd ed.
C55x pipeline hazards Processor structure: Three computation units. 14 operators. Can perform two operations per instruction. Some combinations of operators are not legal. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
226
Overheads for Computers as Components 2nd ed.
C55x hazards A-unit ALU/A-unit ALU. A-unit swap/A-unit swap. D-unit ALU,shifter,MAC/D-unit ALU,shifter,MAC D-unit shifter/D-unit shift, store D-unit shift, store/D-unit shift, store D-unit swap/D-unit swap P-unit control/P-unit control © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
227
Memory system performance
Caches introduce indeterminacy in execution time. Depends on order of execution. Cache miss penalty: added time due to a cache miss. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
228
Overheads for Computers as Components 2nd ed.
Types of cache misses Compulsory miss: location has not been referenced before. Conflict miss: two locations are fighting for the same block. Capacity miss: working set is too large. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
229
Overheads for Computers as Components 2nd ed.
CPU power consumption Most modern CPUs are designed with power consumption in mind to some degree. Power vs. energy: heat depends on power consumption; battery life depends on energy consumption. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
230
CMOS power consumption
Voltage drops: power consumption proportional to V2. Toggling: more activity means more power. Leakage: basic circuit characteristics; can be eliminated by disconnecting power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
231
CPU power-saving strategies
Reduce power supply voltage. Run at lower clock frequency. Disable function units with control signals when not in use. Disconnect parts from power supply when not in use. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
232
Overheads for Computers as Components 2nd ed.
C55x low power features Parallel execution units---longer idle shutdown times. Multiple data widths: 16-bit ALU vs. 40-bit ALU. Instruction caches minimizes main memory accesses. Power management: Function unit idle detection. Memory idle detection. User-configurable IDLE domains allow programmer control of what hardware is shut down. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
233
Power management styles
Static power management: does not depend on CPU activity. Example: user-activated power-down mode. Dynamic power management: based on CPU activity. Example: disabling off function units. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
234
Application: PowerPC 603 energy features
Provides doze, nap, sleep modes. Dynamic power management features: Uses static logic. Can shut down unused execution units. Cache organized into subarrays to minimize amount of active circuitry. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
235
Overheads for Computers as Components 2nd ed.
PowerPC 603 activity Percentage of time units are idle for SPEC integer/floating-point: unit Specint92 Specfp92 D cache 29% 28% I cache 29% 17% load/store 35% 17% fixed-point 38% 76% floating-point 99% 30% system register 89% 97% © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
236
Overheads for Computers as Components 2nd ed.
Power-down costs Going into a power-down mode costs: time; energy. Must determine if going into mode is worthwhile. Can model CPU power states with power state machine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
237
Application: StrongARM SA-1100 power saving
Processor takes two supplies: VDD is main 3.3V supply. VDDX is 1.5V. Three power modes: Run: normal operation. Idle: stops CPU clock, with logic still powered. Sleep: shuts off most of chip activity; 3 steps, each about 30 ms; wakeup takes > 10 ms. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
238
SA-1100 power state machine
Prun = 400 mW run 10 ms 160 ms 90 ms 10 ms 90 ms idle sleep Pidle = 50 mW Psleep = 0.16 mW © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
239
Overheads for Computers as Components 2nd ed.
CPUs Example: data compressor. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
240
Overheads for Computers as Components 2nd ed.
Goals Compress data transmitted over serial line. Receives byte-size input symbols. Produces output symbols packed into bytes. Will build software module only here. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
241
Collaboration diagram for compressor
1..m: packed output symbols 1..n: input symbols :input :data compressor :output © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
242
Overheads for Computers as Components 2nd ed.
Huffman coding Early statistical text compression algorithm. Select non-uniform size codes. Use shorter codes for more common symbols. Use longer codes for less common symbols. To allow decoding, codes must have unique prefixes. No code can be a prefix of a longer valid code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
243
Overheads for Computers as Components 2nd ed.
Huffman example character P a .45 b .24 c .11 d .08 e .07 f .05 P=1 P=.55 P=.31 P=.19 P=.12 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
244
Overheads for Computers as Components 2nd ed.
Example Huffman code Read code from root to leaves: a 1 b 01 c 0000 d 0001 e 0010 f 0011 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
245
Huffman coder requirements table
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
246
Building a specification
Collaboration diagram shows only steady-state input/output. A real system must: Accept an encoding table. Allow a system reset that flushes the compression buffer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
247
data-compressor class
buffer: data-buffer table: symbol-table current-bit: integer encode(): boolean, data-buffer flush() new-symbol-table() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
248
data-compressor behaviors
encode: Takes one-byte input, generates packed encoded symbols and a Boolean indicating whether the buffer is full. new-symbol-table: installs new symbol table in object, throws away old table. flush: returns current state of buffer, including number of valid bits in buffer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
249
Overheads for Computers as Components 2nd ed.
Auxiliary classes data-buffer symbol-table databuf[databuflen] : character len : integer symbols[nsymbols] : data-buffer len : integer insert() length() : integer value() : symbol load() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
250
Overheads for Computers as Components 2nd ed.
Auxiliary class roles data-buffer holds both packed and unpacked symbols. Longest Huffman code for 8-bit inputs is 256 bits. symbol-table indexes encoded verison of each symbol. load() puts data in a new symbol table. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
251
Overheads for Computers as Components 2nd ed.
Class relationships data-compressor 1 1 1 1 data-buffer symbol-table © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
252
Overheads for Computers as Components 2nd ed.
Encode behavior create new buffer add to buffers return true T input symbol encode buffer filled? F add to buffer return false © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
253
Overheads for Computers as Components 2nd ed.
Insert behavior pack into this buffer input symbol T update length fills buffer? F pack bottom bits into this buffer, top bits into overflow buffer © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
254
Overheads for Computers as Components 2nd ed.
Program design In an object-oriented language, we can reflect the UML specification in the code more directly. In a non-object-oriented language, we must either: add code to provide object-oriented features; diverge from the specification structure. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
255
Overheads for Computers as Components 2nd ed.
C++ classes Class data_buffer { char databuf[databuflen]; int len; int length_in_chars() { return len/bitsperbyte; } public: void insert(data_buffer,data_buffer&); int length() { return len; } int length_in_bytes() { return (int)ceil(len/8.0); } int initialize(); ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
256
Overheads for Computers as Components 2nd ed.
C++ classes, cont’d. class data_compressor { data_buffer buffer; int current_bit; symbol_table table; public: boolean encode(char,data_buffer&); void new_symbol_table(symbol_table); int flush(data_buffer&); data_compressor(); ~data_compressor(); } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
257
Overheads for Computers as Components 2nd ed.
C code struct data_compressor_struct { data_buffer buffer; int current_bit; sym_table table; } typedef struct data_compressor_struct data_compressor, *data_compressor_ptr; boolean data_compressor_encode(data_compressor_ptr mycmptrs, char isymbol, data_buffer *fullbuf) ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
258
Overheads for Computers as Components 2nd ed.
Testing Test by encoding, then decoding: symbol table result input symbols encoder decoder compare © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
259
Overheads for Computers as Components 2nd ed.
Code inspection tests Look at the code for potential problems: Can we run past end of symbol table? What happens when the next symbol does not fill the buffer? Does fill it? Do very long encoded symbols work properly? Very short symbols? Does flush() work properly? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
260
Bus-Based Computer Systems
Busses. Memory devices. I/O devices: serial links timers and counters keyboards displays analog I/O © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
261
Overheads for Computers as Components 2nd ed.
The CPU bus Bus allows CPU, memory, devices to communicate. Shared communication medium. A bus is: A set of wires. A communications protocol. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
262
Overheads for Computers as Components 2nd ed.
Bus protocols Bus protocol determines how devices communicate. Devices on the bus go through sequences of states. Protocols are specified by state machines, one state machine per actor in the protocol. May contain asynchronous logic behavior. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
263
Overheads for Computers as Components 2nd ed.
Four-cycle handshake device 1 enq device 1 device 2 ack device 2 1 2 3 4 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
264
Four-cycle handshake, cont’d.
Device 1 raises enq. Device 2 responds with ack. Device 2 lowers ack once it has finished. Device 1 lowers enq. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
265
Microprocessor busses
Clock provides synchronization. R/W is true when reading (R/W’ is false when reading). Address is a-bit bundle of address lines. Data is n-bit bundle of data lines. Data ready signals when n-bit data is ready. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
266
Overheads for Computers as Components 2nd ed.
Timing diagrams © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
267
Overheads for Computers as Components 2nd ed.
Bus read © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
268
State diagrams for bus read
Get data Senddata Done Release ack See ack Ack Adrs Adrs Wait Wait device CPU start © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
269
Overheads for Computers as Components 2nd ed.
Bus wait state © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
270
Overheads for Computers as Components 2nd ed.
Bus burst read © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
271
Overheads for Computers as Components 2nd ed.
Bus multiplexing device data enable CPU data adrs adrs Adrs enable © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
272
Overheads for Computers as Components 2nd ed.
DMA Direct memory access (DMA) performs data transfers without executing instructions. CPU sets up transfer. DMA engine fetches, writes. DMA controller is a separate unit. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
273
Overheads for Computers as Components 2nd ed.
Bus mastership By default, CPU is bus master and initiates transfers. DMA must become bus master to perform its work. CPU can’t use bus while DMA operates. Bus mastership protocol: Bus request. Bus grant. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
274
Overheads for Computers as Components 2nd ed.
DMA operation CPU sets DMA registers for start address, length. DMA status register controls the unit. Once DMA is bus master, it transfers automatically. May run continuously until complete. May use every nth bus cycle. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
275
Bus transfer sequence diagram
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
276
System bus configurations
Multiple busses allow parallelism: Slow devices on one bus. Fast devices on separate bus. A bridge connects two busses. CPU slow device bridge memory slow device high-speed device © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
277
Overheads for Computers as Components 2nd ed.
Bridge state diagram © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
278
Overheads for Computers as Components 2nd ed.
ARM AMBA bus Two varieties: AHB is high-performance. APB is lower-speed, lower cost. AHB supports pipelining, burst transfers, split transactions, multiple bus masters. All devices are slaves on APB. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
279
Overheads for Computers as Components 2nd ed.
Memory components Several different types of memory: DRAM. SRAM. Flash. Each type of memory comes in varying: Capacities. Widths. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
280
Overheads for Computers as Components 2nd ed.
Random-access memory Dynamic RAM is dense, requires refresh. Synchronous DRAM is dominant type. SDRAM uses clock to improve performance, pipeline memory accesses. Static RAM is faster, less dense, consumes more power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
281
Overheads for Computers as Components 2nd ed.
SDRAM operation © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
282
Overheads for Computers as Components 2nd ed.
Read-only memory ROM may be programmed at factory. Flash is dominant form of field-programmable ROM. Electrically erasable, must be block erased. Random access, but write/erase is much slower than read. NOR flash is more flexible. NAND flash is more dense. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
283
Overheads for Computers as Components 2nd ed.
Timers and counters Very similar: a timer is incremented by a periodic signal; a counter is incremented by an asynchronous, occasional signal. Rollover causes interrupt. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
284
Overheads for Computers as Components 2nd ed.
Watchdog timer Watchdog timer is periodically reset by system timer. If watchdog is not reset, it generates an interrupt to reset the host. host CPU interrupt watchdog timer reset © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
285
Overheads for Computers as Components 2nd ed.
Switch debouncing A switch must be debounced to multiple contacts caused by eliminate mechanical bouncing: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
286
Overheads for Computers as Components 2nd ed.
Encoded keyboard An array of switches is read by an encoder. N-key rollover remembers multiple key depressions. row © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
287
Overheads for Computers as Components 2nd ed.
LED Must use resistor to limit current: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
288
Overheads for Computers as Components 2nd ed.
7-segment LCD display May use parallel or multiplexed input. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
289
Types of high-resolution display
Liquid crystal display (LCD) is dominant form. Plasma, OLED, etc. Frame buffer holds current display contents. Written by processor. Read by video. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
290
Overheads for Computers as Components 2nd ed.
Touchscreen Includes input and output device. Input device is a two-dimensional voltmeter: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
291
Touchscreen position sensing
ADC voltage © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
292
Digital-to-analog conversion
Use resistor tree: R Vout bn 2R bn-1 4R bn-2 8R bn-3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
293
Overheads for Computers as Components 2nd ed.
Flash A/D conversion N-bit result requires 2n comparators: Vin encoder ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
294
Dual-slope conversion
Use counter to time required to charge/discharge capacitor. Charging, then discharging eliminates non-linearities. Vin timer © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
295
Overheads for Computers as Components 2nd ed.
Sample-and-hold Samples data: converter Vin © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
296
Bus-Based Computer Systems
Designing with microprocessors. Development and debugging. System-level performance analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
297
Overheads for Computers as Components 2nd ed.
System architectures Architectures and components: software; hardware. Some software is very hardware-dependent. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
298
Hardware platform architecture
Contains several elements: CPU; bus; memory; I/O devices: networking, sensors, actuators, etc. How big/fast much each one be? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
299
Software architecture
Functional description must be broken into pieces: division among people; conceptual organization; performance; testability; maintenance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
300
Hardware and software architectures
Hardware and software are intimately related: software doesn’t run without hardware; how much hardware you need is determined by the software requirements: speed; memory. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
301
Overheads for Computers as Components 2nd ed.
Evaluation boards Designed by CPU manufacturer or others. Includes CPU, memory, some I/O devices. May include prototyping section. CPU manufacturer often gives out evaluation board netlist---can be used as starting point for your custom board design. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
302
Overheads for Computers as Components 2nd ed.
Adding logic to a board Programmable logic devices (PLDs) provide low/medium density logic. Field-programmable gate arrays (FPGAs) provide more logic and multi-level logic. Application-specific integrated circuits (ASICs) are manufactured for a single purpose. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
303
Overheads for Computers as Components 2nd ed.
The PC as a platform Advantages: cheap and easy to get; rich and familiar software environment. Disadvantages: requires a lot of hardware resources; not well-adapted to real-time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
304
Typical PC hardware platform
CPU memory device CPU bus interface bus high-speed bus DMA controller intr ctrl timers low-speed bus bus interface device © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
305
Overheads for Computers as Components 2nd ed.
Typical busses PCI: standard for high-speed interfacing 33 or 66 MHz. PCI Express. USB (Universal Serial Bus), Firewire (IEEE 1394): relatively low-cost serial interface with high speed. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
306
Overheads for Computers as Components 2nd ed.
Software elements IBM PC uses BIOS (Basic I/O System) to implement low-level functions: boot-up; minimal device drivers. BIOS has become a generic term for the lowest-level system software. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
307
Overheads for Computers as Components 2nd ed.
Example: StrongARM StrongARM system includes: CPU chip (3.686 MHz clock) system control module ( kHz clock). Real-time clock; operating system timer general-purpose I/O; interrupt controller; power manager controller; reset controller. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
308
Debugging embedded systems
Challenges: target system may be hard to observe; target may be hard to control; may be hard to generate realistic inputs; setup sequence may be complex. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
309
Overheads for Computers as Components 2nd ed.
Host/target design Use a host system to prepare software for target system: target system serial line host system © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
310
Overheads for Computers as Components 2nd ed.
Host-based tools Cross compiler: compiles code on host for target system. Cross debugger: displays target state, allows target system to be controlled. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
311
Overheads for Computers as Components 2nd ed.
Software debuggers A monitor program residing on the target provides basic debugger functions. Debugger should have a minimal footprint in memory. User program must be careful not to destroy debugger program, but , should be able to recover from some damage caused by user code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
312
Overheads for Computers as Components 2nd ed.
Breakpoints A breakpoint allows the user to stop execution, examine system state, and change state. Replace the breakpointed instruction with a subroutine call to the monitor program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
313
Overheads for Computers as Components 2nd ed.
ARM breakpoints 0x400 MUL r4,r6,r6 0x404 ADD r2,r2,r4 0x408 ADD r0,r0,#1 0x40c B loop uninstrumented code 0x400 MUL r4,r6,r6 0x404 ADD r2,r2,r4 0x408 ADD r0,r0,#1 0x40c BL bkpoint code with breakpoint © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
314
Breakpoint handler actions
Save registers. Allow user to examine machine. Before returning, restore system state. Safest way to execute the instruction is to replace it and execute in place. Put another breakpoint after the replaced breakpoint to allow restoring the original breakpoint. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
315
Overheads for Computers as Components 2nd ed.
In-circuit emulators A microprocessor in-circuit emulator is a specially-instrumented microprocessor. Allows you to stop execution, examine CPU state, modify registers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
316
Overheads for Computers as Components 2nd ed.
Logic analyzers A logic analyzer is an array of low-grade oscilloscopes: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
317
Logic analyzer architecture
UUT sample memory microprocessor system clock vector address controller state or timing mode clock gen keypad display © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
318
Overheads for Computers as Components 2nd ed.
Boundary scan Simplifies testing of multiple chips on a board. Registers on pins can be configured as a scan chain. Used for debuggers, in-circuit emulators. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
319
Overheads for Computers as Components 2nd ed.
How to exercise code Run on host system. Run on target system. Run in instruction-level simulator. Run on cycle-accurate simulator. Run in hardware/software co-simulation environment. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
320
Debugging real-time code
Bugs in drivers can cause non-deterministic behavior in the foreground problem. Bugs may be timing-dependent. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
321
System-level performance analysis
Performance depends on all the elements of the system: CPU. Cache. Bus. Main memory. I/O device. CPU memory cache © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
322
Bandwidth as performance
Bandwidth applies to several components: Memory. Bus. CPU fetches. Different parts of the system run at different clock rates. Different components may have different widths (bus, memory). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
323
Bandwidth and data transfers
Video frame: 320 x 240 x 3 = 230,400 bytes. Transfer in 1/30 sec. Transfer 1 byte/msec, 0.23 sec per frame. Too slow. Increase bandwidth: Increase bus width. Increase bus clock rate. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
324
Overheads for Computers as Components 2nd ed.
Bus bandwidth T: # bus cycles. P: time/bus cycle. Total time for transfer: t = TP. D: data payload length. O1 + O2 = overhead O. O1 D O2 W Tbasic(N) = (D+O)N/W © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
325
Bus burst transfer bandwidth
T: # bus cycles. P: time/bus cycle. Total time for transfer: t = TP. D: data payload length. O1 + O2 = overhead O. 1 2 B O … W Tburst(N) = (BD+O)N/(BW) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
326
Overheads for Computers as Components 2nd ed.
Memory aspect ratios 16 M 64 M 8 M 8 1 4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
327
Overheads for Computers as Components 2nd ed.
Memory access times Memory component access times comes from chip data sheet. Page modes allow faster access for successive transfers on same page. If data doesn’t fit naturally into physical words: A = [(E/w)mod W]+1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
328
Bus performance bottlenecks
Transfer 320 x 240 video 30 frames/sec = 612,000 bytes/sec. Is performance bottleneck bus or memory? CPU memory © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
329
Bus performance bottlenecks, cont’d.
Bus: assume 1 MHz bus, D=1, O=3: Tbasic = (1+3)612,000/2 = 1,224,000 cycles = sec. Memory: try burst mode B=4, width w=0.5. Tmem = (4*1+4)612,000/(4*0.5) = 2,448,000 cycles = sec. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
330
Performance spreadsheet
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
331
Overheads for Computers as Components 2nd ed.
Parallelism Speed things up by running several units at once. DMA provides parallelism if CPU doesn’t need the bus: DMA + bus. CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
332
Bus-Based Computer Systems
Example: alarm clock © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
333
Overheads for Computers as Components 2nd ed.
Alarm clock interface Alarm on Alarm off buzzer PM Alarm ready light set time set alarm hour minute button © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
334
Overheads for Computers as Components 2nd ed.
Operations Set time: hold set time, depress hour, minute. Set alarm time: hold set alarm, depress hour, minute. Turn alarm on/off: depress alarm on/off. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
335
Alarm clock requirements
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
336
Alarm clock class diagram
1 1 1 1 Lights* Display Mechanism 1 1 1 Buttons* Speaker* 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
337
Alarm clock physical classes
Lights* Buttons* Speaker* digit-val() digit-scan() alarm-on-light() PM-light() set-time(): boolean set-alarm(): boolean alarm-on(): boolean alarm-off(): boolean minute(): boolean hour(): boolean buzz() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
338
Overheads for Computers as Components 2nd ed.
Display class Display time[4]: integer alarm-indicator: boolean PM-indicator: boolean set-time() alarm-light-on() alarm-light-off() PM-light-on() PM-light-off() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
339
Overheads for Computers as Components 2nd ed.
Mechanism class Mechanism Seconds: integer PM: boolean tens-hours, ones-hours: boolean tens-minutes, ones-minutes: boolean alarm-ready: boolean alarm-tens-hours, alarm-ones-hours: boolean alarm-tens-minutes, alarm-ones-minutes: scan-keyboard() update-time() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
340
Update-time behavior update seconds with rollover
display.set-time(current time) F Time >= alarm and alarm-on? Rollover? F T T update hh:mm with rollover alarm.buzzer(true) PM->AM AM->PM PM=true PM=false © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
341
Scan-keyboard behavior
Set-time and not set-alarm and hours compute button activations Alarm-on Increment time tens w. rollover and AM/PM alarm-ready= true Alarm-off alarm-ready= false alarm.buzzer(false) Increment time ones w. rollover and AM/PM save button states Set-time and not set-alarm and minutes © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
342
Overheads for Computers as Components 2nd ed.
System architecture Includes: periodic behavior (clock); aperiodic behavior (buttons, buzzer activation). Two major software components: interrupt-driven routine updates time; foreground program deals with buttons, commands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
343
Interrupt-driven routine
Timer probably can’t handle one-minute interrupt interval. Use software variable to convert interrupt frequency to seconds. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
344
Overheads for Computers as Components 2nd ed.
Foreground program Operates as while loop: while (TRUE) { read_buttons(button_values); process_command(button_values); check_alarm(); } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
345
Overheads for Computers as Components 2nd ed.
Testing Component testing: test interrupt code on the platform; can test foreground program using a mock-up. System testing: relatively few components to integrate; check clock accuracy; check recognition of buttons, buzzer, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
346
Program design and analysis
Software components. Representations of programs. Assembly and linking. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
347
Software state machine
State machine keeps internal state as a variable, changes state based on inputs. Uses: control-dominated code; reactive systems. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
348
Overheads for Computers as Components 2nd ed.
State machine example no seat/- no seat/ buzzer off idle seat/timer on no seat/- no belt and no timer/- buzzer seated Belt/buzzer on belt/- belt/ buzzer off belted no belt/timer on © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
349
Overheads for Computers as Components 2nd ed.
C implementation #define IDLE 0 #define SEATED 1 #define BELTED 2 #define BUZZER 3 switch (state) { case IDLE: if (seat) { state = SEATED; timer_on = TRUE; } break; case SEATED: if (belt) state = BELTED; else if (timer) state = BUZZER; … } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
350
Signal processing and circular buffer
Commonly used in signal processing: new data constantly arrives; each datum has a limited lifetime. Use a circular buffer to hold the data stream. time t time t+1 d1 d2 d3 d4 d5 d6 d7 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
351
Overheads for Computers as Components 2nd ed.
Circular buffer x1 x2 x3 x4 x5 x6 t1 t2 t3 Data stream x1 x5 x6 x2 x7 x3 x4 Circular buffer © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
352
Overheads for Computers as Components 2nd ed.
Circular buffers Indexes locate currently used data, current input data: d5 d1 input use d2 d2 input d3 d3 d4 d4 use time t1+1 time t1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
353
Circular buffer implementation: FIR filter
int circ_buffer[N], circ_buffer_head = 0; int c[N]; /* coefficients */ … int ibuf, ic; for (f=0, ibuff=circ_buff_head, ic=0; ic<N; ibuff=(ibuff==N-1?0:ibuff++), ic++) f = f + c[ic]*circ_buffer[ibuf]; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
354
Overheads for Computers as Components 2nd ed.
Queues Elastic buffer: holds data that arrives irregularly. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
355
Overheads for Computers as Components 2nd ed.
Buffer-based queues #define Q_SIZE 32 #define Q_MAX (Q_SIZE-1) int q[Q_MAX], head, tail; void initialize_queue() { head = tail = 0; } void enqueue(int val) { if (((tail+1)%Q_SIZE) == head) error(); q[tail]=val; if (tail == Q_MAX) tail = 0; else tail++; } int dequeue() { int returnval; if (head == tail) error(); returnval = q[head]; if (head == Q_MAX) head = 0; else head++; return returnval; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
356
Overheads for Computers as Components 2nd ed.
Models of programs Source code is not a good representation for programs: clumsy; leaves much information implicit. Compilers derive intermediate representations to manipulate and optiize the program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
357
Overheads for Computers as Components 2nd ed.
Data flow graph DFG: data flow graph. Does not represent control. Models basic block: code with no entry or exit. Describes the minimal ordering requirements on operations. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
358
Single assignment form
x = a + b; y = c - d; z = x * y; y = b + d; original basic block x = a + b; y = c - d; z = x * y; y1 = b + d; single assignment form © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
359
Overheads for Computers as Components 2nd ed.
Data flow graph x = a + b; y = c - d; z = x * y; y1 = b + d; single assignment form a b c d + - y x * + z y1 DFG © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
360
DFGs and partial orders
a+b, c-d; b+d x*y Can do pairs of operations in any order. a b c d + - y x * + z y1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
361
Control-data flow graph
CDFG: represents control and data. Uses data flow graphs as components. Two types of nodes: decision; data flow. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
362
Overheads for Computers as Components 2nd ed.
Data flow node Encapsulates a data flow graph: Write operations in basic block form for simplicity. x = a + b; y = c + d © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
363
Overheads for Computers as Components 2nd ed.
Control cond T v1 v4 value v3 v2 F Equivalent forms © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
364
Overheads for Computers as Components 2nd ed.
CDFG example T if (cond1) bb1(); else bb2(); bb3(); switch (test1) { case c1: bb4(); break; case c2: bb5(); break; case c3: bb6(); break; } cond1 bb1() F bb2() bb3() test1 c3 c1 c2 bb4() bb5() bb6() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
365
Overheads for Computers as Components 2nd ed.
for loop for (i=0; i<N; i++) loop_body(); for loop i=0; while (i<N) { loop_body(); i++; } equivalent i=0 i<N F T loop_body() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
366
Overheads for Computers as Components 2nd ed.
Assembly and linking Last steps in compilation: HLL compile assembly HLL assembly assemble HLL assembly link link executable © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
367
Multiple-module programs
Programs may be composed from several files. Addresses become more specific during processing: relative addresses are measured relative to the start of a module; absolute addresses are measured relative to the start of the CPU address space. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
368
Overheads for Computers as Components 2nd ed.
Assemblers Major tasks: generate binary for symbolic instructions; translate labels into addresses; handle pseudo-ops (data, etc.). Generally one-to-one translation. Assembly labels: ORG 100 label1 ADR r4,c © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
369
Overheads for Computers as Components 2nd ed.
Symbol table ADD r0,r1,r2 xx ADD r3,r4,r5 CMP r0,r3 yy SUB r5,r6,r7 assembly code xx 0x8 yy 0x10 symbol table © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
370
Symbol table generation
Use program location counter (PLC) to determine address of each location. Scan program, keeping count of PLC. Addresses are generated at assembly time, not execution time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
371
Overheads for Computers as Components 2nd ed.
Symbol table example PLC=0x7 ADD r0,r1,r2 xx ADD r3,r4,r5 CMP r0,r3 yy SUB r5,r6,r7 xx 0x8 PLC=0x7 yy 0x10 PLC=0x7 PLC=0x7 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
372
Overheads for Computers as Components 2nd ed.
Two-pass assembly Pass 1: generate symbol table Pass 2: generate binary instructions © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
373
Relative address generation
Some label values may not be known at assembly time. Labels within the module may be kept in relative form. Must keep track of external labels---can’t generate full binary for instructions that use external labels. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
374
Overheads for Computers as Components 2nd ed.
Pseudo-operations Pseudo-ops do not generate instructions: ORG sets program location. EQU generates symbol table entry without advancing PLC. Data statements define data blocks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
375
Overheads for Computers as Components 2nd ed.
Linking Combines several object modules into a single executable module. Jobs: put modules in order; resolve labels across modules. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
376
Externals and entry points
xxx ADD r1,r2,r3 B a yyy %1 a ADR r4,yyy ADD r3,r4,r5 external reference © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
377
Overheads for Computers as Components 2nd ed.
Module ordering Code modules must be placed in absolute positions in the memory space. Load map or linker flags control the order of modules. module1 module2 module3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
378
Overheads for Computers as Components 2nd ed.
Dynamic linking Some operating systems link modules dynamically at run time: shares one copy of library among all executing programs; allows programs to be updated with new versions of libraries. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
379
Program design and analysis
Compilation flow. Basic statement translation. Basic optimizations. Interpreters and just-in-time compilers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
380
Overheads for Computers as Components 2nd ed.
Compilation Compilation strategy (Wirth): compilation = translation + optimization Compiler determines quality of code: use of CPU resources; memory access scheduling; code size. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
381
Basic compilation phases
HLL parsing, symbol table machine-independent optimizations machine-dependent optimizations assembly © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
382
Statement translation and optimization
Source code is translated into intermediate form such as CDFG. CDFG is transformed/optimized. CDFG is translated into instructions with optimization decisions. Instructions are further optimized. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
383
Arithmetic expressions
b a*b + 5*(c-d) a c d * - expression 5 * + DFG © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
384
Arithmetic expressions, cont’d.
b c d ADR r4,a MOV r1,[r4] ADR r4,b MOV r2,[r4] ADD r3,r1,r2 1 2 * - 5 3 ADR r4,c MOV r1,[r4] ADR r4,d MOV r5,[r4] SUB r6,r4,r5 * 4 + MUL r7,r6,#5 ADD r8,r7,r3 DFG code © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
385
Control code generation
if (a+b > 0) x = 5; else x = 7; a+b>0 x=5 x=7 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
386
Control code generation, cont’d.
ADR r5,a LDR r1,[r5] ADR r5,b LDR r2,b ADD r3,r1,r2 BLE label3 1 2 a+b>0 x=5 3 LDR r3,#5 ADR r5,x STR r3,[r5] B stmtent x=7 LDR r3,#7 ADR r5,x STR r3,[r5] stmtent ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
387
Overheads for Computers as Components 2nd ed.
Procedure linkage Need code to: call and return; pass parameters and results. Parameters and returns are passed on stack. Procedures with few parameters may use registers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
388
Overheads for Computers as Components 2nd ed.
Procedure stacks growth proc1 proc1(int a) { proc2(5); } FP frame pointer proc2 5 accessed relative to SP SP stack pointer © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
389
Overheads for Computers as Components 2nd ed.
ARM procedure linkage APCS (ARM Procedure Call Standard): r0-r3 pass parameters into procedure. Extra parameters are put on stack frame. r0 holds return value. r4-r7 hold register values. r11 is frame pointer, r13 is stack pointer. r10 holds limiting address on stack size to check for stack overflows. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
390
Overheads for Computers as Components 2nd ed.
Data structures Different types of data structures use different data layouts. Some offsets into data structure can be computed at compile time, others must be computed at run time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
391
One-dimensional arrays
C array name points to 0th element: a a[0] a[1] = *(a + 1) a[2] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
392
Two-dimensional arrays
Column-major layout: M ... N a[0,0] a[0,1] ... a[1,0] a[1,1] = a[i*M+j] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
393
Overheads for Computers as Components 2nd ed.
Structures Fields within structures are static offsets: aptr field1 4 bytes struct { int field1; char field2; } mystruct; struct mystruct a, *aptr = &a; *(aptr+4) field2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
394
Expression simplification
Constant folding: 8+1 = 9 Algebraic: a*b + a*c = a*(b+c) Strength reduction: a*2 = a<<1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
395
Overheads for Computers as Components 2nd ed.
Dead code elimination Dead code: #define DEBUG 0 if (DEBUG) dbg(p1); Can be eliminated by analysis of control flow, constant folding. 1 dbg(p1); © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
396
Overheads for Computers as Components 2nd ed.
Procedure inlining Eliminates procedure linkage overhead: int foo(a,b,c) { return a + b - c;} z = foo(w,x,y); ð z = w + x + y; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
397
Overheads for Computers as Components 2nd ed.
Loop transformations Goals: reduce loop overhead; increase opportunities for pipelining; improve memory system performance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
398
Overheads for Computers as Components 2nd ed.
Loop unrolling Reduces loop overhead, enables some other optimizations. for (i=0; i<4; i++) a[i] = b[i] * c[i]; ð for (i=0; i<2; i++) { a[i*2] = b[i*2] * c[i*2]; a[i*2+1] = b[i*2+1] * c[i*2+1]; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
399
Loop fusion and distribution
Fusion combines two loops into 1: for (i=0; i<N; i++) a[i] = b[i] * 5; for (j=0; j<N; j++) w[j] = c[j] * d[j]; ð for (i=0; i<N; i++) { a[i] = b[i] * 5; w[i] = c[i] * d[i]; } Distribution breaks one loop into two. Changes optimizations within loop body. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
400
Overheads for Computers as Components 2nd ed.
Loop tiling Breaks one loop into a nest of loops. Changes order of accesses within array. Changes cache behavior. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
401
Overheads for Computers as Components 2nd ed.
Loop tiling example for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = a[i,j]*b[i]; for (i=0; i<N; i+=2) for (j=0; j<N; j+=2) for (ii=0; ii<min(i+2,n); ii++) for (jj=0; jj<min(j+2,N); jj++) c[ii] = a[ii,jj]*b[ii]; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
402
Overheads for Computers as Components 2nd ed.
Array padding Add array elements to change mapping into cache: a[0,0] a[0,1] a[0,2] a[0,0] a[0,1] a[0,2] a[0,2] a[1,0] a[1,1] a[1,2] a[1,0] a[1,1] a[1,2] a[1,2] before after © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
403
Overheads for Computers as Components 2nd ed.
Register allocation Goals: choose register to hold each variable; determine lifespan of varible in the register. Basic case: within basic block. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
404
Register lifetime graph
w = a + b; x = c + w; y = c + d; t=1 a t=2 b c t=3 d w x y 1 2 3 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
405
Instruction scheduling
Non-pipelined machines do not need instruction scheduling: any order of instructions that satisfies data dependencies runs equally fast. In pipelined machines, execution time of one instruction depends on the nearby instructions: opcode, operands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
406
Overheads for Computers as Components 2nd ed.
Reservation table A reservation table relates instructions/time to CPU resources. Time/instr A B instr1 X instr2 X X instr3 X instr4 X © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
407
Overheads for Computers as Components 2nd ed.
Software pipelining Schedules instructions across loop iterations. Reduces instruction latency in iteration i by inserting instructions from iteration i+1. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
408
Instruction selection
May be several ways to implement an operation or sequence of operations. Represent operations as graphs, match possible instruction sequences onto graph. + + * + * * MUL ADD expression templates MADD © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
409
Overheads for Computers as Components 2nd ed.
Using your compiler Understand various optimization levels (-O1, -O2, etc.) Look at mixed compiler/assembler output. Modifying compiler output requires care: correctness; loss of hand-tweaked code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
410
Interpreters and JIT compilers
Interpreter: translates and executes program statements on-the-fly. JIT compiler: compiles small sections of code into instructions during program execution. Eliminates some translation overhead. Often requires more memory. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
411
Program design and analysis
Program-level performance analysis. Optimizing for: Execution time. Energy/power. Program size. Program validation and testing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
412
Program-level performance analysis
Need to understand performance in detail: Real-time behavior, not just typical. On complex platforms. Program performance ¹ CPU performance: Pipeline, cache are windows into program. We must analyze the entire program. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
413
Complexities of program performance
Varies with input data: Different-length paths. Cache effects. Instruction-level performance variations: Pipeline interlocks. Fetch times. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
414
How to measure program performance
Simulate execution of the CPU. Makes CPU state visible. Measure on real CPU using timer. Requires modifying the program to control the timer. Measure on real CPU using logic analyzer. Requires events visible on the pins. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
415
Program performance metrics
Average-case execution time. Typically used in application programming. Worst-case execution time. A component in deadline satisfaction. Best-case execution time. Task-level interactions can cause best-case program behavior to result in worst-case system behavior. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
416
Elements of program performance
Basic program execution time formula: execution time = program path + instruction timing Solving these problems independently helps simplify analysis. Easier to separate on simpler CPUs. Accurate performance analysis requires: Assembly/binary code. Execution platform. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
417
Data-dependent paths in an if statement
if (a || b) { /* T1 */ if ( c ) /* T2 */ x = r*s+t; /* A1 */ else y=r+s; /* A2 */ z = r+s+u; /* A3 */ } else { if ( c ) /* T3 */ y = r-t; /* A4 */ a b c path T1=F, T3=F: no assignments 1 T1=F, T3=T: A4 T1=T, T2=F: A2, A3 T1=T, T2=T: A1, A3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
418
Overheads for Computers as Components 2nd ed.
Paths in a loop for (i=0, f=0; i<N; i++) f = f + c[i] * x[i]; i=0 f=0 N i=N Y f = f + c[i] * x[i] i = i + 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
419
Overheads for Computers as Components 2nd ed.
Instruction timing Not all instructions take the same amount of time. Multi-cycle instructions. Fetches. Execution times of instructions are not independent. Pipeline interlocks. Cache effects. Execution times may vary with operand value. Floating-point operations. Some multi-cycle integer operations. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
420
Mesaurement-driven performance analysis
Not so easy as it sounds: Must actually have access to the CPU. Must know data inputs that give worst/best case performance. Must make state visible. Still an important method for performance analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
421
Overheads for Computers as Components 2nd ed.
Feeding the program Need to know the desired input values. May need to write software scaffolding to generate the input values. Software scaffolding may also need to examine outputs to generate feedback-driven inputs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
422
Trace-driven measurement
Instrument the program. Save information about the path. Requires modifying the program. Trace files are large. Widely used for cache analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
423
Overheads for Computers as Components 2nd ed.
Physical measurement In-circuit emulator allows tracing. Affects execution timing. Logic analyzer can measure behavior at pins. Address bus can be analyzed to look for events. Code can be modified to make events visible. Particularly important for real-world input streams. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
424
Overheads for Computers as Components 2nd ed.
CPU simulation Some simulators are less accurate. Cycle-accurate simulator provides accurate clock-cycle timing. Simulator models CPU internals. Simulator writer must know how CPU works. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
425
SimpleScalar FIR filter simulation
int x[N] = {8, 17, … }; int c[N] = {1, 2, … }; main() { int i, k, f; for (k=0; k<COUNT; k++) for (i=0; i<N; i++) f += c[i]*x[i]; } N total sim cycles sim cycles per filter execution 100 25854 259 1,000 155759 156 1,0000 145 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
426
Performance optimization motivation
Embedded systems must often meet deadlines. Faster may not be fast enough. Need to be able to analyze execution time. Worst-case, not typical. Need techniques for reliably improving execution time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
427
Programs and performance analysis
Best results come from analyzing optimized instructions, not high-level language code: non-obvious translations of HLL statements into instructions; code may move; cache effects are hard to predict. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
428
Overheads for Computers as Components 2nd ed.
Loop optimizations Loops are good targets for optimization. Basic loop optimizations: code motion; induction-variable elimination; strength reduction (x*2 -> x<<1). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
429
Overheads for Computers as Components 2nd ed.
Code motion for (i=0; i<N*M; i++) z[i] = a[i] + b[i]; i<X i=0; X = N*M i=0; i<N*M N Y z[i] = a[i] + b[i]; i = i+1; © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
430
Induction variable elimination
Induction variable: loop index. Consider loop: for (i=0; i<N; i++) for (j=0; j<M; j++) z[i,j] = b[i,j]; Rather than recompute i*M+j for each array in each iteration, share induction variable between arrays, increment at end of loop body. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
431
Overheads for Computers as Components 2nd ed.
Cache analysis Loop nest: set of loops, one inside other. Perfect loop nest: no conditionals in nest. Because loops use large quantities of data, cache conflicts are common. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
432
Array conflicts in cache
1024 1024 4099 b[0,0] ... 4099 main memory cache © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
433
Array conflicts, cont’d.
Array elements conflict because they are in the same line, even if not mapped to same location. Solutions: move one array; pad array. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
434
Performance optimization hints
Use registers efficiently. Use page mode memory accesses. Analyze cache behavior: instruction conflicts can be handled by rewriting code, rescheudling; conflicting scalar data can easily be moved; conflicting array data can be moved, padded. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
435
Energy/power optimization
Energy: ability to do work. Most important in battery-powered systems. Power: energy per unit time. Important even in wall-plug systems---power becomes heat. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
436
Measuring energy consumption
Execute a small loop, measure current: I while (TRUE) a(); © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
437
Sources of energy consumption
Relative energy per operation (Catthoor et al): memory transfer: 33 external I/O: 10 SRAM write: 9 SRAM read: 4.4 multiply: 3.6 add: 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
438
Cache behavior is important
Energy consumption has a sweet spot as cache size changes: cache too small: program thrashes, burning energy on external memory accesses; cache too large: cache itself burns too much power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
439
Overheads for Computers as Components 2nd ed.
Cache sweet spot [Li98] © 1998 IEEE © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
440
Overheads for Computers as Components 2nd ed.
Optimizing for energy First-order optimization: high performance = low energy. Not many instructions trade speed for energy. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
441
Optimizing for energy, cont’d.
Use registers efficiently. Identify and eliminate cache conflicts. Moderate loop unrolling eliminates some loop overhead instructions. Eliminate pipeline stalls. Inlining procedures may help: reduces linkage, but may increase cache thrashing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
442
Overheads for Computers as Components 2nd ed.
Efficient loops General rules: Don’t use function calls. Keep loop body small to enable local repeat (only forward branches). Use unsigned integer for loop counter. Use <= to test loop counter. Make use of compiler---global optimization, software pipelining. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
443
Single-instruction repeat loop example
STM #4000h,AR2 ; load pointer to source STM #100h,AR3 ; load pointer to destination RPT #(1024-1) MVDD *AR2+,*AR3+ ; move © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
444
Optimizing for program size
Goal: reduce hardware cost of memory; reduce power consumption of memory units. Two opportunities: data; instructions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
445
Data size minimization
Reuse constants, variables, data buffers in different parts of code. Requires careful verification of correctness. Generate data using instructions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
446
Overheads for Computers as Components 2nd ed.
Reducing code size Avoid function inlining. Choose CPU with compact instructions. Use specialized instructions where possible. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
447
Program validation and testing
But does it work? Concentrate here on functional verification. Major testing strategies: Black box doesn’t look at the source code. Clear box (white box) does look at the source code. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
448
Overheads for Computers as Components 2nd ed.
Clear-box testing Examine the source code to determine whether it works: Can you actually exercise a path? Do you get the value you expect along a path? Testing procedure: Controllability: rovide program with inputs. Execute. Observability: examine outputs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
449
Controlling and observing programs
firout = 0.0; for (j=curr, k=0; j<N; j++, k++) firout += buff[j] * c[k]; for (j=0; j<curr; j++, k++) if (firout > 100.0) firout = 100.0; if (firout < ) firout = ; Controllability: Must fill circular buffer with desired N values. Other code governs how we access the buffer. Observability: Want to examine firout before limit testing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
450
Execution paths and testing
Paths are important in functional testing as well as performance analysis. In general, an exponential number of paths through the program. Show that some paths dominate others. Heuristically limit paths. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
451
Choosing the paths to test
Possible criteria: Execute every statement at least once. Execute every branch direction at least once. Equivalent for structured programs. Not true for gotos. not covered © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
452
Overheads for Computers as Components 2nd ed.
Basis paths Approximate CDFG with undirected graph. Undirected graphs have basis paths: All paths are linear combinations of basis paths. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
453
Cyclomatic complexity
Cyclomatic complexity is a bound on the size of basis sets: e = # edges n = # nodes p = number of graph components M = e – n + 2p. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
454
Overheads for Computers as Components 2nd ed.
Branch testing Heuristic for testing branches. Exercise true and false branches of conditional. Exercise every simple condition at least once. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
455
Branch testing example
Correct: if (a || (b >= c)) { printf(“OK\n”); } Incorrect: if (a && (b >= c)) { printf(“OK\n”); } Test: a = F (b >=c) = T Example: Correct: [0 || (3 >= 2)] = T Incorrect: [0 && (3 >= 2)] = F © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
456
Another branch testing example
Correct: if ((x == good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } Incorrect: if ((x = good_pointer) && x->field1 == 3)) { printf(“got the value\n”); } Incorrect code changes pointer. Assignment returns new LHS in C. Test that catches error: (x != good_pointer) && x->field1 = 3) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
457
Overheads for Computers as Components 2nd ed.
Domain testing Heuristic test for linear inequalities. Test on each side + boundary of inequality. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
458
Overheads for Computers as Components 2nd ed.
Def-use pairs Variable def-use: Def when value is assigned (defined). Use when used on right-hand side. Exercise each def-use pair. Requires testing correct path. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
459
Overheads for Computers as Components 2nd ed.
Loop testing Loops need specialized tests to be tested efficiently. Heuristic testing strategy: Skip loop entirely. One loop iteration. Two loop iterations. # iterations much below max. n-1, n, n+1 iterations where n is max. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
460
Overheads for Computers as Components 2nd ed.
Black-box testing Complements clear-box testing. May require a large number of tests. Tests software in different ways. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
461
Black-box test vectors
Random tests. May weight distribution based on software specification. Regression tests. Tests of previous versions, bugs, etc. May be clear-box tests of previous versions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
462
How much testing is enough?
Exhaustive testing is impractical. One important measure of test quality---bugs escaping into field. Good organizations can test software to give very low field bug report rates. Error injection measures test quality: Add known bugs. Run your tests. Determine % injected bugs that are caught. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
463
Program design and analysis
Software modem. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
464
Overheads for Computers as Components 2nd ed.
Theory of operation Frequency-shift keying: separate frequencies for 0 and 1. 1 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
465
Overheads for Computers as Components 2nd ed.
FSK encoding Generate waveforms based on current bit: bit-controlled waveform generator © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
466
Overheads for Computers as Components 2nd ed.
FSK decoding zero filter detector 0 bit A/D converter one filter detector 1 bit © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
467
Overheads for Computers as Components 2nd ed.
Transmission scheme Send data in 8-bit bytes. Arbitrary spacing between bytes. Byte starts with 0 start bit. Receiver measures length of start bit to synchronize itself to remaining 8 bits. start (0) bit 1 bit 2 bit 3 bit 8 ... © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
468
Overheads for Computers as Components 2nd ed.
Requirements © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
469
Overheads for Computers as Components 2nd ed.
Specification Line-in* Receiver 1 1 input() sample-in() bit-out() Transmitter Line-out* 1 1 bit-in() sample-out() output() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
470
Overheads for Computers as Components 2nd ed.
System architecture Interrupt handlers for samples: input and output. Transmitter. Receiver. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
471
Overheads for Computers as Components 2nd ed.
Transmitter Waveform generation by table lookup. float sine_wave[N_SAMP] = { 0.0, 0.5, 0.866, 1, 0.866, 0.5, 0.0, -0.5, , -1.0, , -0.5, 0}; time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
472
Overheads for Computers as Components 2nd ed.
Receiver Filters (FIR for simplicity) use circular buffers to hold data. Timer measures bit length. State machine recognizes start bits, data bits. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
473
Overheads for Computers as Components 2nd ed.
Hardware platform CPU. A/D converter. D/A converter. Timer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
474
Component design and testing
Easy to test transmitter and receiver on host. Transmitter can be verified with speaker outputs. Receiver verification tasks: start bit recognition; data bit recognition. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
475
System integration and testing
Use loopback mode to test components against each other. Loopback in software or by connecting D/A and A/D converters. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
476
Processes and operating systems
Multiple tasks and multiple processes. Specifications of process timing. Preemptive real-time operating systems. Processes and UML. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
477
Overheads for Computers as Components 2nd ed.
Reactive systems Respond to external events. Engine controller. Seat belt monitor. Requires real-time response. System architecture. Program implementation. May require a chain reaction among multiple processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
478
Overheads for Computers as Components 2nd ed.
Tasks and processes A task is a functional description of a connected set of operations. (Task can also mean a collection of processes.) A process is a unique execution of a program. Several copies of a program may run simultaneously or at different times. A process has its own state: registers; memory. The operating system manages processes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
479
Why multiple processes?
Multiple tasks means multiple processes. Processes help with timing complexity: multiple rates multimedia automotive asynchronous input user interfaces communication systems © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
480
Overheads for Computers as Components 2nd ed.
Multi-rate systems Tasks may be synchronous or asynchronous. Synchronous tasks may recur at different rates. Processes run at different rates based on computational needs of the tasks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
481
Example: engine control
Tasks: spark control crankshaft sensing fuel/air mixture oxygen sensor Kalman filter engine controller © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
482
Typical rates in engine controllers
Variable Full range time (ms) Update period (ms) Engine spark timing 300 2 Throttle 40 Air flow 30 4 Battery voltage 80 Fuel flow 250 10 Recycled exhaust gas 500 25 Status switches 100 20 Air temperature Seconds 400 Barometric pressure 1000 Spark (dwell) 1 Fuel adjustment 8 Carburetor Mode actuators © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
483
Overheads for Computers as Components 2nd ed.
Real-time systems Perform a computation to conform to external timing constraints. Deadline frequency: Periodic. Aperiodic. Deadline type: Hard: failure to meet deadline causes system failure. Soft: failure to meet deadline causes degraded response. Firm: late response is useless but some late responses can be tolerated. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
484
Timing specifications on processes
Release time: time at which process becomes ready. Deadline: time at which process must finish. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
485
Release times and deadlines
P1 P1 P1 time initiating event period period aperiodic process periodic process initiated at start of period © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
486
Rate requirements on processes
Period: interval between process activations. Rate: reciprocal of period. Initiatino rate may be higher than period---several copies of process run at once. CPU 1 P11 CPU 2 P12 CPU 3 P13 CPU 4 P14 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
487
Overheads for Computers as Components 2nd ed.
Timing violations What happens if a process doesn’t finish by its deadline? Hard deadline: system fails if missed. Soft deadline: user may notice, but system doesn’t necessarily fail. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
488
Example: Space Shuttle software error
Space Shuttle’s first launch was delayed by a software timing error: Primary control system PASS and backup system BFS. BFS failed to synchronize with PASS. Change to one routine added delay that threw off start time calculation. 1 in 67 chance of timing problem. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
489
Overheads for Computers as Components 2nd ed.
Task graphs Tasks may have data dependencies---must execute in certain order. Task graph shows data/control dependencies between processes. Task: connected set of processes. Task set: One or more tasks. P1 P2 P5 P3 P6 P4 task 1 task 2 task set © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
490
Communication between tasks
Task graph assumes that all processes in each task run at the same rate, tasks do not communicate. In reality, some amount of inter-task communication is necessary. It’s hard to require immediate response for multi-rate communication. MPEG system layer MPEG audio MPEG video © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
491
Process execution characteristics
Process execution time Ti. Execution time in absence of preemption. Possible time units: seconds, clock cycles. Worst-case, best-case execution time may be useful in some cases. Sources of variation: Data dependencies. Memory system. CPU pipeline. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
492
Overheads for Computers as Components 2nd ed.
Utilization CPU utilization: Fraction of the CPU that is doing useful work. Often calculated assuming no scheduling overhead. Utilization: U = (CPU time for useful work)/ (total available CPU time) = [ S t1 ≤ t ≤ t2 T(t) ] / [t2 – t1] = T/t © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
493
Overheads for Computers as Components 2nd ed.
State of a process A process can be in one of three states: executing on the CPU; ready to run; waiting for data. executing gets data and CPU gets CPU preempted needs data gets data ready waiting needs data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
494
The scheduling problem
Can we meet all deadlines? Must be able to meet deadlines in all cases. How much CPU horsepower do we need to meet our deadlines? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
495
Scheduling feasibility
Resource constraints make schedulability analysis NP-hard. Must show that the deadlines are met for all timings of resource requests. P1 P2 I/O device © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
496
Simple processor feasibility
Assume: No resource conflicts. Constant process execution times. Require: T ≥ Si Ti Can’t use more than 100% of the CPU. T1 T2 T3 T © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
497
Overheads for Computers as Components 2nd ed.
Hyperperiod Hyperperiod: least common multiple (LCM) of the task periods. Must look at the hyperperiod schedule to find all task interactions. Hyperperiod can be very long if task periods are not chosen carefully. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
498
Overheads for Computers as Components 2nd ed.
Hyperperiod example Long hyperperiod: P1 7 ms. P2 11 ms. P3 15 ms. LCM = 1155 ms. Shorter hyperperiod: P1 8 ms. P2 12 ms. P3 16 ms. LCM = 96 ms. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
499
Simple processor feasibility example
P1 period 1 ms, CPU time 0.1 ms. P2 period 1 ms, CPU time 0.2 ms. P3 period 5 ms, CPU time 0.3 ms. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
500
Overheads for Computers as Components 2nd ed.
Cyclostatic/TDMA Schedule in time slots. Same process activation irrespective of workload. Time slots may be equal size or unequal. T1 T2 T3 T1 T2 T3 P P © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
501
Overheads for Computers as Components 2nd ed.
TDMA assumptions Schedule based on least common multiple (LCM) of the process periods. Trivial scheduler -> very small scheduling overhead. P1 P1 P1 P2 P2 PLCM © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
502
Overheads for Computers as Components 2nd ed.
TDMA schedulability Always same CPU utilization (assuming constant process execution times). Can’t handle unexpected loads. Must schedule a time slot for aperiodic events. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
503
TDMA schedulability example
TDMA period = 10 ms. P1 CPU time 1 ms. P2 CPU time 3 ms. P3 CPU time 2 ms. P4 CPU time 2 ms. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
504
Overheads for Computers as Components 2nd ed.
Round-robin Schedule process only if ready. Always test processes in the same order. Variations: Constant system period. Start round-robin again after finishing a round. T1 T2 T3 T2 T3 P P © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
505
Round-robin assumptions
Schedule based on least common multiple (LCM) of the process periods. Best done with equal time slots for processes. Simple scheduler -> low scheduling overhead. Can be implemented in hardware. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
506
Round-robin schedulability
Can bound maximum CPU load. May leave unused CPU cycles. Can be adapted to handle unexpected load. Use time slots at end of period. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
507
Schedulability and overhead
The scheduling process consumes CPU time. Not all CPU time is available for processes. Scheduling overhead must be taken into account for exact schedule. May be ignored if it is a small fraction of total execution time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
508
Running periodic processes
Need code to control execution of processes. Simplest implementation: process = subroutine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
509
while loop implementation
Simplest implementation has one loop. No control over execution timing. while (TRUE) { p1(); p2(); } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
510
Timed loop implementation
Encapuslate set of all processes in a single function that implements the task set,. Use timer to control execution of the task. No control over timing of individual processes. void pall(){ p1(); p2(); } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
511
Multiple timers implementation
Each task has its own function. Each task has its own timer. May not have enough timers to implement all the rates. void pA(){ /* rate A */ p1(); p3(); } void B(){ /* rate B */ p2(); p4(); p5(); © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
512
Timer + counter implementation
Use a software count to divide the timer. Only works for clean multiples of the timer period. int p2count = 0; void pall(){ p1(); if (p2count >= 2) { p2(); p2count = 0; } else p2count++; p3(); © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
513
Implementing processes
All of these implementations are inadequate. Need better control over timing. Need a better mechanism than subroutines. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
514
Processes and operating systems
© 2000 Morgan Kaufman Overheads for Computers as Components
515
Overheads for Computers as Components
Operating systems The operating system controls resources: who gets the CPU; when I/O takes place; how much memory is allocated. The most important resource is the CPU itself. CPU access controlled by the scheduler. © 2000 Morgan Kaufman Overheads for Computers as Components
516
Overheads for Computers as Components
Process state A process can be in one of three states: executing on the CPU; ready to run; waiting for data. executing gets data and CPU gets CPU preempted needs data gets data ready waiting needs data © 2000 Morgan Kaufman Overheads for Computers as Components
517
Operating system structure
OS needs to keep track of: process priorities; scheduling state; process activation record. Processes may be created: statically before system starts; dynamically during execution. © 2000 Morgan Kaufman Overheads for Computers as Components
518
Embedded vs. general-purpose scheduling
Workstations try to avoid starving processes of CPU access. Fairness = access to CPU. Embedded systems must meet deadlines. Low-priority processes may not run for a long time. © 2000 Morgan Kaufman Overheads for Computers as Components
519
Priority-driven scheduling
Each process has a priority. CPU goes to highest-priority process that is ready. Priorities determine scheduling policy: fixed priority; time-varying priorities. © 2000 Morgan Kaufman Overheads for Computers as Components
520
Priority-driven scheduling example
Rules: each process has a fixed priority (1 highest); highest-priority ready process gets CPU; process continues until done. Processes P1: priority 1, execution time 10 P2: priority 2, execution time 30 P3: priority 3, execution time 20 © 2000 Morgan Kaufman Overheads for Computers as Components
521
Priority-driven scheduling example
P3 ready t=18 P2 ready t=0 P1 ready t=15 P2 P1 P2 P3 10 20 30 40 50 60 time © 2000 Morgan Kaufman Overheads for Computers as Components
522
The scheduling problem
Can we meet all deadlines? Must be able to meet deadlines in all cases. How much CPU horsepower do we need to meet our deadlines? © 2000 Morgan Kaufman Overheads for Computers as Components
523
Process initiation disciplines
Periodic process: executes on (almost) every period. Aperiodic process: executes on demand. Analyzing aperiodic process sets is harder---must consider worst-case combinations of process activations. © 2000 Morgan Kaufman Overheads for Computers as Components
524
Timing requirements on processes
Period: interval between process activations. Initiation interval: reciprocal of period. Initiation time: time at which process becomes ready. Deadline: time at which process must finish. © 2000 Morgan Kaufman Overheads for Computers as Components
525
Overheads for Computers as Components
Timing violations What happens if a process doesn’t finish by its deadline? Hard deadline: system fails if missed. Soft deadline: user may notice, but system doesn’t necessarily fail. © 2000 Morgan Kaufman Overheads for Computers as Components
526
Example: Space Shuttle software error
Space Shuttle’s first launch was delayed by a software timing error: Primary control system PASS and backup system BFS. BFS failed to synchronize with PASS. Change to one routine added delay that threw off start time calculation. 1 in 67 chance of timing problem. © 2000 Morgan Kaufman Overheads for Computers as Components
527
Interprocess communication
Interprocess communication (IPC): OS provides mechanisms so that processes can pass data. Two types of semantics: blocking: sending process waits for response; non-blocking: sending process continues. © 2000 Morgan Kaufman Overheads for Computers as Components
528
Overheads for Computers as Components
IPC styles Shared memory: processes have some memory in common; must cooperate to avoid destroying/missing messages. Message passing: processes send messages along a communication channel---no common address space. © 2000 Morgan Kaufman Overheads for Computers as Components
529
Overheads for Computers as Components
Shared memory Shared memory on a bus: memory CPU 1 CPU 2 © 2000 Morgan Kaufman Overheads for Computers as Components
530
Race condition in shared memory
Problem when two CPUs try to write the same location: CPU 1 reads flag and sees 0. CPU 2 reads flag and sees 0. CPU 1 sets flag to one and writes location. CPU 2 sets flag to one and overwrites location. © 2000 Morgan Kaufman Overheads for Computers as Components
531
Overheads for Computers as Components
Atomic test-and-set Problem can be solved with an atomic test-and-set: single bus operation reads memory location, tests it, writes it. ARM test-and-set provided by SWP: ADR r0,SEMAPHORE LDR r1,#1 GETFLAG SWP r1,r1,[r0] BNZ GETFLAG © 2000 Morgan Kaufman Overheads for Computers as Components
532
Overheads for Computers as Components
Critical regions Critical region: section of code that cannot be interrupted by another process. Examples: writing shared memory; accessing I/O device. © 2000 Morgan Kaufman Overheads for Computers as Components
533
Overheads for Computers as Components
Semaphores Semaphore: OS primitive for controlling access to critical regions. Protocol: Get access to semaphore with P(). Perform critical region operations. Release semaphore with V(). © 2000 Morgan Kaufman Overheads for Computers as Components
534
Overheads for Computers as Components
Message passing Message passing on a network: CPU 1 CPU 2 message message message © 2000 Morgan Kaufman Overheads for Computers as Components
535
Process data dependencies
One process may not be able to start until another finishes. Data dependencies defined in a task graph. All processes in one task run at the same rate. P1 P2 P3 P4 © 2000 Morgan Kaufman Overheads for Computers as Components
536
Other operating system functions
Date/time. File system. Networking. Security. © 2000 Morgan Kaufman Overheads for Computers as Components
537
Processes and operating systems
Scheduling policies: RMS; EDF. Scheduling modeling assumptions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
538
Overheads for Computers as Components 2nd ed.
Metrics How do we evaluate a scheduling policy: Ability to satisfy all deadlines. CPU utilization---percentage of time devoted to useful work. Scheduling overhead---time required to make scheduling decision. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
539
Rate monotonic scheduling
RMS (Liu and Layland): widely-used, analyzable scheduling policy. Analysis is known as Rate Monotonic Analysis (RMA). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
540
Overheads for Computers as Components 2nd ed.
RMA model All process run on single CPU. Zero context switch time. No data dependencies between processes. Process execution time is constant. Deadline is at end of period. Highest-priority ready process runs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
541
Overheads for Computers as Components 2nd ed.
Process parameters Ti is computation time of process i; ti is period of process i. period ti Pi computation time Ti © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
542
Rate-monotonic analysis
Response time: time required to finish process. Critical instant: scheduling state that gives worst response time. Critical instant occurs when all higher-priority processes are ready to execute. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
543
Overheads for Computers as Components 2nd ed.
Critical instant P1 P2 P3 interfering processes P1 P2 P3 critical instant P4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
544
Overheads for Computers as Components 2nd ed.
RMS priorities Optimal (fixed) priority assignment: shortest-period process gets highest priority; priority inversely proportional to period; break ties arbitrarily. No fixed-priority scheme does better. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
545
Overheads for Computers as Components 2nd ed.
RMS example P2 period P2 P1 period P1 P1 P1 5 10 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
546
Overheads for Computers as Components 2nd ed.
RMS CPU utilization Utilization for n processes is S i Ti / ti As number of tasks approaches infinity, maximum utilization approaches 69%. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
547
RMS CPU utilization, cont’d.
RMS cannot use 100% of CPU, even with zero context switch overhead. Must keep idle cycles available to handle worst-case scenario. However, RMS guarantees all processes will always meet their deadlines. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
548
Overheads for Computers as Components 2nd ed.
RMS implementation Efficient implementation: scan processes; choose highest-priority active process. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
549
Earliest-deadline-first scheduling
EDF: dynamic priority scheduling scheme. Process closest to its deadline has highest priority. Requires recalculating processes at every timer interrupt. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
550
Overheads for Computers as Components 2nd ed.
EDF analysis EDF can use 100% of CPU. But EDF may fail to miss a deadline. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
551
Overheads for Computers as Components 2nd ed.
EDF implementation On each timer interrupt: compute time to deadline; choose process closest to deadline. Generally considered too expensive to use in practice. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
552
Fixing scheduling problems
What if your set of processes is unschedulable? Change deadlines in requirements. Reduce execution times of processes. Get a faster CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
553
Overheads for Computers as Components 2nd ed.
Priority inversion Priority inversion: low-priority process keeps high-priority process from running. Improper use of system resources can cause scheduling problems: Low-priority process grabs I/O device. High-priority device needs I/O device, but can’t get it until low-priority process is done. Can cause deadlock. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
554
Solving priority inversion
Give priorities to system resources. Have process inherit the priority of a resource that it requests. Low-priority process inherits priority of device if higher. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
555
Overheads for Computers as Components 2nd ed.
Data dependencies Data dependencies allow us to improve utilization. Restrict combination of processes that can run simultaneously. P1 and P2 can’t run simultaneously. P1 P2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
556
Context-switching time
Non-zero context switch time can push limits of a tight schedule. Hard to calculate effects---depends on order of context switches. In practice, OS context switch overhead is small (hundreds of clock cycles) relative to many common task periods (ms – ms). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
557
Processes and operating systems
Interprocess communication. Operating system performance. Power management. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
558
Interprocess communication
OS provides interprocess communication mechanisms: various efficiencies; communication power. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
559
Interprocess communication
Interprocess communication (IPC): OS provides mechanisms so that processes can pass data. Two types of semantics: blocking: sending process waits for response; non-blocking: sending process continues. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
560
Overheads for Computers as Components 2nd ed.
IPC styles Shared memory: processes have some memory in common; must cooperate to avoid destroying/missing messages. Message passing: processes send messages along a communication channel---no common address space. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
561
Overheads for Computers as Components 2nd ed.
Shared memory Shared memory on a bus: memory CPU 1 CPU 2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
562
Race condition in shared memory
Problem when two CPUs try to write the same location: CPU 1 reads flag and sees 0. CPU 2 reads flag and sees 0. CPU 1 sets flag to one and writes location. CPU 2 sets flag to one and overwrites location. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
563
Overheads for Computers as Components 2nd ed.
Atomic test-and-set Problem can be solved with an atomic test-and-set: single bus operation reads memory location, tests it, writes it. ARM test-and-set provided by SWP: ADR r0,SEMAPHORE LDR r1,#1 GETFLAG SWP r1,r1,[r0] BNZ GETFLAG © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
564
Overheads for Computers as Components 2nd ed.
Critical regions Critical region: section of code that cannot be interrupted by another process. Examples: writing shared memory; accessing I/O device. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
565
Overheads for Computers as Components 2nd ed.
Semaphores Semaphore: OS primitive for controlling access to critical regions. Protocol: Get access to semaphore with P(). Perform critical region operations. Release semaphore with V(). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
566
Overheads for Computers as Components 2nd ed.
Message passing Message passing on a network: CPU 1 CPU 2 message message message © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
567
Process data dependencies
One process may not be able to start until another finishes. Data dependencies defined in a task graph. All processes in one task run at the same rate. P1 P2 P3 P4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
568
Overheads for Computers as Components 2nd ed.
Signals in UML More general than Unix signal---may carry arbitrary data: someClass <<signal>> aSig <<send>> sigbehavior() p : integer © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
569
Evaluating RTOS performance
Simplifying assumptions: Context switch costs no CPU time,. We know the exact execution time of processes. WCET/BCET don’t depend on context switches. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
570
Scheduling and context switch overhead
Process Execution time deadline P1 3 5 P2 10 With context switch overhead of 1, no feasible schedule. 2TP1 + TP2 = 2*(1+3)+(1_3)=11 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
571
Process execution time
Process execution time is not constant. Extra CPU time can be good. Extra CPU time can also be bad: Next process runs earlier, causing new preemption. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
572
Overheads for Computers as Components 2nd ed.
Processes and caches Processes can cause additional caching problems. Even if individual processes are well-behaved, processes may interfere with each other. Worst-case execution time with bad behavior is usually much worse than execution time with good cache behavior. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
573
Effects of scheduling on the cache
Schedule 1 (LRU cache): Process WCET Avg. CPU time P1 8 6 P2 4 3 P3 Schedule 2 (half of cache reserved for P1): © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
574
Overheads for Computers as Components 2nd ed.
Power optimization Power management: determining how system resources are scheduled/used to control power consumption. OS can manage for power just as it manages for time. OS reduces power by shutting down units. May have partial shutdown modes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
575
Power management and performance
Power management and performance are often at odds. Entering power-down mode consumes energy, time. Leaving power-down mode consumes © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
576
Simple power management policies
Request-driven: power up once request is received. Adds delay to response. Predictive shutdown: try to predict how long you have before next request. May start up in advance of request in anticipation of a new request. If you predict wrong, you will incur additional delay while starting up. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
577
Probabilistic shutdown
Assume service requests are probabilistic. Optimize expected values: power consumption; response time. Simple probabilistic: shut down after time Ton, turn back on after waiting for Toff. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
578
Advanced Configuration and Power Interface
ACPI: open standard for power management services. applications device drivers OS kernel power management ACPI BIOS Hardware platform © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
579
ACPI global power states
G3: mechanical off G2: soft off S1: low wake-up latency with no loss of context S2: low latency with loss of CPU/cache state S3: low latency with loss of all state except memory S4: lowest-power state with all devices off G1: sleeping state G0: working state © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
580
Processes and operating systems
Telephone answering machine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
581
Overheads for Computers as Components 2nd ed.
Theory of operation Compress audio using adaptive differential pulse code modulation (ADPCM). analog time ADPCM 3 2 1 -1 -2 -3 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
582
Overheads for Computers as Components 2nd ed.
ADPCM coding Coded in a small alphabet with positive and negative values. {-3,-2,-1,1,2,3} Minimize error between predicted value and actual signal value. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
583
ADPCM compression system
quantizer inverse quantizer integrator encoder samples inverse quantizer integrator decoder © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
584
Telephone system terms
Subscriber line: line to phone. Central office: telephone switching system. Off-hook: phone active. On-hook: phone inactive. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
585
Real and simulated subscriber line
Real subscriber line: 90V RMS ringing signal; companded analog signals; lightning protection, etc. Simulated subscriber line: microphone input; speaker output; switches for ring, off-hook, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
586
Overheads for Computers as Components 2nd ed.
Requirements © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
587
Overheads for Computers as Components 2nd ed.
Comments on analysis DRAM requirement influenced by DRAM price. Details of user interface protocol could be tested on a PC-based prototype. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
588
Answering machine class diagram
1 1 1 Microphone* 1 Controls Record * Outgoing- message 1 1 1 1 1 1 Line-in* * 1 * 1 Incoming- message 1 Playback Line-out* * 1 1 Lights Buttons* 1 1 Speaker* © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
589
Physical interface classes
Microphone* Line-in* Line-out* sample() sample() ring-indicator() sample() pick-up() Buttons* Lights* Speaker* record-OGM play messages num-messages sample() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
590
Overheads for Computers as Components 2nd ed.
Message classes Message length start-adrs next-msg samples Outgoing-message Incoming-message length=30 sec msg-time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
591
Overheads for Computers as Components 2nd ed.
Operational classes Controls Record Playback operate() record-msg() playback-msg() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
592
Overheads for Computers as Components 2nd ed.
Software components Front panel module. Speaker module. Telephone line module. Telephone input and output modules. Compression module. Decompression module. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
593
Controls activate behavior
Compute buttons, line activations Activations? Play OGM Record OGM Play ICM Erase Answer Play OGM Wait for timeout Allocate ICM Erase Record ICM © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
594
Record-msg/playback-msg behaviors
nextadrs = 0 nextadrs = 0 msg.samples[nextadrs] = sample(source) speaker.samples() = msg.samples[nextadrs]; nextadrs++ F F End(source) nextadrs=msg.length T T record-msg playback-msg © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
595
Overheads for Computers as Components 2nd ed.
Hardware platform CPU. Memory. Front panel. 2 A/Ds: subscriber line, microphone. 2 D/A: subscriber line, speaker. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
596
Component design and testing
Must test performance as well as testing. Compression time shouldn’t dominate other tasks. Test for error conditions: memory overflow; try to delete empty message set, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
597
System integration and testing
Can test partial integration on host platform; full testing requires integration on target platform. Simulate phone line for tests: it’s legal; easier to produce test conditions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
598
Overheads for Computers as Components 2nd ed.
Multiprocessors Why multiprocessors? CPUs and accelerators. Multiprocessor performance analysis. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
599
Overheads for Computers as Components 2nd ed.
Why multiprocessors? Better cost/performance. Match each CPU to its tasks or use custom logic (smaller, cheaper). CPU cost is a non-linear function of performance. cost performance © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
600
Why multiprocessors? cont’d.
Better real-time performance. Put time-critical functions on less-loaded processing elements. Remember RMS utilization---extra CPU cycles must be reserved to meet deadlines. cost deadline w. RMS overhead deadline performance © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
601
Why multiprocessors? cont’d.
Using specialized processors or custom logic saves power. Desktop uniprocessors are not power-efficient enough for battery-powered applications. [Aus04] © 2004 IEEE Computer Society © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
602
Why multiprocessors? cont’d.
Good for processing I/O in real-time. May consume less energy. May be better at streaming data. May not be able to do all the work on even the largest single CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
603
Overheads for Computers as Components 2nd ed.
Accelerated systems Use additional computational unit dedicated to some functions? Hardwired logic. Extra CPU. Hardware/software co-design: joint design of hardware and software architectures. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
604
Accelerated system architecture
accelerator request result CPU data data memory I/O © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
605
Accelerator vs. co-processor
A co-processor executes instructions. Instructions are dispatched by the CPU. An accelerator appears as a device on the bus. The accelerator is controlled by registers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
606
Accelerator implementations
Application-specific integrated circuit. Field-programmable gate array (FPGA). Standard component. Example: graphics processor. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
607
Overheads for Computers as Components 2nd ed.
System design tasks Design a heterogeneous multiprocessor architecture. Processing element (PE): CPU, accelerator, etc. Program the system. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
608
Accelerated system design
First, determine that the system really needs to be accelerated. How much faster is the accelerator on the core function? How much data transfer overhead? Design the accelerator itself. Design CPU interface to accelerator. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
609
Accelerated system platforms
Several off-the-shelf boards are available for acceleration in PCs: FPGA-based core; PC bus interface. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
610
Accelerator/CPU interface
Accelerator registers provide control registers for CPU. Data registers can be used for small data objects. Accelerator may include special-purpose read/write logic. Especially valuable for large data transfers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
611
System integration and debugging
Try to debug the CPU/accelerator interface separately from the accelerator core. Build scaffolding to test the accelerator. Hardware/software co-simulation can be useful. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
612
Overheads for Computers as Components 2nd ed.
Caching problems Main memory provides the primary data transfer mechanism to the accelerator. Programs must ensure that caching does not invalidate main memory data. CPU reads location S. Accelerator writes location S. CPU writes location S. BAD © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
613
Overheads for Computers as Components 2nd ed.
Synchronization As with cache, main memory writes to shared memory may cause invalidation: CPU reads S. Accelerator writes S. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
614
Multiprocessor performance analysis
Effects of parallelism (and lack of it): Processes. CPU and bus. Multiple processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
615
Overheads for Computers as Components 2nd ed.
Accelerator speedup Critical parameter is speedup: how much faster is the system with the accelerator? Must take into account: Accelerator execution time. Data transfer time. Synchronization with the master CPU. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
616
Accelerator execution time
Total accelerator execution time: taccel = tin + tx + tout Data input Data output Accelerated computation © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
617
Overheads for Computers as Components 2nd ed.
Accelerator speedup Assume loop is executed n times. Compare accelerated system to non-accelerated system: S = n(tCPU - taccel) = n[tCPU - (tin + tx + tout)] Execution time on CPU © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
618
Single- vs. multi-threaded
One critical factor is available parallelism: single-threaded/blocking: CPU waits for accelerator; multithreaded/non-blocking: CPU continues to execute along with accelerator. To multithread, CPU must have useful work to do. But software must also support multithreading. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
619
Overheads for Computers as Components 2nd ed.
Total execution time Single-threaded: Multi-threaded: P1 P1 P2 A1 P2 A1 P3 P3 P4 P4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
620
Execution time analysis
Single-threaded: Count execution time of all component processes. Multi-threaded: Find longest path through execution. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
621
Sources of parallelism
Overlap I/O and accelerator computation. Perform operations in batches, read in second batch of data while computing on first batch. Find other work to do on the CPU. May reschedule operations to move work after accelerator initiation. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
622
Data input/output times
Bus transactions include: flushing register/cache values to main memory; time required for CPU to set up transaction; overhead of data transfers by bus packets, handshaking, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
623
Scheduling and allocation
Must: schedule operations in time; allocate computations to processing elements. Scheduling and allocation interact, but separating them helps. Alternatively allocate, then schedule. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
624
Example: scheduling and allocation
Task graph Hardware platform © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
625
Overheads for Computers as Components 2nd ed.
First design Allocate P1, P2 -> M1; P3 -> M2. M1 P1 P1C P2 P2C M2 P3 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
626
Overheads for Computers as Components 2nd ed.
Second design Allocate P1 -> M1; P2, P3 -> M2: M1 P1 P1C M2 P2 P3 time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
627
Example: adjusting messages to reduce delay
Task graph: Network: 3 4 execution time Transmission time = 4 allocation P1 P2 M1 M2 M3 d1 d2 P3 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
628
Overheads for Computers as Components 2nd ed.
Initial schedule M1 P1 M2 P2 M3 P3 network d1 d2 Time = 15 time 5 10 15 20 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
629
Overheads for Computers as Components 2nd ed.
New design Modify P3: reads one packet of d1, one packet of d2 computes partial result continues to next packet © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
630
Overheads for Computers as Components 2nd ed.
New schedule M1 P1 M2 P2 M3 P3 P3 P3 P3 network d1 d2 d1 d2 d1 d2 d1 d2 Time = 12 time 5 10 15 20 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
631
Buffering and performance
Buffering may sequentialize operations. Next process must wait for data to enter buffer before it can continue. Buffer policy (queue, RAM) affects available parallelism. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
632
Overheads for Computers as Components 2nd ed.
Buffers and latency Three processes separated by buffers: B1 A B2 B B3 C © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
633
Buffers and latency schedules
A[0] A[1] … B[0] B[1] C[0] C[1] A[0] B[0] C[0] A[1] B[1] C[1] … Must wait for all of A before getting any B © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
634
Overheads for Computers as Components 2nd ed.
Multiprocessors Consumer electronics systems. Cell phones. CDs and DVDs. Audio players. Digital still cameras. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
635
Consumer electronics use cases
Multimedia: stored in compressed form, uncompressed on viewing. Data storage and management: keep track of your multimedia, etc. Communication: download, upload, chat. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
636
Non-functional requirements for CE
Often battery-operated, strict power budget., Very inexpensive. User interface must be capable but inexpensive. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
637
Overheads for Computers as Components 2nd ed.
CE devices and hosts Many devices talk to host system. PC host does things that are hard to do on the device. Increasingly, CE devices communicate directly over the network, avoiding the host for access. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
638
Platforms and operating systems
Many CE devices use a DSP for signal processing and a RISC CPU for other tasks. I/O devices include buttons, screen, USB. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
639
Overheads for Computers as Components 2nd ed.
Flash file systems Flash is widely used for mass storage. Flash wears out on writing (up to 1 million cycles). Directory is most often written, wears out first. Flash file system has layer that moves contents to levelize wear. Hides wear leveling from API. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
640
Overheads for Computers as Components 2nd ed.
Cell phones Most popular CE device in history; most widely used computing device. 1 billion sold per year. Handset talks to cell. Cells hand off handset as it moves. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
641
Overheads for Computers as Components 2nd ed.
Cell phone platforms Today’s cell phones use analog front end, digital baseband processing. Future cell phones will perform IF processing with DSP. Baseband processing in DSP: Voice compression. Network protocol. Other processing: Multimedia functions. User interface. File system. Applications (contacts, etc.) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
642
Overheads for Computers as Components 2nd ed.
CD/MP3 player Audio CPU memory Jog memory Analog out display Error corrector focus, tracking, sled, motor drive Servo CPU Analog in amp DAC head FE, TE, amp I2S memory © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
643
Overheads for Computers as Components 2nd ed.
CD medium Rotational speed: m/s (CLV). Track pitch: 1.6 microns. Diameter: 120 mm. Pit length: microns. Pit depth: .11 microns. Pit width: 0.5 microns. Laser wavelength: 780 nm. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
644
Overheads for Computers as Components 2nd ed.
CD mechanism Laser, lens, sled: CD focus track detectors diffraction grating sled laser track © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
645
Overheads for Computers as Components 2nd ed.
Laser focus Focus controlled by vertical position of lens. Unfocused beam causes irregular spot: Out of focus In focus Out of focus © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
646
Overheads for Computers as Components 2nd ed.
Laser pickup Side spot detectors F A Level: A+B+C+D Focus error: (A+C)-(B+D) Tracking error: E-F B D E C © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
647
Overheads for Computers as Components 2nd ed.
Servo control Four main signals: focus 245 kHz; tracking 245 kHz; sled 800 Hz; Disc motor. Optical pickup © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
648
Overheads for Computers as Components 2nd ed.
EFM Eight-to-fourteen modulation: Fourteen-bit code guarantees a maximum distance between transitions. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
649
Overheads for Computers as Components 2nd ed.
Error correction CD capacity: 6.99 GB raw, 700 MB formatted. Reed-Solomon code: g(x) = (x-a) (x- a2) … (x- an-k-1) (x- an-k) Produces data, erasure bits. Time to solve varies greatly depending on noise. CD interleaves Reed-Solomon blocks to reduce effects of large data gaps. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
650
Control and error correction
Skips caused by physical disturbance. Wait for disturbance to subside. Retry. Read errors caused by disc/servo problems. Detect error. Choose location for retry. Fail and interpolate. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
651
Overheads for Computers as Components 2nd ed.
MPEG audio standards Layer 1: Lossless compression of subbands + optional simple masking model Layer 2: More advanced masking model. Layer 3: Additional processing for lower bit rates. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
652
Overheads for Computers as Components 2nd ed.
MPEG audio rates Input sampling rates: 32, 44.1, 48 kHz. Output bit rates: 23, 48, 64, 96, 112, 128, 192, 256, 384 kbits/sec. Output can be mono, dual-channel (bilingual, etc.), stereo. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
653
Overheads for Computers as Components 2nd ed.
Other standards Dolby Digital (AC-3): Uses modified discrete cosine transform. ATRAC (MiniDisc): Uses subband + modified DCT. MPEG-2 AAC. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
654
Overheads for Computers as Components 2nd ed.
MPEG Layer 1 384 samples/block at all frequencies. Equals 8 ms at 48 kHz. Optional masking model. Driven by separate FFT for better accuracy. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
655
Overheads for Computers as Components 2nd ed.
MPEG Layer 1 data frame Bit allocation codes specify word length in each subband. Scale factors give gain for each band. header CRC bit allocation scale factors subband samples aux data © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
656
Overheads for Computers as Components 2nd ed.
MPEG Layer 1 encoder Choose Scale factor Filter bank mux requantize * 0101.. Masking model FFT © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
657
Overheads for Computers as Components 2nd ed.
MPEG Layer 1 decoder Scale factor demux inverse quantize Inverse filter bank 0101.. * * expand Step size © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
658
Overheads for Computers as Components 2nd ed.
Decoding is easier than encoding, but requires: decompression; filtering. Basic CD standard for data discs. No standards for MP3 disc file structure: player must understand Windows, Mac, Unix discs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
659
Overheads for Computers as Components 2nd ed.
Audio players Audio players may use flash, hard disk, or CD for mass storage. Decompression requires small amount of CPU: 10% of ARM7. File system must be compatible (FAT). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
660
Overheads for Computers as Components 2nd ed.
Digital still cameras DSC must determine exposure before taking picture. After taking picture: Improve image quality. Compress. Save as file. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
661
Digital still camera architecture
DSC uses CPU for general-purpose processing, DSP for image processing. Internal memory buffers the passes on the image. Display is lower resolution than image sensor. Image must be downsampled. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
662
Overheads for Computers as Components 2nd ed.
Image capture Before taking picture: Determine exposure. Determine focus. Optimize white balance. Bayer pattern © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
663
Overheads for Computers as Components 2nd ed.
Image processing Must perform basic processing to get usable picture: Bayer->RGB interpolation. DSCs perform many functions formerly performed by photoprocessors for film: Image sharpening. Color balance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
664
Overheads for Computers as Components 2nd ed.
File management EXIF standard gives format for digital pictures: Format of data in a file. Directory structure. EXIF file includes: Image (JPEG, etc.) Thumbnail. Metadata (camera type, date/time, etc.) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
665
Overheads for Computers as Components 2nd ed.
Accelerators Example: video accelerator © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
666
Overheads for Computers as Components 2nd ed.
Concept Build accelerator for block motion estimation, one step in video compression. Perform two-dimensional correlation: Frame 1 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
667
Block motion estimation
MPEG divides frame into 16 x 16 macroblocks for motion estimation. Search for best match within a search range. Measure similarity with sum-of-absolute-differences (SAD): S | M(i,j) - S(i-ox, j-oy) | © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
668
Overheads for Computers as Components 2nd ed.
Best match Best match produces motion vector for motion block: © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
669
Overheads for Computers as Components 2nd ed.
Full search algorithm bestx = 0; besty = 0; bestsad = MAXSAD; for (ox = - SEARCHSIZE; ox < SEARCHSIZE; ox++) { for (oy = -SEARCHSIZE; oy < SEARCHSIZE; oy++) { int result = 0; for (i=0; i<MBSIZE; i++) { for (j=0; j<MBSIZE; j++) { result += iabs(mb[i][j] - search[i-ox+XCENTER][j-oy-YCENTER]); © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
670
Full search algorithm, cont’d.
} if (result <= bestsad) { bestsad = result; bestx = ox; besty = oy; } © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
671
Computational requirements
Let MBSIZE = 16, SEARCHSIZE = 8. Search area is in each dimension. Must perform: nops = (16 x 16) x (17 x 17) = ops CIF format has 352 x 288 pixels -> 22 x 18 macroblocks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
672
Accelerator requirements
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
673
Accelerator data types, basic classes
Motion-vector Macroblock Search-area x, y : pos pixels[] : pixelval pixels[] : pixelval PC Motion-estimator memory[] compute-mv() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
674
Overheads for Computers as Components 2nd ed.
Sequence diagram :PC :Motion-estimator compute-mv() Search area memory[] memory[] macroblocks memory[] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
675
Architectural considerations
Requires large amount of memory: macroblock has 256 pixels; search area has 1,089 pixels. May need external memory (especially if buffering multiple macroblocks/search areas). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
676
Motion estimator organization
PE 0 search area network PE 1 comparator ctrl Address generator ... Motion vector macroblock network PE 15 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
677
Overheads for Computers as Components 2nd ed.
Pixel schedules M(0,0) S(0,2) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
678
Overheads for Computers as Components 2nd ed.
System testing Testing requires a large amount of data. Use simple patterns with obvious answers for initial tests. Extract sample data from JPEG pictures for more realistic tests. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
679
Networking for Embedded Systems
Why we use networks. Network abstractions. Example networks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
680
Overheads for Computers as Components 2nd ed.
Network elements distributed computing platform: PE PE communication link network PE PEs may be CPUs or ASICs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
681
Networks in embedded systems
initial processing more processing PE sensor PE PE actuator © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
682
Overheads for Computers as Components 2nd ed.
Why distributed? Higher performance at lower cost. Physically distributed activities---time constants may not allow transmission to central site. Improved debugging---use one CPU in network to debug others. May buy subsystems that have embedded processors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
683
Overheads for Computers as Components 2nd ed.
Network abstractions International Standards Organization (ISO) developed the Open Systems Interconnection (OSI) model to describe networks: 7-layer model. Provides a standard way to classify network components and operations. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
684
Overheads for Computers as Components 2nd ed.
OSI model application end-use interface presentation data format session application dialog control transport connections network end-to-end service data link reliable data transport physical mechanical, electrical © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
685
Overheads for Computers as Components 2nd ed.
OSI layers Physical: connectors, bit formats, etc. Data link: error detection and control across a single link (single hop). Network: end-to-end multi-hop data communication. Transport: provides connections; may optimize network resources. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
686
Overheads for Computers as Components 2nd ed.
OSI layers, cont’d. Session: services for end-user applications: data grouping, checkpointing, etc. Presentation: data formats, transformation services. Application: interface between network and end-user programs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
687
Hardware architectures
Many different types of networks: topology; scheduling of communication; routing. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
688
Point-to-point networks
One source, one or more destinations, no data switching (serial port): PE 3 PE 1 PE 2 link 1 link 2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
689
Overheads for Computers as Components 2nd ed.
Bus networks Common physical connection: PE 1 PE 2 PE 3 PE 4 header address data ECC packet format © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
690
Overheads for Computers as Components 2nd ed.
Bus arbitration Fixed: Same order of resolution every time. Fair: every PE has same access over long periods. round-robin: rotate top priority among Pes. fixed A B C A B C round-robin A B C B C A A,B,C A,B,C © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
691
Overheads for Computers as Components 2nd ed.
Crossbar out4 out3 out2 out1 in1 in2 in3 in4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
692
Crossbar characteristics
Non-blocking. Can handle arbitrary multi-cast combinations. Size proportional to n2. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
693
Overheads for Computers as Components 2nd ed.
Multi-stage networks Use several stages of switching elements. Often blocking. Often smaller than crossbar. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
694
Message-based programming
Transport layer provides message-based programming interface: send_msg(adrs,data1); Data must be broken into packets at source, reassembled at destination. Data-push programming: make things happen in network based on data transfers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
695
Overheads for Computers as Components 2nd ed.
I2C bus Designed for low-cost, medium data rate applications. Characteristics: serial; multiple-master; fixed-priority arbitration. Several microcontrollers come with built-in I2C controllers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
696
Overheads for Computers as Components 2nd ed.
I2C physical layer master 1 master 2 data line SDL clock line SCL slave 1 slave 2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
697
Overheads for Computers as Components 2nd ed.
I2C data format SCL ... ... ... SDL ack start MSB © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
698
I2C electrical interface
Open collector interface: + SDL + SCL © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
699
Overheads for Computers as Components 2nd ed.
I2C signaling Sender pulls down bus for 0. Sender listens to bus---if it tried to send a 1 and heard a 0, someone else is simultaneously transmitting. Transmissions occur in 8-bit bytes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
700
Overheads for Computers as Components 2nd ed.
I2C data link layer Every device has an address (7 bits in standard, 10 bits in extension). Bit 8 of address signals read or write. General call address allows broadcast. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
701
Overheads for Computers as Components 2nd ed.
I2C bus arbitration Sender listens while sending address. When sender hears a conflict, if its address is higher, it stops signaling. Low-priority senders relinquish control early enough in clock cycle to allow bit to be transmitted reliably. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
702
Overheads for Computers as Components 2nd ed.
I2C transmissions multi-byte write S adrs data data P read from slave S adrs 1 data P write, then read S adrs data S adrs 1 data P © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
703
Overheads for Computers as Components 2nd ed.
Ethernet Dominant non-telephone LAN. Versions: 10 Mb/s, 100 Mb/s, 1 Gb/s Goal: reliable communication over an unreliable medium. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
704
Overheads for Computers as Components 2nd ed.
Ethernet topology Bus-based system, several possible physical layers: A B C © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
705
Overheads for Computers as Components 2nd ed.
CSMA/CD Carrier sense multiple access with collision detection: sense collisions; exponentially back off in time; retransmit. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
706
Exponential back-off times
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
707
Ethernet packet format
preamble start frame source adrs dest adrs length data payload padding CRC © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
708
Overheads for Computers as Components 2nd ed.
Ethernet performance Quality-of-service tends to non-linearly decrease at high load levels. Can’t guarantee real-time deadlines. However, may provide very good service at proper load levels. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
709
Overheads for Computers as Components 2nd ed.
Fieldbus Used for industrial control and instrumentation---factories, etc. H1 standard based on MB/s twisted pair medium. High Speed Ethernet (HSE) standard based on 100 Mb/s Ethernet. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
710
Overheads for Computers as Components 2nd ed.
Networks Network-based design. Communication analysis. System performance analysis. Internet. Internet-enabled systems. Vehicles as networks. Sensor networks © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
711
Communication analysis
First, understand delay for single message. Delay for multiple messages depends on: network protocol; devices on network. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
712
Overheads for Computers as Components 2nd ed.
Message delay Assume: single message; no contention. Delay: tm = tx + tn + tr = xmtr overhead + network xmit time + rcvr overhead © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
713
Example: I2C message delay
Network transmission time dominates. Assume 100 kbits/sec, one 8-bit byte. Number of bits in packet: npacket = start + address + data + stop = = 18 bits Time required to transmit: 1.8 x 10-4 sec. 20 instructions on 8 MHz controller adds 2.5 x 10-6 delay on xmtr, rcvr. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
714
Overheads for Computers as Components 2nd ed.
Multiple messages If messages can interfere with each other, analysis is more complex. Model total message delay: ty = td + tm = wait time for network + message delay © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
715
Overheads for Computers as Components 2nd ed.
Arbitration and delay Fixed-priority arbitration introduces unbounded delay for all but highest-priority device. Unless higher-priority devices are known to have limited rates that allow lower devices to transmit. Round-robin arbitration introduces bounded delay proportional to N. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
716
Further complications
Acknowledgment time. Transmission errors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
717
Priority inversion in networks
In many networks, a packet cannot be interrupted. Result is priority inversion: low-priority message holds up higher-priority message. Doesn’t cause deadlock, but can slow down important communications. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
718
Overheads for Computers as Components 2nd ed.
Multihop networks In multihop networks, one node receives message, then retransmits to destination (or intermediate). hop 1 hop 2 A B C Network 1 Network 2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
719
System performance analysis
System analysis is difficult in general. multiprocessor performance analysis is hard; communication performance analysis is hard. Simple example: uncertainty in P1 finish time -> uncertainty in P2 start time. P1 P2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
720
Overheads for Computers as Components 2nd ed.
Analysis challenges P2 and P3 can delay each other, even though they are in separate tasks. Delays in P1 propagate to P2, then P3, then to P4. P1 P2 P3 P4 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
721
Overheads for Computers as Components 2nd ed.
Lower bounds on system Computational requirements: sum up process requirements over least-common multiple of periods, average over one period. Communication requirements: Count all transmissions in one period. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
722
Hardware platform design
Need to choose: number and types of PEs; number and types of networks. Evaluate a platform by allocating processes, scheduling processes and communication. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
723
I/O-intensive systems
Start with I/O devices, then consider computation: inventory required devices; identify critical deadlines; chooses devices that can share PEs; analyze communication times; choose PEs to go with devices. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
724
Computation-intensive systems
Start with shortest-deadline tasks: Put shortest-deadline tasks on separate PEs. Check for interference on critical communications. Allocate low-priority tasks to common PEs wherever possible. Balance loads wherever possible. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
725
Overheads for Computers as Components 2nd ed.
Internet Protocol Internet Protocol (IP) is basis for Internet. Provides an internetworking standard: between two Ethernets, Ethernet and token ring, etc. Higher-level services are built on top of IP. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
726
Overheads for Computers as Components 2nd ed.
IP in communication application application presentation presentation session session IP transport transport network network network data link data link data link physical physical physical node A router node B © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
727
Overheads for Computers as Components 2nd ed.
IP packet Includes: version, service type, length time to live, protocol source and destination address data payload Maximum data payload is 65,535 bytes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
728
Overheads for Computers as Components 2nd ed.
IP addresses 32 bits in early IP, 128 bits in IPv6. Typically written in form xxx.xx.xx.xx. Names (foo.baz.com) translated to IP address by domain name server (DNS). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
729
Overheads for Computers as Components 2nd ed.
Internet routing Best effort routing: doesn’t guarantee data delivery at IP layer. Routing can vary: session to session; packet to packet. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
730
Higher-level Internet services
Transmission Control Protocol (TCP) provides connection-oriented service. Quality-of-service (QoS) guaranteed services are under development. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
731
The Internet service stack
FTP HTTP SMTP telnet SNMP TCP UDP User Datagram Protocol IP © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
732
Internet-enabled embedded system
Internet-enabled embedded system: any embedded system that includes an Internet interface (e.g., refrigerator). Internet appliance: embedded system designed for a particular Internet task (e.g. ). Examples: Cell phone. Laser printer. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
733
Overheads for Computers as Components 2nd ed.
Example: Javacam Hardware platform: parallel-port camera; National Semi NS486SXF; 1.5 Mbytes memory. Uses memory-efficient Java Nanokernel. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
734
Overheads for Computers as Components 2nd ed.
Javacam architecture Web browser QuickCam applet HTTP Quickcam server QuickCam Java VM Java nanokernel 486 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
735
Overheads for Computers as Components 2nd ed.
Vehicles as networks 1/3 of cost of car/airplane is electronics/avionics. Dozens of microprocessors are used throughout the vehicle. Network applications: Vehicle control. Instrumentation. Communication. Passenger entertainment systems. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
736
Overheads for Computers as Components 2nd ed.
CAN bus First used in 1991. Serial bus, 1 Mb/sec up to 40 m. Synchronous bus. Logic 0 dominates logic 1 on bus. Arbitrated with CSMA/AMP: Arbitration on message priority. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
737
Overheads for Computers as Components 2nd ed.
CAN data frame 11 bit destination address. RTR bit determines read/write from/to destination. Any node can detect bus error, interrupt packet for retransmission. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
738
Overheads for Computers as Components 2nd ed.
CAN controller Controller implements physical and data link layers. No network layer needed---bus provides end-to-end connections. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
739
Overheads for Computers as Components 2nd ed.
Other vehicle busses FlexRay is next generation: Time triggered protocol. 10 Mb/s. Local Interconnect Network (LIN) connects devices in a small area (e.g., door). Passenger entertainment networks: Bluetooth. Media Oriented Systems Transport (MOST). © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
740
Overheads for Computers as Components 2nd ed.
Avionics Anything permanently attached to the aircraft must be certified by FAA/national agency. Traditional architecture uses separate electronics for each instrument/device. Line replaceable unit (LRU) can be physically removed and replaced. Federated architecture shares processors across a subsystem (nav/comm, etc.) © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
741
Overheads for Computers as Components 2nd ed.
Sensor networks Wireless networks, small nodes. Ad hoc networks---organizes itself without system administrator: Must be able to declare membership in network, find other networks. Must be able to determine routes for data. Must update configuration as nodes enter/leave. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
742
Overheads for Computers as Components 2nd ed.
Node capabilities Must be able to turn radio on/off quickly with low power overhead. Communication/computation power = 100x. Radios should operate at several different power levels to avoid interference with other nodes. Must buffer, route network traffic. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
743
Overheads for Computers as Components 2nd ed.
Networks Example: elevator controller. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
744
Overheads for Computers as Components 2nd ed.
Terminology Elevator car: holds passengers. Hoistway: elevator shaft. Car control panel: buttons in each car. Floor control panel: elevator request, etc. per floor. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
745
Overheads for Computers as Components 2nd ed.
Elevator system floor floor floor floor floor Hoistway 1 Hoistway 2 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
746
Overheads for Computers as Components 2nd ed.
Theory of operation Each floor has control panel, display. Each car has control panel: one button per floor; emergency stop. Controlled by a single controller. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
747
Elevator position sensing
sensor fine coarse © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
748
Overheads for Computers as Components 2nd ed.
Elevator control Elevator control has up and down. To stop, disable both. Master controller: reads elevator positions; reads requests; schedules elevators; controls movement; controls doors. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
749
Elevator system requirements
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
750
Elevator system class diagram
1 Coarse-sensor* Master-control-panel* 1 1 1 1 N Fine-sensor* Car 1 1 1 1 Controller 1 Car-control-panel* 1 1 1 Floor F N Floor-control-panel* Motor* 1 © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
751
Overheads for Computers as Components 2nd ed.
Physical interfaces Sensor* Car-control-panel* hit: boolean Floors[1..F]: boolean emergency-stop: boolean open-door, close-door: Coarse-sensor* Fine-sensor* Master-control-panel... Motor* Floor-control-panel* speed: {o,s,f} up, down: boolean © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
752
Overheads for Computers as Components 2nd ed.
Car and Floor classes Car Floor request-lights[1..F]: boolean current-floor: integer up-light, down-light: boolean © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
753
Overheads for Computers as Components 2nd ed.
Controller class Controller car-floor[1..H]: integer emergency-stop[1..H]: integer scan-cars() scan-floors() scan-master-panel() operate() © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
754
Overheads for Computers as Components 2nd ed.
Architecture Computation and I/O occur at: floor control panels/displays; elevator cars; system controller. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
755
Panels and cab controller
Panels are straightforward---no real-time requirements. Cab controller: read buttons and send events to system controller; read sensor inputs and send to system controller. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
756
Overheads for Computers as Components 2nd ed.
System controller Must take inputs from many sources: car controllers; floors. Must control cars to hard real-time deadlines. User interface, scheduling are soft deadlines. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
757
Overheads for Computers as Components 2nd ed.
Testing Build an elevator simulator using an FPGA: simulate multiple elevators; simulate real-time control demands. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
758
System design techniques
Design methodologies. Requirements and specification. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
759
Overheads for Computers as Components 2nd ed.
Design methodologies Process for creating a system. Many systems are complex: large specifications; multiple designers; interface to manufacturing. Proper processes improve: quality; cost of design and manufacture. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
760
Overheads for Computers as Components 2nd ed.
Product metrics Time-to-market: beat competitors to market; meet marketing window (back-to-school). Design cost. Manufacturing cost. Quality. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
761
Overheads for Computers as Components 2nd ed.
Mars Climate Observer Lost on Mars in September 1999. Requirements problem: Requirements did not specify units. Lockheed Martin used English; JPL wanted metric. Not caught by manual inspections. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
762
Overheads for Computers as Components 2nd ed.
Design flow Design flow: sequence of steps in a design methodology. May be partially or fully automated. Use tools to transform, verify design. Design flow is one component of methodology. Methodology also includes management organization, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
763
Overheads for Computers as Components 2nd ed.
Waterfall model Early model for software development: requirements architecture coding testing maintenance © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
764
Overheads for Computers as Components 2nd ed.
Waterfall model steps Requirements: determine basic characteristics. Architecture: decompose into basic modules. Coding: implement and integrate. Testing: exercise and uncover bugs. Maintenance: deploy, fix bugs, upgrade. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
765
Waterfall model critique
Only local feedback---may need iterations between coding and requirements, for example. Doesn’t integrate top-down and bottom-up design. Assumes hardware is given. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
766
Overheads for Computers as Components 2nd ed.
Spiral model system feasibility specification prototype initial system enhanced system requirements design test © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
767
Overheads for Computers as Components 2nd ed.
Spiral model critique Successive refinement of system. Start with mock-ups, move through simple systems to full-scale systems. Provides bottom-up feedback from previous stages. Working through stages may take too much time. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
768
Successive refinement model
specify specify architect architect design design build build test test initial system refined system © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
769
Hardware/software design flow
requirements and specification architecture software design hardware design integration testing © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
770
Co-design methodology
Must architect hardware and software together: provide sufficient resources; avoid software bottlenecks. Can build pieces somewhat independently, but integration is major step. Also requires bottom-up feedback. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
771
Hierarchical design flow
Embedded systems must be designed across multiple levels of abstraction: system architecture; hardware and software systems; hardware and software components. Often need design flows within design flows. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
772
Hierarchical HW/SW flow
spec architecture HW SW integrate test system spec HW architecture detailed design integration test hardware spec SW architecture detailed design integration test software © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
773
Concurrent engineering
Large projects use many people from multiple disciplines. Work on several tasks at once to reduce design time. Feedback between tasks helps improve quality, reduce number of later design problems. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
774
Concurrent engineering techniques
Cross-functional teams. Concurrent product realization. Incremental information sharing. Integrated product management. Supplier involvement. Customer focus. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
775
AT&T PBX concurrent engineering
Benchmark against competitors. Identify breakthrough improvements. Characterize current process. Create new process. Verify new process. Implement. Measure and improve. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
776
Requirements analysis
Requirements: informal description of what customer wants. Specification: precise description of what design team should deliver. Requirements phase links customers with designers. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
777
Overheads for Computers as Components 2nd ed.
Types of requirements Functional: input/output relationships. Non-functional: timing; power consumption; manufacturing cost; physical size; time-to-market; reliability. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
778
Overheads for Computers as Components 2nd ed.
Good requirements Correct. Unambiguous. Complete. Verifiable: is each requirement satisfied in the final system? Consistent: requirements do not contradict each other. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
779
Good requirements, cont’d.
Modifiable: can update requirements easily. Traceable: know why each requirement exists; go from source documents to requirements; go from requirement to implementation; back from implementation to requirement. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
780
Overheads for Computers as Components 2nd ed.
Setting requirements Customer interviews. Comparison with competitors. Sales feedback. Mock-ups, prototypes. Next-bench syndrome (HP): design a product for someone like you. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
781
Overheads for Computers as Components 2nd ed.
Specifications Capture functional and non-functional properties: verify correctness of spec; compare spec to implementation. Many specification styles: control-oriented vs. data-oriented; textual vs. graphical. UML is one specification/design language. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
782
Overheads for Computers as Components 2nd ed.
SDL Used in telecommunications protocol design. Event-oriented state machine model. telephone on-hook caller goes off-hook dial tone caller gets dial tone © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
783
Overheads for Computers as Components 2nd ed.
Statecharts Ancestor of UML state diagrams. Provided composite states: OR states; AND states. Composite states reduce the size of the state transition graph. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
784
Overheads for Computers as Components 2nd ed.
Statechart OR state s123 i1 i1 S1 S1 i2 i1 i1 i2 i2 S2 S4 S2 S4 i2 S3 S3 traditional OR state © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
785
Overheads for Computers as Components 2nd ed.
Statechart AND state sab c S1-3 S1-4 S1 S3 d b a b a c a b d c S2-3 S2-4 S2 S4 d r r r S5 S5 traditional AND state © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
786
Overheads for Computers as Components 2nd ed.
AND-OR tables Alternate way of specifying complex conditions: cond1 or (cond2 and !cond3) cond1 T - cond2 - T cond3 - F OR AND © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
787
Overheads for Computers as Components 2nd ed.
TCAS II specification TCAS II: aircraft collision avoidance system. Monitors aircraft and air traffic info. Provides audio warnings and directives to avoid collisions. Leveson et al used RMSL language to capture the TCAS specification. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
788
Overheads for Computers as Components 2nd ed.
RMSL State description: Transition bus for transitions between many states: state1 inputs a state description b c outputs d © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
789
TCAS top-level description
power-off power-on Inputs: TCAS-operational-status {operational,not-operational} fully-operational C own-aircraft other-aircraft i:[1..30] standby mode-s-ground-station i:[1..15] © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
790
Own-Aircraft AND state
CAS Inputs: own-alt-radio: integer standby-discrete-input: {true,false} own-alt-barometric:integer, etc. Climb-inibit Descend-inibit Effective-SL Alt-SL Alt-layer ... ... ... 1 1 Increase-climb-inibit 2 2 ... ... Increase-Descend-inibit ... ... Advisory-Status ... 7 7 Outputs: sound-aural-alarm: {true,false} aural-alarm-inhibit: {true, false} combined-control-out: enumerated, etc. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
791
Overheads for Computers as Components 2nd ed.
CRC cards Well-known method for analyzing a system and developing an architecture. CRC: classes; responsibilities of each class; collaborators are other classes that work with a class. Team-oriented methodology. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
792
Overheads for Computers as Components 2nd ed.
CRC card format Class name: Superclasses: Subclasses: Responsibilities: Collaborators: Class name: Class’s function: Attributes: front back © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
793
Overheads for Computers as Components 2nd ed.
CRC methodology Develop an initial list of classes. Simple description is OK. Team members should discuss their choices. Write initial responsibilities/collaborators. Helps to define the classes. Create some usage scenarios. Major uses of system and classes. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
794
CRC methodology, cont’d.
Walk through scenarios. See what works and doesn’t work. Refine the classes, responsibilities, and collaborators. Add class relatoinships: superclass, subclass. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
795
Overheads for Computers as Components 2nd ed.
CRC cards for elevator Real-world classes: elevator car, passenger, floor control, car control, car sensor. Architectural classes: car state, floor control reader, car control reader, car control sender, scheduler. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
796
Elevator responsibilities and collaborators
© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
797
System design techniques
Quality assurance. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
798
Overheads for Computers as Components 2nd ed.
Quality assurance Quality judged by how well product satisfies its intended function. May be measured in different ways for different kinds of products. Quality assurance (QA) makes sure that all stages of the design process help to deliver a quality product. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
799
Therac-25 Medical Imager (Leveson and Turner)
Six known accidents: radiation overdoses leading to death and serious injury. Radiation gun controlled by PDP-11. Four major software components: stored data; scheduler; set of tasks; interrupt services. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
800
Overheads for Computers as Components 2nd ed.
Therac-25 tasks Treatment monitor controlled and monitored setup and delivery of treatment in eight phases. Servo task controlled radiation gun. Housekeeper task took care of status interlocks and limit checks. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
801
Treatment monitor task
Treat was main monitor task. Eight subroutines. Treat rescheduled itself after every subroutine. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
802
Overheads for Computers as Components 2nd ed.
Software timing race Timing-dependent use of mode and energy: if keyboard handler sets completion behavior before operator changes mode/energy data, Datent task will not detect the change, but Hand task will. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
803
Software timing errors
Changes to parameters made by operator may show on screen but not be sensed by Datent task. One accident caused by entering mode/energy, changing mode/energy, returning to command line in 8 seconds. Skilled operators typed faster, more likely to exercise bug. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
804
Leveson and Turner observations
Performed limited safety analysis: guessed at error probabilities, etc. Did not use mechanical backups to check machine operation. Used overly complex programs written in unreliable styles. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
805
Overheads for Computers as Components 2nd ed.
ISO 9000 Developed by International Standards organization. Applies to a broad range industries. Concentrates on process. Validation based on extensive documentation of organization’s process. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
806
CMU Capability Maturity Model
Five levels of organizational maturity: Initial: poorly organized process, depends on individuals. Repeatable: basic tracking mechanisms. Defined: processes documented and standardized. Managed: makes detailed measurements. Optimizing: measurements used for improvement. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
807
Overheads for Computers as Components 2nd ed.
Verification Verification and testing are important throughout the design flow. Early bugs are more expensive to fix: requirements bug cost to fix coding bug time © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
808
Verifying requirements and specification
prototypes; prototyping languages; pre-existing systems. Specifications: usage scenarios; formal techniques. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
809
Overheads for Computers as Components 2nd ed.
Design review Uses meetings to catch design flaws. Simple, low-cost. Proven by experiments to be effective. Use other people in the project/company to help spot design problems. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
810
Overheads for Computers as Components 2nd ed.
Design review players Designers: present design to rest of team, make changes. Review leader: coordinates process. Review scribe: takes notes of meetings. Review audience: looks for bugs. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
811
Before the design review
Design team prepares documents used to describe the design. Leader recruits audience, coordinates meetings, distributes handouts, etc. Audience members familiarize themselves with the documents before they go to the meeting. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
812
Overheads for Computers as Components 2nd ed.
Design review meeting Leader keeps meeting moving; scribe takes notes. Designers present the design: use handouts; explain what is going on; go through details. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
813
Design review audience
Look for any problems: Is the design consistent with the specification? Is the interface correct? How well is the component’s internal architecture designed? Did they use good design/coding practices? Is the testing strategy adequate? © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
814
Overheads for Computers as Components 2nd ed.
Follow-up Designers make suggested changes. Document changes. Leader checks on results of changes, may distribute to audience for further review or additional reviews. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
815
Overheads for Computers as Components 2nd ed.
Measurements Measurements help ground our beliefs: Do our practices really work? Do they work where we think they work? Types of measurements: bugs found at different stages of design; bugs as a function of time; bugs in different types of components; how bugs are found. © 2008 Wayne Wolf Overheads for Computers as Components 2nd ed.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.