Slide 1 ISTORE-1 Update David Patterson University of California at Berkeley UC Berkeley IRAM Group UC Berkeley ISTORE Group.

Slides:

Advertisements

Similar presentations

U Computer Systems Research: Past and Future u Butler Lampson u People have been inventing new ideas in computer systems for nearly four decades, usually.

Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.

2. Computer Clusters for Scalable Parallel Computing

SECTION 4a Transforming Data into Information.

1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.

Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.

MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 11 Windows Server 2008 Virtualization.

Slide 1 Patterson’s Projects, People, Impact Reduced Instruction Set Computer (RISC) –What: simplified instructions to exploit VLSI: ‘80-’84 –With:

1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.

Chapter 1 and 2 Computer System and Operating System Overview

Slide 1 ISTORE Overview David Patterson, Katherine Yelick University of California at Berkeley UC Berkeley ISTORE Group

Slide 1 Computers for the Post-PC Era David Patterson University of California at Berkeley UC Berkeley IRAM Group UC Berkeley.

Slide 1 ISTORE: System Support for Introspective Storage Appliances Aaron Brown, David Oppenheimer, and David Patterson Computer Science Division University.

Chapter 3 Chapter 3: Server Hardware. Chapter 3 Learning Objectives n Describe the base system requirements for Windows NT 4.0 Server n Explain how to.

Memory: Virtual MemoryCSCE430/830 Memory Hierarchy: Virtual Memory CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng Zhu.

CPE 731 Advanced Computer Architecture Multiprocessor Introduction

Slide 1 Computers for the Post-PC Era David Patterson, Katherine Yelick University of California at Berkeley UC Berkeley IRAM.

CalStan 3/2011 VIRAM-1 Floorplan – Tapeout June 01 Microprocessor –256-bit media processor –12-14 MBytes DRAM – Gops –2W at MHz –Industrial.

1 IRAM and ISTORE David Patterson, Katherine Yelick, John Kubiatowicz U.C. Berkeley, EECS

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Router Architectures An overview of router architectures.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.

Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.

CS294, YelickISTORE, p1 CS ISTORE: Hardware Overview and Software Challenges

Computer Organization CS224 Fall 2012 Lesson 51. Measuring I/O Performance  I/O performance depends on l Hardware: CPU, memory, controllers, buses l.

Server Hardware Chapter 22 Release 22/10/2010Jetking Infotrain Ltd.

Computer System Architectures Computer System Software

LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.

Information and Communication Technology Fundamentals Credits Hours: 2+1 Instructor: Ayesha Bint Saleem.

UC Berkeley 1 The Datacenter is the Computer David Patterson Director, RAD Lab January, 2007.

Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.

Computer Processing of Data

1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.

1 CS503: Operating Systems Spring 2014 Dongyan Xu Department of Computer Science Purdue University.

Introduction CSE 410, Spring 2008 Computer Systems

CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

Slide 1 Research in Internet Scale Systems Katherine Yelick U.C. Berkeley, EECS With Jim Beck, Aaron Brown, Daniel Hettena,

LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor

Slide 1 Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001.

1 International Technology University CEN 951 Computer Architecture Lecture 1 - Introduction.

COMPUTER ARCHITECTURE. Recommended Text 1Computer Organization and Architecture by William Stallings 2Structured Computer Organisation Andrew S. Tanenbaum.

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

CS211 - Fernandez - 1 CS211 Graduate Computer Architecture Network 3: Clusters, Examples.

Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.

EEL 5708 Cluster computers. Case study: Google Lotzi Bölöni.

Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.

 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.

Rehab AlFallaj.  Network:  Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and do specific task.

Slide 1 Computers for the Post-PC Era David Patterson University of California at Berkeley UC Berkeley IRAM Group UC Berkeley.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.

Slide 1 ISTORE Update David Patterson University of California at Berkeley UC Berkeley IRAM Group UC Berkeley ISTORE Group

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

Slide 1 ISTORE Overview David Patterson, Katherine Yelick University of California at Berkeley UC Berkeley ISTORE Group

Hardware Architecture

William Stallings Computer Organization and Architecture 6th Edition

Memory COMPUTER ARCHITECTURE

Berkeley Cluster: Zoom Project

Architecture & Organization 1

Scaling for the Future Katherine Yelick U.C. Berkeley, EECS

Architecture & Organization 1

Computers for the Post-PC Era

Computer Evolution and Performance

Chapter 4 Multiprocessors

ISTORE Update David Patterson University of California at Berkeley

Presentation transcript:

Slide 1 ISTORE-1 Update David Patterson University of California at Berkeley UC Berkeley IRAM Group UC Berkeley ISTORE Group July 2000

Slide 2 Perspective on Post-PC Era PostPC Era will be driven by 2 technologies: 1) “Gadgets”:Tiny Embedded or Mobile Devices –ubiquitous: in everything –e.g., successor to PDA, cell phone, wearable computers 2) Infrastructure to Support such Devices –e.g., successor to Big Fat Web Servers, Database Servers

Slide 3 Outline Motivation for the ISTORE project –AME: Availability, Maintainability, Evolutionary growth ISTORE’s research principles & techniques –Introspection –SON: Storage-Oriented Node In Cluster –RAIN: Redundant Array of Inexpensive Network switches –Benchmarks for AME A Case for SON vs. CPUs Applications, near term and future Conclusions and future work

Slide 4 Lampson: Systems Challenges Systems that work –Meeting their specs –Always available –Adapting to changing environment –Evolving while they run –Made from unreliable components –Growing without practical limit Credible simulations or analysis Writing good specs Testing Performance –Understanding when it doesn’t matter “Computer Systems Research -Past and Future” Keynote address, 17th SOSP, Dec Butler Lampson Microsoft

Slide 5 Hennessy: What Should the “New World” Focus Be? Availability –Both appliance & service Maintainability –Two functions: »Enhancing availability by preventing failure »Ease of SW and HW upgrades Scalability –Especially of service Cost –per device and per service transaction Performance –Remains important, but its not SPECint “Back to the Future: Time to Return to Longstanding Problems in Computer Systems?” Keynote address, FCRC, May 1999 John Hennessy Stanford

Slide 6 Is Maintenance Key? Rule of Thumb: Cost Maintenance 10X cost of Hardware Causes of crashes on Digital VAX systems between 1985 and [Murp95]

Slide 7 The real scalability problems: AME Availability –systems should continue to meet quality of service goals despite hardware and software failures Maintainability –systems should require only minimal ongoing human administration, regardless of scale or complexity: Today, cost of maintenance = 10X cost of purchase Evolutionary Growth –systems should evolve gracefully in terms of performance, maintainability, and availability as they are grown/upgraded/expanded These are problems at today’s scales, and will only get worse as systems grow

Slide 8 Principles for achieving AME (1) No single points of failure Redundancy everywhere Performance robustness is more important than peak performance –“performance robustness” implies that real-world performance is comparable to best-case performance Performance can be sacrificed for improvements in AME –resources should be dedicated to AME »compare: biological systems spend > 50% of resources on maintenance –can make up performance by scaling system

Slide 9 Principles for achieving AME (2) Introspection –reactive techniques to detect and adapt to failures, workload variations, and system evolution –proactive techniques to anticipate and avert problems before they happen

Slide 10 Hardware Techniques (1): SON SON: Storage Oriented Nodes (in clusters) Distribute processing with storage –If AME really important, provide resources! –Most storage servers limited by speed of CPUs!! –Amortize sheet metal, power, cooling, network for disk to add processor, memory, and a real network? –Embedded processors 2/3 perf, 1/10 cost, power? –Serial lines, switches also growing with Moore’s Law; less need today to centralize vs. bus oriented systems Advantages of cluster organization –Truly scalable architecture –Architecture that tolerates partial failure –Automatic hardware redundancy

Slide 11 Hardware techniques (2) Heavily instrumented hardware –sensors for temp, vibration, humidity, power, intrusion –helps detect environmental problems before they can affect system integrity Independent diagnostic processor on each node –provides remote control of power, remote console access to the node, selection of node boot code –collects, stores, processes environmental data for abnormalities –non-volatile “flight recorder” functionality –all diagnostic processors connected via independent diagnostic network

Slide 12 Hardware techniques (3) On-demand network partitioning/isolation –Internet applications must remain available despite failures of components, therefore can isolate a subset for preventative maintenance –Allows testing, repair of online system –Managed by diagnostic processor and network switches via diagnostic network

Slide 13 Hardware techniques (4) Built-in fault injection capabilities –Power control to individual node components –Injectable glitches into I/O and memory busses –Managed by diagnostic processor –Used for proactive hardware introspection »automated detection of flaky components »controlled testing of error-recovery mechanisms –Important for AME benchmarking (see next slide)

Slide 14 ISTORE-1 hardware platform 80-node x86-based cluster, 1.4TB storage –cluster nodes are plug-and-play, intelligent, network- attached storage “bricks” »a single field-replaceable unit to simplify maintenance –each node is a full x86 PC w/256MB DRAM, 18GB disk –more CPU than NAS; fewer disks/node than cluster ISTORE Chassis 80 nodes, 8 per tray 2 levels of switches Mbit/s 2 1 Gbit/s Environment Monitoring: UPS, redundant PS, fans, heat and vibration sensors... Intelligent Disk “Brick” Portable PC CPU: Pentium II/266 + DRAM Redundant NICs (4 100 Mb/s links) Diagnostic Processor Disk Half-height canister

Slide 15 ISTORE-1 Status 10 Nodes manufactured; 60 board fabbed, 25 to go Boots OS Diagnostic Processor Interface SW complete PCB backplane: not yet designed Finish 80 node system: Summer 2000

Slide 16 A glimpse into the future? System-on-a-chip enables computer, memory, redundant network interfaces without significantly increasing size of disk ISTORE HW in 5-7 years: –building block: 2006 MicroDrive integrated with IRAM »9GB disk, 50 MB/sec from disk »connected via crossbar switch –If low power, 10,000 nodes fit into one rack! O(10,000) scale is our ultimate design point

Slide 17 Hardware Technique (6): RAIN Switches for ISTORE-1 substantial fraction of space, power, cost, and just 80 nodes! Redundant Array of Inexpensive Disks (RAID): replace large, expensive disks by many small, inexpensive disks, saving volume, power, cost Redundant Array of Inexpensive Network switches: replace large, expensive switches by many small, inexpensive switches, saving volume, power, cost? –ISTORE-1: Replace 2 16-port 1-Gbit switches by fat tree of 8 8-port switches, or 24 4-port switches?

Slide 18 “Hardware” techniques (6) Benchmarking –One reason for 1000X processor performance was ability to measure (vs. debate) which is better »e.g., Which most important to improve: clock rate, clocks per instruction, or instructions executed? –Need AME benchmarks “what gets measured gets done” “benchmarks shape a field” “quantification brings rigor”

Slide 19 Availability benchmark methodology Goal: quantify variation in QoS metrics as events occur that affect system availability Leverage existing performance benchmarks –to generate fair workloads –to measure & trace quality of service metrics Use fault injection to compromise system –hardware faults (disk, memory, network, power) –software faults (corrupt input, driver error returns) –maintenance events (repairs, SW/HW upgrades) Examine single-fault and multi-fault workloads –the availability analogues of performance micro- and macro-benchmarks

Slide 20 Results are most accessible graphically –plot change in QoS metrics over time –compare to “normal” behavior? »99% confidence intervals calculated from no-fault runs Benchmark Availability? Methodology for reporting results

Slide 21 Example single-fault result Compares Linux and Solaris reconstruction –Linux: minimal performance impact but longer window of vulnerability to second fault –Solaris: large perf. impact but restores redundancy fast Linux Solaris

Slide 22 Software techniques Fully-distributed, shared-nothing code –centralization breaks as systems scale up O(10000) –avoids single-point-of-failure front ends Redundant data storage –required for high availability, simplifies self-testing –replication at the level of application objects »application can control consistency policy »more opportunity for data placement optimization

Slide 23 Software techniques (2) “River” storage interfaces –NOW Sort experience: performance heterogeneity is the norm »e.g., disks: outer vs. inner track (1.5X), fragmentation »e.g., processors: load (1.5-5x) –So demand-driven delivery of data to apps »via distributed queues and graduated declustering »for apps that can handle unordered data delivery –Automatically adapts to variations in performance of producers and consumers –Also helps with evolutionary growth of cluster

Slide 24 Software techniques (3) Reactive introspection –Use statistical techniques to identify normal behavior and detect deviations from it –Policy-driven automatic adaptation to abnormal behavior once detected »initially, rely on human administrator to specify policy »eventually, system learns to solve problems on its own by experimenting on isolated subsets of the nodes one candidate: reinforcement learning

Slide 25 Software techniques (4) Proactive introspection –Continuous online self-testing of HW and SW »in deployed systems! »goal is to shake out “Heisenbugs” before they’re encountered in normal operation »needs data redundancy, node isolation, fault injection –Techniques: »fault injection: triggering hardware and software error handling paths to verify their integrity/existence »stress testing: push HW/SW to their limits »scrubbing: periodic restoration of potentially “decaying” hardware or software state self-scrubbing data structures (like MVS) ECC scrubbing for disks and memory

Slide 26 A Case for Storage Oriented Nodes Advantages of SON: 1 v. 2 Networks Physical Repair/Maintenance Die size vs. Clock rate, Complexity Silicon Die Cost ~ Area 4 Cooling ~ (Watts/chip) N Size, Power Cost of System v. Cost of Disks Cluster advantages: dependability, scalability Advantages of CPU: Apps don’t parallelize, so 1 very fast CPU much better in practice than N fast CPUs Leverage Desktop MPU investment Software Maintenance: 1 Large system with several CPUs easier to install SW than several small computers

Slide 27 SON: 1 vs. 2 networks Current computers all have LAN + Disk interconnect (SCSI, FCAL) –LAN is improving fastest, most investment, most features –SCSI, FCAL poor network features, improving slowly, relatively expensive for switches, bandwidth –Two sets of cables, wiring? Why not single network based on best HW/SW technology?

Slide 28 SON: Physical Repair Heterogeneous system with server components (CPU, backplane, memory cards, interface cards, power supplies,...) and disk array components (disks, cables, controllers, array controllers, power supplies,... ) –Keep all components available somewhere as FRUs Small number of modules that is based on hot-pluggable interconnect (LAN) with Field Replacable Units: Node, Power Supplies, network cables –Replace node (disk, CPU, memory, NI) if any fail –Preventative maintenance via isolation, fault insertion

Slide 29 SON: Complexity v. Perf Complexity increase: –HP PA-8500: issue 4 instructions per clock cycle, 56 instructions out-of-order execution, 4Kbit branch predictor, 9 stage pipeline, 512 KB I cache, 1024 KB D cache (> 80M transistors just in caches) –Intel SA-110: 16 KB I$, 16 KB D$, 1 instruction, in order execution, no branch prediction, 5 stage pipeline Complexity costs in development time, development power, die size, cost –550 MHz HP PA mm 2, 0.25 micron/4M $330, 60 Watts –233 MHz Intel SA mm 2, 0.35 micron/3M $18, 0.4 Watts

Slide 30 Cost of System v. Disks Examples show cost of way we build current systems (CPU, 2 networks, many disks/CPU …) DateCostDisksDisks/CPU –NCR WorldMark: 10/97$8.3M –Sun Enterprise 10k: 3/98$5.2M –Sun Enterprise 10k: 9/99$6.2M –IBM Netinf. Cluster: 7/00$7.8M And these Data Base apps are CPU bound!!! Also potential savings in space, power –ISTORE-1: with big switches, its 2-3 racks for 80 CPUs/disks (3/8 rack unit per CPU/disk themselves) –ISTORE-2: 4X density improvement?

Slide 31 SON: Cluster Advantages Truly scalable architecture Architecture that tolerates partial failure Automatic hardware redundancy

Slide 32 SON: Cooling cost v. Peak Power What is relationship? –Feet per second of air flow? –Packaging costs? –Fan failure?

Slide 33 The Case for CPU But: Assume Apps that parallelize: WWW services, Vision, Graphics Leverage investment in Embedded MPU, System on a Chip Improved maintenance is research target: e.g., many disks lower reliability, but RAID is better Advantages of CPU: Apps don’t parallelize, so N very fast CPU much better in practice than 2N fast CPUs Leverage Desktop MPU investment Software Installation: 1 Large system with several CPUs easier to keep SW up-to-date than several small computers

Slide 34 Initial Applications ISTORE is not one super-system that demonstrates all these techniques! –Initially provide middleware, library to support AME goals Initial application targets –cluster web/ servers »self-scrubbing data structures, online self-testing »statistical identification of normal behavior –information retrieval for multimedia data »self-scrubbing data structures, structuring performance-robust distributed computation

Slide 35 ISTORE Successor does Human Quality Vision? Malik at UCB thinks vision research at critical juncture; have about right algorithms, awaiting faster computers to test them 10,000 nodes with System-On-A-Chip + Microdrive + network –1 to 10 GFLOPS/node => 10,000 to 100,000 GFLOPS –High Bandwidth Network –1 to 10 GB of Disk Storage per Node => can replicate images per node – Need AME advances to keep 10,000 nodes useful

Slide 36 ISTORE Continued Funding New NSF Information Technology Research, larger funding (>$500K/yr) 1400 Letters 920 Preproposals 134 Full Proposals Encouraged 240 Full Proposals Submitted 60 Funded (pending approval) Rumor: We’re in the top 5%

Slide 37 Conclusions: ISTORE Availability, Maintainability, and Evolutionary growth are key challenges for server systems –more important even than performance ISTORE is investigating ways to bring AME to large-scale, storage-intensive servers –via clusters of network-attached, computationally- enhanced storage nodes running distributed code –via hardware and software introspection –we are currently performing application studies to investigate and compare techniques Availability benchmarks a powerful tool? –revealed undocumented design decisions affecting SW RAID availability on Linux and Windows 2000 Exciting applications for large systems that can be maintained

Slide 38 Backup Slides

Slide 39 State of the art Cluster: NCR WorldMark … BYNET switched network … …… bus bridge … … 1 …… scsiscsi … … 64 Bus bridge Proc Mem 1 Proc Mem Bus bridge Proc 32 Proc TPC-D, TD V2, 10/97 –32 nodes x MHz CPUs, 1 GB DRAM, 41 disks (128 cpus, 32 GB, 1312 disks, 5.4 TB) –CPUs, DRAM, encl., boards, power $5.3M –Disks+cntlr$2.2M –Disk shelves$0.7M –Cables$0.1M –HW total $8.3M scsiscsi scsiscsi scsiscsi scsiscsi scsiscsi Mem pci source: pci

Slide 40 State of the Art SMP: Sun E10000 … data crossbar switch 4 address buses … …… bus bridge … … 1 …… scsiscsi … … 23 Mem Xbar bridge Proc s 1 Mem Xbar bridge Proc s 16 Proc TPC-D,Oracle 8, 3/98 –SMP MHz CPUs, 64GB dram, 668 disks (5.5TB) –Disks,shelf$2.1M –Boards,encl.$1.2M –CPUs$0.9M –DRAM$0.8M –Power$0.1M –Cables,I/O$0.1M –HW total $5.2M scsiscsi scsiscsi scsiscsi scsiscsi scsiscsi scsiscsi scsiscsi scsiscsi scsiscsi source:

Slide 41 State of the Art SMP: Sun E10000 … data crossbar switch 4 address buses … …… bus bridge … … 1 …… fcalfcal … … 27 Mem Xbar bridge Proc s 1 Mem Xbar bridge Proc s 16 Proc TPC-C,Oracle 8i, 9/99 –SMP MHz CPUs, 64GB dram, 1732 disks (15.5TB) –Disks,shelf$3.6M –Boards,encl.$0.9M –CPUs$0.9M –DRAM$0.6M –Power$0.1M –Cables,I/O$0.1M –HW total $6.2M fcalfcal fcalfcal fcalfcal fcalfcal fcalfcal fcalfcal fcalfcal fcalfcal fcalfcal source:

Slide 42 State of the art Cluster: IBM Netinfinity … Giganet 1Gbit switched Ethernet … …… bus bridge … … 1 …… scsiscsi … … Bus bridge Proc Mem 1 Proc Mem Bus bridge Proc 32 Proc TPC-C, DB2, 7/00 –32 nodes x MHz CPUs, 0.5 GB DRAM, 220 disks (128 cpus, 16 GB, 7040 disks, 116 TB) –CPUs$0.6M –Caches$0.5M –DRAM$0.6M –Disks $3.8M –Disk shelves$1.6M –Disk cntrl.$0.4M –Racks$0.1M –Cables$0.1M –Switches$0.1M –HW total $7.8M scsiscsi scsiscsi scsiscsi scsiscsi scsiscsi Mem pci source: pci 704

Slide 43 Attacking Computer Vision Analogy: Computer Vision Recognition in 2000 like Computer Speech Recognition in 1985 –Pre 1985 community searching for good algorithms: classic AI vs. statistics? –By 1985 reached consensus on statistics –Field focuses and makes progress, uses special hardware –Systems become fast enough that can train systems rather than preload information, which accelerates progress –By 1995 speech regonition systems starting to deploy –By 2000 widely used, available on PCs

Slide 44 Computer Vision at Berkeley Jitendra Malik believes has an approach that is very promising 2 step process: 1) Segmentation: Divide image into regions of coherent color, texture and motion 2) Recognition: combine regions and search image database to find a match Algorithms for 1) work well, just slowly (300 seconds per image using PC) Algorithms for 2) being tested this summer using hundreds of PCs; will determine accuracy

Slide 45 Human Quality Computer Vision Suppose Algorithms Work: What would it take to match Human Vision? At 30 images per second: segmentation –Convolution and Vector-Matrix Multiply of Sparse Matrices (10,000 x 10,000, 10% nonzero/row) –32-bit Floating Point –300 seconds on PC (assuming 333 MFLOPS) => 100G FL Ops/image –30 Hz => 3000 GFLOPs machine to do segmentation

Slide 46 Human Quality Computer Vision At 1 / second: object recognition –Human can remember 10,000 to 100,000 objects per category (e.g., 10k faces, 10k Chinese characters, high school vocabulary of 50k words,..) –To recognize a 3D object, need ~10 2D views –100 x 100 x 8 bit (or fewer bits) per view => 10,000 x 10 x 100 x 100 bytes or 10 9 bytes –Pruning using color and texture and by organizing shapes into an index reduce shape matches to 1000 –Compare 1000 candidate merged regions with 1000 candidate object images –If 10 hours on PC (333 MFLOPS) => GFLOPS –Use storage to reduce computation?