1 Berkeley RAD Lab Technical Overview Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica March 2006.

Slides:



Advertisements
Similar presentations
RAMP Gold : An FPGA-based Architecture Simulator for Multiprocessors Zhangxi Tan, Andrew Waterman, David Patterson, Krste Asanovic Parallel Computing Lab,
Advertisements

Performance Testing - Kanwalpreet Singh.
SLA-Oriented Resource Provisioning for Cloud Computing
EEL6686 Guest Lecture February 25, 2014 A Framework to Analyze Processor Architectures for Next-Generation On-Board Space Computing Tyler M. Lovelly Ph.D.
Tier 1 Breakout Topics How to study a 100,000-core system (yes that is 100K) using RAMP technologies? Krste What "great" research questions can RAMP help.
RAMP in Retrospect David Patterson August 25, 2010.
Parallel Applications Parallel Hardware Parallel Software 1 The Parallel Computing Laboratory Krste Asanovic, Ras Bodik, Jim Demmel, Tony Keaveny, Kurt.
RAMP Retreat August 2008 Christos Kozyrakis Pervasive Parallelism Laboratory Stanford University
1 Jan 07 RAMP PI Report: Plans until next Retreat & Beyond Krste Asanovíc (MIT), Derek Chiou (Texas), James Hoe(CMU), Christos Kozyrakis (Stanford), Shih-Lien.
1 RAMP White RAMP Retreat, BWRC, Berkeley, CA 20 January 2006 RAMP collaborators: Arvind (MIT), Krste Asanovíc (MIT), Derek Chiou (Texas), James Hoe (CMU),
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
1 RAMP Implementation J. Wawrzynek. 2 RDL supports multiple platforms:  XUP, pure software, BEE2 BEE2 will be the standard RAMP platform for the next.
1 Research Accelerator for MultiProcessing Dave Patterson, UC Berkeley January RAMP collaborators: Arvind (MIT), Krste Asanovíc (MIT), Derek Chiou.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
1 GENI: Global Environment for Network Innovations Jennifer Rexford Princeton University
BEEKeeper Remote Management and Debugging of Large FPGA Clusters Terry Filiba Navtej Sadhal.
1 GENI: Global Environment for Network Innovations Jennifer Rexford On behalf of Allison Mankin (NSF)
UC Berkeley 1 Time dilation in RAMP Zhangxi Tan and David Patterson Computer Science Division UC Berkeley.
Berkeley RAD Lab Center Proposal Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica RADS Retreat, June 2005.
Research Accelerator for Multiple Processors
1 A Community Vision for a Shared Experimental Parallel HW/SW Platform Dave Patterson, Pardee Professor of Comp. Science, UC Berkeley President, Association.
1 Introduction to Research Accelerator for Multiple Processors David Patterson (Berkeley, CO-PI), Arvind (MIT), Krste Asanovíc (Berkeley/MIT), Derek Chiou.
1 RAMP Tutorial Introduction/Overview Krste Asanovic UC Berkeley RAMP Tutorial, ASPLOS, Seattle, WA March 2, 2008.
Climate Machine Update David Donofrio RAMP Retreat 8/20/2008.
1 A Research Program in Reliable Adaptive Distributed Systems (RADS) Armando Fox*, Michael Jordan, Randy Katz, George Necula, David Patterson, Ion Stoica,
SECTION 1: INTRODUCTION TO SIMICS Scott Beamer CS152 - Spring 2009.
1 Reliable Adaptive Distributed Systems Armando Fox, Michael Jordan, Randy H. Katz, David Patterson, George Necula, Ion Stoica, Doug Tygar.
1 Berkeley RAD Lab: Robust, Adaptive, Distributed Systems Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica November 2005.
VMware vCenter Server Module 4.
© 2012 AT&T Intellectual Property. All rights reserved. AT&T, the AT&T logo and all other AT&T marks contained herein are trademarks of AT&T Intellectual.
7/20/05FDIS The Design and Application of Berkeley Emulation Engines John Wawrzynek Bob Brodersen Chen Chang University of California, Berkeley Berkeley.
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
The Pursuit for Efficient S/C Design The Stanford Small Sat Challenge: –Learn system engineering processes –Design, build, test, and fly a CubeSat project.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.

DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
UC Berkeley Scaleable Structured Datastorage for Web 2.0 Michael Armbrust, David Patterson October, 2007.
N. GSU Slide 1 Chapter 02 Cloud Computing Systems N. Xiong Georgia State University.
Cloud Computing & Amazon Web Services – EC2 Arpita Patel Software Engineer.
Above the Clouds : A Berkeley View of Cloud Computing
RAMPing Down Chuck Thacker Microsoft Research August 2010.
CS/ECE 3330 Computer Architecture Kim Hazelwood Fall 2009.
Berkeley RAD Lab: Building Successful Industry Partnerships for Fun & Profit Armando Fox Research Associate & Co-founding PI, UC Berkeley RADLab Visiting.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
1 Berkeley RAD Lab Technical Approach Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica October 2005.
A scalable and flexible platform to run various types of resource intensive applications on clouds ISWG June 2015 Budapest, Hungary Tamas Kiss,
CLOUD COMPUTING. What is cloud computing ? History Virtualization Cloud Computing hardware Cloud Computing services Cloud Architecture Advantages & Disadvantages.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. 1 Automate your way to.
1 Retreat (Advance) John Wawrzynek UC Berkeley January 15, 2009.
Using HTTP Access Logs To Detect Application-Level Failures In Internet Services Peter Bodík ‡, Greg Friedman †, Lukas Biewald †, Helen Levine §, George.
Architecture & Cybersecurity – Module 3 ELO-100Identify the features of virtualization. (Figure 3) ELO-060Identify the different components of a cloud.
Web Technologies Lecture 13 Introduction to cloud computing.
Towards a Smart Workload Generator on RAMP Archana Ganapathi, David Patterson, Anthony Joseph {archanag, pattrsn, cs.berkeley.edu.
Background Computer System Architectures Computer System Software.
3/12/07CS Visit Days1 A Sea Change in Processor Design Uniprocessor SpecInt Performance: From Hennessy and Patterson, Computer Architecture: A Quantitative.
SOA Concepts Service Oriented Architecture Johns-Hopkins University Montgomery County Center, Spring 2009 Session 1: January 28, 2009 Instructor:
Using HTTP Access Logs To Detect Application-Level Failures In Internet Services Peter Bodík, UC Berkeley Greg Friedman, Lukas Biewald, Stanford University.
1 This Changes Everything: Accelerating Scientific Discovery through High Performance Digital Infrastructure CANARIE’s Research Software.
Wednesday NI Vision Sessions
Extreme Scale Infrastructure
Intro to Software as a Service (SaaS) and Cloud Computing
Welcome: Intel Multicore Research Conference
David Patterson Electrical Engineering and Computer Sciences
RM3G: Next Generation Recovery Manager
Berkeley RAD Lab Center Proposal
Presentation transcript:

1 Berkeley RAD Lab Technical Overview Armando Fox, Randy Katz, Michael Jordan, Dave Patterson, Scott Shenker, Ion Stoica March 2006

2 RAD Lab The 5-year Vision: Single person can go from vision to a next-generation IT service (“the Fortune 1 million”) E.g., over long holiday weekend in 1995, Pierre Omidyar created Ebay v1.0 The Challenges: Develop the new Service: today, easy prototyping ≠ easy operations Assess: Measuring, Testing, and Debugging the new Service in a realistic distributed environment: how will it scale? Deploy: Scaling up a new, geographically distributed Service Operate a service that could quickly scale to millions of users with <1 operator The Vehicle: Interdisciplinary Center creates core technical competency to demo 10X to 100X Researchers are leaders in machine learning, networking, and systems Industrial Participants: leading companies in HW, systems SW, and online services “RAD Lab” = Reliable, Adaptable, Distributed systems

3 Founding the RAD Lab Looked for 3 to 4 founding companies to fund 5 cost of $0.5M / year  Google, Microsoft, Sun Microsystems signed up Affiliate Companies ($0.1M/yr): HP, IBM, others Founding Company Model  Prefer founding partner technology in prototypes  Designate employees to act as consultants  Putting IP in Public Domain  3-year project review by founding partners $2.5-$3M/yr ~65% industry, ~25% state, ~10% fed  30 grad students + 10 undergrads+ 6 faculty + 2 staff

4 Steps vs. Process Process: Support DADO Evolution, 1 group Steps: Traditional, Static Handoff Model, N groups DevelopAssessDeployOperateDevelopAssessDeployOperate

5 Key Ingredients: Visualization & Statistical Machine Learning (SML) Too much data for human to troubleshoot manually  Eg Amazon - tens of metrics, 100’s-1000’s of machines Visualization exploits human visual processing SML finds patterns in large quantities of data

6 Operations example: combining visualization & machine learning Idea: end-user behavior as “failure detector” Approach: combine visualization with SML analysis so operator see anomalies too Experiment: does distribution of hits to various pages match the “historical” distribution?  Each minute, compare hit counts of top N pages to hit counts over last 6 hours using Bayesian networks and  2 test, real Ebates data To learn more, see “Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization,” In Proc. 2nd IEEE Int’l Conf. on Autonomic Computing, June 2005, by Peter Bodik, Greg Friedman, Lukas Biewald, Helen Levine (Ebates,com), George Candea, Kayur Patel, Gilman Tolle, Jon Hui, Armando Fox, Michael I. Jordan, David Patterson.

7 Visualization as user behavior completely different; usually animate architecture Win trust in SLT by leveraging operator expertise and human visual pattern recognition Top40Pages Time (5 minute intervals)

8 Build Academic MPP from FPGAs As  25 CPUs will fit in Field Programmable Gate Array (FPGA), 1000-CPU system from  40 FPGAs? bit simple “soft core” RISC at 150MHz in 2004 (Virtex-II) FPGA generations every 1.5 yrs;  2X CPUs,  1.2X clock rate HW research community does logic design (“gate shareware”) to create out-of-the-box, MPP  E.g., 1000 processor, standard ISA binary-compatible, 64-bit, cache- coherent  100 MHz/CPU in 2007  RAMPants: Arvind (MIT), Krste Asanovíc (MIT), Derek Chiou (Texas), James Hoe (CMU), Christos Kozyrakis (Stanford), Shih-Lien Lu (Intel), Mark Oskin (Washington), David Patterson (Berkeley, Co-PI), Jan Rabaey (Berkeley), and John Wawrzynek (Berkeley, PI) “Research Accelerator for Multiple Processors”

9 Why RAMP Good for Research MPP? SMPClusterSimulate RAMP Scalability (1k CPUs) CAAA Cost (1k CPUs) F ($40M) C ($2-3M) A+ ($0M) A ($ M) Cost of ownershipADAA Power/Space (kilowatts, racks) D (120 kw, 12 racks) A+ (.1 kw, 0.1 racks) A (1.5 kw, 0.3 racks) CommunityDAAA ObservabilityDCA+ ReproducibilityBDA+ ReconfigurabilityDCA+ CredibilityA+ FB+/A- Perform. (clock) A (2 GHz) A (3 GHz) F (0 GHz) C ( GHz) GPACB-BA-

10 Completed Dec (14x17 inch 22-layer PCB) Board: 5 Virtex II FPGAs, 18 banks DDR2-400 memory, 20 10GigE conn. RAMP 1 Hardware BEE2: Berkeley Emulation Engine 2 By John Wawrzynek and Bob Brodersen with students Chen Chang and Pierre Droz 1.5W / computer, 5 cu. in. /computer, $100 / computer 1000 CPUs :  1.5 KW,  ¼ rack,  $100,000 Box: 8 compute modules in 8U rack mount chassis

11 RAMP in RADS: Internet in a Box Building blocks also  Distributed Computing RAMP vs. Clusters (Emulab, PlanetLab)  Scale: RAMP O(1000) vs. Clusters O(100)  Private use: $100k  Every group has one  Develop/Debug: Reproducibility, Observability  Flexibility: Modify modules (Router, SMP, OS) Explore via repeatable experiments as vary parameters, configurations vs. observations on single (aging) cluster that is often idiosyncratic

12 Planned Apps & Courses ResearchIndex: reputation & ranking system for CS research papers and digests  Seeking suggestions/collaboration on this & other possible apps, to get experience with Develop & Deploy  Seeing datasets corresponding to larger (real) apps as well, to increase experience with Assess & Operate Courses  CS 294, Fall 06: MS/PhD level projects contributing to RAD Lab infrastructure in all areas (DADO)  CS 294, Fall 07: Prototype services to run in “production mode” on RAD Lab platform, improve platform/environment based on lessons from deployment  CS 294, Fall 08: “Web 2.0” style services on RAD Lab platform (e.g. joint with Haas Business School)  Undergrad courses, >2008: software eng. assignments are network services running on RADS platform

13 RAD Lab: Interdisciplinary Center for Reliable, Adaptive, Distributed Systems Develop using primitives to enable functions (MapReduce), services (Craigslist) Assess using deterministic replay and statistical debugging Deploy via “Internet-in-a-Box” FPGAs Operate SLT-friendly, Control Theory- friendly architectures and operator- centric visualization and analysis tools Capability (Desired): Capability (Desired): 1 person can invent & run the next-gen IT service Base Technology: Server Hardware, System Software, Middleware, Networking

14 Industrial collaboration Historically a UCB strength Industrial research labs are ideal partners  High quality research staff => symmetric collaboration  Ties to product groups => work on relevant problems  Access to real data sets => realistic evaluation of prototypes Goal: ongoing transfer of software, technology & people  “BSD License” for RAD Lab technology intended to ease adoption by industrial partners RADLab targets: SML & control theory, visualization, development of service-oriented archs. & apps.

15 RAD Lab Timeline 2005 Launch RAD Lab 2006 Collect workloads, Internet in a Box 2007 SLT/CT distributed architectures, Iboxes, annotation layer, class testing 2008 Development toolkit 1.0, tuple space, class testing; Mid Project Review 2009 RAD Lab software suite 1.0, class testing 2010 End of Project Party