Revolution on Demand: Push-button Specialized Supercomputers Chita Das, Yale Patt, Kevin Skadron, Karin Strauss, and Steven Swanson.

Slides:



Advertisements
Similar presentations
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Advertisements

Presenter : Shao-Chieh Hou VLSI Design, Automation and Test, VLSI-DAT 2007.
The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Computing Systems Roadmap and its Impact on Software Development Michael Ernst, BNL HSF Workshop at SLAC January, 2015.
Hadi Salimi Distributed Systems Labaratory, School of Computer Engineering, Iran University of Science and Technology, Fall
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
By Adam Balla & Wachiu Siu
Cloud Computing to Satisfy Peak Capacity Needs Case Study.
Clouds C. Vuerli Contributed by Zsolt Nemeth. As it started.
Life and Health Sciences Summary Report. “Bench to Bedside” coverage Participants with very broad spectrum of expertise bridging all scales –From molecule.
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Some Thoughts on Technology and Strategies for Petaflops.
Think. Learn. Succeed. Aura: An Architectural Framework for User Mobility in Ubiquitous Computing Environments Presented by: Ashirvad Naik April 20, 2010.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Chapter 13 Embedded Systems
Introduction to Databases Transparencies
Hot Chips 16August 24, 2004 OptimoDE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Codesign Part of HW/SW Codesign of Embedded Systems Course (CE )
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Research Directions for On-chip Network Microarchitectures Luca Carloni, Steve Keckler, Robert Mullins, Vijay Narayanan, Steve Reinhardt, Michael Taylor.
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Power is Leading Design Constraint Direct Impacts of Power Management – IDC: Server 2% of US energy consumption and growing exponentially HPC cluster market.
Discussion on LI for Mobile Clouds
DSL Group (#1) Kunle Olukotun Jim Larus Krste Asanovic Almadena Chtchelkanova Mark Oskin.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
“Clouds: a construction zone” (and Why PaaS is the future…) Matt Thompson General Manager, Developer & Platform Evangelism Microsoft.
Chapter 2 The process Process, Methods, and Tools
Virtualization Lab 3 – Virtualization Fall 2012 CSCI 6303 Principles of I.T.
Brussels, 1 June 2005 WP Strategic Objective Embedded Systems Tom Bo Clausen.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
REXAPP Bilal Saqib. REXAPP  Radio EXperimentation And Prototyping Platform Based on NOC  REXAPP Compiler.
Architecting Web Services Unit – II – PART - III.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Clever Framework Name That Doesn’t Violate Copyright Laws MARCH 27, 2015.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
B3AS Joseph Lewthwaite 1 Dec, 2005 ARL Knowledge Fusion COE Program.
ISO/IEC SC 25/WG 1 ISO/IEC : CodeBase Tag Discussion Ron Ambrosio IBM TJ Watson Research Center.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
1 BRUSSELS - 14 July 2003 Full Security Support in a heterogeneous mobile GRID testbed for wireless extensions to the.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Group member: Kai Hu Weili Yin Xingyu Wu Yinhao Nie Xiaoxue Liu Date:2015/10/
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
Data Center & Large-Scale Systems (updated) Luis Ceze, Bill Feiereisen, Krishna Kant, Richard Murphy, Onur Mutlu, Anand Sivasubramanian, Christos Kozyrakis.
Programmability Hiroshi Nakashima Thomas Sterling.
Internet of Things. IoT Novel paradigm – Rapidly gaining ground in the wireless scenario Basic idea – Pervasive presence around us a variety of things.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Mark Gilbert Microsoft Corporation Services Taxonomy Building Block Services Attached Services Finished Services.
Cray XD1 Reconfigurable Computing for Application Acceleration.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
Tackling I/O Issues 1 David Race 16 March 2010.
DUSD(Labs) GSRC Calibrating Achievable Design 11/02.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Instructor Materials Chapter 7: Network Evolution
Organizations Are Embracing New Opportunities
Power is Leading Design Constraint
Principles/Paradigms Of Pervasive Computing
Introduction to Heterogeneous Parallel Computing
Cloud Computing: Concepts
♪ Embedded System Design: Synthesizing Music Using Programmable Logic
Presentation transcript:

Revolution on Demand: Push-button Specialized Supercomputers Chita Das, Yale Patt, Kevin Skadron, Karin Strauss, and Steven Swanson

Man on the Moon Goals Push-button, drop-in co-processor for every application o 10,000X speedup over general-purpose for $10K Push-button, application specific reconfiguration o 1000X speedup in a commodity node Stay within today's power envelope, chip area, system form factor Scalable o All form factors -- handheld to datacenter o Scale out o Scale forward Man on Neptune goal o Custom data center for every application o Reconfigure entire system

Scenario Go to website, give algorithm, data size, target perf, physical constraints (form factor, power, cooling), push button FedEx delivers the next day Or, for cloud service, you just get an IP address (Mostly) transparent software layer

Problem Statement Many applications still require orders of magnitude improvement in performance, perf/W, or perf/$ to enable new capabilities o Mobile: Speech recognition, language translation, data analysis, diagnostics (think Star Trek tricorder), situational awareness o Desktop/Notebook: Still the sweet spot for programmers and users: Video processing, rich UIs/VR, interactive problem solving/data analysis (eg MATLAB) o Data center: Large-scale problems  Computational fluid dynamics  Drug discovery  Simulation and modeling (weather, geology, tsunami prediction, multi-agent modeling, etc)  Mining and machine learning (graph analytics, EXAMPLES...)

Problem statement, cont. Performance within a processing node matters o Sweet spots: mobile, single computer, single rack, small data center o Infrastructure, utility costs o Compute vs communication balance Specialization provides 10X-10,000X improvements in performance as well as perf/W, perf/$ o Specialized computational units, memory hierarchies, interconnects, etc. o Examples, SIMD/MIMD/task pipelines; custom operations (FFT, IDCT, transcendentals, etc.); support for fine- grained synch...

Problem statement cont. Specialization will also be necessary as general-purpose scaling stops o Power delivery, cooling, pin-B/W all hitting walls due to fundamental physical limits o Beyond Moore's Law

Some Key Requirements Programs should be portable across diverse hardware o Same code should be portable across platforms and generations  Separate correctness from performance o System software (compiler/runtime/OS) must automatically map to specific resources Programmers should be able to drill down to optimize for specific hardware

Research Questions How do we pick which heterogeneous resources to use? o How to identify application-specific needs o How to generate appropriate specialized hardware units How do we connect them? How do we abstract it to the software? o Interface design/abstractions? o Programming models? How do we build it in a cost-effective manner? o Huge design automation and VLSI challenges How do we automate all this?

Architecting Components of Heterogeneous Systems Components must have o Clean interfaces o Scalability o Reusable o Composable A menu of components o Form factors o Memory interfaces Select from a menu of components and press go o Humans are involved only at the highest level of design

Opportunities for specialization Computing resources o ISA specialization o ASIC/ASIP cores o Reconfigurable logic blocks o Specialized serial/CPU/control cores o Non-silicon computation Communication resources o Specialized NoC, NIC Storage resources o Specialized/programmable memory interfaces o Access pattern-specific memory organization  Streaming, scatter/gather, etc. System topology Software!Software!Software! o Programming models and system software

Abstracting Heterog to Software Abstractions are key to integrating heterogeneity into the larger system o Language? o Hardware capability descriptions? Well-defined, boundaries o Minimize changes needed in upper layers to leverage heterogeneity

Driver applications: There's an App (and chip) for that! Large scale o Climate modeling o Multiscale modeling of the human body  From proteins to gross mechanics (muscles, bones) o Genomics and drug discovery o Graph analytics o Video analytics o Multi-agent simulations Portable o A supercomputing laptop for signal intellegence Embedded o "Tricorder"...

Approach Select some specific, strategic application drivers Develop specialized accelerators for those applications o Emphasize design reuse o System-level building blocks o Distill HW/SW co-design principles Develop new SW abstractions, maybe DSLs Build and deploy in a small scale "real" system o Custom supercomputers-in-a-rack o Maybe Bluegene-style cards in backplanes Iterate!

Must Work with Real Applications (and Developers) Must work with real applications o Continuous feedback loop between our research and outcomes for strategic applications o Gain experience in exploiting hetero throughout the system o Enable specialization in a wider range of applications  Reduce DA cycle for specialized hardware o Understand the cost of abstraction (improve efficiency) Must work with real developers Translational research is how we build credibility We want the hardware to be transparent, not the benefits!

Other Challenges Design Automation o Automatically synthesizing specialized units o There's a wealth of technology to leverage Manufacturing o NREs are large o They need to become almost zero.

Timeline 5 Years o Implementation of N prototype systems o 1000x performance improvement 10 years o First fully-automated design completed 15 years o Customized computing systems become defacto line item for startups and research grants in computation-intensive fields.