Dagstuhl Seminar on Dark Silicon: From Embedded to HPC Feb 3, 2016

Slides:



Advertisements
Similar presentations
Cloud Computing: Theirs, Mine and Ours Belinda G. Watkins, VP EIS - Network Computing FedEx Services March 11, 2011.
Advertisements

Reconfigurable Network Topologies at Rack Scale
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Software Cluster Improve Collaboration and Community Engagement Work with diverse communities that contribute to the sustainability of scientific software.
CCA Common Component Architecture Manoj Krishnan Pacific Northwest National Laboratory MCMD Programming and Implementation Issues.
Cluster Reliability Project ISIS Vanderbilt University.
Extreme scale parallel and distributed systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Challenges towards Elastic Power Management in Internet Data Center.
Chapter 1 Introduction. Objectives To explain the definition of computer architecture To discuss the history of computers To describe the von-neumann.
Supercomputing Cross-Platform Performance Prediction Using Partial Execution Leo T. Yang Xiaosong Ma* Frank Mueller Department of Computer Science.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Implications of Emerging Hardware Tom Wenisch (University of Michigan) Nikos Hardavellas (Northwestern University) Sangyeun Cho (University of Pittsburgh)
CML SSDM: Smart Stack Data Management for Software Managed Multicores Jing Lu Ke Bai, and Aviral Shrivastava Compiler Microarchitecture Lab Arizona State.
Large Area Surveys - I Large area surveys can answer fundamental questions about the distribution of gas in galaxy clusters, how gas cycles in and out.
Present by Sheng Cai Coordinating Power Control and Performance Management for Virtualized Server Clusters.
Partitioned Multistack Evironments for Exascale Systems Jack Lange Assistant Professor University of Pittsburgh.
1 Thermal Management of Datacenter Qinghui Tang. 2 Preliminaries What is data center What is thermal management Why does Intel Care Why Computer Science.
Background Computer System Architectures Computer System Software.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX ) funded by the French program.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Matthew Locke November 2007 A Linux Power Management Architecture.
GUIDO VOLPI – UNIVERSITY DI PISA FTK-IAPP Mid-Term Review 07/10/ Brussels.
 Programming methodology: ◦ is a process of developing programs that involves strategically dividing important tasks into functions to be utilized by.
What we mean by Big Data and Advanced Analytics
Performance Assurance for Large Scale Big Data Systems
ABOUT THE SEMINAR Project Management may be defined as “discipline of initiating, planning, executing and controlling a set of activities to achieve specific.
Review of the WLCG experiments compute plans
Organizations Are Embracing New Opportunities
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
NSAP Technology Challenges and Frontiers
The “Understanding Performance!” team in CERN IT
Outline Benchmarking in ATLAS Performance scaling
Job Scheduling in a Grid Computing Environment
Parallel Computing in the Multicore Era
What is Dark Silicon in Embedded?
Software Requirements
Grid Computing Colton Lewis.
Scaling for the Future Katherine Yelick U.C. Berkeley, EECS
Many-core Software Development Platforms
High Performance Computing University of Southern California
Decomposition.
Department of Computer Science University of California, Santa Barbara
Haishan Zhu, Mattan Erez
Power is Leading Design Constraint
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Resource-Efficient and QoS-Aware Cluster Management
Parallel Computing in the Multicore Era
CS385T Software Engineering Dr.Doaa Sami
Architecture & System Performance
Performance Evaluation of Computer Networks
Building and running HPC apps in Windows Azure
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
Performance Evaluation of Computer Networks
Overview of Workflows: Why Use Them?
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Mihai Neacşu, BSc. Prof.dr.eng. Alexandru Iosup Ir. Laurens Versluis
Presented by Jacob Feldman (732)
MagnaData: Scheduling Complex Workflows with Non-Functional Requirements in Datacenters Laurens Versluis Massivizing Computer research.
Modern data architecture at scale in the cloud : Best practices of Serverless, lambda and microservices architecture Prakriteswar Santikary, PhD Vice President.
Department of Computer Science University of California, Santa Barbara
We work with companies advancing artificial intelligence
Kajornsak Piyoungkorn,
Welcome to HTCondor Week #17 (year 32 of our project)
SOFTWARE ENGINEERING CS-5337: Introduction
Presentation transcript:

Dagstuhl Seminar on Dark Silicon: From Embedded to HPC Feb 3, 2016 HPC Group Dagstuhl Seminar on Dark Silicon: From Embedded to HPC Feb 3, 2016

What is Dark Silicon in HPC? Shortage of power, strict constraints Overprovisioning of hardware resources with respect to power Chip-level: we already have dark silicon Job-level, cluster-level: workload dependent, dim silicon

Main challenges arising from DS? Focus in HPC is performance, we care about FLOPS, time to solution, accuracy of science. Different than the embedded community Stay in a power band, fluctuations of a few megawatts may arise, power stability Performance variability, reproducibility System throughput and power utilization: how do we manage resources?

Challenges… Performance Variability: Embrace it and build runtime systems to address this At the cost of performance? Use it to our advantage: need more control? Minimize in hardware? How do we tolerate variability? Current approaches for programming the machines and performance tuning will no longer work. We need to redefine the metrics on how we measure things. We assume homogeneity at the moment.

Fundamental Techniques Monitoring at large scale (TUM, LRZ, LLNL, ETH, Dresden, JSC …) Granularity, what to gather, how to use it Resource Management, RMAP/Flux (LLNL, TUM, ETH, UniBo…) Runtime Systems GEO PM (Intel), Conductor (LLNL), SDSC, PowSched (UOregon) Tracing, Visualization (LLNL, JSC, Dresden, Oregon,SDSC, …) Deeper connection with workflows and applications. Also, refactoring of applications (work stealing, oversubscription, auto tuning)

Opportunities and Impacting the Embedded community? Measurement and Analysis Tools Learning from the operating systems community about hardware management Real-time boundaries, can we introduce them in HPC, have a QOS definition? Programming models tailored to code (right techniques for programming) Portability, development cycle Hardware-software co-design Redesign the stack: hardware, job, cluster, site

Going Forward: Big Data and HPC HPC Applications are handling more data, which creates a new set of challenges… Data-intensive computing, data-aware computing Another Dagstuhl? 