Future of Scientific Computing Marvin Theimer Software Architect Windows Server High Performance Computing Group Microsoft Corporation Marvin Theimer.

Slides:



Advertisements
Similar presentations
Tivoli SANergy. SANs are Powerful, but... Most SANs today offer limited value One system, multiple storage devices Multiple systems, isolated zones of.
Advertisements

Introduction to Grid Application On-Boarding Nick Werstiuk
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
ICS 434 Advanced Database Systems
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation.
Copyright 2002: LIIF Technology Architecture Review Database Application Architecture Database Application Architecture Collaborative Workgroup Architecture.
March 18, 2008SSE Meeting 1 Mary Hall Dept. of Computer Science and Information Sciences Institute Multicore Chips and Parallel Programming.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
4/17/2017 5:03 AM © 2004 Microsoft Corporation. All rights reserved.
Chapter 12 Distributed Database Management Systems
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Computing ESSENTIALS     Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Information Technology, the Internet, and You computing ESSENTIALS.
HPC Technical Workshop Björn Tromsdorf Product & Solutions Manager, Microsoft EMEA London
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Administration and management of Windows-based clusters Windows HPC Server 2008 Matej Ciesko HPC Consultant, PM
High-Performance Computing on the Windows Server Platform Marvin Theimer Software Architect Windows Server HPC Group microsoft.com Microsoft.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Information Systems Today: Managing in the Digital World TB4-1 4 Technology Briefing Networking.
1 1 Hybrid Cloud Solutions (Private with Public Burst) Accelerate and Orchestrate Enterprise Applications.
Computer System Architectures Computer System Software
OPC Database.NET. OPC Systems.NET What is OPC Systems.NET? OPC Systems.NET is a suite of.NET and HTML5 products for SCADA, HMI, Data Historian, and live.
Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.
“Here comes the Grid” Mark Hayes Technical Director - Cambridge eScience Centre NIEeS Summer School 2003.
Financial Services Developer Conference Excel Solutions with CCS Antonio Zurlo Technology Specialist HPC Microsoft Corporation.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Company Overview for GDF Suez December 29, Enthought’s Business Enthought provides products and consulting services for scientific software solutions.
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Introduction to Microsoft Windows 2000 Integrated support for client/server and peer-to-peer networks Increased reliability, availability, and scalability.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Introduction to Microsoft Windows 2000 Welcome to Chapter 1 Windows 2000 Server.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.
Application Software System Software.
Developing High Performing Parallel Application Services on Windows Azure Wen-ming Ye Sr. Technical Evangelist Microsoft Corporation.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Commentary on: The Virtual Observatory G. Jogesh Babu Center for Astrostatistics
Creating Simple and Parallel Data Loads With DTS.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Planning Server Deployments Chapter 1. Server Deployment When planning a server deployment for a large enterprise network, the operating system edition.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
3 rd EGEE Conference Athens 18 th April, Stephen McGibbon Senior Director, EMEA Technology Office Chief Technology Officer, Central & Eastern Europe,
Integrating Scientific Tools and Web Portals
Clouds , Grids and Clusters
PLM, Document and Workflow Management
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Grid Services For Virtual Organizations
iSERVOGrid Architecture Working Group Brisbane Australia June
SDM workshop Strawman report History and Progress and Goal.
Cornell Theory Center Cornell Theory Center (CTC) is a high-performance computing and interdisciplinary research center at Cornell.
Building and running HPC apps in Windows Azure
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Presentation transcript:

Future of Scientific Computing Marvin Theimer Software Architect Windows Server High Performance Computing Group Microsoft Corporation Marvin Theimer Software Architect Windows Server High Performance Computing Group Microsoft Corporation

Supercomputing Goes Personal SystemCray Y-MP C916Sun NewEgg.com Architecture 16 x Vector 4GB, Bus 24 x 333MHz Ultra- SPARCII, 24GB, SBus 4 x 2.2GHz x64 4GB, GigE OS UNICOSSolaris 2.5.1Windows Server 2003 SP1 GFlops ~10 Top500 # 1500N/A Price $40,000,000$1,000,000 (40x drop)< $4,000 (250x drop) Customers Government LabsLarge EnterprisesEvery Engineer & Scientist Applications Classified, Climate, Physics Research Manufacturing, Energy, Finance, Telecom Bioinformatics, Materials Sciences, Digital Media

Molecular Biologist’s Workstation High-end workstation with internal cluster nodes 8 Opteron, 20 Gflops workstation/cluster for O($10,000) Turn-key system purchased from a standard OEM Pre-installed set of bioinformatics applications Run interactive workstation applications that offload computationally intensive tasks to attached cluster nodes Run workflows consisting of visualization and analysis programs that process the outputs of simulations running on attached cluster nodes High-end workstation with internal cluster nodes 8 Opteron, 20 Gflops workstation/cluster for O($10,000) Turn-key system purchased from a standard OEM Pre-installed set of bioinformatics applications Run interactive workstation applications that offload computationally intensive tasks to attached cluster nodes Run workflows consisting of visualization and analysis programs that process the outputs of simulations running on attached cluster nodes

The Future: Supercomputing on a Chip IBM Cell processor 256 Gflops today 4 node personal cluster => 1 Tflops 32 node personal cluster => Top100 Intel many-core chips “100’s of cores on a chip in 2015” (Justin Rattner, Intel) “4 cores”/Tflop => 25 Tflops/chip IBM Cell processor 256 Gflops today 4 node personal cluster => 1 Tflops 32 node personal cluster => Top100 Intel many-core chips “100’s of cores on a chip in 2015” (Justin Rattner, Intel) “4 cores”/Tflop => 25 Tflops/chip

The Continuing Trend Towards Decentralized, Dedicated Resources Grids of personal & departmental clusters Personal workstations & departmental servers Minicomputers Mainframes

The Evolving Nature of HPC ScenarioFocus Departmental Cluster Conventional scenario IT owns large clusters due to complexity and allocates resources on per job basis Users submit batch jobs via scripts In-house and ISV apps, many based on MPI Scheduling multiple users’ applications onto scarce compute cycles Cluster systems administration Personal/Workgroup Cluster Emerging scenario Clusters are pre-packaged OEM appliances, purchased and managed by end-users Desktop HPC applications transparently and interactively make use of cluster resources Desktop development tools integration Interactive applications Compute grids: distributed systems management HPC Application Integration Future scenario Multiple simulations and data sources integrated into a seamless application workflow Network topology and latency awareness for optimal distribution of computation Structured data storage with rich meta-data Applications and data potentially span organizational boundaries Data-centric, “whole- system” workflows Data grids: distributed data management Interactive Computation and Visualization Manual, batch execution IT Mgr SQL

Exploding Data Sizes Experimental data: TBs  PBs Modeling data: Today: 10’s to 100’s of GB per simulation is the common case Applications mostly run in isolation Tomorrow: 10’s to 100’s of TBs, all of it to be archived Whole-system modeling and multi-application workflows Experimental data: TBs  PBs Modeling data: Today: 10’s to 100’s of GB per simulation is the common case Applications mostly run in isolation Tomorrow: 10’s to 100’s of TBs, all of it to be archived Whole-system modeling and multi-application workflows

How Do You Move A Terabyte? * 14 minutes ,920, OC hours1000Gbps 1 day Mpbs 14 hours ,000155OC3 2 days2, ,00043T3 2 months2, ,2001.5T1 5 months Home DSL 6 years3,0861, Home phone Time/TB $/TB Sent $/Mbps Rent $/month Speed Mbps Context 24 hours50100FedEx *Material courtesy of Jim Gray LAN Setting 13 minutes Gpbs

Anticipated HPC Grid Topology Islands of high connectivity Simulations done on personal & workgroup clusters Data stored in data warehouses Data analysis best done inside the data warehouse Wide-area data sharing/replication via FedEx? Data warehouse Workgroup cluster Personal cluster

Data Analysis and Mining Traditional approach: Keep data in flat files Write C or Perl programs to compute specific analysis queries Problems with this approach: Imposes significant development times Scientists must reinvent DB indexing and query technologies Have to copy the data from the file system to the compute cluster for every query Results from the astronomy community: Relational databases can yield speed-ups of one to two orders of magnitude SQL + application/domain-specific stored procedures greatly simplify creation of analysis queries Traditional approach: Keep data in flat files Write C or Perl programs to compute specific analysis queries Problems with this approach: Imposes significant development times Scientists must reinvent DB indexing and query technologies Have to copy the data from the file system to the compute cluster for every query Results from the astronomy community: Relational databases can yield speed-ups of one to two orders of magnitude SQL + application/domain-specific stored procedures greatly simplify creation of analysis queries

Is That the End of the Story? Relational Data warehouse Workgroup cluster Personal cluster

Too Much Complexity Relational Data warehouse Workgroup cluster Personal cluster Distributed systems issues: Security System management Directory services Storage management Digital experimentation: Experiment management Provenance (data & workflows) Version management (data & workflows) Parallel application development: Chip-level, node-level, cluster-level, LAN grid-level, WAN grid-level parallelism OpenMP, MPI, HPF, Global Arrays, … Component architectures Performance configuration & tuning Debugging/profiling/tracing/analysis Domain science 2004 NAS supercomputing report: O(35) new computational scientists graduated per year

Separating the Domain Scientist from the Computer Scientist Computer scientist Computational scientist Domain scientist Parallel domain application development Parallel/distributed file systems, relational data warehouses, dynamic systems management, Web Services & HPC grids (Interactive) scientific workflow, integrated with collaboration-enhanced office automation tools Concrete concurrency Abstract concurrency Concrete workflow Abstract workflow Write scientific paper (Word) Record experiment data (Excel) Individual experiment run (Workflow orchestrator) Analyze data (SQL-Server) Share paper with co-authors (Sharepoint) Collaborate with co-authors (NetMeeting) Example:

Scientific Information Worker: Past and Future Past Buy lab equipment Keep lab notebook Run experiments by hand Assemble & analyze data (using stat pkg) Collaborate by phone/ ; write up results with Latex Metaphor: Physical experimentation “Do it yourself” Lots of disparate systems/pieces Past Buy lab equipment Keep lab notebook Run experiments by hand Assemble & analyze data (using stat pkg) Collaborate by phone/ ; write up results with Latex Metaphor: Physical experimentation “Do it yourself” Lots of disparate systems/pieces Future Buy hardware & software Automatic provenance Workflow with 3 rd party domain packages Excel & Access/Sql-Server Office tool suite with collaboration support Metaphor: Digital experimentation Turn-key desktop supercomputer Single integrated system