Grids and Computational Science

Grids and Computational Science
ERDC Grid Tutorial August Geoffrey Fox IPCRES Laboratory for Community Grids Computer Science, Informatics, Physics Indiana University Bloomington IN 11/13/201811/13/2018 erdccpsgrid01

Abstract of PET and Computational Science Presentation
We describe HPCC and Grid trends and how they could be folded into a PET computational environment A Peer to Peer Grid of Services supporting science and hence DoD swordfighter We describe what works (MPI), what sort of works (Objects), what is known (parallel algorithms), what is active (datamining, visualization), what failed (good parallel environments), what is inevitable (petaflops), what is simple but important (XML), what is getting more complicated (applications) and the future (Web and Grids) 11/13/201811/13/2018 erdccpsgrid01

erdccpsgrid01 gcf@indiana.edu
Trends of Importance Resources of increasing performance Computers, storage, sensors, networks Applications of increasing sophistication Size, multi-scales, multi-disciplines New algorithms and mathematical techniques Computer science Compilers, Parallelism, Objects, Components Grid and Internet Concepts and Technologies Enabling new applications 11/13/201811/13/2018 erdccpsgrid01

Projected Top 500 Until Year 2009
First, Tenth, 100th, 500th, SUM of all 500 Projected in Time Earth Simulator from Japan 11/13/201811/13/2018 erdccpsgrid01

Top 500 June 2001 11/13/201811/13/2018 erdccpsgrid01

Top 500 by Vendor systems June 2001 11/13/201811/13/2018 erdccpsgrid01

Top 500 by Vendor Total Power
June 2001 11/13/201811/13/2018 erdccpsgrid01

PACI 13.6 TF Linux TeraGrid Caltech Argonne SDSC NCSA
574p IA-32 Chiba City 256p HP X-Class 32 32 Caltech 32 Nodes 0.5 TF 0.4 TB Memory 86 TB disk Argonne 64 Nodes 1 TF 0.25 TB Memory 25 TB disk 32 32 128p Origin 24 32 128p HP V2500 32 HR Display & VR Facilities 24 8 8 5 92p IA-32 5 HPSS HPSS 24 4 Extreme Black Diamond OC-12 Chicago & LA DTF Core Switch/Routers Cisco 65xx Catalyst Switch (256 Gb/s Crossbar) ESnet HSCC MREN/Abilene Starlight OC-48 Calren OC-48 OC-12 NTON OC-12 ATM Juniper M160 GbE SDSC 256 Nodes 4.1 TF, 2 TB Memory 225 TB disk NCSA 500 Nodes 8 TF, 4 TB Memory 240 TB disk Juniper M40 Juniper M40 vBNS Abilene Calren ESnet OC-12 OC-12 OC-3 vBNS Abilene MREN OC-12 2 2 OC-12 OC-3 Myrinet Clos Spine 8 4 HPSS 8 UniTree 2 Sun Starcat 4 Myrinet Clos Spine = 32x 1GbE 1024p IA-32 320p IA-64 1176p IBM SP Blue Horizon 16 = 64x Myrinet 14 4 = 32x Myrinet 1500p Origin Sun E10K = 32x FibreChannel = 8x FibreChannel 10 GbE 32 quad-processor McKinley Servers 4GF, 8GB memory/server) 32 quad-processor McKinley Servers 4GF, 12GB memory/server) Fibre Channel Switch 16 quad-processor McKinley Servers 4GF, 8GB memory/server) Cisco 6509 Catalyst Switch/Router IA-32 nodes 11/13/201811/13/2018 erdccpsgrid01

Caltech Hypercube JPL Mark II 1985 Chuck Seitz 1983 Hypercube as a cube 11/13/201811/13/2018 erdccpsgrid01

From the New York Times 1984 One of today's fastest computers is the Cray 1, which can do 20 million to 80 million operations a second. But at $5 million, they are expensive and few scientists have the resources to tie one up for days or weeks to solve a problem. ``Poor old Cray and Cyber (another super computer) don't have much of a chance of getting any significant increase in speed,'' Fox said. ``Our ultimate machines are expected to be at least 1,000 times faster than the current fastest computers.'' (80 gigaflops predicted. Livermore just installed gflops) But not everyone in the field is as impressed with Caltech's Cosmic Cube as its inventors are. The machine is nothing more nor less than 64 standard, off-the-shelf microprocessors wired together, not much different than the innards of 64 IBM personal computers working as a unit. The Caltech Hypercube was “just a cluster of PC’s”! 11/13/201811/13/2018 erdccpsgrid01

From the New York Times 1984 ``We are using the same technology used in PCs (personal computers) and Pacmans,'' Seitz said. The technology is an 8086 microprocessor capable of doing 1/20th of a million operations a second with 1/8th of a megabyte of primary storage. Sixty-four of them together will do 3 million operations a second with 8 megabytes of storage. Computer scientists have known how to make such a computer for years but have thought it too pedestrian to bother with. ``It could have been done many years ago,'' said Jack B. Dennis, a computer scientist at the Massachusetts Institute of Technology who is working on a more radical and ambitious approach to parallel processing than Seitz and Fox. ``There's nothing particularly difficult about putting together 64 of these processors,'' he said. ``But many people don't see that sort of machine as on the path to a profitable result.'‘ So clusters are a trivial architecture (1984) …… So architecture is unchanged ; unfortunately after 20 years research, programming model is also the same (message passing) 11/13/201811/13/2018 erdccpsgrid01

Technology Trends and Principles
All performance and capability measures of infrastructure continue to improve Gilder’s law says that network bandwidth increases 3 times faster than CPU Performance (Moore’s Law) The Telecosm eclipses the Microcosm …. George Gilder Telecosm : How Infinite Bandwidth Will Revolutionize Our World (September 2000, Free Press; ISBN: , #146(3883) in Amazon Sales Jan (July )) 11/13/201811/13/2018 erdccpsgrid01

Small Devices Increasing in Importance
There is growing interest in wireless portable displays in the confluence of cell phone and personal digital assistant markets By 2005, 60 million internet ready cell phones sold each year 65% of all Broadband Internet accesses via non desktop appliances CM5 11/13/201811/13/2018 erdccpsgrid01

The HPCC Track The 1990 HPCC 10 year initiative was largely aimed at enabling large scale simulations for a broad range of computational science and engineering problems It was in many ways a success and we have methods and machines that can (begin to) tackle most 3D simulations ASCI simulations particularly impressive DoE still putting substantial resources into basic software and algorithms from adaptive meshes to PDE solver libraries Machines are still increasing in performance exponentially and should achieve petaflops in next 7-10 years Earthquake community needs to harness these capabilities Japan’s Earth Simulator activity (GEOFEM) major effort 11/13/201811/13/2018 erdccpsgrid01

Some HPCC Difficulties
An Intellectual failure: we never produced a better programming model than message passing HPCC code is hard work “High point” of ASCI software is “Grid FTP” An institutional problem: we do not have a way to produce complex sustainable software for a niche (1%) market like HPCC. POOMA support just disappeared one day (foundation of first proposal GEM wrote) One must adopt commodity standards and produce “small” sustainable modules. Note distributed memory becoming dominant again with bizarre clustered SMP architecture – not clear that “wise” to exploit advantages of shared memory architectures 11/13/201811/13/2018 erdccpsgrid01

My HPCC Advice to HPCMO KISS: Keep it Simple and Sustainable Use MPI and openMP if needed for performance on shared memory nodes Adaptive Meshes Load Balancing PDE Solvers including fast multipoles Particle dynamics Other areas such as datamining, visualization and data assimilation quite advanced but still significant research } Are well understood to get high performance parallel simulations Use broad community expertise 11/13/201811/13/2018 erdccpsgrid01

Use of Object Technologies
The claimed commercial success in using Object and component technology has not been a clear success in HPCC Object technologies do not naturally support either high performance or parallelism C++ can be high performance but CORBA and Java are not There is no agreed HPCC component architecture to produce more modern libraries (DoE has very large CCA – Common Component Architecture – effort) Fortran will continue to decline in importance and interest – the community should prefer not to use it It’s use will not attract the best students 11/13/201811/13/2018 erdccpsgrid01

Application Structure
New applications are typically multi-scale and multi-disciplinary i.e. a given simulation is made of multiple components with either different time/length scales and/or multiple authors from possibly multiple fields I am not aware of a systematic “Computational renormalization group” – a methodology that links different scales together However composition of modules is an area where technology of growing sophistication is becoming available Needed commercially to integrate corporate functions CCA controversial “small grain size”; Gateway example of clearly successful large grain size integration Integration of data and simulation is one example of composition which is “understood” 11/13/201811/13/2018 erdccpsgrid01

Object Size & Distributed/Parallel Simulations
All interesting systems consist of linked entities Particles, grid points, people or groups thereof Linkage translates into message passing Cars on a freeway Phone calls Forces between particles Amount of communication tends to be proportional to surface area of entity whereas simulation time proportional to volume So communication/computation is surface/volume and decreases in importance as entity size increases In parallel computing, communication synchronized; in distributed computing “self contained objects” (whole programs) which can be scheduled asynchronously 11/13/201811/13/2018 erdccpsgrid01

Complex System simulations
Networks of particles and (partial differential equation) grid points interact “instantaneously” and simulations reduce to iterating calculate/communicate phases: “calculate at given time or iteration number next positions/values” (massively parallel) and then update Scaling parallelism guaranteed Complex (phenomenological) systems are made of agents evolving with irregular time steps – event driven simulations do not parallelize This lack of global time synchronization in “complex systems” stops natural parallelism in classic HPCC approaches 11/13/201811/13/2018 erdccpsgrid01

Los Alamos Delphi Initiative
Aims at large complex systems simulation of global and national scope in their size and significance Demonstrates success of new methods (SDS – Sequential dynamical Systems) that parallelize well and outperform previous approaches General applicability (e.g. to earthquakes) not clear Could be relevant to cellular automata like models of earthquakes National traffic systems Epidemics Forest Fires Cellular and other communication networks e.g. the Internet Electrical, Gas, Water .. Grids Business processes Battles 11/13/201811/13/2018 erdccpsgrid01

Some Problem Classes Hardest: smallish objects with irregular time synchronization (Delphi) Classic HPCC: synchronized objects with regular time structure (communication overhead decreases as problem size increases) Internet Technology and Commercial Application Integration: Large objects with modest communications and without difficult time synchronization Compose as independent (pipelined) services Includes some approaches to multi-disciplinary simulation linkage 11/13/201811/13/2018 erdccpsgrid01

What is a Grid or Web Service?
There are generic Grid system services: security, collaboration, persistent storage, universal access An Application Service is a capability used either by another service or by a user It has input and output ports – data is from sensors or other services Portals are the user (web browser) interfaces to Grid services Gateway makes running jobs on remote computers a Grid Service It is invoked by other services e.g. the CFD service which includes Meshing Service, Human or other advice on code, Simulation and Visualization services 11/13/201811/13/2018 erdccpsgrid01

Sensors/Image Processing Service
Consider NASA Space Operations (CSOC) as a Grid Service Spacecraft management (with a web front end) has planning, real-time control and decision making services Each tracking station is a service which is a special case of the sensor service All sensors have same top level structure as a Grid Service but are “specialized” (sub-classed) to each field Image Processing is a pipeline of filters – which can be grouped into different services These link to other filters, sensors, storage devices Data storage is an important system service Major services built hierarchically from “basic” services 11/13/201811/13/2018 erdccpsgrid01

Distributed Sensor Service erdccpsgrid01 gcf@indiana.edu
Sensor Grid Service Distributed Sensor Service out port universal sensor access people/computers in ports 11/13/201811/13/2018 erdccpsgrid01

Is a Grid Service a New Idea?
Not really for (in case of sensor) it is like the concept of a control computer to handle data from some device BUT all control computers are “distributed objects”, web servers and all non binary data is defined in XML. There is a universal way of discovering and defining services with universal input and output streams which can be defined in multiple protocols (IIOP(CORBA), RMI(Java), SOAP(Web)) Further we have in portal a universal user interface Further we have linked concepts of libraries (subroutine calls) and processes (linked by piping files) 11/13/201811/13/2018 erdccpsgrid01

Integration of Grid Services
Multidisciplinary Control Integration of Grid Services Image Processing Server Parallel DB Proxy Database Sensor Control Grid Gateway Supporting Seamless Interface Data Mining Server Origin 2000 Proxy MPP NetSolve Linear Alg. Server Matrix Solver Agent-based Choice of Compute Engine IBM SP Proxy Object Grid Programming Environment MPP Classic HPCC Resources 11/13/201811/13/2018 erdccpsgrid01

Overall Grid/Web Architecture
General Vision? NCSA Vision Science Portals & Workbenches Twenty-First Century Applications Computational Services P e r f o m a n c Networking, Devices and Systems Grid Services (resource independent) Grid Fabric (resource dependent) Access Services & Technology Access Grid Community Portals Next Generation Web Education Business Services Commerce C o n v i 11/13/201811/13/2018 erdccpsgrid01

The Application Service Model
As bandwidth of communication (between) services increases one can support smaller services A service “is a component” and is a replacement for a library in case where performance allows Services are a sustainable model of software development – each service has documented capability with standards compliant interfaces XML defines interfaces at several levels WSDL at Grid level and XSIL or equivalent for scientific data format A service can be written in Perl, Python, Java Servlet, Enterprise Javabean, CORBA (C++ or Fortran) ……. Communication protocol can be RMI (Java), IIOP (CORBA) or SOAP (HTTP, XML) …… 11/13/201811/13/2018 erdccpsgrid01

Services support Communities
Grid Communities (HPCMO, PET, Vicksburg, Environmental Science, High School Classes) are groups of communicating individuals sharing resources implemented as Grid Services Access Grid from Argonne/NCSA is best Audio/Video conferencing technology Peer to Peer networking describes a set of technologies supporting community building with an emphasis on less structured groups than classic “users of a supercomputer” Peer to peer Grids combine the technologies and support “small worlds” – optimized networks with short links between each community member 11/13/201811/13/2018 erdccpsgrid01

Classic Grid Architecture
Resources Database Database Neos Composition Middle Tier Brokers Service Providers Netsolve Security Portal Portal Typically separate Clients Servers Resources Clients Users and Devices 11/13/201811/13/2018 erdccpsgrid01

Peer to Peer Network Peers User Resource Service Routing Peers are Jacks of all Trades linked to “all” peers in community Typically Integrated Clients Servers and Resources 11/13/201811/13/2018 erdccpsgrid01

Peer to Peer Grid User Resource Service Routing Dynamic Message or Event Routing from Peers or Servers Services GMS Routing 11/13/201811/13/2018 erdccpsgrid01

HPCMO HPCC and Grid Strategy I
Decide what services are well enough understood and useful enough to be encapsulated as application services Parallel FEM Solvers Visualization Parallel Particle Dynamics Access to Sensor Data or GIS Data Image Processing Filters Make service as small as possible – smaller is simpler and more sustainable but with higher communication needs Establish teams to design and build services Use a framework offering needed Grid System services Build HCMO electronic community with collaboration tools, resources and HPCMO wide networking 11/13/201811/13/2018 erdccpsgrid01

HPCMO HPCC and Grid Strategy II
Some capabilities – such as fast multipole or adaptive mesh package – should be built as classic libraries or templates Other services – such as datamining or support of multi-scale simulations – need research using a toolkit approach if one can design a general structure Need “hosts” for major services – access and storage of sensor data Need funds to build and sustain “infrastructure” and research services Use electronic community tools to enhance HCMO Collaboration 11/13/201811/13/2018 erdccpsgrid01

Grids and Computational Science

Similar presentations

Presentation on theme: "Grids and Computational Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grids and Computational Science

Similar presentations

Presentation on theme: "Grids and Computational Science"— Presentation transcript:

Similar presentations

About project

Feedback