Digital Science Center Overview

Slides:



Advertisements
Similar presentations
SALSA HPC Group School of Informatics and Computing Indiana University.
Advertisements

Cosmic Issues and Analysis of External Comments on FutureGrid TG11 Salt Lake City July Geoffrey Fox
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
FutureGrid: A Distributed High Performance Test-bed for Clouds Andrew J. Younge Indiana University
Science Clouds and CFD NIA CFD Conference: Future Directions in CFD Research, A Modeling and Simulation Conference August.
FutureGrid SOIC Lightning Talk February Geoffrey Fox
Science of Cloud Computing Panel Cloud2011 Washington DC July Geoffrey Fox
Experimenting with FutureGrid CloudCom 2010 Conference Indianapolis December Geoffrey Fox
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
FutureGrid Overview Geoffrey Fox
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
ACES and Clouds ACES Meeting Maui October Geoffrey Fox Informatics, Computing and Physics Indiana.
Biomedical Cloud Computing iDASH Symposium San Diego CA May Geoffrey Fox
Science Applications on Clouds June Cloud and Autonomic Computing Center Spring 2012 Workshop Cloud Computing: from.
FutureGrid Overview Geoffrey Fox
FutureGrid Overview Geoffrey Fox
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
SALSA HPC Group School of Informatics and Computing Indiana University.
FutureGrid Overview Geoffrey Fox
Some remarks on Use of Clouds to Support Long Tail of Science July XSEDE 2012 Chicago ILL July 2012 Geoffrey Fox.
FutureGrid SOIC Lightning Talk February Geoffrey Fox
FutureGrid Cyberinfrastructure for Computational Research.
FutureGrid Computing Testbed as a Service Overview July Geoffrey Fox for FutureGrid Team
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
FutureGrid Computing Testbed as a Service NSF Presentation NSF April Geoffrey Fox for FutureGrid Team
FutureGrid Overview Geoffrey Fox
Tutorial Presented at TG2011 Geoffrey Fox, Gregor von Laszewski, Renato Figueiredo, Kate Keahey, Andrew Younge Contact:
FutureGrid BOF Overview TG 11 Salt Lake City July Geoffrey Fox
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
Programming Models for Technical Computing on Clouds and Supercomputers (aka HPC) May Cloud Futures 2012 May 7–8,
Scientific Computing Supported by Clouds, Grids and HPC(Exascale) Systems June HPC 2012 Cetraro, Italy Geoffrey Fox.
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Science Applications on Clouds and FutureGrid June Cloud and Autonomic Computing Center Spring 2012 Workshop Cloud.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Big Data Workshop Summary Virtual School for Computational Science and Engineering July Geoffrey Fox
Private Public FG Network NID: Network Impairment Device
Volunteer Computing for Science Gateways
Status and Challenges: January 2017
Recap: introduction to e-science
MapReduce and Data Intensive Applications XSEDE’12 BOF Session
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
FutureGrid Computing Testbed as a Service
Science Clouds and Their Innovative Applications
FutureGrid Overview for VSCSE Summer School on Science Clouds
ICETE 2012 Joint Conference on e-Business and Telecommunications
FutureGrid: a Grid Testbed
Science Clouds and Their Use in Data Intensive Applications
Large Scale Data Analytics on Clouds
Biology MDS and Clustering Results
FutureGrid Overview June HPC 2012 Cetraro, Italy Geoffrey Fox
1Indiana University, 2now Rutgers University
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Science Clouds and Their Innovative Applications
Cyberinfrastructure for eScience and eBusiness from Clouds to Exascale
Clouds from FutureGrid’s Perspective
Gregor von Laszewski Indiana University
FutureGrid Overview July XSEDE UAB Meeting Geoffrey Fox
Department of Intelligent Systems Engineering
Using and Building Infrastructure Clouds for Science
PolarGrid and FutureGrid
Panel on Research Challenges in Big Data
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Convergence of Big Data and Extreme Computing
Presentation transcript:

Digital Science Center Overview July 8 2012 Beihang University, Beijing Geoffrey Fox gcf@indiana.edu http://www.infomall.org https://portal.futuregrid.org Director, Digital Science Center, Pervasive Technology Institute Associate Dean for Research and Graduate Studies,  School of Informatics and Computing Indiana University Bloomington

FutureGrid key Concepts I FutureGrid is an international testbed modeled on Grid5000 June 27 2012: 225 Projects, 920 users Supporting international Computer Science and Computational Science research in cloud, grid and parallel computing (HPC) Industry and Academia The FutureGrid testbed provides to its users: A flexible development and testing platform for middleware and application users looking at interoperability, functionality, performance or evaluation Each use of FutureGrid is an experiment that is reproducible A rich education and teaching platform for advanced cyberinfrastructure (computer science) classes

FutureGrid key Concepts II Rather than loading images onto VM’s, FutureGrid supports Cloud, Grid and Parallel computing environments by provisioning software as needed onto “bare-metal” using Moab/xCAT (need to generalize) Image library for MPI, OpenMP, MapReduce (Hadoop, (Dryad), Twister), gLite, Unicore, Globus, Xen, ScaleMP (distributed Shared Memory), Nimbus, Eucalyptus, OpenNebula, KVM, Windows ….. Either statically or dynamically Growth comes from users depositing novel images in library FutureGrid has ~4400 distributed cores with a dedicated network and a Spirent XGEM network fault and delay generator Image1 Image2 ImageN … Load Choose Run

Secondary Storage (TB) Compute Hardware Name System type # CPUs # Cores TFLOPS Total RAM (GB) Secondary Storage (TB) Site Status india IBM iDataPlex 256 1024 11 3072 180 IU Operational alamo Dell PowerEdge 192 768 8 1152 30 TACC hotel 168 672 7 2016 120 UC sierra 2688 96 SDSC xray Cray XT5m 6 1344 foxtrot 64 2 24 UF Bravo Large Disk & memory 32 128 1.5 3072 (192GB per node) 192 (12 TB per Server) Delta Large Disk & memory With Tesla GPU’s 32 CPU 32 GPU’s 192+ 14336 GPU ? 9 1536 (192GB per node) TOTAL Cores 4384

5 Use Types for FutureGrid 225 approved projects (~920 users) June 27 2012 USA, China, India, Pakistan, lots of European countries Industry, Government, Academia Training Education and Outreach (8%) Semester and short events; promising for small universities Interoperability test-beds (3%) Grids and Clouds; Standards; from Open Grid Forum OGF Domain Science applications (31%) Life science highlighted (18%), Non Life Science (13%) Computer science (47%) Largest current category Computer Systems Evaluation (27%) XSEDE (TIS, TAS), OSG, EGI Clouds are meant to need less support than other models; FutureGrid needs more user support …….

https://portal.futuregrid.org/projects

Distribution of FutureGrid Technologies and Areas 220 Projects

GPU’s in Cloud: Xen PCI Passthrough Pass through the PCI-E GPU device to DomU Use Nvidia Tesla CUDA programming model Work at ISI East (USC) Intel VT-d or AMD IOMMU extensions Xen pci-back FutureGrid “delta” has16 192GB memory nodes each with 2 GPU’s (Tesla C2075 - 6GB) CUDA CUDA CUDA GPU1 GPU2 GPU3 http://futuregrid.org

FutureGrid Tutorials Cloud Provisioning Platforms Using Nimbus on FutureGrid [novice] Nimbus One-click Cluster Guide Using OpenStack Nova on FutureGrid Using Eucalyptus on FutureGrid [novice] Connecting private network VMs across Nimbus clusters using ViNe [novice] Using the Grid Appliance to run FutureGrid Cloud Clients [novice] Cloud Run-time Platforms Running Hadoop as a batch job using MyHadoop [novice] Running SalsaHadoop (one-click Hadoop) on HPC environment [beginner] Running Twister on HPC environment Running SalsaHadoop on Eucalyptus Running FG-Twister on Eucalyptus Running One-click Hadoop WordCount on Eucalyptus [beginner] Running One-click Twister K-means on Eucalyptus Image Management and Rain Using Image Management and Rain [novice] Storage Using HPSS from FutureGrid [novice] Educational Grid Virtual Appliances Running a Grid Appliance on your desktop Running a Grid Appliance on FutureGrid Running an OpenStack virtual appliance on FutureGrid Running Condor tasks on the Grid Appliance Running MPI tasks on the Grid Appliance Running Hadoop tasks on the Grid Appliance Deploying virtual private Grid Appliance clusters using Nimbus Building an educational appliance from Ubuntu 10.04 Customizing and registering Grid Appliance images using Eucalyptus High Performance Computing Basic High Performance Computing Running Hadoop as a batch job using MyHadoop Performance Analysis with Vampir Instrumentation and tracing with VampirTrace Experiment Management Running interactive experiments [novice] Running workflow experiments using Pegasus Pegasus 4.0 on FutureGrid Walkthrough [novice] Pegasus 4.0 on FutureGrid Tutorial [intermediary] Pegasus 4.0 on FutureGrid Virtual Cluster [advanced]

aaS and Roles/Appliances If you package a capability X as XaaS, it runs on a separate VM and you interact with messages SQLaaS offers databases via messages similar to old JDBC model If you build a role or appliance with X, then X built into VM and you just need to add your own code and run Generalized worker role builds in I/O and scheduling FG will take capabilities – MPI, MapReduce, Workflow .. – and offer as roles or aaS (or both) Perhaps workflow has a controller aaS with graphical design tool while runtime packaged in a role? Package parallelism as a virtual cluster

Selected List of Services Offered FutureGrid User PaaS Hadoop Twister Hive Hbase IaaS Bare Metal Nimbus Eucalyptus OpenStack OpenNebula ViNE Grid Genesis II Unicore SAGA Globus HPCC MPI OpenMP ScaleMP CUDA XSEDE Software Others FG RAIN Portal Inca Ganglia Experiment Management (Pegasus) 11/9/2018

Science Computing Environments Large Scale Supercomputers – Multicore nodes linked by high performance low latency network Increasingly with GPU enhancement Suitable for highly parallel simulations High Throughput Systems such as European Grid Initiative EGI or Open Science Grid OSG typically aimed at pleasingly parallel jobs Can use “cycle stealing” Classic example is LHC data analysis Grids federate resources as in EGI/OSG or enable convenient access to multiple backend systems including supercomputers Portals make access convenient and Workflow integrates multiple processes into a single job Specialized visualization, shared memory parallelization etc. machines

Clouds and Grids/HPC Synchronization/communication Performance Grids > Clouds > Classic HPC Systems Clouds naturally execute effectively Grid workloads but are less clear for closely coupled HPC applications Service Oriented Architectures portals and workflow appear to work similarly in both grids and clouds May be for immediate future, science supported by a mixture of Clouds – some practical differences between private and public clouds – size and software High Throughput Systems (moving to clouds as convenient) Grids for distributed data and access Supercomputers (“MPI Engines”) going to exascale

What Applications work in Clouds Pleasingly parallel applications of all sorts with roughly independent data or spawning independent simulations Long tail of science and integration of distributed sensors Commercial and Science Data analytics that can use MapReduce (some of such apps) or its iterative variants (most other data analytics apps) Which science applications are using clouds? Many demonstrations –Conferences, OOI, HEP …. Venus-C (Azure in Europe): 27 applications not using Scheduler, Workflow or MapReduce (except roll your own) 50% of applications on FutureGrid are from Life Science but there is more computer science than total applications Locally Lilly corporation is major commercial cloud user (for drug discovery) but Biology department is not

4 Forms of MapReduce (a) Map Only (d) Loosely Synchronous   (a) Map Only (d) Loosely Synchronous (c) Iterative MapReduce (b) Classic MapReduce Input map reduce Iterations Output Pij BLAST Analysis Parametric sweep Pleasingly Parallel High Energy Physics (HEP) Histograms Distributed search Classic MPI PDE Solvers and particle dynamics Domain of MapReduce and Iterative Extensions Science Clouds MPI Exascale Expectation maximization Clustering e.g. Kmeans Linear Algebra, Page Rank

Intermediate step in DA-PWC With 6 clusters MDS used to project from high dimensional to 3D space Each of 100K points is a sequence. Clusters are Fungi families. 140 Clusters at end of iteration N=100K points is 10^5 core hours Scales between O(N) and O(N2)