Https://portal.futuregrid.org Some remarks on Use of Clouds to Support Long Tail of Science July 18 2012 XSEDE 2012 Chicago ILL July 2012 Geoffrey Fox.

Slides:



Advertisements
Similar presentations
Architecture and Measured Characteristics of a Cloud Based Internet of Things May 22, 2012 The 2012 International Conference.
Advertisements

SALSA HPC Group School of Informatics and Computing Indiana University.
ELIDA Panel on Data Marketplaces and digital preservation Fabrizio Gagliardi, Vassilis Christophides, David Giaretta, Robert Sharpe, Elena Simperl.
International Conference on Cloud and Green Computing (CGC2011, SCA2011, DASC2011, PICom2011, EmbeddedCom2011) University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
SALSASALSASALSASALSA Digital Science Center June 25, 2010, IIT Geoffrey Fox Judy Qiu School.
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
Design Discussion Rain: Dynamically Provisioning Clouds within FutureGrid Geoffrey Fox, Andrew J. Younge, Gregor von Laszewski, Archit Kulshrestha, Fugang.
Virtual Clusters Supporting MapReduce in the Cloud Jonathan Klinginsmith School of Informatics and Computing.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate.
Science Clouds and CFD NIA CFD Conference: Future Directions in CFD Research, A Modeling and Simulation Conference August.
FutureGrid SOIC Lightning Talk February Geoffrey Fox
Science of Cloud Computing Panel Cloud2011 Washington DC July Geoffrey Fox
Experimenting with FutureGrid CloudCom 2010 Conference Indianapolis December Geoffrey Fox
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
ACES and Clouds ACES Meeting Maui October Geoffrey Fox Informatics, Computing and Physics Indiana.
Biomedical Cloud Computing iDASH Symposium San Diego CA May Geoffrey Fox
Science Applications on Clouds June Cloud and Autonomic Computing Center Spring 2012 Workshop Cloud Computing: from.
Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
ICETE 2012 Joint Conference on e-Business and Telecommunications Hotel Meliá Roma Aurelia Antica, Rome, Italy July
FutureGrid Connection to Comet Testbed and On Ramp as a Service Geoffrey Fox Indiana University Infra structure.
SALSA HPC Group School of Informatics and Computing Indiana University.
FutureGrid SOIC Lightning Talk February Geoffrey Fox
SALSASALSASALSASALSA FutureGrid Venus-C June Geoffrey Fox
FutureGrid BOF Overview TG 11 Salt Lake City July Geoffrey Fox
SALSASALSASALSASALSA Clouds Ball Aerospace March Geoffrey Fox
Programming Models for Technical Computing on Clouds and Supercomputers (aka HPC) May Cloud Futures 2012 May 7–8,
Scientific Computing Supported by Clouds, Grids and HPC(Exascale) Systems June HPC 2012 Cetraro, Italy Geoffrey Fox.
SALSASALSASALSASALSA Cloud Panel Session CloudCom 2009 Beijing Jiaotong University Beijing December Geoffrey Fox
Internet of Things (Smart Grid) Storm Archival Storage – NOSQL like Hbase Streaming Processing (Iterative MapReduce) Batch Processing (Iterative MapReduce)
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Recipes for Success with Big Data using FutureGrid Cloudmesh SDSC Exhibit Booth New Orleans Convention Center November Geoffrey Fox, Gregor von.
Future Grid Future Grid Overview. Future Grid Future GridFutureGridFutureGrid The goal of FutureGrid is to support the research that will invent the future.
Remarks on MOOC’s Open Grid Forum BOF July 24 OGF38B at XSEDE13 San Diego Geoffrey Fox Informatics, Computing.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Web Technologies Lecture 13 Introduction to cloud computing.
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Big Data Workshop Summary Virtual School for Computational Science and Engineering July Geoffrey Fox
Geoffrey Fox Panel Talk: February
Private Public FG Network NID: Network Impairment Device
Status and Challenges: January 2017
Recap: introduction to e-science
NSF : CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science PI: Geoffrey C. Fox Software: MIDAS HPC-ABDS.
FutureGrid Computing Testbed as a Service
Digital Science Center Overview
Cloud DIKW based on HPC-ABDS to integrate streaming and batch Big Data
Clouds from FutureGrid’s Perspective
Department of Intelligent Systems Engineering
3 Questions for Cluster and Grid Use
Panel on Research Challenges in Big Data
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Big Data, Simulations and HPC Convergence
Convergence of Big Data and Extreme Computing
Presentation transcript:

Some remarks on Use of Clouds to Support Long Tail of Science July XSEDE 2012 Chicago ILL July 2012 Geoffrey Fox Informatics, Computing and Physics Pervasive Technology Institute Indiana University Bloomington

What Applications work in Clouds Pleasingly parallel applications of all sorts with roughly independent data or spawning independent simulations – Long tail of science and integration of distributed sensors Commercial and Science Data analytics that can use MapReduce (some of such apps) or its iterative variants (most other data analytics apps) Which science applications are using clouds? – Many demonstrations –Conferences, OOI, HEP …. – Venus-C (Azure in Europe): 27 applications not using Scheduler, Workflow or MapReduce (except roll your own) – 50% of applications on FutureGrid are from Life Science but there is more computer science than total applications – Locally Lilly corporation is major commercial cloud user (for drug discovery) but Biology department is not 2

27 Venus-C Azure “Long Tail” Applications chosen from ~70 3 Chemistry (3) Lead Optimization in Drug Discovery Molecular Docking Civil Eng. and Arch. (4) Structural Analysis Building information Management Energy Efficiency in Buildings Soil structure simulation Earth Sciences (1) Seismic propagation ICT (2) Logistics and vehicle routing Social networks analysis Mathematics (1) Computational Algebra Medicine (3) Intensive Care Units decision support. IM Radiotherapy planning. Brain Imaging Mol, Cell. & Gen. Bio. (7) Genomic sequence analysis RNA prediction and analysis System Biology Loci Mapping Micro-arrays quality. Physics (1) Simulation of Galaxies configuration Biodiversity & Biology (2) Biodiversity maps in marine species Gait simulation Civil Protection (1) Fire Risk estimation and fire propagation Mech, Naval & Aero. Eng. (2) Vessels monitoring Bevel gear manufacturing simulation VENUS-C Final Review: The User Perspective 11-12/7 EBC Brussels

Internet of Things and the Cloud It is projected that there will be 24 billion devices on the Internet by Most will be small sensors that send streams of information into the cloud where it will be processed and integrated with other streams and turned into knowledge that will help our lives in a multitude of small and big ways. The cloud will become increasing important as a controller of and resource provider for the Internet of Things. As well as today’s use for smart phone and gaming console support, “smart homes” and “ubiquitous cities” build on this vision and we could expect a growth in cloud supported/controlled robotics. Some of these “things” will be supporting science – All will be studied by Computer Science Natural parallelism over “things” 4

Sensors as a Service Sensor Processing as a Service (could use MapReduce) A larger sensor ……… Output Sensor Open Source Sensor (IoT) Cloud

What to use in Clouds: Cloud PaaS HDFS style file system to collocate data and computing Queues to manage multiple tasks Tables to track job information MapReduce and Iterative MapReduce to support parallelism Services for everything Portals as User Interface Appliances and Roles as customized images Software tools like Google App Engine, memcached Workflow to link multiple services (functions) Data Parallel Languages like Pig; more successful than HPF? 6

What to use in Grids and Supercomputers? HPC PaaS Services Portals and Workflow as in clouds MPI and GPU/multicore threaded parallelism GridFTP and high speed networking Wonderful libraries supporting parallel linear algebra, particle evolution, partial differential equation solution Globus, Condor, SAGA, Unicore, Genesis for Grids Parallel I/O for high performance in an application Wide area File System (e.g. Lustre) supporting file sharing This is a rather different style of PaaS from clouds – could we unify Cloud and HPC PaaS as SciencePaaS? 7

aaS versus Roles/Appliances If you package a capability X as XaaS, it runs on a separate VM and you interact with messages – SQLaaS offers databases via messages similar to old JDBC model If you build a X-role or X-appliance, then X built into VM and you just need to add your own code and run – Azure Generic worker role builds in I/O and scheduling Lets take all useful capabilities – MPI, MapReduce, Workflow.. – and offer as roles or aaS (or both) Perhaps workflow has a controller aaS with graphical design tool while runtime packaged in a role? Use virtual clusters to support parallelism 8

Data Intensive Applications Applications tend to be new and so can consider emerging technologies such as clouds Do not have lots of small messages but rather large reduction (aka Collective) operations – New optimizations e.g. for huge messages – e.g. Expectation Maximization (EM) dominated by broadcasts and reductions Not clearly a single exascale job but rather many smaller (but not sequential) jobs e.g. to analyze groups of sequences Algorithms not clearly robust enough to analyze lots of data Current standard algorithms such as those in R library not designed for big data? – Need to develop appliances and libraries supporting data intensive applications 9

Computational Science as a Service Traditional Computer Center has a variety of capabilities supporting (scientific computing/scholarly research) users. – Lets call this Computational Science as a Service IaaS, PaaS and SaaS are lower level parts of these capabilities but commercial clouds do not include 1)Developing roles/appliances for particular users 2)Supplying custom SaaS aimed at user communities 3)Community Portals 4)Integration across disparate resources for data and compute (i.e. grids) 5)Data transfer and network link services 6)Archival storage, preservation 7)Consulting on use of particular appliances and SaaS i.e. on particular software components 8)Debugging and other problem solving 9)Administrative issues such as (local) accounting This allows us to develop a new model of a computer center where commercial companies operate base hardware/software A combination of XSEDE, Internet2 and computer center supply 1) to 9)? 10

FutureGrid Key Concepts I FutureGrid is an international testbed modeled on Grid5000 – July : 223 Projects, 968 users Supporting international Computer Science and Computational Science research in cloud, grid and parallel computing (HPC) FutureGrid has ~4400 distributed cores (including GPU’s, large memory and disk) with a dedicated network and a Spirent XGEM network fault and delay generator The FutureGrid testbed provides to long tail of science: – User-customizable Interactive Environment supporting Grid, Cloud and HPC software with and without virtualization. – A rich education and teaching platform for advanced cyberinfrastructure (computer/computational science) classes – Library of Images and Appliances

FutureGrid key Concepts II 9 Partners Rather than loading images onto VM’s as in Amazon or Azure, FutureGrid supports Cloud, Grid and Parallel computing environments by provisioning software as needed onto “bare- metal” using RAIN software Supports Appliances with tutorials on use Use it at Virtual Science Cloud Summer School July 30— August –Several application talks –10 HBCU faculty from ADMI Image1 Image2 ImageN … LoadChooseRun

5 Use Types for FutureGrid (red latest) 223 approved projects (968 users) July – USA, China, India, Pakistan, lots of European countries – Industry, Government, Academia Training Education and Outreach (8%)(10%) – Semester and short events; interesting outreach to HBCU Interoperability test-beds (3%)(2%) – Grids and Clouds; Standards; from Open Grid Forum OGF Domain Science applications (31%)(26%) – Life science highlighted (18%)(14%), Non Life Science (13%)(12%) Computer science (47%)(57%) – Largest current category Computer Systems Evaluation (27%)(29%) – XSEDE (TIS, TAS), OSG, EGI – See Andrew Grimshaw’s discussion of XSEDE testing in Book Chapter 13

14