Https://portal.futuregrid.org 2020-2025 Scientific Computing Environments ( Distributed Computing in an Exascale era) August 7 2013 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

1 Challenges and New Trends in Data Intensive Science Panel at Data-aware Distributed Computing (DADC) Workshop HPDC Boston June Geoffrey Fox Community.
Current NIST Definition NIST Big data consists of advanced techniques that harness independent resources for building scalable data systems when the characteristics.
International Conference on Cloud and Green Computing (CGC2011, SCA2011, DASC2011, PICom2011, EmbeddedCom2011) University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Microsoft Technical Computing Modeling the world with greater fidelity Wolfgang Dreyer, TC - Microsoft Germany.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
Parallel Programming Henri Bal Rob van Nieuwpoort Vrije Universiteit Amsterdam Faculty of Sciences.
Master of Arts in Data Science Geoffrey Fox for Data Science Program March
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
Master of Arts in Data Science
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Cyberinfrastructure Supporting Social Science Cyberinfrastructure Workshop October Chicago Geoffrey Fox
Big Data in the Cloud: Research and Education September PPAM 2013 Warsaw Geoffrey Fox
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
X-Informatics Introduction: What is Big Data, Data Analytics and X-Informatics? January Geoffrey Fox
X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
Low-Latency Accelerated Computing on GPUs
OpenQuake Infomall ACES Meeting Maui May Geoffrey Fox
Remarks on Big Data Clustering (and its visualization) Big Data and Extreme-scale Computing (BDEC) Charleston SC May Geoffrey Fox
Data Science at Digital Science October Geoffrey Fox Judy Qiu
ICETE 2012 Joint Conference on e-Business and Telecommunications Hotel Meliá Roma Aurelia Antica, Rome, Italy July
Some remarks on Use of Clouds to Support Long Tail of Science July XSEDE 2012 Chicago ILL July 2012 Geoffrey Fox.
X-Informatics MapReduce February Geoffrey Fox Associate Dean for Research.
SALSASALSASALSASALSA Cloud Panel Session CloudCom 2009 Beijing Jiaotong University Beijing December Geoffrey Fox
1 Multicore for Science Multicore Panel at eScience 2008 December Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University.
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Computing Research Testbeds as a Service: Supporting large scale Experiments and Testing SC12 Birds of a Feather November.
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
SALSASALSASALSASALSA Digital Science Center February 12, 2010, Bloomington Geoffrey Fox Judy Qiu
Big Data to Knowledge Panel SKG 2014 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China August Geoffrey Fox
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
1 Cloud Systems Panel at HPDC Boston June Geoffrey Fox Community Grids Laboratory, School of informatics Indiana University
Training Data Scientists DELSA Workshop DW4 May Washington DC Geoffrey Fox Informatics, Computing.
Remarks on MOOC’s SC13 Birds of a Feather November Geoffrey Fox Informatics, Computing and Physics.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Data Science at Digital Science Center.
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Geoffrey Fox Panel Talk: February
Panel: Beyond Exascale Computing
Volunteer Computing for Science Gateways
Geoffrey Fox, Shantenu Jha, Dan Katz, Judy Qiu, Jon Weissman
Extreme Big Data Examples
NSF start October 1, 2014 Datanet: CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science Indiana University.
Digital Science Center I
I590 Data Science Curriculum August
Data Science Curriculum March
Biology MDS and Clustering Results
6 October 2016 Irmingard Eder Data Scientist, Munich Re
Hilton Hotel Honolulu Tapa Ballroom 2 June 26, 2017 Geoffrey Fox
Cloud Evolution Dennis Gannon
Scalable Parallel Interoperable Data Analytics Library
Clouds from FutureGrid’s Perspective
The two faces of Cyberinfrastructure: Grids (or Web 2
Discussion: Cloud Computing for an AI First Future
Services, Security, and Privacy in Cloud Computing
Department of Intelligent Systems Engineering
PolarGrid and FutureGrid
Panel on Research Challenges in Big Data
Digital Science Center
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Big Data, Simulations and HPC Convergence
Convergence of Big Data and Extreme Computing
Presentation transcript:

Scientific Computing Environments ( Distributed Computing in an Exascale era) August Geoffrey Fox School of Informatics and Computing Community Grids Laboratory Indiana University Bloomington

Scientific Computing Environments ( Distributed Computing in an Exascale era) 1)The components of national research computing in exascale era with mix of high end machines, clouds (whatever commercial companies offer broadly or publicly), university centers, high throughput systems and with growing amounts of distributed and "repositorized" data serving High End and Long Tail researchers. 2)The nature of an environment like XSEDE in the Exascale era; i.e. the nature of a distributed system of facilities including one or more exascale machines. Should it be relatively tightly coupled like XSEDE or more loosely coupled like DoE leadership systems (or both!)

Considerations Both of these major topics can be considered with attention to A) What services do 2025 science projects need from cyberinfrastructure; examples are -- Collaboration; On-demand computing; Digital observatory; High-speed scratch and persistent storage; Data preservation; Identity, profile, group management, reproducibility of results, versioning, and documentation of results B) What are requirements in are there changes in distributed system requirements outside details of exascale machines with their novel architecture e.g. a)Will big data lead to new requirements b)Will feeding/supporting exascale machine lead to new requirements c)Will supporting long tail of science lead to new requirements d)Can we do more to make people use central services rather than building their own

Speakers Miron Livny, Wisconsin Shantenu Jha, Rutgers Dennis Gannon, Microsoft Ioan Raicu, IIT Jim Pepin, Clemson 4

Network v Data v Compute Growth Moore’s Law Unnormalized Slope of #1 Top 500 > Total Data > Moore > IP Traffic 5

Trends Network, Data, Computing Data likely to get larger and produced all over the world i.e. stay distributed Network rise underlies MOOC’s and Cloud computing Not obvious that data/network increase any larger than computing Cisco network traffic < Moore’s Law IDC total data > (little bit) Moore’s Law Some areas of data like genomics and social images have seen huge (one time?) increases 6

7 Meeker/Wu May Internet Trends D11 Conference Zettabyte = 1000 Exabytes Exabyte = 1000 Petabytes Petabyte = 1000 Terabyte Terabyte = 1000 Gigabytes Gigabyte = 1000 Megabytes

8 Faster than Moore’s Law Slower?

9 Meeker/Wu May Internet Trends D11 Conference Factor of 2 per year faster than Computing 2013 JUST to date (May 2013)

Image based Computations Deep Learning with COTS HPC, Adam Coates, Brody Huval, Tao Wang, David J. Wu, Andrew Y. Ng and Bryan Catanzaro ICML 2013 (Stanford AI group) CoatesHuvalWangWuNgCatanzaro_icml2013.pdf 64 GPU’s on 16 nodes; MPI Speed up of 32; GPU parallelism “perfect” Train 11 BILLION parameters in 3 days on just 10 million 200 by 200 images from YouTube (note 500 million per day on FaceBook etc.) MPI Parallelism over pixels; GPU uses optimized Matrix-Matrix multiplication with Parallelism over Neuron banks and Images Earlier paper NIPS2012 using MapReduce variant with Google (Dean) had MUCH poorer performance on Intel style cores Next: Neural networks for driving: 100 million ~1000 by 1000 images 10

Is Big Data Changing Requirements? Will Compute/Data/Network ratios change? – Not obvious but needs more study I expect “Data Science” to grow and increased use of large scale data analytics as in deep learning and image clustering (100 million images, 10 million clusters) – Richer set of data areas and new users like AI/Image processing Compute requirements unclear for data analytics – Status of bringing data to computing still unclear – NIST BigData effort defining use cases and associated reference architecture So changes due to Big Data just because we haven’t got it right now However applications like LHC analysis and Long Tail Science will keep high throughput computing structure 11