Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation.

Slides:



Advertisements
Similar presentations
1 From Grids to Service-Oriented Knowledge Utilities research challenges Thierry Priol.
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Why Grids Matter to Europe Bob Jones EGEE.
Large Scale Computing Systems
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Future of Scientific Computing Marvin Theimer Software Architect Windows Server High Performance Computing Group Microsoft Corporation Marvin Theimer.
Client+Cloud The Future of Research Dr. Daniel A. Reed Corporate Vice President Extreme Computing Group & Technology Strategy and Policy.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
FP6−2004−Infrastructures−6-SSA [ Empowering e Science across the Mediterranean ] Grids and their role towards development F. Ruggieri – INFN (EUMEDGRID.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Infrastructure overview Arnold Meijster &
EGEE is proposed as a project funded by the European Union under contract IST The EGEE International Grid Infrastructure and the Digital Divide.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Integrated Scientific Workflow Management for the Emulab Network Testbed Eric Eide, Leigh Stoller, Tim Stack, Juliana Freire, and Jay Lepreau and Jay Lepreau.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
HPC Technical Workshop Björn Tromsdorf Product & Solutions Manager, Microsoft EMEA London
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Advanced development requires advanced tooling.
1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.
Assessment of Core Services provided to USLHC by OSG.
1 Windows Compute Cluster Server 2003 Carlos Hulot New Technologies & Plataform Manager Microsoft Brasil
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
Tony Hey Corporate Vice President Corporate Vice President Technical Computing Microsoft Corporation Microsoft Corporation Computer and Information Sciences.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
IST E-infrastructure shared between Europe and Latin America Biomedical Applications in EELA Esther Montes Prado CIEMAT (Spain)
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Cloud Computing in NASA Missions Dan Whorton CTO, Stinger Ghaffarian Technologies June 25, 2010 All material in RED will be updated.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE Infrastructure and Remote Instruments.
Informix IDS Administration with the New Server Studio 4.0 By Lester Knutsen My experience with the beta of Server Studio and the new Informix database.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
INFSO-RI Enabling Grids for E-sciencE EGEE - a worldwide Grid infrastructure opportunities for the biomedical community Bob Jones.
Lewis Shepherd GOVERNMENT AND THE REVOLUTION IN SCIENTIFIC COMPUTING.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Running a Scientific Experiment on the Grid Vilnius, 13 rd May, 2008 by Tomasz Szepieniec IFJ PAN & CYFRONET.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE – paving the way for a sustainable infrastructure.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
1 Windows Compute Cluster Server 2003 Guilherme Carvalhal Gerente Acadêmico Microsoft Brasil
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
INFSO-RI Enabling Grids for E-sciencE In silico docking on EGEE infrastructure, the case of WISDOM Nicolas Jacq LPC of Clermont-Ferrand,
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
INFSO-RI Enabling Grids for E-sciencE EGEE Review WISDOM demonstration Vincent Bloch, Vincent Breton, Matteo Diarena, Jean Salzemann.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Enabling, facilitating and delivering quality training in the UK and Internationally Introduction to e-science concepts Mike Mineter Training Outreach.
Statistics in WR: Lecture 1 Key Themes – Knowledge discovery in hydrology – Introduction to probability and statistics – Definition of random variables.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Technical computing for science and industry Fabrizio Gagliardi Microsoft Corporation Fabrizio Gagliardi Microsoft Corporation.
Page : 1 SC2004 Pittsburgh, November 12, 2004 DEISA : integrating HPC infrastructures in Europe DEISA : integrating HPC infrastructures in Europe Victor.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
3 rd EGEE Conference Athens 18 th April, Stephen McGibbon Senior Director, EMEA Technology Office Chief Technology Officer, Central & Eastern Europe,
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Management of Virtual Machines in Grids Infrastructures
The LHC Computing Grid Visit of Her Royal Highness
Management of Virtual Machines in Grids Infrastructures
Building and running HPC apps in Windows Azure
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
Presentation transcript:

Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation

Outline Introductory remarks Reviewing emergence of e_Science the intensive computing side the massive data side The opportunity of e_Science The challenges of e_Science A Microsoft contribution Conclusions

Introductory remarks Who am I? A computer scientist who has spent 30 years at CERN (and in other scientific laboratories) developing HPC systems for physics and other sciences Started in real-time, data acquisition and networking Pioneered ES, AI, MPP systems, cluster computing and in the last 7 years, Grid computing Initiator of EU-DataGrid, EGEE and more than 10 other HPC and Grid projects (mostly within the EU IST programmes) Co-founder of the Global Grid Forum (started in Amsterdam in 2001 together with EU-DataGrid) See my last article on IEEE Spectrum Magazine (July 2006)

Introductory remarks 2 Joined Microsoft on 1/November/2005 Promoting Microsoft Computing into Science and Science into Microsoft Computing My mission: Promoting Microsoft Computing into Science and Science into Microsoft Computing by exploring and building important collaborations with science in Europe, Middle East, Africa and Latin America Director in the Technical Computing team led by Tony Hey (Corporate VP)

A New Science Paradigm  Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena  Last few hundred years: Theoretical Science - Newton’s Laws, Maxwell’s Equations … - Newton’s Laws, Maxwell’s Equations …  Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena  Today: e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using massive computing and large data - using massive computing and large data exploration and mining: exploration and mining: Data captured by instruments Data captured by instruments Data generated by simulations Data generated by simulations Data generated by sensor networks Data generated by sensor networks  Scientists mostly work on computers (With thanks to Jim Gray)

Life Sciences Multidisciplinary Research New Materials, Technologies & Processes Math and Physical Science Social Sciences Earth Sciences Computer & Information Sciences Accelerating Discovery

7 CERN LHC 40 million particle collisions every second reduced by online computers to a few hundred “good” events per sec. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments

8 Technology evolution has helped… System Cray Y-MP C916Sun HPC10000Small Form Factor PCs Architecture 16 x Vector 4GB, Bus 24 x 333MHz Ultra- SPARCII, 24GB, SBus 4 x 2.2GHz Athlon64 4GB, GigE OS UNICOSSolaris 2.5.1Windows Server 2003 SP1 GFlops~10 Top500 # 1500N/A Price $40,000,000$1,000,000 (40x drop)< $4,000 (250x drop) Customers Government LabsLarge EnterprisesEvery Engineer & Scientist Applications Classified, Climate, Physics Research Manufacturing, Energy, Finance, Telecom Bioinformatics, Materials Sciences, Digital Media

Top 500 Architectures / Systems

Enabling Grids for E-sciencE INFSO-RI LCG depends on two major science Grid infrastructures (plus regional Grids) EGEE - Enabling Grids for E-Science OSG - US Open Science Grid High Energy Physics (LCG) Scale (June 2006): ~ 200 sites in 40 countries ~ CPUs > 10 PB storage > jobs per day > 100 Virtual Organizations

Enabling Grids for E-sciencE EGEE-II INFSO-RI Grids in Biomedical Sciences A multiplication of projects around the world –Example: the National Bioinformatics Initiative in Holland The example of EGEE –More than 20 applications in medical imaging, bioinformatics and drug discovery –Large scale deployment of in silico drug discovery initiatives binding energy docking energy T01 (E119A) T01 energy statistics kcal/mol number Docking Energy Binding Energy 1f8b, 1f8c 2qwe 55% 11.58% binding energy docking energy Kcal/mol compound numbers T01 (E119A) T01 energy statistics kcal/mol number Docking Energy Binding Energy 1f8c 2qwe 55% 11.58% binding energy docking energy Kcal/mol compound numbers Impact of mutations on drug efficiency against H5N1 In Silico Docking On Malaria on 5 grid infrastructures is breaking the the world record for in silico docking throughput

12 Future ITER Fusion reactor Applications with distributed calculations: Monte Carlo, Separate estimates, … Multiple Ray Tracing: e. g. TRUBA Stellarator Optimization: VMEC Transport and Kinetic Theory: Monte Carlo Codes

13 The data deluge e_Science is now dominated by huge amounts of data Many discoveries are hidden in those data, but… How to organize, mine and understand the data? How to address the above issues in a scientist friendly environment, this is where commodity computing tools developed by Microsoft for business and industry could help…

© 14 Data, Data, Data Courtesy of Carole Goble

© 15 Lets put it in context…. “Six weeks in the laboratory can save you six minutes at the computer” Jeremy Zucker, Tom Knight Courtesy of Carole Goble

© 16 Courtesy of Carole Goble

17 The opportunity in e_Science Replacing experimental activity (or part of it) with computing simulation and modelling based on large distributed computing infrastructures is what is now called e_Science Allowing sharing of resources, not only computing, but also data and people’s knowledge is what motivated the emergency of grid computing and the establishment of international virtual organisations which replace local resident scientists This is major paradigm shift which requires scientists to become expert in complex computing methods

18 The challenges (still) in e_Science The applied scientist is obliged to become also a computer scientist Far too much time is spent in developing often over engineered computing solutions distracting the applied scientist from their primary mission This has shifted the conventional scientific computing paradigm and could limit scientific discovery in the future and produce major set backs The applied scientist is obliged to become also a computer scientist Far too much time is spent in developing often over engineered computing solutions distracting the applied scientist from their primary mission This has shifted the conventional scientific computing paradigm and could limit scientific discovery in the future and produce major set backs

19 The Problem for the e-Scientist Data ingest Managing Petabytes Common schemas How to organize it? How to reorganize it? How to coexist & cooperate with others?  Data Query and Visualization tools  Support/training  Performance  Execute queries in a minute  Batch (big) query scheduling Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts

20 Can “Here and Now” technologies accelerate discovery? Can “Business” Tools and techniques for dealing with be used in scientific research to allow researchers to be scientists and not computer scientists…

21 Computational Modeling Real-world Data Interpretation & Insight Persistent Distributed Data Workflow, Data Mining & Algorithms

22 Computational Modeling Real-world Data Interpretation & Insight Persistent Distributed Data Workflow, Data Mining & Algorithms

23 Conclusion We need to advance in making computing easy to use for the scientists to concentrate their energy on their science rather than on the computing tools Only in this way e_Science will be successful in accelerating discovery and producing new breakthroughs Microsoft is making first significative contributions with contribution to Grid standards (OGF HPC profile) and first HPC cluster products MS CSS We need to advance in making computing easy to use for the scientists to concentrate their energy on their science rather than on the computing tools Only in this way e_Science will be successful in accelerating discovery and producing new breakthroughs Microsoft is making first significative contributions with contribution to Grid standards (OGF HPC profile) and first HPC cluster products MS CSS

24 Windows Compute Cluster Server 2003 Launched on June 2006 !!!

25 Microsoft Compute Cluster Server  Vision  Solution for aplications that uses intensive compute tasks.  To help scalate using a cluster of computers.  Mission Statement  Empowering end users by allowing them to easily harness distributed computing resources to solve complex problems.  Platform  Based on Windows Server 2003 SP1 64 bit Edition.  Suport for Ethernet, Infiniband and others (better than Winsock Direct).  Administration  Setup and administration simplified.  Administration based on images + scripts.  Security based on Active Directory.  Job scheduling and resources administration.  Development  Cluster scheduler via.NET and DCOM.  MPI2 stack with a better performance and security for parallel applications.  Visual Studio 2005 – OpenMP, Parallel Debugger.

26 Topology of WCCS

27 Communication Components  Computers in a cluster can be connected in one of the six communication topologies:  Star  Crossbar  Ring  2D Hypercube  Fully Connected  Mesh / Grid

28 Some Details about Security  Permissions on files and folders on the file server that is connected to both the head nodes and the compute nodes.  Secure movement of files from personal computers back and forth to the secure file server.  Authentication of users on compute nodes so that jobs can be run remotely on these computers.  User management  Human and programming interfaces  Program run levels  User level, kernel, Admin mode  Dynamic access to resources

29 WCCS Components  Head Node  Compute Node  Job Scheduler  Management Infrastructure  Compute Cluster Administrator and Job Manager  Command Line Interface

30 Installing and Configuring Head Node Head Node Node

31  Configuring the Cluster Installing and Configuring Head Node

32  Selecting Network Topology Installing and Configuring Head Node

33 Services on Nodes  Head Node  Compute Cluster Management Service  Compute Cluster Scheduler Service  Compute Cluster SDM Store Service  Compute Cluster MPI Service  Compute Cluster Node Manager Service  Compute Nodes  Compute Cluster Management Service  Compute Cluster MPI Service  Compute Cluster Node Manager Service

34 Cluster Control

35 Run a sample code on the Cluster

36 Management of WCCS  Remote Desktop Sessions

37 Management of WCCS  System Monitor  This page displays performance monitoring data for the cluster

38 Job Activation  State transition during job execution on compute node

39 Job life cycle in WCCS

40 Create a new Job

41 Windows Compute Cluster Server 2003 Developing using Visual Studio 2005

42 Microsoft Academic Programs WCCS 2003 Access to Academia free for non commercial use WCCS 2003 Access to Academia free for non commercial use

43 Windows Compute Cluster Server 2003 Thank you!!! Carlos Hulot New Technologies & Plataform Manager Microsoft Brasil Microsoft HPC website Public Newsgroup nntp://microsoft.public.windows.hpc Comunidade Acadêmica | Brasil