Download presentation
Presentation is loading. Please wait.
Published byAntoine Tibbals Modified over 9 years ago
1
Opportunities and Challenges in e_Science Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation Fabrizio Gagliardi & Carlos Hulot Microsoft Corporation
2
Outline Introductory remarks Reviewing emergence of e_Science the intensive computing side the massive data side The opportunity of e_Science The challenges of e_Science A Microsoft contribution Conclusions
3
Introductory remarks Who am I? A computer scientist who has spent 30 years at CERN (and in other scientific laboratories) developing HPC systems for physics and other sciences Started in real-time, data acquisition and networking Pioneered ES, AI, MPP systems, cluster computing and in the last 7 years, Grid computing Initiator of EU-DataGrid, EGEE and more than 10 other HPC and Grid projects (mostly within the EU IST programmes) Co-founder of the Global Grid Forum (started in Amsterdam in 2001 together with EU-DataGrid) See my last article on IEEE Spectrum Magazine (July 2006)
4
Introductory remarks 2 Joined Microsoft on 1/November/2005 Promoting Microsoft Computing into Science and Science into Microsoft Computing My mission: Promoting Microsoft Computing into Science and Science into Microsoft Computing by exploring and building important collaborations with science in Europe, Middle East, Africa and Latin America Director in the Technical Computing team led by Tony Hey (Corporate VP)
5
A New Science Paradigm Thousand years ago: Experimental Science - description of natural phenomena - description of natural phenomena Last few hundred years: Theoretical Science - Newton’s Laws, Maxwell’s Equations … - Newton’s Laws, Maxwell’s Equations … Last few decades: Computational Science - simulation of complex phenomena - simulation of complex phenomena Today: e-Science or Data-centric Science - unify theory, experiment, and simulation - unify theory, experiment, and simulation - using massive computing and large data - using massive computing and large data exploration and mining: exploration and mining: Data captured by instruments Data captured by instruments Data generated by simulations Data generated by simulations Data generated by sensor networks Data generated by sensor networks Scientists mostly work on computers (With thanks to Jim Gray)
6
Life Sciences Multidisciplinary Research New Materials, Technologies & Processes Math and Physical Science Social Sciences Earth Sciences Computer & Information Sciences Accelerating Discovery
7
7 CERN LHC 40 million particle collisions every second reduced by online computers to a few hundred “good” events per sec. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments
8
8 Technology evolution has helped… 199119982005 System Cray Y-MP C916Sun HPC10000Small Form Factor PCs Architecture 16 x Vector 4GB, Bus 24 x 333MHz Ultra- SPARCII, 24GB, SBus 4 x 2.2GHz Athlon64 4GB, GigE OS UNICOSSolaris 2.5.1Windows Server 2003 SP1 GFlops~10 Top500 # 1500N/A Price $40,000,000$1,000,000 (40x drop)< $4,000 (250x drop) Customers Government LabsLarge EnterprisesEvery Engineer & Scientist Applications Classified, Climate, Physics Research Manufacturing, Energy, Finance, Telecom Bioinformatics, Materials Sciences, Digital Media
9
Top 500 Architectures / Systems
10
Enabling Grids for E-sciencE INFSO-RI-508833 10 LCG depends on two major science Grid infrastructures (plus regional Grids) EGEE - Enabling Grids for E-Science OSG - US Open Science Grid High Energy Physics (LCG) Scale (June 2006): ~ 200 sites in 40 countries ~ 25 000 CPUs > 10 PB storage > 35 000 jobs per day > 100 Virtual Organizations
11
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 11 Grids in Biomedical Sciences A multiplication of projects around the world –Example: the National Bioinformatics Initiative in Holland The example of EGEE –More than 20 applications in medical imaging, bioinformatics and drug discovery –Large scale deployment of in silico drug discovery initiatives binding energy docking energy T01 (E119A) T01 energy statistics 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 0 kcal/mol number Docking Energy Binding Energy 1f8b, 1f8c 2qwe 55% 11.58% binding energy docking energy Kcal/mol compound numbers T01 (E119A) T01 energy statistics 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 0 kcal/mol number Docking Energy Binding Energy 1f8c 2qwe 55% 11.58% binding energy docking energy Kcal/mol compound numbers Impact of mutations on drug efficiency against H5N1 In Silico Docking On Malaria on 5 grid infrastructures is breaking the the world record for in silico docking throughput
12
12 Future ITER Fusion reactor Applications with distributed calculations: Monte Carlo, Separate estimates, … Multiple Ray Tracing: e. g. TRUBA Stellarator Optimization: VMEC Transport and Kinetic Theory: Monte Carlo Codes
13
13 The data deluge e_Science is now dominated by huge amounts of data Many discoveries are hidden in those data, but… How to organize, mine and understand the data? How to address the above issues in a scientist friendly environment, this is where commodity computing tools developed by Microsoft for business and industry could help…
14
© 14 Data, Data, Data Courtesy of Carole Goble
15
© 15 Lets put it in context…. “Six weeks in the laboratory can save you six minutes at the computer” Jeremy Zucker, Tom Knight Courtesy of Carole Goble
16
© 16 Courtesy of Carole Goble
17
17 The opportunity in e_Science Replacing experimental activity (or part of it) with computing simulation and modelling based on large distributed computing infrastructures is what is now called e_Science Allowing sharing of resources, not only computing, but also data and people’s knowledge is what motivated the emergency of grid computing and the establishment of international virtual organisations which replace local resident scientists This is major paradigm shift which requires scientists to become expert in complex computing methods
18
18 The challenges (still) in e_Science The applied scientist is obliged to become also a computer scientist Far too much time is spent in developing often over engineered computing solutions distracting the applied scientist from their primary mission This has shifted the conventional scientific computing paradigm and could limit scientific discovery in the future and produce major set backs The applied scientist is obliged to become also a computer scientist Far too much time is spent in developing often over engineered computing solutions distracting the applied scientist from their primary mission This has shifted the conventional scientific computing paradigm and could limit scientific discovery in the future and produce major set backs
19
19 The Problem for the e-Scientist Data ingest Managing Petabytes Common schemas How to organize it? How to reorganize it? How to coexist & cooperate with others? Data Query and Visualization tools Support/training Performance Execute queries in a minute Batch (big) query scheduling Experiments & Instruments Simulations facts answers questions ? Literature Other Archives facts
20
20 Can “Here and Now” technologies accelerate discovery? Can “Business” Tools and techniques for dealing with be used in scientific research to allow researchers to be scientists and not computer scientists…
21
21 Computational Modeling Real-world Data Interpretation & Insight Persistent Distributed Data Workflow, Data Mining & Algorithms
22
22 Computational Modeling Real-world Data Interpretation & Insight Persistent Distributed Data Workflow, Data Mining & Algorithms
23
23 Conclusion We need to advance in making computing easy to use for the scientists to concentrate their energy on their science rather than on the computing tools Only in this way e_Science will be successful in accelerating discovery and producing new breakthroughs Microsoft is making first significative contributions with contribution to Grid standards (OGF HPC profile) and first HPC cluster products MS CSS We need to advance in making computing easy to use for the scientists to concentrate their energy on their science rather than on the computing tools Only in this way e_Science will be successful in accelerating discovery and producing new breakthroughs Microsoft is making first significative contributions with contribution to Grid standards (OGF HPC profile) and first HPC cluster products MS CSS
24
24 Windows Compute Cluster Server 2003 Launched on June 2006 !!!
25
www.microsoft.com/hpc 25 Microsoft Compute Cluster Server Vision Solution for aplications that uses intensive compute tasks. To help scalate using a cluster of computers. Mission Statement Empowering end users by allowing them to easily harness distributed computing resources to solve complex problems. Platform Based on Windows Server 2003 SP1 64 bit Edition. Suport for Ethernet, Infiniband and others (better than Winsock Direct). Administration Setup and administration simplified. Administration based on images + scripts. Security based on Active Directory. Job scheduling and resources administration. Development Cluster scheduler via.NET and DCOM. MPI2 stack with a better performance and security for parallel applications. Visual Studio 2005 – OpenMP, Parallel Debugger.
26
www.microsoft.com/hpc 26 Topology of WCCS
27
www.microsoft.com/hpc 27 Communication Components Computers in a cluster can be connected in one of the six communication topologies: Star Crossbar Ring 2D Hypercube Fully Connected Mesh / Grid
28
www.microsoft.com/hpc 28 Some Details about Security Permissions on files and folders on the file server that is connected to both the head nodes and the compute nodes. Secure movement of files from personal computers back and forth to the secure file server. Authentication of users on compute nodes so that jobs can be run remotely on these computers. User management Human and programming interfaces Program run levels User level, kernel, Admin mode Dynamic access to resources
29
www.microsoft.com/hpc 29 WCCS Components Head Node Compute Node Job Scheduler Management Infrastructure Compute Cluster Administrator and Job Manager Command Line Interface
30
www.microsoft.com/hpc 30 Installing and Configuring Head Node Head Node Node
31
www.microsoft.com/hpc 31 Configuring the Cluster Installing and Configuring Head Node
32
www.microsoft.com/hpc 32 Selecting Network Topology Installing and Configuring Head Node
33
www.microsoft.com/hpc 33 Services on Nodes Head Node Compute Cluster Management Service Compute Cluster Scheduler Service Compute Cluster SDM Store Service Compute Cluster MPI Service Compute Cluster Node Manager Service Compute Nodes Compute Cluster Management Service Compute Cluster MPI Service Compute Cluster Node Manager Service
34
www.microsoft.com/hpc 34 Cluster Control
35
www.microsoft.com/hpc 35 Run a sample code on the Cluster
36
www.microsoft.com/hpc 36 Management of WCCS Remote Desktop Sessions
37
www.microsoft.com/hpc 37 Management of WCCS System Monitor This page displays performance monitoring data for the cluster
38
www.microsoft.com/hpc 38 Job Activation State transition during job execution on compute node
39
www.microsoft.com/hpc 39 Job life cycle in WCCS
40
www.microsoft.com/hpc 40 Create a new Job
41
41 Windows Compute Cluster Server 2003 Developing using Visual Studio 2005
42
42 Microsoft Academic Programs WCCS 2003 Access to Academia free for non commercial use WCCS 2003 Access to Academia free for non commercial use
43
43 Windows Compute Cluster Server 2003 Thank you!!! Carlos Hulot New Technologies & Plataform Manager Microsoft Brasil cahulot@microsoft.com www.microsoft.com/hpc Microsoft HPC website http://www.microsoft.com/hpc Public Newsgroup nntp://microsoft.public.windows.hpc Comunidade Acadêmica | Brasil http://www.microsoft.com/brasil/comunidadeacademica
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.