Download presentation
Presentation is loading. Please wait.
Published byDylan Marsh Modified over 10 years ago
1
CERN The Status of EU DataGrid project Presented by Bob Jones CERN Technical Coordinator bob.jones@cern.ch
2
CERN November 2001 EU DataGrid 2 Main project goals and characteristics l To build a significant prototype of the LHC computing model l To collaborate with and complement other European and US projects l To develop a sustainable computing model applicable to other sciences and industry: biology, earth observation etc. l Specific project objectives Middleware for fabric & Grid management (mostly funded by the EU) evaluation, test, and integration of existing M/W S/W and research and development of new S/W as appropriate Large scale testbed (mostly funded by the partners) Production quality demonstrations (partially funded by the EU) l Open source and technology transfer Global GRID Forum Industry and Research Forum
3
CERN November 2001 EU DataGrid 3 Main Partners l CERN – International (Switzerland/France) l CNRS - France l ESA/ESRIN – International (Italy) l INFN - Italy l NIKHEF – The Netherlands l PPARC - UK
4
CERN November 2001 EU DataGrid 4 Participants l Main partners: CERN, INFN(I), CNRS(F), PPARC(UK), NIKHEF(NL), ESA-Earth Observation l Other sciences: KNMI(NL), Biology, Medicine l Industrial participation: CS SI/F, DataMat/I, IBM/UK l Associated partners: Czech Republic, Finland, Germany, Hungary, Spain, Sweden (mostly computer scientists) l Formal collaboration with USA established l Industry and Research Project Forum with representatives from Denmark, Greece, Israel, Japan, Norway, Poland, Portugal, Russia, Switzerland
5
CERN November 2001 EU DataGrid 5 Project Scope l 9.8 M Euros EU funding over 3 years l 90% for middleware and applications (HEP, EO and biology) l Three year phased developments & demos (2001-2003) l Possible extensions (time and funds) on the basis of first successful results: DataTAG (2002-2003) CrossGrid (2002-2004) GridStart (2002-2004) …
6
CERN November 2001 EU DataGrid 6 Programme of work Middleware WP1 Grid Workload Management F. Prelz/INFN WP2 Grid Data Management P. Kunszt/CERN WP3 Grid Monitoring services S. Fisher/RAL WP4 Fabric Management O. Barring/CERN WP5 Mass Storage Management J. Gordon/RAL Testbed WP6 Testbed Integration F. Etienne/CNRS WP7 Network Services C. Michau/CNRS Scientific Applications WP8 HEP Applications F. Carminati/CERN WP9 Earth Observation ApplicationsL. Fusco/ESA-ESRIN WP10 Biology Applications C. Michau/CNRS Dissemination WP11M. Lancia/CNR Project Management WP12F. Gagliardi/CERN
7
CERN November 2001 EU DataGrid 7 WP 1 GRID Workload Management l Goal Define and implement a suitable architecture for distributed scheduling and resource management in a GRID environment l Issues Optimal co-allocation of data, CPU and network for specific grid/network- aware jobs Distributed scheduling (data/code migration) of unscheduled/scheduled jobs Uniform interface to various local resource managers Priorities, policies on resource (CPU, Data, Network) usage
8
CERN November 2001 EU DataGrid 8 WP 2 GRID Data Management l Goal Provide tools and middle-ware infrastructure to coherently manage and share petabyte-scale information volumes in high-throughput production-quality grid environments l Issues Data replication - how to maintain consistent up to date catalogues of application data and its replicas
9
CERN November 2001 EU DataGrid 9 WP 3 GRID Monitoring Services l Goals Provide tools and infrastructure to enable end-user and administrator access to status and error information in a Grid environment Permit job performance optimisation and problem tracing to facilitate high performance Grid computing l Issues How to provide a scalable, structured information system capable of integrating with the various Grid components
10
CERN November 2001 EU DataGrid 10 WP 4 Fabric Management l Goals Facilitate high performance grid computing through effective local site management Permit job performance optimisation and problem tracing at local sites Building on experience of the partners in managing clusters of several hundreds of nodes, provide all the necessary tools to manage a site with grid services on thousands of nodes l Issues How to install reference platform and EDG software on large numbers of hosts with minimal human intervention per node How to ensure the node configurations are consistent and handle updates to the software suites
11
CERN November 2001 EU DataGrid 11 WP 5 Mass Storage Management l Goals Provide extra functionality through common user and data export/import interfaces to all different existing local mass storage systems used by the project partners Ease integration of local mass storage system with the GRID data management system by using these interfaces and publishing information l Issues How to interface the many mass storage systems to the grid and provide mechanisms for interrogating their status
12
CERN November 2001 EU DataGrid 12 WP 6 Integration testbed l Goals Plan, organise, and enable testbeds for the end-to-end application experiments, which will demonstrate the effectiveness of the Data Grid in production quality operation over high performance networks. Integrate successive releases of the software components from each of the development work packages Demonstrate by the end of the project testbeds operating as production facilities for real end-to-end applications over large trans-European and potentially global high performance networks l Issues How to bring together software components from multiple sites to make a coherent, working testbed deployed at multiple sites on which the application groups can perform useful work
13
CERN November 2001 EU DataGrid 13 WP 7 Networking Services l Goals Review the network service requirements of DataGrid then make detailed plans in collaboration with the European and national actors involved Establish and manage the DataGrid network facilities Monitor the traffic and performance of the network, and develop models and provide tools and data for the planning of future networks, to satisfy the requirements of data intensive grids Deal with the distributed security aspects of DataGrid l Issues Dealing with the various national bodies and other institutions to ensure sufficient network capacity with the appropriate characteristics are available to the application groups Work in close co-ordination with GEANT
14
CERN November 2001 EU DataGrid 14 Earth Observation and Biology Science Applications l Earth Observation (WP9) provide a good opportunity to exploit Earth Observation Science (EO) applications that require large computational power and access large data files distributed over a geographical archive (e.g. numerical weather and climate models) l Biology Science (WP10) Production, analysis and data mining of data produced within projects of sequencing of genomes or in projects with high throughput for the determination of three-dimensional macromolecular structures Production, storage, comparison and retrieval of measures of the genetic expression levels obtained through systems of gene profiling based on micro-arrays, or through techniques that involve the massive production of non-textual data as still images or video
15
CERN November 2001 EU DataGrid 15 Status l EU contract signed on December 29th, 2000 l Project started on 1/1/2001 l Work ramping up at CERN and the collaborating institutes (Globus initial installation, tests and prototype production) l International test bed infrastructure being deployed l Architecture Task Force produced 2nd version of Architecture doc. l First (internal milestone) at PM9 - test-bed 1
16
CERN November 2001 EU DataGrid 16 Test Bed Schedule l TestBed 0 (early 2001) International test bed 0 infrastructure deployed Globus 1 only - no EDG middleware l TestBed 1 ( now ) First release of EU DataGrid software to defined users within the project: HEP experiments (WP 9) Biology applications (WP 10) Earth Observation (WP 11) l TestBed 2 (Sept. 2002) Builds on TestBed 1 to extend facilities of DataGrid l TestBed 3 (March 2003) & 4 (Sept 2003)
17
CERN November 2001 EU DataGrid 17 DataGrid status l Preliminary architecture defined Enough to deploy testbed 1 l First M/W delivery (GDMP, first workload management system, fabric management tools, Globus installation, including certification and authorization, Condor tools) l First application test cases ready, long term cases defined l Integration team actively building Testbed 1 WP8 WP9 WP10 PierGiorgio Cerello Eric Van Herwijnen Julian Lindford Andrea Parrini Yannick Legre WP6 Brian Coghlan Flavia Donno Eric Fede Fabio Hernandez Nadia Lajili Charles Loomis Pietro Paolo Martucci Andrew McNab Sophie Nicoud Yannik Patois Anders Waananen WP1 WP2 WP3 WP4 WP5 WP7 Elisabetta Ronchieri Shahzad Muzaffar Alex Martin Maite Barroso Lopez Jean Philippe Baud Frank Bonnassieux
18
CERN November 2001 EU DataGrid 18 Collective Services Information & Monitoring Replica Manager Grid Scheduler Local Application Local Database Underlying Grid Services Computing Element Services Authorization Authentication and Accounting Replica Catalog Storage Element Services SQL Database Services Fabric services Configuration Management Configuration Management Node Installation & Management Node Installation & Management Monitoring and Fault Tolerance Monitoring and Fault Tolerance Resource Management Fabric Storage Management Fabric Storage Management Grid Fabric Local Computing Grid Grid Application Layer Data Management Job Management Metadata Management Object to File Mapping Service Index EU-DataGrid Architecture
19
CERN November 2001 EU DataGrid 19 Test bed 1 Approach l Software integration combines software from each middle-ware work package and underlying external tool kits (e.g. Globus) performed by integration team at CERN on a cluster of 10 Linux PCs l Basic integration tests performed by integration team to verify basic functionality l Validation tests application groups use testbed 1 to exercise their application software e.g. LHC experiments run jobs using their offline software suites on test- bed 1 sites
20
CERN November 2001 EU DataGrid 20 Detailed TestBed 1 Schedule l October 1: Intensive integration starts Based on Globus 2 l November 1: First beta release of DataGrid (CERN & Lyon) (depends on changes needed Globus 1->2) l November 15: Initial limited application testing finished DataGrid ready for deployment on partner sites (~5 sites) l November 30: Widespread deployment Code machines split for development Testbed 1 open to all applications (~40 sites) l December 29: WE ARE DONE!
21
CERN November 2001 EU DataGrid 21 TestBed 1 Sites l First round (15 Nov.) CERN, Lyon, RAL, Bologna l Second Round (30 Nov.) Netherlands: NIKHEF UK: See John Gordons talk Italy: 6-7 sites: Catania, Legnaro/Padova, Milan, Pisa, Rome, Turin, Cagliari? France: Ecole-Polytechnique Russia: Moscow Spain: Barcelona? Scandinavia: Lund? WP9 (GOME): ESA, KNMI, IPSL, ENEA
22
CERN November 2001 EU DataGrid 22 Licenses & Copyrights l Package Repository and web site Provides access to the packaged Globus, DataGrid and required external software All software is packaged as source and binary RPMs l Copyright Statement Copyright (c) 2001 EU DataGrid – see http://www.edg.org/license.html l License Will be the same (or very similar) to Globus license A BSD-style license which puts few restrictions on use l Condor-G (used by WP1) Not open source or redistributable Through special agreement, can redistribute within DataGrid l LCFG (used by WP4) Uses GPL
23
CERN November 2001 EU DataGrid 23 Security l The EDG software supports many Certification Authorities from the various partners involved in the project http://marianne.in2p3.fr/datagrid/ca/ca-table-ca.html but not Globus CA l For a machine to participate as a Testbed 1 resource all the CAs must be enabled. all CA certificates can be installed without compromising local site security l Each host running a Grid service needs to be able to authenticate users and other hosts site manager has full control over security for local nodes l Virtual Organisation represents a community of users 6 VOs for testbed 1: 4 HEP (ALICE, ATLAS, CMS, LHCb), 1 EO, 1 Biology
24
CERN November 2001 EU DataGrid 24 ldxprof Generic Component Generic Component rdxprof LCFG Components DBM File LCFG configuration files mkxprof Web Server XML Profile (one per client node) Server node HTTP Client nodes Node configuration and installation tools Node configuration tools For reference platform (Linux RedHat 6.2) Initial installation tool using system image cloning LCFG (Edinburgh University) for software updates and maintenance
25
CERN November 2001 EU DataGrid 25 Middleware components Job Description Language (JDL) script to describe the job parameters User Interface (UI) sends the job to the RB and receives the results Resource Broker (RB) locates and selects the target Computing Element (CE) Job Submission Service (JSS) submits the job to the target CE Logging and Book-keeping (L&B ) records job status information Grid Information Service (GIS) Information Index about state of Grid fabric Replica Catalog list of data sets and their duplicates held on Storage Elements (SE)
26
CERN November 2001 EU DataGrid 26 A Job Submission Example UI JDL Logging & Book-keeping Job Submit Event ResourceBroker Output Sandbox Input Sandbox Job Submission Service StorageElement ComputeElement Brokerinfo Output Sandbox Input SandboxInformationService Job Status ReplicaCatalogue
27
CERN November 2001 EU DataGrid 27 Iterative Releases l Planned intermediate release schedule TestBed1: October 2001 Release 1.1: January 2002 Release 1.2: March 2002 Release 1.3: May 2002 Release 1.4: July 2002 TestBed 2: September 2002 l Similar schedule will be organised for 2003 l Each release includes feedback from use of previous release by application groups planned improvements/extension by middle-ware WPs use of software infrastructure feeds into architecture group
28
CERN November 2001 EU DataGrid 28 Software Infrastructure l Toolset for aiding the development & integration of middle-ware code repositories (CVS) browsing tools (CVSweb) build tools (autoconf, make etc.) document builders (doxygen) coding standards and check tools (e.g. CodeChecker) nightly builds l Guidelines, examples and documentation show the software developers how to use the toolset l Development facility test environment for software (small set of PCs in a few partner sites) l Provided and managed by WP6 setting-up toolset and organising development facility
29
CERN November 2001 EU DataGrid 29 Future Plans l Tighter connection to applications principal architects l Closer integration of the software components l Improve software infrastructure toolset and test suites l Evolve architecture on the basis of TestBed results l Enhance synergy with US via DataTAG-iVDGL and InterGrid l Promote early standards adoption with participation to GGF WGs l First project EU review end of February 2002 l Final software release by end of 2003
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.