Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of.

Slides:



Advertisements
Similar presentations
An open source approach for grids Bob Jones CERN EU DataGrid Project Deputy Project Leader EU EGEE Designated Technical Director
Advertisements

QCDgrid User Interfaces James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
UKQCD GridPP NeSCAC Irving, 4/2/041 9 th GridPP Collaboration Meeting QCDgrid: Status and Future Alan Irving University of Liverpool.
Stephen Burke - WP8 Status - 14/2/2002 Partner Logo WP8 Status Stephen Burke, PPARC/RAL.
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
HPCx Power for the Grid Dr Alan D Simpson HPCx Project Director EPCC Technical Director.
Data Management Expert Panel - WP2. WP2 Overview.
Andrew McNab - Manchester HEP - 2 May 2002 Testbed and Authorisation EU DataGrid Testbed 1 Job Lifecycle Software releases Authorisation at your site Grid/Web.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
Holding slide prior to starting show. Supporting Collaborative Working of Construction Industry Consortia via the Grid - P. Burnap, L. Joita, J.S. Pahwa,
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
ILDG5QCDgrid1 QCDgrid status report UKQCD data grid Chris Maynard.
QCDgrid UKQCD Achievements and Future Priorities Who and what Achievements QCDgrid middleware Future priorities Demo of meta-data catalogue browser Alan.
10/20/05 LIGO Scientific Collaboration 1 LIGO Data Grid: Making it Go Scott Koranda University of Wisconsin-Milwaukee.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Using ICENI to run parameter sweep applications across multiple Grid resources Murtaza Gulamali Stephen McGough, Steven Newhouse, John Darlington London.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
1 All-Hands Meeting 2-4 th Sept 2003 e-Science Centre The Data Portal Glen Drinkwater.
UKQCD QCDgrid Richard Kenway. UKQCD Nov 2001QCDgrid2 why build a QCD grid? the computational problem is too big for current computers –configuration generation.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
WP8 Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid WP8 Meeting, 16th November 2000 Glenn Patrick (RAL)
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The GILDA training infrastructure.
Metadata Mòrag Burgon-Lyon University of Glasgow.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
UKQCD Grid Status Report GridPP 13 th Collaboration Meeting Durham, 4th—6th July 2005 Dr George Beckett Project Manager, EPCC +44.
A GRID solution for Gravitational Waves Signal Analysis from Coalescing Binaries: preliminary algorithms and tests F. Acernese 1,2, F. Barone 2,3, R. De.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
1 Grid Activity Summary » Grid Testbed » CFD Application » Virtualization » Information Grid » Grid CA.
A QCD Grid: 5 Easy Pieces? Richard Kenway University of Edinburgh.
Enabling Grids for E-sciencE Software installation and setup Viet Tran Institute of Informatics Slovakia.
1 AHM, 2–4 Sept 2003 e-Science Centre GRID Authorization Framework for CCLRC Data Portal Ananta Manandhar.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
May 2005 PPARC e-Science PG School1 QCDgrid Chris Maynard A Grid for UKQCD National collaboration for lattice QCD.
The National Grid Service Mike Mineter.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Bob Jones EGEE Technical Director
Accessing the VI-SEEM infrastructure
Data Management Components for a Research Data Archive
Presentation transcript:

Setting up a Pan-European Datagrid using QCDgrid technology Chris Johnson, James Perry, Lorna Smith and Jean-Christophe Desplat EPCC, The University Of Edinburgh The ENACTS “Demonstrator”

This Talk A summary of the title: –ENACTS Demonstrator –Pan-European Datagrid –QCDgrid.

ENACTS European Network for Advanced Computing Technology for Science. EC-funded project with 14 members. Started in Attempt to ensure that Europe did not lag behind US in grid technology. ENACTS originally consisted of many reports reports with little technical work.

14 ENACTS partners Please see:

ENACTS Demonstrator: Partners involved EPCC, Edinburgh, UK –Chris Johnson –Jean-Christophe Desplat –James Perry Parallab, Bergen, Norway –Jacko Koster –Jan-Frode Myklebust –Csaba Anderlik TCD, Dublin, Ireland –Geoff Bradley –Bob Crosbie.

ENACTS Demonstrator… Objective –“To enable the formation of a pan-European HPC metacentre…”. The Demonstrator is part of Phase II of the activity and its specific objective is –“to draw together the results from all of the Phase I technology studies and evaluate their practical consequences for operating a pan- European metacentre and constructing a best- practice model for collaborative working amongst individual facilities”.

ENACTS Demonstrator ENACTS Phase I –consisted mainly of reports ENACTS Phase II –contained the Demonstrator activity Phase I identified technologies such as –Globus, replica management, LDAP database and XML metadata. All these technologies are inherent in the QCDgrid system.

metacentre A “virtual organisation” with data described by metadata. –Users submit data from any site –The data is stored on “the grid” –All data is stored reliably –All data is easy to retrieve.

Our Demonstrator Set-up QCDgrid across the 3-sites to create our metacentre. Use a genuine scientific scenario. Use an XML schema for meta-data. Ensure the data is portable between the systems involved.

Summary so far… ENACTS demonstrator project is an EC funded project involving 3 partners attempting to set up a pan- European metadata centre using QCDgrid technology.

What is QCDgrid? It’s not QCD-specific!! QCDgrid was written to manage the QCD data belonging to the UK QCD community (UKQCD) –2 previous All-Hands talks (James Perry,EPCC). The original grid consisted of 6 geographically dispersed sites. Around 5 terabytes of data. The amount of data is expected to grow dramatically when QCDOC comes online later in 2004.

What is QCDgrid? QCDgrid is a layer of software written on top of the Globus Toolkit. –Uses security infrastructure and basic grid operations such as data transfer –also uses more advanced features such as the replica catalogue.

How does QCDgrid work? A control thread runs on one storage element –constantly scans grid –ensures all storage elements are working –ensures all files are stored in at least 2 suitable locations. When a new file is added it is rapidly replicated across grid onto 2 or more geographically separate sites.

QCDgrid:Dealing with node loss If a storage element is lost unexpectedly, all files that were held on the failed system are replicated elsewhere. QCDgrid can cope with loss of entire site. If the control node is lost – control reverts to a secondary node.

How is QCDgrid used? Assuming QCDgrid is set up and the control thread and metadata database are running… and that each user has a valid certificate… User submits the usual initialisation commands –grid-proxy-init –source the correct set up files –sets a few paths, classpaths, etc.

Submitting files Command line User submits a file (datafile.dat) –put-file-on-qcdgrid datafile.dat AND an accompanying metadata file which describes the above data file –put-file-on-qcdgrid datafile.xml –exist:/db>put datafile.xml

Submitting files (better still…) Using the GUI User runs the Java GUI and submits both data file and metadata file at the same time. metadata file has a tag for the name of the corresponding data file –marries the two –every data file should have an associated metadata file.

Metadata Browser Can submit, search and retrieve data using this Java browser.

Metadata/Datagrid Integration QCDgrid software deals with storage and replication of data. eXist database deals with cataloging of data using metadata. All can be controlled using command line or GUI.

More on QCDgrid Other commands available with QCDgrid –qcdgrid-list lists all files on grid. –get-file-from-qcdgrid retrieves files from grid. –i-like-this-file attempts to store a file local to the user. There are also several commands for administering nodes, etc.

QCDgrid Sites ? QCDgrid ENACTS ?

Our depoyment of QCDgrid UK (all using e-science CA) -> Europe (all using different CAs). Moving from a homogenous Linux environment to a mixed one (Linux/Solaris). Moving from Globus Toolkit (GT) > GT2.4.

How difficult was it? Certificates –Some certificate issuers took several weeks to issue certificates. –Different policies on issuing certificates, e.g. non-human users (project accounts). –Not too many difficulties using multiple certificates.

How difficult was it? Moving to a heterogeneous environment. –Installing of Globus 2.x is difficult on Solaris – led to the Solaris node being unable to submit data. A few minor problems getting system specific functions to work (e.g. df command). Usual minor compilation issues – did require gcc compiler.

How difficult was it? Globus –This presented the biggest difficulty! –Installation difficulties and firewall issues several months before a “helloworld” job would run from any site to any other. –Migrating from GT 2.0 -> GT 2.4 Major difficulties! Had to re-write the replica schema. Remove some error-handling functionality.

Users – Scientific scenario Appealed to QCD –Given more time we could have found a different discipline. Two users from TCD as well as those involved in the project itself. Code used was MILC code. Monte-Carlo simulation to investigate string-breaking. Three Monte-Carlo chains, one on each node.

User feedback Generally impressed with functionality. Some frustration in getting certificates. Difficulty persuading users to use metadata, although agreed it was useful. Would make more use of file-sharing using such a system. Liked machine-independent data. Wanted grid to do job submission.

Conclusions We have created a pan-European datagrid (metacentre) using QCDgrid technology. The systems works well… …Globus is the limiting factor. Users were impressed with the system in use. The system was not tested with many users but we can see no reason why it would not scale to many users/nodes if Globus allows.

Acknowledgements James Perry. Jean-Christophe Desplat, Jacko Koster, Jan-Frode Myklebust, Csaba Anderlik, Geoff Bradley, Bob Crosbie. Craig McNeile and Bálint Joó. Mike Peardon and Jimmy Juge.

References ENACTS – QCDgrid – –code: MILC code –