Download presentation
Presentation is loading. Please wait.
Published byAnnabella Fitzgerald Modified over 9 years ago
2
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University of Sheffield
3
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield ‘Other’ Experiments Representing non-LHC and non-US-collider experiments. Includes ANTARES, MINOS and UKDMC. In general such experiments have few resources to devote to exclusively Grid activities (although much effort targeted at e- Science related issues, e.g. analysis code development). At present analysis of data predominantly carried out locally or at central facilities - no requirement as yet to move to large- scale distributed data processing. That said ….. keen interest exists in testing / making use of Grid tools if will improve data handling within existing analysis frameworks. Situation likely to change in next few years given larger data- rates / mass uptake of Grid technology by central facilities.
4
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Application 1: Transfer of Data Between Mass Storage Facilities = Experiments in general need to transfer large volumes of data quickly and conveniently between physically separated sites. = Sites may or may not possess high-speed network connections (e.g. RAL and Boulby Mine). = In former case a grid-based transfer protocol may be appropriate - In the latter data needs to be transferred by some means other than the network. = This problem is common to many HEP experiments and mirrors that faced by the LHC experiments. = It is hoped that common solutions can be found (e.g. using EDG testbed components such as GridFTP and WP5 protocols).
5
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield 1.Large subsets of data held in data storage facilities at collaborating institutes. 2.User wishes to transfer large subsets of this data (not necessarily co-located) to cpu location (datastore) prior to analysis. 3.User logs onto local machine. 4.User accesses collaboration-wide web-page providing front-end to generic data discovery and transfer tool. 5.User logs onto site (password required – automatic authentication is not required at this stage). 6.Software presents query form to user. 7.User specifies datasets of interest by ‘run’ properties (e.g. ‘I wish to download all calibration data taken between 01/06/02 and 01/07/02 by detector XXX’). Specification by run number or (if necessary file name also possible. 8.Software accesses collaboration metadata catalogue to match query to file-names. Metadata catalogue probably updated manually in the first instance as part of data book keeping process. Entries (in plain-text format) give e.g. run type, run number, start time, end time for each file. Example Use Case (Data Discovery and Transfer)
6
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield 9.Software queries replica catalogue to discover location of required files. 10.Software starts up transfer protocol (e.g. GridFTP). 11.Software initiates FTP-like connection between source site(s) and destination site (not necessarily local to user). Source and destination sites must be members of list of ‘approved’ collaboration datastores (I.e. not possible to transfer data to arbitrary location – security issues). 12.Software ‘gets’ files efficiently reliably and securely from source(s) to destination. 13.Software notifies user of status of transfer via front-end (e.g. total data volume, total volume transferred, volume remaining, estimated time required, time taken, estimated time remaining, current mean transfer rate). 14.Software notifies user if faults occur: keeps trying until time-out, then returns to user with meaningful error message (i.e. suspected reason for error) if still failing. Must permit automatic partial transfer if faults only occur for certain files / locations (i.e. fully transferred files remain, partially transferred files deleted). 15.Software updates replica catalogue and transfer log file. 16.Software notifies user when transfer complete. Example Use Case (Data Discovery and Transfer)
7
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Data Transfer Requirements 1.MSM software should be supplied capable of transferring certain specified data sets, but not others, onto specific physical tapes. Specification must be possible on basis of file metadata as well as physical filename. 2.MSM software should decide which tapes are most suitable for this purpose on the basis of time taken to prepare tapes and/or total number of tapes required. 3.A common translation module for file metadata such that content, format and status of given transfer tapes can be assessed automatically by any specified system (there may be more than one) and used to position to and read files or segments of files from those tapes. 4.A simple-to-use ftp-like protocol with web-based front-end suitable for reliable, transparent, efficient and secure transfer of large datasets between multiple specified collaboration sites. Software must discover names and locations of files from specified run metadata using metadata and replica catalogues.
8
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Collaborative Project: Generic Data Discovery and Transfer Tool = Last requirement (generic data discovery and transfer tool) is common to several experiments (ANTARES, UKDMC and MINOS). = Therefore hoped that a common solution can be found. = Generic nature of the requirement suggests that solution will also be of interest to other groups, in particular UKQCD. = 'Mainstream' experiments (e.g. BaBar and LHC collaborations) have similar data transfer requirements, so the tool may be of further interest here. = Have therefore proposed a collaborative project between several experiments, including ANTARES, UKDMC and MINOS and possibly also UKQCD, BaBar and others. = The project will deliver, on a 1-2 year timescale, a fully functioning web-based data discovery and transfer tool providing an automated interface to appropriate grid applications (metadata and replica cataloguing and file transfer services).
9
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Application 2: Remote Control of Underground Experiments = Novel application distinct from others proposed within GridPP. = UKDMC and MINOS identified need for remote access to and control of underground experiments. = Involves remote configuring, monitoring and debugging of DAQ code (possibly also remote high-level trigger for low background experiments). = Methodology is similar to that suggested for a Global Accelerator Network for running the next generation of colliders. = There may also be commonality with grid based remote control applications specified by AstroGrid and other collaborations.
10
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Application 3: Fast Access to Remote Data Sets = Simple grid-like application identified by MINOS. = Would like to perform interactive ROOT analyses in UK on selected data sets held at location(s) in US. = Would involve accessing and merging data from remotely held files => already possible in PAW (Manchester), ROOT also? = Also desire to perform batch reduction in UK on remotely held files, possibly using grid-open type command => AFS alternative?
11
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Summary =The 'Other' Experiments are keen to make full use ASAP of tools provided by GridPP and other initiatives in order to simplify existing analysis procedures. =Interested in developing full grid-based analyses in longer term (> 2 years). =We want to learn to walk (before we can run)!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.