All-sky search for continuous gravitational waves: tests in a grid environment Cristiano Palomba INFN Roma1 Plan of the talk: Computational issues Computing model Introduction to grid computing All-sky search Tests of the all-sky Hough Transform on the grid Conclusions and next steps Mathematics of Gravitation II Warsaw, International Banach Center, 1 st -10 th September 2003
Computational issues The full-sky search for periodic sources of gravitational waves is computationally very demanding and needs large (distributed) computational resources. We want to explore a portion of the parameter space as large as possible. For a full-sky hierarchical search (i.e. source position completely unknown), with signal frequency up to 2kHz and a minimum source decay time t~10^4 years a computing power of the order, at least, of 1Tflop is required (see Sergio Frasca’s talk). Low granularity: the whole data analysis problem can be divided in several smaller and independent tasks, for instance dividing the frequency band we want to explore in small subsets.
Computing model The analysis fits well in a distributed computing environment. detector data Computing environment events We are following two approaches: Standard master/slaves farm: based on PBS, managed by a Supervisor Grid computing: geographically distributed farms, job distribution managed by a Resource Broker
Introduction to grid computing A grid is a collaborative set of computing, data storage and network resources, belonging to different administrative domains. Mini Computer Microcomputer Cluster mainframe The classical view…… It enables the coordinated and coherent use of largely distributed resources in a complete transparent way for the user. (by Christophe Jacquet) … and the new one…..
Introduction to grid computing: some of the major projects European DataGrid (EDG): LHC Computing Grid (LCG): … - cern.ch/lcg CrossGrid: – DataTAG: – GriPhyN – PPDG – iVDGL – TERAGRID (NSF) –
Introduction to grid computing: EDG (I) Purpose: to build on the emerging grid technologies to develop a sustainable computing model for effective share of computing resources and data Main partners: CERN - International (Switzerland/France) CNRS – France ESA/ESRIN – International (Italy) INFN – Italy NIKHEF – The Netherlands PPARC - UK
Introduction to grid computing: EDG (II) Assistant partners Industrial Partners Datamat (Italy) IBM-UK (UK) CS-SI (France) Research and Academic Institutes CESNET (Czech Republic) Commissariat à l'énergie atomique (CEA) – France Computer and Automation Research Institute, Hungarian Academy of Sciences (MTA SZTAKI) Consiglio Nazionale delle Ricerche (Italy) Helsinki Institute of Physics – Finland Institut de Fisica d'Altes Energies (IFAE) - Spain Istituto Trentino di Cultura (IRST) – Italy Konrad-Zuse-Zentrum für Informationstechnik Berlin - Germany Royal Netherlands Meteorological Institute (KNMI) Ruprecht-Karls-Universität Heidelberg - Germany Stichting Academisch Rekencentrum Amsterdam (SARA) – Netherlands Swedish Research Council - Sweden
Introduction to grid computing: EDG basics (I) Request Result Request Data Client (User Interface) Application Server (Computing Element: Gatekeeper+ Worker Nodes) Data Server (Storage Element) Basic Services: Resource Broker, Information Service, Replica Catalog
OS & Net services Applications Introduction to grid computing: EDG basics (III) Basic Services High level Grid Middleware Fabric management: farm installation, configuration, management, monitoring, ‘gridification’,… Based on Globus 2.0 Authentication, authorization: based on X.509 public key certificates. gridFTP: tool for secure and efficient file transfer. Replica Catalog: resolution of logical file names into physical file names. MDS: publish information about grid resources. Condor-G: job scheduler Resource Broker: matchmaking between job requirements and available resources. Job submission system: wrapper for Condor-G; uses JDl scripts. Information Index: collects information concerning grid resources; read by the RB. Logging & Bookkeeping: stores information about the status and history of submitted jobs Replica Manager, GDMP: tools for creation and management of file replicas Virgo Data Analysis Software
Gatekeeper Worker Node 1 Worker Node 2 Worker Node 3 Storage Element Gatekeeper Worker Node 1 Gatekeeper Worker Node 1 Introduction to grid computing: Job submission mechanism User Interface PBS Resource Broker I I IS OS L&B
All-sky search: all-sky Hough Transform It is the first incoherent step in the hierarchical procedure for the search of periodic signals. It is the ‘heaviest’ step, because it works on the whole parameter space to be explored. It has been implemented using look-up tables (LUT). This has shown to be the most efficient way. A LUT is a C array containing the coordinates of all the points of all the circles that can be drawn for a given source frequency. Several simmetries can be exploited when building a LUT. In particular, a LUT can be used for a range of frequencies of the order, at least, of the Doppler band for the initial one. The time needed to build the LUT is negligible respect to the time needed for building the Hough Maps.
All-sky search: all-sky Hough Transform (II)
All-sky search: implementation on the grid 1.The f-t peak-map, containing the list of peaks for each time slice, is produced from the SFDB; 2. The peak-map is split in several frequency sub-bands, depending on the number of available nodes; each sub-band will be processed by one Worker Node; E.g.: assuming we have 100 nodes, the band Hz could be divided in 1000 sub-bands, 1.5Hz each each processor will process 5 of these 3. These input files are distributed among the Storage Elements 4.Several ‘packets’ of jobs are submitted from the User Interface: each packet will be executed on a Worker Node using one of the above input files; Each packet will consist of about 1500 jobs 4a. Each job extracts from the input file a Doppler band around the current reference frequency and calculates the HT; Each job will last about 30 min (in case of crash only the last short job will be restarted); The LUT will be re-calculated about 8 times for each input file
5. The candidates (position, frequency and spin-down values) are stored in a file; the files are replicated among Storage Elements; About 10^9 candidates will be selected after the first incoherent step All-sky search: implementation on the grid (II) 6. Steps 1-5 are repeated for the second observation period All subsequent steps do not need a distributed computing environment
All-sky search: Test of the Hough Transform on the Grid Three sites involved. ROMA NAPOLI BOLOGNA 26 processors used.
All-sky search: Test of the Hough Transform on the Grid - description test-1: intensive job submission Hundreds of jobs are submitted sequentially on the grid Targets: - measure the performances (time needed to complete the jobs) - measure the reliability (fraction of dead jobs) - measure the overhead time due to the RB activity - check the homogeneity in the distribution of jobs test-2: submission of packets of jobs Several ‘packets of jobs’ are submitted on the grid Targets: - measure the performances (expected lower overhead?) - measure the reliability (expected lower fraction of dead jobs) Each job calculates the Hough Transform for a single initial source frequency.
All-sky search: Test of the Hough Transform on the Grid - results Death rate ~ 3/1000 Death rate < 1/1000
Conclusions Emerging grid technologies appear to be very effective in data analysis for the search of periodic gravitational signals. Our data analysis method can be very easily adapted to work on a computing grid. Other kinds of search for gravitational signals can benefit from grid (e.g. coalescing binaries). Soon we will pass from tests to production. Grid software is more and more robust and reliable.
The European DataGrid project (EDG) is a EU funded project which has been started mainly to answer the computing needs of future LHC experiments. It has produced since now several software releases, which enable the ‘gridification” of a local cluster of computers. DIFFERENZE RISPETTO, PER ES., A CONDOR, SETI ECC.
Introduction to grid computing: EDG basics (II) Different kind of services: Information Service: gives information about the availability of services on the grid (e.g. X cpu and Y GB of disk space are available at a given site) Authentication & Authorization: allow a user to log on the grid and use its resources Job submission service: responsible for the job submission according to the user needs/requests and the available resources Replica Management: responsible for the management of file replicas Logging & Bookkeeping: