Astrophysics on the OSG (LIGO, SDSS, DES) Astrophysics on the OSG (LIGO, SDSS, DES) Kent Blackburn LIGO Laboratory California Institute of Technology Open Science Grid Consortium Meeting University of Florida January 23, 2006
Outline and Contributors LIGO on the OSG LIGO on the OSG Kent Blackburn, Duncan Brown, Albert Lazzarini, David Meyers SDSS, NEO & DES on the OSG SDSS, NEO & DES on the OSG Nickolai Kuropatkin, Neha Sharma, Chris Stoughton, James Annis, Steve Kent
Gravitational Wave Physics on the OSG Laser Interferometer Gravitational wave Observatory (LIGO) LIGO Scientific Collaboration (LSC)
LIGO on the Open Science Grid Search for Gravitaitional Waves Hanford, WA Livingston, LA Plus GEO, TAMA and VIRGO LIGO Scientific Collaboration ~ 40 Institutions worldwide ~ 400 individuals contributing LIGO Data Grid (LDG) Nine Grid Sites Over 2000 CPUs Multi-Petabyte Data Archive at Caltech Scientific Data Collection grouped into temporal “Science Runs” Currently In Science Run 5 Goal to collect one year “plus” of design sensitivity data One Terabyte of data each day Analysis carried out primarily on the LIGO Data Grid (LDG) Stepping out onto the OSG
LIGO Data Analysis Classifications Principle Classifications of Searches Binary Inspiral (Neutron Stars & Black Holes) Consumes bulk of LIGO Data Grid resources Burst (Supernovae and other Unmodeled Events) Coincidence between different data streams necessary Stochastic Background (Similar to the CMB) Computationally least demanding but requires cross correlation Periodic (Pulsars, Rotating Neutron Stars) Signal sinusoidal in reference frame of source All Sky Survey could promote Global Warming (Order FLOPS) Binary Inspiral Search selected for initial adoption onto the OSG Workflow well suited to Open Science Grid Already using a similar set of Grid Technologies within LIGO Data Grid Simple parametric parallelization of algorithms Optimal filtering of data against tens of thousands of waveforms Computationally demanding but interesting on the scale of the OSG Expect other searches to follow once OSG trailblazing work done
Binary Inspiral Search Experiences on the Open Science Grid First attempt at July, 2005 OSG Consortium Meeting in Milwaukee, Wisconsin Unsuccessful at submitting a binary inspiral workflow at any OSG site Authentication was primary reason for failures (LIGO VO not part of 0.2.1) Other issues discovered with the version of VDS distributed in First successful completion of a binary inspiral workflow October 1 st, 2005 on LIGO’s OSG Integration Testbed Cluster at Caltech Eight Node Dual CPU cluster with two terabytes of disk space Running a “patched” version of VDS on top of OSG Used a test workflow involved ~38 GBs of LIGO Data and workflows with about 700 DAG nodes. Followed up by running at LIGO’s OSG Productions sites at PSU(PBS) and UWM(Condor) (once VDS patch applied at each) Collaborated with several CMS resources to further test outside LIGO’s VO Worked with clusters at San Diego, Nebraska and Caltech All clusters added LIGO’s VOMS to allow authentication Updated OSG with VDS patches Mixed results do to size of LIGO data sets transferred for this test workflow Worked with Deployment and Integration Teams to assure LIGO’s functional requirements appeared in the OSG 0.4 software stack (just announced!)
Greatly Simplified LIGO DAG
LIGO’s Next Move on the OSG The OSG release should greatly improve the OSG for LIGO’s Binary Inspiral Workflow A workflow geared toward actually conducting a scientific study would involve at least DAG nodes and close to two terabytes of data. Recent OSG motivated activities in LIGO have produced a nearly 10:1 reduction is data through improved data selection and compression Need to develop more flexible workflows that don’t challenge the limited data storage resources typical of a present day OSG site Pegasus is used to construct concrete DAGS from abstract DAX workflows Flexibility here to recognize and adapt to OSG site specifics could facilitate greater utilization of the OSG as an abstract “Grid” Develop ability to benefit from Storage Resource Management Typical LIGO data analyses benefit from being able to repeat the analysis on the same data set with improved calibration and selection criteria LIGO is currently bringing up an SE on our local ITB cluster at Caltech to experiment with SRM
Astronomy on the OSG Sloan Digital Sky Survey (SDSS) Experimental Astronomy Group (EAG) Fermi National Accelerator Laboratory
Near Earth Objects Near Earth Objects (NEOs) Comets and Asteroids nudged by the gravitational attraction of planets into orbits that pass by the Earth's neighborhood Composed of water ice and dust, formed early in the history of the Solar System The scientific interest in comets and asteroids is due to their being remnants of the early solar system ; the interest in NEO is their potential for hitting the earth… 37 Near Earth Object candidates are identified in the SDSS imaging data Apparent magnitudes r=19 – 21 and proper motions of 1.3 to 18 degrees per day The earth collision rate for this population (size greater than 20 m) is estimated to be one per century
How to find Near Earth Objects
NEO Workflow
NEO Job Statistics Total Jobs 180 Total Input Data 9*180=1620 GB Total Output Data12*180=2160 K
Quasar Spectra Fitting using SDSS Quasars are super massive black holes. Swirling clouds of gas and plasma falling into a black hole glowing at many different wavelengths. We measure the spectrum of the light to measure the properties of each quasar. The SDSS provides us with 50,000 quasar spectra. We make fits to these spectra that include the following components: Power-law continuum, decreasing as e - A Balmer continuum due to ionized Hydrogen, with a characteristic bump from 2000 to 4000 Angstroms Strong emission lines from ionized gas, such as Hydrogen, Nitrogen, Oxygen, and Magnesium Many faint emission lines from Iron Starlight from the galaxy that surrounds the quasar
Example Quasar Spectrum with Fit
Quasar Fit Production: Science using the Generic Grid Gofer (GGG) All jobs are stored in “jobs” table. Available grid sites are stored in “pool” table Job Manager takes jobs from the database, creates Condor DAG files and submits them to sites from the pool in an automatic mode. Two main parts – Job Manager and DAG Creator All completed stages of a job are recorded in the database together with submission time and execution time
Workflow in Generic Grid Gofer Nickolai Kuropatkin
Astronomy Experiences on the Grid Experience tells us that Grid is more suitable for CPU Intensive Jobs … achieve parallelism … more jobs… finish sooner Running locally would limit the number of jobs run simultaneously On OSG, can run several run-rerun and camcols within a run-rerun in parallel Current Workflow also will facilitate further analysis Spectra CPU Intensive NEO Data&CPU Intensive Grid MatchIdeal for Grid Grid not very happy Total No. of Jobs ~ Data Input/Job 1 Megabytes 9 Gigabytes Data Output/Job 2 Megabytes 12 Kilobytes Avg. Rate of Job Completion per day per day ?
Future Grid Projects in Astronomy In the coming year Experimental Astrophysics Group ( EAG) has 4 projects planned for the Open Science Grid: The Simulation effort for the Dark Energy Survey (DES) Genetic algorithm fitting of Sloan Digital Sky Survey (SDSS) Quasar Spectra Search for Near Earth Asteroids (NEOs) in the SDSS Imaging data The Co-addition of the SDSS Southern Stripe