Grist: Grid Data Mining for Astronomy Joseph Jacob, Daniel Katz, Craig Miller, and Harshpreet Walia (JPL) Roy Williams, Matthew Graham, and Michael Aivazis.

Slides:



Advertisements
Similar presentations
Trying to Use Databases for Science Jim Gray Microsoft Research
Advertisements

3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.

September 13, 2004NVO Summer School1 VO Protocols Overview Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
September 13, 2004NVO Summer School1 VO Protocols Overview Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
The Australian Virtual Observatory e-Science Meeting School of Physics, March 2003 David Barnes.
Grid Astronomy with Image Federation Roy Williams Michael Feldmann California Institute of Technology.
CASDA Virtual Observatory CSIRO ASTRONOMY AND SPACE SCIENCE Arkadi Kosmynin 11 March 2014.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Virtual Observatory Architecture Data Services Registry Services Compute Services Roy Williams Caltech US-VO co-director.
Science Applications of Montage: An Astronomical Image Mosaic Engine ESTO G. Bruce Berriman, John Good and Anastasia Laity.
The Montage Image Mosaic Service: Custom Mosaics on Demand ESTO John Good, Bruce Berriman Mihseh Kong and Anastasia Laity IPAC, Caltech
Montage: Experiences in Astronomical Image Mosaicking on the TeraGrid ESTO.
Leicester Database & Archive Service J. D. Law-Green, J. P. Osborne, R. S. Warwick X-Ray & Observational Astronomy Group, University of Leicester What.
An Astronomical Image Mosaic Service for the National Virtual Observatory
A Grid-Enabled Engine for Delivering Custom Science- Grade Images on Demand
An Astronomical Image Mosaic Service for the National Virtual Observatory / ESTO.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
CREATING A MULTI-WAVELENGTH GALACTIC PLANE ATLAS WITH AMAZON WEB SERVICES G. Bruce Berriman, John Good IPAC, California Institute of Technolog y Ewa Deelman,
Why Build Image Mosaics for Wide Area Surveys? An All-Sky 2MASS Mosaic Constructed on the TeraGrid A. C. Laity, G. B. Berriman, J. C. Good (IPAC, Caltech);
Supported by the National Science Foundation’s Information Technology Research Program under Cooperative Agreement AST with The Johns Hopkins University.
The Japanese Virtual Observatory (JVO) Yuji Shirasaki National Astronomical Observatory of Japan.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Scaling NVO Services to the Teragrid Roy Williams Conrad Steenberg Craig Miller Matthew Graham Joe Jacob Julian Bunn.
WSRF Supported Data Access Service (VO-DAS)‏ Chao Liu, Haijun Tian, Dan Gao, Yang Yang, Yong Lu China-VO National Astronomical Observatories, CAS, China.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
Master Thesis Defense Jan Fiedler 04/17/98
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Science with the Virtual Observatory Brian R. Kent NRAO.
Prototype system of the Japanese Virtual Observatory The Japanese Virtual Observatory (JVO) aims at providing easy access to federated astronomical databases.
Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data.
JVO JVO Portal Japanese Virtual Observatory (JVO) Prototype 2 Masahiro Tanaka, Yuji Shirasaki, Satoshi Honda, Yoshihiko Mizumoto, Masatoshi Ohishi (NAOJ),
Asynchronous services from NVO Roy Williams Conrad Steenberg Craig Miller Matthew Graham Joe Jacob Julian Bunn.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
Wiss. Beirat AIP, ClusterFinder & VO-Methods H. Enke German Astrophysical Virtual Observatory ClusterFinder VO Methods for Astronomical Applications.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
EScience May 2007 From Photons to Petabytes: Astronomy in the Era of Large Scale Surveys and Virtual Observatories R. Chris Smith NOAO/CTIO, LSST.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Some Grid Science California Institute of Technology Roy Williams Paul Messina Grids and Virtual Observatory Grids and and LIGO.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Hyperatlas Coregistered Federated Imagery Roy Williams Bruce Berriman George Djorgovski John Good Reagan Moore Caltech CACR Caltech IPAC Caltech Astronomy.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Virtual Observatories, Astronomical Telescopes and Instrumentation (August 2002)1 An Architecture for Access To A Compute Intensive Image Mosaic Service.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
Web based spectrum databases and utilities László Dobos Tamás Budavári István Csabai MAGPOP kick-off meeting, January Cassis.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
VO Data Access Layer IVOA Cambridge, UK 12 May 2003 Doug Tody, NRAO.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Joseph JaJa, Mike Smorul, and Sangchul Song
Montage: An On-Demand Image Mosaic Service for the NVO
Google Sky.
Presentation transcript:

Grist: Grid Data Mining for Astronomy Joseph Jacob, Daniel Katz, Craig Miller, and Harshpreet Walia (JPL) Roy Williams, Matthew Graham, and Michael Aivazis (Caltech CACR) George Djorgovski and Ashish Mahabal (Caltech Astronomy) Robert Nichol and Dan Vandenberk (CMU) Jogesh Babu (Penn State) Pasadena, CA, July 12-15, 2004

2 Grist Relation to Other Projects Grist NSF TeraGrid NASA Information Power Grid NSF Virtual Observatory NSF Data Mining

3 Motivation Massive, complex, distributed datasets Exponential growth in data volume; In astronomy the data volume now doubles every 18 months Multi-terabyte surveys; Petabytes on horizon Need to support multi-wavelength science Distributed data, computers, and expertise Need to enable domain experts to deploy services Need framework for interoperability

4 Selected Image Archives IRAS1 GB1 arcminAll Sky4 Infrared Bands DPOSS4 TB1 arcsecAll Northern Sky1 Near-IR, 2 Visible Bands 2MASS10 TB1 arcsecAll Sky3 Near-Infrared Bands SDSS-DR25 TB0.4 arcsec 3,324 square degrees (16% of Northern Sky) 1 Near-IR, 4 Visible Bands IRAS2MASS DPOSS SDSS IRAS SDSS-DR2 DPOSS 2MASS Total Image Size Comparison Wavelength Comparison

5 Grist Objectives Establish a service framework for astronomy Comply with grid and web services standards Comply with NVO standards Deploy a collection of useful algorithms as services within this framework Data access, data mining, statistics, visualization, utilities Organize these services into a workflow Controllable from a remote graphical user interface Virtual data: pre-computed vs. dynamically-computed data products Science Palomar-Quest exploration of time-variable sky to search for new classes of transients Quasar search Outreach “Hyperatlas” federation of multi-wavelength imagery: Quest, DPOSS, 2MASS, SDSS, FIRST, etc. Multi-wavelength images served via web portal Building a Grid and Web-services architecture for astronomical image processing and data mining.

6 Approach Building services that are useful to astronomers Implementing NVO protocols on TeraGrid Processing and exposing data from a real sky survey (Palomar-Quest) Workflow of SOAP-based Grid services with GUI manager

7 Broader Impact Connecting the Grid and Astronomy communities Federation of big data in science through distributed services Managing a workflow of services

8 The Old Way: Developer sells or gives away software to clients, who download, port, compile and run it on their own machines. Grid Services provide more flexibility. Components can: Remain on a server controlled by the authors, so the software is always the latest version. Remain on a server close to the data source for efficiency. Be on a machine owned by the client, so the client controls level of service for themselves. Why Grid Services?

9 Grid Paradigms Explored in Grist Services replace programs Separation of control from data flow Separation of metadata from data Database records replace files Streams replace files

10 Grist Services Data Access (via NVO) Data Federation (Images and Catalogs) Data Mining (K-Means Clustering, PCA, SOM, anomaly detection, search, etc.) Source Extraction (Sextractor) Image Subsetting Image Mosaicking (yourSky, Montage, SWarp) Atlasmaker (Virtual Data) WCS transformations (xy2sky, sky2xy) Density Estimation (KDE) Statistics (R - VOStatistics) Utilities (Catalog and image manipulation, etc.) Visualization (Scatter plot, etc.) Computed on client Computed on server; image sent back to client

11 ~billion sources, ~hundred dimensions Example Workflow Scenario Dimensionality Reduction and Subsetting

12 Another Example Workflow Scenario Quasar and Transient Search

13 Quasar and Transient Search (cont.)

14 Workflow GUI Illustrations

15 Service Composition Palomar-Quest Survey Time-domain astronomy 50 Gbyte every clear night Yale Data exposure through NVO spigots Resampling Scaling Compression Federated images Browse, Education 2MASS (infrared) Sloan (optical) fusion Data Mining Fast transient search Stacked multi-temporal images for faint sources Density estimation Astronomy knowledge z >4 quasars Transients Source detection NCSA Caltech Control of services by workflow manager GUI

16 NVO Standards National Virtual Observatory (NVO) defines standards for astronomical data VOTable for catalogs Datacube for binary data 1D: time series, spectra 2D: images, frequency-time spectra 3D+: volume (voxel) datasets, hyper-spectral images. NVO defines standards for serving data Cone Search Simple Image Access Protocol (SIAP)

17 Grid Implementation Grid services standards are evolving. Fast! Infrastructure based on Globus Toolkit (GT2) Enables secure (user authenticated on all required resources with a single “certificate”) remote job execution and file transfers. Open Grid Services Architecture (OGSA/OGSI) combines web services and globus (GT3) WSRF: WS-Resource Framework (GT4) Grid Services converging with Web Services Standards XML: eXtensible Markup Language SOAP: Simple Object Access Protocol WSDL: Web Services Description Languages Therefore, as a start, Grist is implementing SOAP web services (Tomcat, Axis,.NET), while we track the new standards

18 Grist Technology Progression Standalone service Service factory - single service Control a single service factory from workflow manager Start, stop, query state, modify state, restart Service factory - multiple services Chaining multiple service factories Control multiple service workflow from workflow manager Handshaking mechanism for when services are connected in workflow manager (confirm services are compatible, exchange service handles, etc.) Service roaming

19 Open Issues in Grist (1) Stateful Services Client starts a service, goes away, comes back later Where to store state? Distributed: On client (cookies?) Centralized: On server Service Lifetime Client starts a service then goes away and never returns Receiving Status Client pull Server push But what if client goes away? Maintaining State On client On server

20 Open Issues in Grist (2) Chaining Services Controlled by central master (the client) No central control once chain is set up Service Roaming Protocols for shipping large datasets (memory limitations, etc) Authentication

21 “Hyperatlas” Partnership Collaboration between Caltech, SDSC, JPL, and the Astronomy domain experts for each survey. Objectives: Agree on a standard layout and grid for image plates to enable multi-wavelength science. Share the work and share the resulting image plates. Involve the science community to ensure high quality plates are produced.

22 Image Reprojection and Mosaicking FITS format encapsulates the image data with keyword-value pairs that describe the image and specify how to map pixels to the sky

23 Science Drivers for Mosaics Many important astrophysics questions involve studying regions that are at least a few degrees across.  Need high, uniform spatial resolution  BUT cameras give high resolution or wide area but not both => need mosaics  required for research and planning Mosaics can reveal new structures & open new lines of research Star formation regions, clusters of galaxies must be studied on much larger scales to reveal structure and dynamics Mosaicking multiple surveys to the same grid – image federation – required to effectively search for faint, unusual objects, transients, or unknown objects with unusual spectrum.

24 yourSky Custom Mosaic Portal San Diego Supercomputing Center Storage Resource Broker 2MASS Atlas: 1.8 Million images ~4 TB Caltech Center for Advanced Computing Research HPSS DPOSS: 2500 images ~3 TB yourSky can access all of the publicly released DPOSS and 2MASS images for custom mosaic construction.

25 Montage: Science Quality Mosaics Delivers custom, science grade image mosaics User specifies projection, coordinates, spatial sampling, mosaic size, image rotation Preserve astrometry & flux Background modeled and matched across images Modular “toolbox” design Loosely-coupled engines for Image Reprojection, Background Matching, Co-addition Control testing and maintenance costs Flexibility; e.g custom background algorithm; use as a reprojection and co-registration engine Implemented in ANSI C for portability Public service will be deployed on the TeraGrid Order mosaics through web portal

26 Sample Montage Mosaics 2MASS 3-color mosaic of galactic plane 44 x 8 degrees; 36.5 GB per band; CAR projection 158,400 x 28,800 pixels; covers 0.8% of the sky 4 hours wall clock time on cluster of 4 x 1.4-GHz Linux boxes 100 µm sky; aggregation of COBE and IRAS maps (Schlegel, Finkbeiner and Davis, 1998) 360 x 180 degrees; CAR projection

27 Summary Grist is architecting a framework for astronomical grid services Services for data access, mining, federation, mosaicking, statistics, and visualization Track evolving Grid and NVO standards