20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,

Slides:



Advertisements
Similar presentations
Spatial (or N-Dimensional) Search in a Relational World Jim Gray, Microsoft Alex Szalay, Johns Hopkins U.
Advertisements

Spatial (or N-Dimensional) Search in a Relational World Jim Gray.
Lehrstuhl Informatik III: Datenbanksysteme Astrometric Matching - E-Science Workflow 1 Lehrstuhl Informatik III: 1 Datenbanksysteme 1 Fakultät für Informatik.
Trying to Use Databases for Science Jim Gray Microsoft Research
World Wide Telescope mining the Sky using Web Services Information At Your Fingertips for astronomers Jim Gray Microsoft Research Alex Szalay Johns Hopkins.
Registries Work Package 2 Requirements, Science Cases, Use Cases, Test Cases Charter: Focus on science case scenarios, and use cases related specifically.
9 September 2005NVO Summer School Aspen Astronomical Dataset Query Language (ADQL) Ray Plante T HE US N ATIONAL V IRTUAL O BSERVATORY.
Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Richard Kuntschke 1, Tobias Scholl 1, Sebastian Huber 1, Alfons.
Metadata in the TAP context (1) The Problem: learn about which tables, tablesets,... are available from a TAP server for each of the tables / tablesets,
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 1 Eötvös Loránd University, Budapest,
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
C-Store: Introduction to TPC-H Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 20, 2009.
Organizing the Extremely Large LSST Database for Real-Time Astronomical Processing ADASS London, UK September 23-26, 2007 Jacek Becla 1, Kian-Tat Lim 1,
CASJOBS: A WORKFLOW ENVIRONMENT DESIGNED FOR LARGE SCIENTIFIC CATALOGS Nolan Li, Johns Hopkins University.
Petascale Data Intensive Computing for eScience Alex Szalay, Maria Nieto-Santisteban, Ani Thakar, Jan Vandenberg, Alainna Wonders, Gordon Bell, Dan Fay,
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
Probabilistic Cross-Identification of Astronomical Sources Tamás Budavári Alexander S. Szalay María Nieto-Santisteban The Johns Hopkins University.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
The Montage Image Mosaic Service: Custom Mosaics on Demand ESTO John Good, Bruce Berriman Mihseh Kong and Anastasia Laity IPAC, Caltech
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
C&A 10April06 1 Point Source Detection and Localization Using the UW HealPixel database Toby Burnett University of Washington.
Benchmarks Title: A Measure of Transaction Processing Power Authors: Anon Et. Al. Datamation, 1985.
The Calibration Process
Panel Summary Andrew Hanushevsky Stanford Linear Accelerator Center Stanford University XLDB 23-October-07.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Presenter: Joshan V John Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan & Tien N. Nguyen Iowa State University, USA Instructor: Christoph Csallner 1 Joshan.
FLANN Fast Library for Approximate Nearest Neighbors
N Tropy: A Framework for Analyzing Massive Astrophysical Datasets Harnessing the Power of Parallel Grid Resources for Astrophysical Data Analysis Jeffrey.
Concepts of Database Management, Fifth Edition
László Dobos, Tamás Budavári, Alex Szalay, István Csabai Eötvös University / JHU Aug , 2008.IDIES Inaugural Symposium, Baltimore1.
Amdahl Numbers as a Metric for Data Intensive Computing Alex Szalay The Johns Hopkins University.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Big Data in Science (Lessons from astrophysics) Michael Drinkwater, UQ & CAASTRO 1.Preface Contributions by Jim Grey Astronomy data flow 2.Past Glories.
Memory/Storage Architecture Lab Computer Architecture Performance.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
 Agenda 2/20/13 o Review quiz, answer questions o Review database design exercises from 2/13 o Create relationships through “Lookup tables” o Discuss.
The Sloan Digital Sky Survey ImgCutout: The universe at your fingertips Maria A. Nieto-Santisteban Johns Hopkins University
1 Seoul National University Performance. 2 Performance Example Seoul National University Sonata Boeing 727 Speed 100 km/h 1000km/h Seoul to Pusan 10 hours.
Recent spatial work by Jim Gray and Alex Szalay Bob Mann.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
ApproxHadoop Bringing Approximations to MapReduce Frameworks
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
EGRE 426 Computer Organization and Design Chapter 4.
Developing GRID Applications GRACE Project
Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,
A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Introduction to emulators Tony O’Hagan University of Sheffield.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Astronomical Data Processing & Workflow Scheduling in cloud
Database Performance Measurement
Java 9: The Quest for Very Large Heaps
Cross-matching the sky with database server cluster
Sky Query: A distributed query engine for astronomy
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
Efficient Catalog Matching with Dropout Detection
CSE 1020:Software Development
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
Computer Organization and Design Chapter 4
LSST, the Spatial Cross-Match Challenge
Presentation transcript:

20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University, 2. Technische Universität München

Maria A. Nieto-Santisteban / ADASS 2007 Motivation Most astronomical data is or must be online Data & catalogs are growing No single best data management solution  Different projects approach the data access challenge in different ways  Different users and science problems have different needs A well defined “test” suite to provide objective comparison would be useful

Maria A. Nieto-Santisteban / ADASS 2007 Benchmarking Key criteria for a domain-specific benchmark Relevant: It must measure the peak performance and price/performance of systems when performing typical operations within that problem domain Portable: It should be easy to implement the benchmark on many different systems and architectures Scaleable: The benchmark should apply to small and large computer systems. It should be possible to scale the benchmark up to larger systems, and to parallel computer systems as computer performance and architecture evolve Simple: The benchmark must be understandable, otherwise it will lack credibility “The Benchmark Handbook: For Database and Transaction Processing Systems,” Jim Gray Repeatable

Maria A. Nieto-Santisteban / ADASS 2007 Data Corpus (+ outputs) TablenColrSize(B)nRowtSize(GB)Notes ROSAT DS K0.02RASS FIRST DS K0.07 FP153252~ 0Footprint SDSS DR6 DS M89PhotoTag DS M27I < 20.0 FP253243~ 0Footprint 2MASS DS M211PSC … * raw data size without indexes

Maria A. Nieto-Santisteban / ADASS 2007 The “20” Spatial Queries Single/Multi Catalog & Regions  Cone search: Find objects within a circle  Find objects within a circle satisfying a high multi- dimensional constraint  Find the closest neighbor  Find objects within a region  Find objects in/outside masked regions  Find objects near the edges of a region  Compute the area of a region  Find surveys covering a given region  Find the intersection between several surveys  Count objects from a list of regions

Maria A. Nieto-Santisteban / ADASS 2007 The “20” Spatial Queries  Find these 1k - 100k objects in these catalogs  For all catalogs, extract a random sample of existing objects within a given region  Cross-match 2 catalogs within a given region  Cross-match n catalogs, n > 2, within a given region  Find objects which are in A, B, and C but not in D  Given a sparse grid, find the closest grid point for all objects in the catalog  Find multiple detections of the same object with given magnitudes variations  Find all quasars within a region and compute their distance to surroundings galaxies  more... open to discussion …

Maria A. Nieto-Santisteban / ADASS 2007 Query Specification Inspired by the TPC Benchmark tm H spec. Queries are defined using: The business question, using natural language sets the context and functionality, and specifies the inputs and outputs parameters Substitution parameters Random input parameters Random output columns Validation Provide a few queries with known results  Functional query, defines the business question using SQL

Maria A. Nieto-Santisteban / ADASS 2007 An Example: Cone Search Query Business Question: The query describes sky position and an angular distance (both in degrees), defining a cone on the sky. The response returns a list of astronomical sources from the SDSSDR6 catalog, ds4 in the corpus, whose positions lie within the cone. The response shall include: objid, Ra, Dec, d[deg], modelMag_u|r|i|z|g Substitution Parameters: Runs for 1000 random points Runs for Radius = 4”, 16”, 30”, 1’, 4’, 30’, 1 deg, 4 deg Returns 10, 20, 30 additional randomly selected columns

Maria A. Nieto-Santisteban / ADASS 2007 An Example: Cone Search Query Validation:  RA = 195.0, DEC = 2.5, Radius = 1’ objidradecd[deg]mMag_imMag_gmMag_rmMag_umMag_z rows

Maria A. Nieto-Santisteban / ADASS 2007 The Metrics Quantitative (min, max, avg, geometric mean, stdev)  Elapsed time [s], CPU [s], I/O  Throughput: Data [Mbs], Work [Jobs/s]  Total space: Data + Index [GB]  Number of concurrent users Qualitative (capabilities)  Query flexibility Single catalog Multi-catalog Inter-catalog  Data flow: upload and download, MB, GB?  Distributed access to data and services  Workflow definition  Data, results & code sharing  Job management

Maria A. Nieto-Santisteban / ADASS 2007 Final Remarks Benchmarking is difficult and controversial Different hardware, different software, changing environments, … Scalability is not guaranteed