MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam Presented.

Slides:



Advertisements
Similar presentations
Global Hands-On Universe meeting July 15, 2007 Authentic Data in the Classroom with the Sloan Digital Sky Survey Jordan Raddick (Johns Hopkins University)
Advertisements

The Researcher’s Guide to the Data Deluge: Querying a Scientific Database in just a Few Seconds Martin L. Kersten Stratos Idreos Stefan Manegold Erietta.
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
László Dobos 1,2, Tamás Budavári 2, Nolan Li 2, Alex Szalay 2, István Csabai 1 1 Eötvös Loránd University, Budapest,
CASJOBS: A WORKFLOW ENVIRONMENT DESIGNED FOR LARGE SCIENTIFIC CATALOGS Nolan Li, Johns Hopkins University.
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
Massive Graph Visualization: LDRD Final Report Sandia National Laboratories Sand Printed October 2007.
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.
M.Kersten The MonetDB Architecture Martin Kersten CWI Amsterdam.
Building a Framework for Data Preservation of Large-Scale Astronomical Data ADASS London, UK September 23-26, 2007 Jeffrey Kantor (LSST Corporation), Ray.
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
Dutch-Belgium DataBase Day University of Antwerp, MonetDB/x100 Peter Boncz, Marcin Zukowski, Niels Nes.
Scientific Paradigm First : Observations Second : Theory Third: Computer simulations Fourth: Data mining.
SDSS Web Services Tamás Budavári Johns Hopkins University Coding against the Universe.
Teaching Science with Sloan Digital Sky Survey Data GriPhyN/iVDGL Education and Outreach meeting March 1, 2002 Jordan Raddick The Johns Hopkins University.
Soror SAHRI SD-SQL Server: a Scalable Distributed Database.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Spatial Indexing and Visualizing Large Multi-dimensional Databases I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,
How to speed up search of ILMT light curves using the HTM (Hierarchical Triangular Mesh) method in relational databases ARC Liège, 11 February 2010 ILMT.
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Spatial Indexing of large astronomical databases László Dobos, István Csabai, Márton Trencséni ELTE, Hungary.
Innovations in the Multimission Archive at STScI (MAST) M. Corbin, M. Donahue, C. Imhoff, T. Kimball, K. Levay, P. Padovani, M. Postman, M. Smith, R. Thompson.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
How to build your own SkyNode A quick tutorial by Alberto Conti & Bernie Shiao Space Telescope Science Institute Baltimore, MD
Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey Alexander S. Szalay, Peter Z. Kunszt, Ani Thakar Dept. of Physics.
The Development of the Ceramics and Glass website Mia Ridge Museum Systems Team Museum of London.
M.Kersten MonetDB, a Column-Store in Midflight Martin Kersten CWI Amsterdam.
M.Kersten Dec 31, Cracking the database store The far side of the Moon Martin Kersten, Stefan Manegold Centre for Mathematics and Computer Science.
Moving Point Type OTB Research Institute for Housing, Urban and Mobility Studies Dagstuhl 1 A ‘movingpoint’ type for a DBMS Wilko Quak - TUDelft.
M.Kersten MonetDB/SQL : the Challenges of a Scientific Database, Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam.
M.Kersten The MonetDB Architecture Martin Kersten CWI Amsterdam.
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
The Sloan Digital Sky Survey ImgCutout: The universe at your fingertips Maria A. Nieto-Santisteban Johns Hopkins University
IPHAS EDR: Where are we? Catalogues up to January 2006 ingested Catalogue totals: Rows: 430M Gb: 400Gb Images are available.
Distributed Query Processing. Agenda Recap of query optimization Transformation rules for P&D systems Memoization Queries in heterogeneous systems Query.
August 10, 2004 Apache Point Observatory, NM FINDING SUPERNOVAE IN A SLICE OF PI Dennis J. Lamenti San Francisco State University.
Recent spatial work by Jim Gray and Alex Szalay Bob Mann.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
A Generalized Architecture for Bookmark and Replay Techniques Thesis Proposal By Napassaporn Likhitsajjakul.
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
Evaluation of distribution Alternatives of Pantex Spatial database for the Pantex Plant Presented by Ye Maggie Ruan (
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Slide 1 PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA.
Universiteit Utrecht MONET CD Session 9 | Monday 6 June 2005 Lee Provoost.
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
JENAM 2008 Theory Standards for the Virtual Observatory SimDB + SimDAP.
Database cracking Stratos Idreos, Martin Kersten and Stefan Manegold
Key Terms Attribute join Target table Join table Spatial join.
Cross-matching the sky with database server cluster
Tomograph: Highlighting query parallelism in a multi-core system
Potter’s Wheel: An Interactive Data Cleaning System
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Sky Query: A distributed query engine for astronomy
BARC Scaleable Servers
RDF Stores S. Sakr and G. A. Naymat.
Rick, the SkyServer is a website we built to make it easy for professional and armature astronomers to access the terabytes of data gathered by the Sloan.
MANAGING DATA RESOURCES
Yi Wang, Wei Jiang, Gagan Agrawal
A Framework for Testing Query Transformation Rules
Distributed Databases
Efficient Catalog Matching with Dropout Detection
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database Milena Ivanova, Niels Nes, Romulo Goncalves, Martin Kersten CWI, Amsterdam Presented at SSDBM, July 2007, Banff, Canada

SkyServer provides public access to SDSS for astronomers, students, and wide public A project to make a map of a large part of the Universe 230 million object images 1 million spectra 4TB catalog data 9TB images

DBDBD’07, Eindhoven M. Ivanova et al., CWI SkyServer Schema 446 columns >370 million rows Vertical fragment of 100+ popular columns Materialized join of Photo and Spectra

DBDBD’07, Eindhoven M. Ivanova et al., CWI Outline MonetDB/SQL SkyServer porting lessons Query log lessons Evaluation Outlook

DBDBD’07, Eindhoven M. Ivanova et al., CWI MonetDB Background H T Ra … H T Dec … H TUTU … RaDecU … … … ……………… PhotoObjAll Ra BATDec BATU BAT

DBDBD’07, Eindhoven M. Ivanova et al., CWI MonetDB Architecture SQL MonetDB Server Tactical Optimizer MonetDB Kernel XQuery MAL function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; select count(*) from photoobjall;

DBDBD’07, Eindhoven M. Ivanova et al., CWI SkyServer with MonetDB Goal: To provide SkyServer mirror with similar functionality using MonetDB Three phases: 1%, 10%, entire SDSS data set Can we Do better in terms of performance and functionality? Improve query processing by novel parallelism and query cracking techniques? Extend functionality to support, e.g. LOFAR?

DBDBD’07, Eindhoven M. Ivanova et al., CWI Portability Lessons Need for rich SQL environment (PSM) Cast to SQL:2003 standard –Replacement of data types and operations –Specific extensions ignored or replaced Avoid data redundancy –Auxiliary tables replaced by views:10% size reduction

DBDBD’07, Eindhoven M. Ivanova et al., CWI Spatial Search Lesson HTM (Hierarchical Triangular Mesh) –Implemented in C++, C# –Good for point-near-point and point-in- region queries Zones –Implemented in SQL –Good for point-near-point (x3) –Efficient for batch-oriented spatial join(x32) –Enables SQL optimizer usage

DBDBD’07, Eindhoven M. Ivanova et al., CWI Query Log Lessons Query logs important for both application and science Analysed 1.2M queries, August 2006 Spatial access prevails (83%) Small core of photo and spectro tables accessed –64% photo, 44% spectro, 27% both

DBDBD’07, Eindhoven M. Ivanova et al., CWI Common Patterns Limited number of query patterns –Correlation to web site interface Most popular query (25%) SELECT top 10 p.objID, p.run, p.rerun, p.camcol, p.field, p.obj, p.type, p.ra, p.dec, p.u, p.g, p.r, p.i, p.z, p.Err_u, p.Err_g, p.Err_r, p.Err_i, p.Err_z FROM fGetNearbyObjEq(195,2.5,3) n, PhotoPrimary p WHERE n.objID = p.objID;

DBDBD’07, Eindhoven M. Ivanova et al., CWI Query Coverage Query coverage on the sky

DBDBD’07, Eindhoven M. Ivanova et al., CWI Spatial Overlap 24% queries overlap Mean sequence length of 9.4, max of 6200 Overlap and equality patterns for script- based interaction Zoom in/zoom out patterns for manual interaction

DBDBD’07, Eindhoven M. Ivanova et al., CWI Evaluation on 1GB

DBDBD’07, Eindhoven M. Ivanova et al., CWI Evaluation on 100GB ‘Color-cut’ for low-z quasars SELECT g, run, rerun, camcol, field, objID, FROM Galaxy WHERE ( ( g <= 22) and (u - g >= -0.27) and (u - g < 0.71) and (g - r >= -0.24) and (g - r < 0.35) and (r - i >= -0.27) and (r - i < 0.57) and (i - z >= -0.35) and (i - z < 0.7) ); Moving asteroids SELECT objID, sqrt(power(rowv,2) + power(colv,2)) as velocity FROM PhotoObj WHERE power(rowv,2) + power(colv,2) > 50 and rowv >= 0 and colv >= 0;

DBDBD’07, Eindhoven M. Ivanova et al., CWI Status Staircase to the sky –1GB: done –100GB: towards completion –Entire 4TB DR6: in progress Web site

DBDBD’07, Eindhoven M. Ivanova et al., CWI Inspirations Self-organization vs. hard-coded zoning –Adaptive segmentation (ICDE’08) –Adaptive replication (EDBT’08) Results caching and reuse Workload-driven optimization