Slide 1 PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

The Relational Model and Relational Algebra Nothing is so practical as a good theory Kurt Lewin, 1945.
20 Spatial Queries for an Astronomer's Bench (mark) María Nieto-Santisteban 1 Tobias Scholl 2 Alexander Szalay 1 Alfons Kemper 2 1. The Johns Hopkins University,
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
Physical Database Monitoring and Tuning the Operational System.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
László Dobos, Tamás Budavári, Alex Szalay, István Csabai Eötvös University / JHU Aug , 2008.IDIES Inaugural Symposium, Baltimore1.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
The Pan-STARRS Data Challenge Jim Heasley Institute for Astronomy University of Hawaii ICS 624 – 28 March 2011.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
1 Database Management Systems: part of the solution or part of the problem? Clive Page 2004 April 28.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
The Pan-STARRS Data Challenge Jim Heasley Institute for Astronomy University of Hawaii.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Copyright 2006 Prentice-Hall, Inc. Essentials of Systems Analysis and Design Third Edition Joseph S. Valacich Joey F. George Jeffrey A. Hoffer Chapter.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
The PHysics Analysis SERver Project (PHASER) CHEP 2000 Padova, Italy February 7-11, 2000 M. Bowen, G. Landsberg, and R. Partridge* Brown University.
Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |
University of Sunderland COM 220 Lecture Ten Slide 1 Database Performance.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
7 Strategies for Extracting, Transforming, and Loading.
PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Pan-STARRS PS1 Published Science Products Subsystem Presentation to the PS1 Science Council August 1, 2007.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall Chapter 9 Designing Databases 9.1.
January 23, 2016María Nieto-Santisteban – AISRP 2003 / Pittsburgh1 High-Speed Access for an NVO Data Grid Node María A. Nieto-Santisteban, Aniruddha R.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Lecture 3 With every passing hour our solar system comes forty-three thousand miles closer to globular cluster 13 in the constellation Hercules, and still.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Database Planning Database Design Normalization.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
SQL Basics Review Reviewing what we’ve learned so far…….
Spatial Searches in the ODM. slide 2 Common Spatial Questions Points in region queries 1.Find all objects in this region 2.Find all “good” objects (not.
Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Practical Database Design and Tuning
Module 11: File Structure
Flash Storage 101 Revolutionizing Databases
Physical Database Design and Performance
Cross-matching the sky with database server cluster
Hash-Based Indexes Chapter 11
Translation of ER-diagram into Relational Schema
Data Lifecycle Review and Outlook
Physical Database Design
Practical Database Design and Tuning
Chapter 4 Indexes.
CH 4 Indexes.
CH 4 Indexes.
Overview of Query Evaluation
Efficient Catalog Matching with Dropout Detection
LSST, the Spatial Cross-Match Challenge
Presentation transcript:

slide 1 PS1 PSPS Object Data Manager Design PSPS Critical Design Review November 5-6, 2007 IfA

slide 2 Detail Design  General Concepts  Distributed Database architecture  Ingest Workflow  Prototype

slide 3 Zones (spatial partitioning and indexing algorithm)  Partition and bin the data into declination zones ZoneID = floor ((dec ) / zoneHeight)  Few tricks required to handle spherical geometry  Place the data close on disk Cluster Index on ZoneID and RA  Fully implemented in SQL  Efficient Nearby searches Cross-Match (especially)  Fundamental role in addressing the critical requirements Data volume management Association Speed Spatial capabilities Zones Declination (Dec) Right Ascension (RA)

slide 4 Zoned Table ObjIDZoneID*RADecCXCYCZ… * ZoneHeight = 8 arcsec in this example ZoneID = floor ((dec ) / zoneHeight)

slide 5 SQL CrossNeighbors SELECT * FROM prObj1 z1 JOIN zoneZone ZZ ON ZZ.zoneID1 = z1.zoneID JOIN prObj2 z2 ON ZZ.ZoneID2 = z2.zoneID WHERE z2.ra BETWEEN z1.ra-ZZ.alpha AND z2.ra+ZZ.alpha AND z2.dec BETWEEN AND AND (z1.cx*z2.cx+z1.cy*z2.cy+z1.cz*z2.cz) >

slide 6 Good CPU Usage

slide 7 Partitions  SQL Server 2005 introduces technology to handle tables which are partitioned across different disk volumes and managed by a single server.  Partitioning makes management and access of large tables and indexes more efficient Enables parallel I/O Reduces the amount of data that needs to be accessed Related tables can be aligned and collocated in the same place speeding up JOINS

slide 8 Partitions  2 key elements Partitioning function –Specifies how the table or index is partitioned Partitioning schemas –Using a partitioning function, the schema specifies the placement of the partitions on file groups  Data can be managed very efficiently using Partition Switching Add a table as a partition to an existing table Switch a partition from one partitioned table to another Reassign a partition to form a single table  Main requirement The table must be constrained on the partitioning column

slide 9 Partitions  For the PS1 design, Partitions mean File Group Partitions Tables are partitioned into ranges of ObjectID, which correspond to declination ranges. ObjectID boundaries are selected so that each partition has a similar number of objects.

slide 10 Distributed Partitioned Views  Tables participating in the Distributed Partitioned View (DVP) reside on different databases which reside in different databases which reside on different instances or different (linked) servers

slide 11 Concept: Slices  In the PS1 design, the bigger tables will be partitioned across servers  To avoid confusion with the File Group Partitioning, we call them “Slices”  Data is glued together using Distributed Partitioned Views  The ODM will manage slices. Using slices improves system scalability.  For PS1 design, tables are sliced into ranges of ObjectID, which correspond to broad declination ranges. Each slice is subdivided into partitions that correspond to narrower declination ranges.  ObjectID boundaries are selected so that each slice has a similar number of objects.

slide 12 Detail Design Outline  General Concepts  Distributed Database architecture  Ingest Workflow  Prototype

slide 13 PS1 P1P1 PmPm PartitionsMap Objects LnkToObj Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] Meta Detections Linked servers PS1 database LoadAdmin Load Support 1 objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 objZoneIndx Orphans_ln Detections_ln LnkToObj_ln detections Load Support n Linked servers detections PartitionsMap PS1 Distributed DB system Legend Database Full table [partitioned table] Output table Partitioned View Query Manager (QM) Web Based Interface (WBI)

slide 14 Design Decisions: ObjID  Objects have their positional information encoded in their objID fGetPanObjID (ra, dec, zoneH) ZoneID is the most significant part of the ID  It gives scalability, performance, and spatial functionality  Object tables are range partitioned according to their object ID

slide 15 ObjectID Clusters Data Spatially ObjectID = Dec = –  ZH =  ZID = (Dec+90) / ZH = RA =  ObjectID is unique when objects are separated by > arcsec

slide 16 Design Decisions: DetectID  Detections have their positional information encoded in the detection identifier fGetDetectID (dec, observationID, runningID, zoneH) Primary key (objID, detectionID), to align detections with objects within partitions Provides efficient access to all detections associated to one object Provides efficient access to all detections of nearby objects

slide 17 DetectionID Clusters Data in Zones DetectID = Dec = –  ZH =  ZID = (Dec+90) / ZH = ObservationID = Running ID =

slide 18 ODM Capacity The PS1 ODM shall be able to ingest into the ODM a total of 1.5  P2 detections 8.3  cumulative sky (stack) detections 5.5  10 9 celestial objects together with their linkages.

slide 19 PS1 Table Sizes - Monolithic Table Year 1Year 2Year 3Year 3.5 Objects 2.31 StackPsfFits StackToObj StackModelFits P2PsfFits P2ToObj Other Tables Indexes +20% Total Sizes are in TB

slide 20 PS1 P1P1 PmPm PartitionsMap Objects LnkToObj Meta Linked servers PS1 database What goes into the main Server Objects LnkToObj Meta PartitionsMap Legend Database Full table [partitioned table] Output table Distributed Partitioned View

slide 21 PS1 P1P1 PmPm PartitionsMap Objects LnkToObj Meta Linked servers PS1 database What goes into slices PartitionsMap [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta Legend Database Full table [partitioned table] Output table Distributed Partitioned View

slide 22 PS1 P1P1 PmPm PartitionsMap Objects LnkToObj Meta Linked servers PS1 database What goes into slices PartitionsMap [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta Legend Database Full table [partitioned table] Output table Distributed Partitioned View

slide 23 Duplication of Objects & LnkToObj  Objects are distributed across slices  Objects, P2ToObj, and StackToObj are duplicated in the slices to parallelize “inserts” & “updates”  Detections belong into their object’s slice  Orphans belong to the slice where their position would allocate them Orphans near slices’ boundaries will need special treatment  Objects keep their original object identifier Even though positional refinement might change their zoneID and therefore the most significant part of their identifier

slide 24 PS1 P1P1 PmPm PartitionsMap Objects LnkToObj Meta Linked servers PS1 database Glue = Distributed Views Detections [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta Legend Database Full table [partitioned table] Output table Distributed Partitioned View Detections

slide 25 PS1 P1P1 PmPm Web Based Interface (WBI) Linked servers PS1 database Partitioning in Main Server  Main server is partitioned (objects) and collocated (lnkToObj) by objid  Slices are partitioned (objects) and collocated (lnkToObj) by objid Query Manager (QM)

slide 26 PS1 Table Sizes - Main Server Table Year 1Year 2Year 3Year 3.5 Objects 2.31 StackPsfFits     StackToObj StackModelFits     P2PsfFits     P2ToObj Other Tables Indexes +20% Total Sizes are in TB

slide 27 PS1 Table Sizes - Each Slice m=4m=8m=10m=12 Table Year 1Year 2Year 3Year 3.5 Objects StackPsfFits StackToObj StackModelFits P2PsfFits P2ToObj Other Tables Indexes +20% Total Sizes are in TB

slide 28 PS1 Table Sizes - All Servers Table Year 1Year 2Year 3Year 3.5 Objects StackPsfFits StackToObj StackModelFits P2PsfFits P2ToObj Other Tables Indexes +20% Total Sizes are in TB

slide 29 Detail Design Outline  General Concepts  Distributed Database architecture  Ingest Workflow  Prototype

slide 30 PS1 P1P1 PmPm PartitionsMap Objects LnkToObj Meta [Objects_p1] [LnkToObj_p1] [Detections_p1] PartitionsMap Meta [Objects_pm] [LnkToObj_pm] [Detections_pm] PartitionsMap Meta Detections Linked servers PS1 database LoadAdmin Load Support 1 objZoneIndx orphans_l1 Detections_l1 LnkToObj_l1 objZoneIndx Orphans_ln Detections_ln LnkToObj_ln detections Load Support n Linked servers detections PartitionsMap PS1 Distributed DB system Legend Database Full table [partitioned table] Output table Partitioned View Query Manager (QM) Web Based Interface (WBI)

slide 31 “Insert” & “Update”  SQL Insert and Update are expensive operations due to logging and re-indexing  In the PS1 design, Insert and Update have been re- factored into sequences of: Merge + Constrain + Switch Partition  Frequency f1: daily f2: at least monthly f3: TBD (likely to be every 6 months)

slide 32 Ingest Workflow ObjectsZ CSV Detect X(1”) DXO_1a NoMatch X(2”) DXO_2a DZone P2PsfFits Resolve P2ToObj Orphans

slide 33 frequency = f1 P2ToObj Orphans SLICE_1 MAIN P2PsfFits Metadata+ Objects Orphans_1 P2ToPsfFits_1 P2ToObj_1 Objects_ Stack*_1 123 P2ToObjStackToObj P2ToObj_1P2ToPsfFits_1 Orphans_1 ObjectsZ LOADER

slide 34 SLICE_1 MAIN Metadata+ Objects Orphans_1 P2ToPsfFits_1 P2ToObj_1 Objects_ Stack*_1 123 P2ToObjStackToObj LOADER Objects frequency = f2

slide 35 frequency = f2 SLICE_1 MAIN Metadata+ Objects Orphans_1 P2ToPsfFits_1 P2ToObj_1 Objects_ Stack*_1 123 P2ToObjStackToObj Objects LOADER Objects Objects_1

slide 36 frequency = f3 MAIN Metadata+ Objects 123 P2ToObjStackToObj Snapshot Objects

slide 37 Batch Update of a Partition A1A2A … merged select into switch B1 select into … where B2 + PK index select into … where B3 + PK index switch select into … where B1 + PK index

slide 38 P2P3P2P3 PS1 P1P2P1P2 PmP1PmP1 PartitionsMap Objects LnkToObj Meta Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Query Manager (QM) Detections Linked servers PS1 database [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PS1 PartitionsMap Objects LnkToObj Meta Detections P m-1 P m Scaling-out  Apply Ping-Pong strategy to satisfy query performance during ingest 2 x ( 1 main + m slices)

slide 39 P2P3P2P3 PS1 P1P2P1P2 PmP1PmP1 Legend Database Duplicate Full table [partitioned table] Partitioned View Duplicate P view [Objects_p1] [LnkToObj_p1] [Detections_p1] [Objects_p2] [LnkToObj_p2] [Detections_p2] Meta Query Manager (QM) Linked servers PS1 database [Objects_pm] [LnkToObj_pm] [Detections_pm] [Objects_p1] [LnkToObj_p1] [Detections_p1] Meta PS1 PartitionsMap Objects LnkToObj Meta Detections P m-1 P m Scaling-out  More robustness, fault-tolerance, and reabilability calls for 3 x ( 1 main + m slices) PartitionsMap Objects LnkToObj Meta Detections

slide 40 Adding New slices SQL Server range partitioning capabilities make it easy  Recalculate partitioning limits  Transfer data to new slices  Remove data from slices  Define an d Apply new partitioning schema  Add new partitions to main server  Apply new partitioning schema to main server

slide 41 Adding New Slices

slide 42 Detail Design Outline  General Concepts  Distributed Database architecture  Ingest Workflow  Prototype

slide 43 ODM Assoc/Update Requirement  The PS1 ODM shall update the derived attributes for objects when new P2, P4  (stack), P4  and cumulative sky detections are being correlated with existing objects.

slide 44 ODM Ingest Performance The PS1 ODM shall be able to ingest the data from the IPP at two times the nominal daily arrival rate* * The nominal daily data rate from the IPP is defined as the total data volume to be ingested annually by the ODM divided by 365.  Nominal daily data rate: 1.5  / 3.5 / 365 = 1.2  10 8 P2 detections / day 8.3  / 3.5 / 365 = 6.5  10 7 stack detections / day

slide 45 Number of Objects miniProtomyProtoPrototypePS1 SDSS* Stars5.7 x x x 10 8 SDSS* Galaxies9.1 x x x 10 8 Galactic Plane1.5 x x x 10 9 TOTAL1.6 x x x x 10 9 * “SDSS” includes a mirror of 11.3 <  < 30 objects to  < 0 Total GB of csv loaded data: 300 GB CSV Bulk insert load: 8 MB/s Binary Bulk insert:18-20 MB/s Creation Started: October 15 th 2007 Finished:October 29 th 2007 (??) Includes 10 epochs of P2PsfFits detections 1 epoch of Stack detections

slide 46 Time to Bulk Insert from CSV FileRowsRowSizeGB MinutesMinutes/GB stars_plus_xai.csv galaxy0_xal.csv galaxy0_xam.csv galaxy0_xan.csv gp_6.csv gp_10.csv gp_11.csv P2PsfFits / Day CSV Bulk insert speed ~ 8 MB/s BIN Bulk insert speed ~ 18 – 20 MB/s

slide 47 Prototype in Context Survey ObjectsDetections SDSS DR6 3.8  MASS 4.7  10 8 USNO-B 1.0  10 9 Prototype 1.3   PS1 (end of survey) 5.5   10 11

slide 48 Size of Prototype Database Table MainSlice1Slice2Slice3LoaderTotal Objects StackPsfFits 6.49  StackToObj 6.49  StackModelFits 0.87  P2PsfFits  P2ToObj  Total Extra Tables Grand Total Table sizes are in billions of rows

slide 49 Size of Prototype Database Table MainSlice1Slice2Slice3LoaderTotal Objects StackPsfFits  StackToObj  StackModelFits  P2PsfFits  P2ToObj  Total Extra Tables Allocated / Free Grand Total TB of data in a distributed database Table sizes are in GB

slide 50 Well-Balanced Partitions Server PartitionRowsFractionDec Range Main 1432,590, %  Slice ,199, %  Slice ,229, % 9.39  Slice ,162, % 8.91  Main 2432,456, %  Slice ,261, % 8.46  Slice ,073, % 7.21  Slice ,121, % 7.77  Main 3432,496, %  Slice ,270, %  Slice ,090, %  Slice ,136, % 

slide 51 Ingest and Association Times Task Measured Minutes Create Detections Zone Table39.62 X(0.2") 121M X 1.3B65.25 Build #noMatches Table1.50 X(1") 12k X 1.3B0.65 Build #allMatches Table (121M)6.58 Build Orphans Table0.17 Create P2PsfFits Table11.63 Create P2ToObj Table14.00 Total of Measured Times140.40

slide 52 Ingest and Association Times Task Estimated Minutes Compute DetectionID, HTMID30 Remove NULLS15 Index P2PsfFits on ObjID15 Slices Pulling Data from Loader 5 Resolve 1 Detection - N Objects10 Total of Estimated Times75 Educated Guess Wild Guess

slide 53 Total Time to I/A daily Data Task Time (hours) Time (hours) Ingest 121M Detections (binary)0.32  Ingest 121M Detections (CSV)  0.98 Total of Measured Times2.34 Total of Estimated Times1.25 Total Time to I/A Daily Data Requirement: Less than 12 hours (more than 2800 detections / s) Detection Processing Rate: 8600 to 7400 detections / s Margin on Requirement: 3.1 to 2.6 Using multiple loaders would improve performance

slide 54 Insert slices Task Estimated Minutes Import P2PsfFits (binary out/in)20.45 Import P2PsfFits (binary out/in)2.68 Import Orphans0.00 Merge P2PsfFits58 Add constraint P2PsfFits193 Merge P2ToObj13 Add constraint P2ToObj54 Total of Measured Times362 6 h with 8 partitions/slice (~1.3 x 10 9 detections/partition) Educated Guess

slide 55 Detections Per Partition Years Total Detections Slices Partition per Slice Total Partitions Detections per Slice               10 9

slide 56 Total Time for slice Task Time (hours) Total of Measured Times0.25 Total of Estimated Times5.3 Total Time for daily insert6 Daily insert may operate in parallel with daily ingest and association. Requirement: Less than 12 hours Margin on Requirement: 2.0 Using more slices will improve insert performance.

slide 57 Summary  Ingest + Association < 4 h using 1 loader daily) Scales with the number of servers Current margin on requirement 3.1 Room for improvement  Detection slices daily) 6 h with 8 partitions/slice It may happen in parallel with loading  Detections Lnks main < monthly) Unknown 6 h available  Objects insert & slices < monthly) Unknown 6 hours available  Objects main server < monthly) Unknown 12 h available. Transfer can be pipelined as soon as objects have been processed

slide 58 Risks  Estimates of Insert & Update at slices could be underestimated Need more empirical evaluation of exercising parallel I/O  Estimates and lay out of disk storage could be underestimated Merges and Indexes require 2x the data size