Download presentation
Presentation is loading. Please wait.
Published byBrianne Gregory Modified over 9 years ago
1
DC2 Postmortem Association Pipeline
2
AP Architecture 3 main phases for each visit –Load: current knowledge of FOV into memory –Compute: match difference sources to objects match moving object predictions to difference sources create and update objects based on match results –Store: updated/new objects, difference sources
3
AP Architecture: Database + files Historical Object Catalog: -master copy in database updated/appended to never read in DC2 -thin slice of object catalog in files primary purpose: efficient retrieval of positions for spatial matching kept in sync with db copy, updated by AP Updates easy since positions never change -difference sources, mops predictions in database, read into memory by custom C++ code when needed -spatial cross match in custom C++ code, all other processing (Object catalog updates) implemented using SQL scripts, in-memory tables, etc…
4
AP Performance: Data Volume DC2: 417,327 objects in Object catalog (all in a single FOV) FOV is ~1 deg 2 Getting 5k difference sources per FOV worst case 22 moving object predictions worst case Production: Up to 10 million objects per FOV, 14 to 49 billion objects (DR1 - DR11) total FOV is 10 deg 2 100k difference sources per FOV worst case 2.5k moving object predictions per FOV worst case
5
AP Architecture: Load –Sky is partitioned into chunks (ra/dec boxes). For each chunk: 1 file stores objects from the input Object catalog (i.e. the product of Deep Detect) within the chunk 1 delta file stores new objects accumulated during visit processing –Multiple slices read these chunk files in parallel DC2: –one slice only –files read/written over NFS –all visits have essentially the same FOV (in terms of chunks) –Objects are loaded into a shared memory region –Master creates a spatial index for objects on the fly all slices must run on the same machine as the master
6
AP Performance: Load Reading chunk files (object positions) Timing included for completeness but not very meaningful Data lives on NFS volume - contention with other pipelines But, same chunks are read every visit - reads are from cache
7
AP Performance: Load Building zone index for objects 0.33 - 0.34 seconds on average increases over consecutive visits since new objects are being created visit 1: 417,327 objectsvisit 62: ~450k objects (depends on run)
8
AP Architecture: Compute –Read difference sources coming from IPDP (via database) –Build spatial index for difference sources –Match difference sources against objects spatial match only –Write difference source to object matches to database –Read moving object predictions from database –Match them to difference sources that didn’t match a known variable object spatial match only –Create objects from difference sources that didn’t match any object DC2: moving object predictions not taken into account –Write Mops prediction to diff. source matches and new objects to database Everything runs on master process. Index building and matching can be multi-threaded (OpenMP) if necessary, but aren’t for DC2.
9
AP Performance: Compute Reading difference sources from database
10
AP Performance: Compute Building zone index for difference sources
11
AP Performance: Compute Matching difference sources to objects
12
AP Performance: Compute rlp0128/rlp0130: matched objects were matched an average of 1.69/2.01 times in r and 2.12/2.45 times in u over 62 visits
13
AP Performance: Compute Writing difference source matches to database
14
AP Performance: Compute Reading mops predictions from database
15
AP Performance: Compute Matching mops predictions to diff. sources ~0.1 ms (~20 predictions and error ellipses are clamped to 10 arc-seconds) Creating new objects
16
AP Performance: Compute Writing mops predictions and new objects to database
17
AP Architecture: Store –Multiple slices write chunk delta files in parallel these files contain object positions for objects accumulated during visit processing DC2: 1 slice, data lives on NFS –Master launches MySQL scripts that use database outputs of compute phase: Update Object catalog. For DC2 this is –# of times an object was observed in a given filter –Latest observation time of an object Insert new objects into Object catalog Append difference sources for visit to DIASource table Append various per-visit result tables to historical tables (for debugging) Drop per-visit tables
18
AP Performance: Store Writing chunk delta files (positions for objects created during visit processing) Wild swings in timing due to NFS contention IPDP is often loading science/template exposures as AP ends (AP was configured to write chunk deltas to NFS volume).
19
AP Performance: Store Updating historical Object catalog Long times again due to NFS contention AP writes out SQL script files to NFS, mysql client reads them from NFS (while IPDP is loading exposures) Last visit in run has timing with no interference from other pipelines (~0.4s)
20
AP Performance: Store Database cleanup: append per-visit tables to per-run accumulator tables drop per-visit tables Suspected culprit - NFS again
21
Conclusions For the small amounts of data in DC2, AP performs to spec (1.6s - 10s) despite some configuration problems Don’t use NFS Matching is fast: but need to run with more input data to make strong claims about performance Need to plug-in algorithms which make use of non-spatial data to really test the AP design
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.