Association Pipeline Take difference sources, moving object predictions for a visit Compare them to current knowledge of the sky –objects, historical difference.

Association Pipeline Take difference sources, moving object predictions for a visit Compare them to current knowledge of the sky –objects, historical difference sources Use results to –determine if something interesting happened (send out alerts) –improve knowledge of the sky (create new objects and update existing ones)

Outline Spatial Matching –algorithm & implementation Architecture DC3 Discussion

2 Spatial Matches difference sources D v vs. objects O v –d  D v and o  O v match iff distance(d, o) < R v –avoid alerting on known variables, capture variability of known objects mops predictions P v vs. difference sources D v –m  P v and d  D v match iff d falls within positional error ellipse of m –avoid alerting on known movers, don’t create entries for moving objects in the stationary object catalog –only match against difference sources that did not match a known variable object

Algorithm 1 For now assume: objects O v, difference sources D v, predictions P v for FOV are in memory, focus on matching only Create ZoneIndex (ZoneTypes.h) for O v. Goal: support fast proximity searches –bucket sort positions by declination: obtain a set of constant height zones –within each zone, sort positions by right ascension

Algorithm 2 Choosing zone height ≥ R v means that for p in zone Z, only need to look at zones immediately above/below to find potential matches Furthermore, for each Z can compute  s.t. any 2 points within Z separated by more than  in ra are separated by distance ≥ R v Given point p=(ra,dec), look at entries in range [ra - , ra +  ] for 3 zones. Zone entries are sorted on right ascension - use binary search to locate candidates. Finally, compute distance between p and small set of candidates -- done! Well, except that…

Algorithm 3 If one picks difference sources at random, cache miss rate will be high (4.0e6 objects, index > 100MB) So, create ZoneIndex for difference sources as well (re-used later) Process difference sources one zone at a time, in ra order Can process zones in parallel Within a zone, use linear search to find entries in range [ra +  - , ra +  +  ] from those in range [ra - , ra +  ]

Algorithm 4 See distanceMatch() in Match.h for details Similar algorithm used to find DiaSources within the error ellipse of moving object predictions Differences –No fixed R v : error ellipses have wildly varying size. Compute a bounding box in zone,ra for each ellipse –Find ra range of bounding box by treating ellipse as a circle with radius equal to the semi-major axis –Don’t create a ZoneIndex for the ellipses, just sort based on lowest zone of ellipse bounding box –Cache issues less severe (100,000 difference sources worst case, rather than 10 million objects) See ellipseMatch(), ellipseGroupedMatch() in Match.h for matching algorithm, EllipseTypes.h for ellipse representation

Algorithm 5 All these match routines work on generic structures (Ellipse, ZoneEntry) that contain position information and references to further data (e.g. a full DiaSource). Each takes 2 functor parameters that can be used to filter out a particular difference source, object, or prediction from the match. Also, each routine has a MatchListProcessor or MatchPairProcessor parameter MatchListProcessor: operator() takes e.g. a DiaSource and the list of all matching objects. In DC2, only action is to keep a record of matches found, but more complicated matching logic could be implemented here MatchPairProcessor: same as MatchListProcessor, but works on a single match at a time.

Timing ~0.33 seconds to build ZoneIndex for 400k Objects Parallelizable ~5 ms to match 2.6k difference sources to 400k Objects (1.8k matches) DC2: max of 22 moving object predictions per FOV, matching those to difference sources takes ~0.1ms results at -O0

Architecture: High Level Load phase (lsst.ap.LoadStage) –read positions for objects within FOV from files (main Object table is in RDBMS, association pipeline keeps the files and table in sync). –build ZoneIndex for objects Compute phase (triggered by detection) –Read difference sources coming from detection (lsst.dps.IOStage) –Build spatial index for difference sources (lsst.ap.MatchDiaSourcesStage) –Match difference sources against objects (lsst.ap.MatchDiaSourcesStage) –Write match results to database (lsst.dps.IOStage) –Read moving object predictions from database (lsst.dps.IOStage) –Match them to difference sources that didn’t match a known variable object (lsst.ap.MatchMopsPredsStage) –Create objects from difference sources that didn’t match any object (lsst.ap.MatchMopsPredsStage) –Write matches and new objects to database (lsst.dps.IOStage) Store phase (lsst.ap.StoreStage) –Write positions of new objects to chunk delta files –Execute MySQL scripts that update the Object table, insert new objects into it, append per-visit tables to historical tables and finally drop per visit tables

Architecture: Details 1 Stripes are a fraction of a field-of-view high (less than one to minimize wasted I/O around the circular FOV) Chunks are one stripe height wide Objects are divided into declination stripes, and physically partitioned into chunks. LoadStage only reads chunks overlapping the circular FOV, keeping IO per visit low CircularRegion/RectangularRegion classes represent FOVs, ZoneStripeChunkDecomposition maps positions to the obvious things, computeChunkIds() maps FOVs to overlapping chunks.

Architecture: Details 2 For each spatial chunk have a: –chunk file that contains object position, id, variability probabilities as specified at the beginning of a run. These are read-only. –chunk delta file that contains new objects created during visit processing (new objects are also stored in the database). These are rewritten every visit. Multiple slices load stripes of object chunks in parallel DC2: –no way to send this data back to master via pipeline framework –no way to communicate it to other slices Solution: shared memory Slice 0 Slice 1

Architecture: Details 3 each slice stores loaded object positions into a shared memory chunk store works so long as master and workers live on same machine allows parallel IO in workers, single-threaded matching by master allows multiple association pipeline instances to co-exist –so that Load, Compute, Store can be overlapped has non-trivial effect on code –cannot store pointers in shared memory (different processes may map the shared memory segment to different virtual addresses) –instead, must store offsets relative to the beginning of the shared memory area –requires hand-rolled code for a lot of things normally taken for granted: associative container, memory allocation, etc…

Architecture: Details 4 This is mostly hidden from view by the Chunk and SharedSimpleObjectChunkManager classes. The chunk manager –tracks visits that are in-flight –allocates memory for chunks –registers new visits –tracks which chunks are being used by a visit, enforce 1 owner per chunk –allows a visit to wait until it has acquired ownership of all chunks overlapping the visit FOV (make sure concurrent pipelines don’t step on eachother) –allows to skip reading chunks that a previous visit is still holding in memory –either commits or rolls back all changes to the in-memory chunk store for a given visit.

Architecture: Details 5 A Chunk –supports inserting/removing/updating entries –copy-on-write semantics to allow for rollback: removing an entry really means flagging it as DELETED updating an entry means flagging the existing version as DELETED and inserting a modified copy –provides access to entry flags –supports reading and writing of chunk and chunk delta files, with and without gzip compression –allows a series of inserts/deletes to be marked as committed or rolled back

C++/Python Boundary LoadStage constructs a VisitProcessingContext which contains per-visit state and parameters. It is SWIGed and passed between stages on a Clipboard Stages.h declares functions for each of the high level steps outlined in previous slides - each takes a VisitProcessingContext as parameter Pipeline logic is almost entirely in C++. Exception is StoreStage, which generates SQL scripts from a template and then calls mysql to run them.

DC3 Code Items More extensive Python/C++ interface? Get rid of shared memory chunk store? –would simplify code a lot! –but would also lose functionality ability to spread visits across pipeline instances overlapping Load, Compute, Store Transactions/fault tolerance: –Support this for DC3? At what granularity? Go to a shared nothing architecture with message passing? –Need support from middleware for passing data from master to slice and back. Depending on what is parallelized, even slice to slice communication could be necessary. Could use MPI directly. –Better fit with current pipeline model, can scale beyond a single server –But not clear this is necessary unless algorithms get heavier: by 2014 expect many (32+) cores per server.

DC3 Algorithms Use source classification information from detection (or compute it in association) –http://dev.lsstcorp.org/htmldocs/SourceClassificationTable.pdfhttp://dev.lsstcorp.org/htmldocs/SourceClassificationTable.pdf Vary the match-radius on a per difference source basis? Probabilistic matching (make use of error ellipses for difference sources and objects)? Take magnitudes of difference sources into account? How? Cadence of observations often results in pairs of visits to the same FOV (within 30min) –Take pairs of DiaSources from both visits and do a full orbital fit against the orbits intersecting the FOV (http://listserv.lsstcorp.org/mailman/private/lsst-data/2007- June/003268.html)(http://listserv.lsstcorp.org/mailman/private/lsst-data/2007- June/003268.html) –Cross-match new objects from both to avoid generating back to back alerts for a new moving object (http://listserv.lsstcorp.org/mailman/private/lsst-data/2007- June/003264.html)(http://listserv.lsstcorp.org/mailman/private/lsst-data/2007- June/003264.html)

Association Pipeline Take difference sources, moving object predictions for a visit Compare them to current knowledge of the sky –objects, historical difference.

Similar presentations

Presentation on theme: "Association Pipeline Take difference sources, moving object predictions for a visit Compare them to current knowledge of the sky –objects, historical difference."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Association Pipeline Take difference sources, moving object predictions for a visit Compare them to current knowledge of the sky –objects, historical difference.

Similar presentations

Presentation on theme: "Association Pipeline Take difference sources, moving object predictions for a visit Compare them to current knowledge of the sky –objects, historical difference."— Presentation transcript:

Similar presentations

About project

Feedback