Download presentation
Published byKevin Price Modified over 7 years ago
1
A Total Recall of Data Usage for the MWA Long Term Archive
Chen Wu & the archive team: Dave Pallot, Andreas Wicenec
2
Agenda Overall dataflow Data usage
Exploit data usage for performance modelling GLEAM archive Future work MWA Project Meeting, Perth, June 2016
3
Long-Term Archive ( 3 PB / year)
Overall dataflow Science Archive Galaxy / Magnus Long-Term Archive ( 3 PB / year) Pawsey, Perth (Disk + Tape) UI VO Web Vis + Voltage Online processing Online archive 400 MB/s Beam former Tile Tier 0 (MRO, Western Australia) Tier 1 (Perth, Western Australia) Tier 2 (Asia Pacific & North America) NGAS Client Data Capture Correlator PFB USA Mirrored Archive pipelines RV MS & Images QA Receiver India New Zealand MWA Project Meeting, Perth, June 2016
4
Total archive statistics
Number of files archived ~ 20 million Visibility: ~ 11 million Voltage: ~ 9 million Number of files requested by MWA users Visibility: ~ 14.5 million Volume archived ~ 10.2 PB Visibilities: ~ 8.2 PB Voltage: ~ 2.0 PB Volume requested by MWA users Visibilities: ~ 10.9 PB Request > Ingest MWA Project Meeting, Perth, June 2016
5
Ingest volume breakdown
Project Volume (TB) G0002 G0009 G0008 G0024 D0000 817.34 D0006 681.05 D0004 419.06 D0012 281.67 D0005 278.32 D0011 234.47 G0017 227.11 G0001 219.53 C100 104.51 G0016 79.34 C001 74.98 C102 66.98 G0018 60.33 G0011 43.81 G0004 42.27 D0002 39.29 G0021 35.82 OA002 33.56 G0010 31.37 D0008 30.47 C106 28.15 G0020 23.64 G0015 19.78 C104 17.14 Others 121.82 MWA Project Meeting, Perth, June 2016
6
Request volume breakdown
Project Volume (TB) G0008 G0009 D0000 D0006 G0001 607.97 G0017 321.27 G0016 302.90 G0002 219.09 D0002 179.39 D0005 121.07 G0004 106.17 G0011 102.48 G0010 68.72 D0007 65.81 G0015 65.50 G0020 41.43 G0003 40.73 G0012 39.92 D0008 39.31 G0018 29.45 D0011 15.13 A0001 13.34 G0023 7.74 G0024 5.28 C001 3.87 D0004 1.40 D0009 1.38 C122 0.44 G0021 0.30 G0025 0.16 C100 0.11 G0019 OA002 0.09 D0012 0.04 D0010 0.02 MWA Project Meeting, Perth, June 2016
7
Request region breakdown
MWA Project Meeting, Perth, June 2016
8
File size distribution
MWA Project Meeting, Perth, June 2016
9
Daily usage ( ~ ) 800MB/s 400MB/s MWA Project Meeting, Perth, June 2016
10
Observation time – Access time matrix
Colour map – # of requests 24 Feb 2016, MWA workshop 17th/18th of May MWA Project Meeting, Perth, June 2016
11
HSM storage @ MWA LTA Pawsey
12
Disk “hits” ratio curve
Disk cache eviction policies: AGE_WEIGHT = constant + multiplier*<file_age_in_day> “simulation” using the MWA data access stream consisting of 33 million successful ingestions + requests MWA Project Meeting, Perth, June 2016
13
Evolution of the MWA LTA
MWA Project Meeting, Perth, June 2016
14
GLEAM Archive Over 1 million images 20,000 MeasurementSet 250 TB
NGAS Client IVOA Interface GLEAM VO Server Over 1 million images 20,000 MeasurementSet 250 TB Interactive processing Cutout and regridding, NGAS Tasks Batch (re-)processing - Process all files satisfying some conditions currently in the archive: e.g. Compress all visibility files that are (1) EoR project and (2) Observed on last Friday (MWA) Rescale flux of all snapshot images of GLEAM Phase 1 that are ingested in the past two weeks Make movies from images formed in DEC -26 strip scans Re-index all WCS headers of images ingested from last November Incremental processing - Asynchronously, continuously, and selectively processing "newly" ingested files After a snapshot image tar is ingested, decompress it, and for each FITS image, compute its sky coverage, and update VO database indexes accordingly As soon as a 32MHz image is ingested, if its Robustness is 0, send a copy to RRI at India before transferring it to RDSI In-archive processing GLEAM Archive Store 04 GLEAM Archive Store 06 Interactive Batch Continuous MWA Project Meeting, Perth, June 2016
15
Future work NGAS Public Release v8.0 MWA 2/3 R & Development
Improved stability with a large number of unit tests File container support, Dockerisation, etc. MWA 2/3 Optimal co-design and co-configuration of SW/HW Benchmark storage and compute systems Profile in-archive processing tasks (flagging, compression, etc.) NGAS Dashboard Log data analytics (Spark + Scala + ELK) R & Development Real-time erasure coding for fault-tolerance storage NGAS job framework for in-archive processing MWA Project Meeting, Perth, June 2016
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.