Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Total Recall of Data Usage for the MWA Long Term Archive

Similar presentations


Presentation on theme: "A Total Recall of Data Usage for the MWA Long Term Archive"— Presentation transcript:

1 A Total Recall of Data Usage for the MWA Long Term Archive
Chen Wu & the archive team: Dave Pallot, Andreas Wicenec

2 Agenda Overall dataflow Data usage
Exploit data usage for performance modelling GLEAM archive Future work MWA Project Meeting, Perth, June 2016

3 Long-Term Archive ( 3 PB / year)
Overall dataflow Science Archive Galaxy / Magnus Long-Term Archive ( 3 PB / year) Pawsey, Perth (Disk + Tape) UI VO Web Vis + Voltage Online processing Online archive 400 MB/s Beam former Tile Tier 0 (MRO, Western Australia) Tier 1 (Perth, Western Australia) Tier 2 (Asia Pacific & North America) NGAS Client Data Capture Correlator PFB USA Mirrored Archive pipelines RV MS & Images QA Receiver India New Zealand MWA Project Meeting, Perth, June 2016

4 Total archive statistics
Number of files archived ~ 20 million Visibility: ~ 11 million Voltage: ~ 9 million Number of files requested by MWA users Visibility: ~ 14.5 million Volume archived ~ 10.2 PB Visibilities: ~ 8.2 PB Voltage: ~ 2.0 PB Volume requested by MWA users Visibilities: ~ 10.9 PB Request > Ingest MWA Project Meeting, Perth, June 2016

5 Ingest volume breakdown
Project Volume (TB) G0002 G0009 G0008 G0024 D0000 817.34 D0006 681.05 D0004 419.06 D0012 281.67 D0005 278.32 D0011 234.47 G0017 227.11 G0001 219.53 C100 104.51 G0016 79.34 C001 74.98 C102 66.98 G0018 60.33 G0011 43.81 G0004 42.27 D0002 39.29 G0021 35.82 OA002 33.56 G0010 31.37 D0008 30.47 C106 28.15 G0020 23.64 G0015 19.78 C104 17.14 Others 121.82 MWA Project Meeting, Perth, June 2016

6 Request volume breakdown
Project Volume (TB) G0008 G0009 D0000 D0006 G0001 607.97 G0017 321.27 G0016 302.90 G0002 219.09 D0002 179.39 D0005 121.07 G0004 106.17 G0011 102.48 G0010 68.72 D0007 65.81 G0015 65.50 G0020 41.43 G0003 40.73 G0012 39.92 D0008 39.31 G0018 29.45 D0011 15.13 A0001 13.34 G0023 7.74 G0024 5.28 C001 3.87 D0004 1.40 D0009 1.38 C122 0.44 G0021 0.30 G0025 0.16 C100 0.11 G0019 OA002 0.09 D0012 0.04 D0010 0.02 MWA Project Meeting, Perth, June 2016

7 Request region breakdown
MWA Project Meeting, Perth, June 2016

8 File size distribution
MWA Project Meeting, Perth, June 2016

9 Daily usage ( ~ ) 800MB/s 400MB/s MWA Project Meeting, Perth, June 2016

10 Observation time – Access time matrix
Colour map – # of requests 24 Feb 2016, MWA workshop 17th/18th of May MWA Project Meeting, Perth, June 2016

11 HSM storage @ MWA LTA Pawsey

12 Disk “hits” ratio curve
Disk cache eviction policies: AGE_WEIGHT = constant + multiplier*<file_age_in_day> “simulation” using the MWA data access stream consisting of 33 million successful ingestions + requests MWA Project Meeting, Perth, June 2016

13 Evolution of the MWA LTA
MWA Project Meeting, Perth, June 2016

14 GLEAM Archive Over 1 million images 20,000 MeasurementSet 250 TB
NGAS Client IVOA Interface GLEAM VO Server Over 1 million images 20,000 MeasurementSet 250 TB Interactive processing Cutout and regridding, NGAS Tasks Batch (re-)processing - Process all files satisfying some conditions currently in the archive: e.g. Compress all visibility files that are (1) EoR project and (2) Observed on last Friday (MWA) Rescale flux of all snapshot images of GLEAM Phase 1 that are ingested in the past two weeks Make movies from images formed in DEC -26 strip scans Re-index all WCS headers of images ingested from last November Incremental processing - Asynchronously, continuously, and selectively processing "newly" ingested files After a snapshot image tar is ingested, decompress it, and for each FITS image, compute its sky coverage, and update VO database indexes accordingly As soon as a 32MHz image is ingested, if its Robustness is 0, send a copy to RRI at India before transferring it to RDSI In-archive processing GLEAM Archive Store 04 GLEAM Archive Store 06 Interactive Batch Continuous MWA Project Meeting, Perth, June 2016

15 Future work NGAS Public Release v8.0 MWA 2/3 R & Development
Improved stability with a large number of unit tests File container support, Dockerisation, etc. MWA 2/3 Optimal co-design and co-configuration of SW/HW Benchmark storage and compute systems Profile in-archive processing tasks (flagging, compression, etc.) NGAS Dashboard Log data analytics (Spark + Scala + ELK) R & Development Real-time erasure coding for fault-tolerance storage NGAS job framework for in-archive processing MWA Project Meeting, Perth, June 2016


Download ppt "A Total Recall of Data Usage for the MWA Long Term Archive"

Similar presentations


Ads by Google