MWA Data Capture and Archiving Dave Pallot MWA Conference Melbourne Australia 7 th December 2011.

Slides:



Advertisements
Similar presentations
Dominik Stokłosa Pozna ń Supercomputing and Networking Center, Supercomputing Department INGRID 2008 Lacco Ameno, Island of Ischia, ITALY, April 9-11 Workflow.
Advertisements

Starfish: A Self-tuning System for Big Data Analytics.
Allocation Methods - Contiguous
SHARKFEST ‘10 | Stanford University | June 14–17, 2010 The Shark Distributed Monitoring System: Distributing Wireshark Deep Packet Analysis to LAN/WAN.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
1 File Management (a). 2 File-System Interface  File Concept  Access Methods  Directory Structure  File System Mounting  File Sharing  Protection.
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Internetworking.
Atacama Large Millimeter/submillimeter Array Expanded Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array.
Module – 11 Local Replication
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
First year experience with the ATLAS online monitoring framework Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration.
Aus-VO: Progress in the Australian Virtual Observatory Tara Murphy Australia Telescope National Facility.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
8/21/2015J-PARC1 Data Management Machine / Application State Data.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
Intranet and internet based software components. 2 Overview  What are intranet and internet based map applications?  System Requirements  Architecture.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
Input/OUTPUT [I/O Module structure].
1. 2 Purpose of This Presentation ◆ To explain how spacecraft can be virtualized by using a standard modeling method; ◆ To introduce the basic concept.
50mm Telescope ACS Course Garching, 15 th to 19 th January 2007 January 2007Garching.
ICT Coordination and Planning Meeting #1 (17-19 April 2013) ALMA Dashboard 1.0 Giorgio Filippi The Atacama Large Millimeter/submillimeter Array.
Components of Database Management System
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
MCTS Guide to Microsoft Windows Vista Chapter 4 Managing Disks.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Relational APDM & Relational ASDM models effort done in online.
Doug Tody E2E Perspective EVLA Advisory Committee Meeting December 14-15, 2004 EVLA Software E2E Perspective.
ASI-Eumetsat Meeting Matera, 4-5 Feb CNM Context Matera, February 4-5, 20092ASI-Eumetsat Meeting.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
MWA Operations Management Plan (OMP, v. 3) Ron Remillard (MIT); MWA Project Meeting, June 6, 2011.
ALMA Archive Operations Impact on the ARC Facilities.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
ICALEPCS’ GenevaACS in ALMA1 Allen Farris National Radio Astronomy Observatory Lead, ALMA Control System.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Slide 1 Archive Computing: Scalable Computing Environments on Very Large Archives Andreas J. Wicenec 13-June-2002.
14 June, 2004 EVLA Overall Design Subsystems II Tom Morgan 1 EVLA Overall Software Design Final Internal Review Subsystems II by Tom Morgan.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
Software development Control system of the new IGBT EE switch.
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
IT3002 Computer Architecture
EVLA Data Processing PDR E2E Data Archive System John Benson, NRAO July 18, 2002.
Overview of MWA Data Archive Subsystem Team: Dave Pallot, Chen Wu, Andreas Wicenec 22 May 2012.
Hyperion :High Volume Stream Archival Divya Muthukumaran.
Object and Class Structuring Chapter 9 Part of Analysis Modeling Designing Concurrent, Distributed, and Real-Time Applications with UML Hassan Gomaa (2001)
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Indira Gandhi National Open University presents. A Video Lecture Course: Computer Platforms.
Scenario use cases Szymon Mueller PSNC. Agenda 1.General description of experiment use case. 2.Detailed description of use cases: 1.Preparation for observation.
Report on Vector Prototype J.Apostolakis, R.Brun, F.Carminati, A. Gheata 10 September 2012.
ETM ® System Release 5.2 Overview and Release 6.0 Preview Ronnie Ganske, Chief Architect Tuesday, March 27, 2007.
Metadata for the SKA - Niruj Mohan Ramanujam, NCRA.
A Total Recall of Data Usage for the MWA Long Term Archive
Software Overview Sonja Vrcic
Bryan Butler EVLA Computing Division Head
Chapter 11: File System Implementation
NRAO VLA Archive Survey
CMS High Level Trigger Configuration Management
Distributed Network Traffic Feature Extraction for a Real-time IDS
Computing Architecture
Data Orgnization Frequently accessed data on the same storage device?
Presentation transcript:

MWA Data Capture and Archiving Dave Pallot MWA Conference Melbourne Australia 7 th December 2011

Talking Points Data Capture and Archive System –Systems overview –Correlator data capture –RTS data capture –On-site data operations –The Next Generation Archiving System (NGAS). –Archive Details

Data Capture and Archive System Capture the data products from the MWA (MRO) and transport it to the peta-byte storage facility at the Pawsey Center (Perth) for later retrieval, processing and analysis.

Data Capture and Archive System Correlator: –24x GPU-X –~32 MB/s (0.5 sec, 40 kHz, 32-bit) On-site Storage: –~48 TB of transportable storage Pawsey: –15 PB reserved for MWA –96 GPU nodes for data processing

System Flow 1.Monitor & Control tells correlator to capture data. 2.Correlator dumps visibility data to configurable storage location. 3.Monitor & Control tells correlator to stop data capture. 4.Visibility files are produced, collected and transported to Pawsey for archiving (NGAS). 5.Observations, with their visibilities, are accessed and images are produced.

Correlator Data Capture Data capture modes: -Save all. -Save all on trigger. Save All Mode –Dump all visibility data to a single data file per machine for the fixed duration of a single observation. –Size of each visibility file is dependant on the output block size and the duration of the observation.

Correlator Data Capture cont. Save All on Trigger Mode Stream data to a circular disk buffer and only produce a visibly data file (flush the buffers) when triggered i.e. something interesting happens. –Telescope continuously on. Trigger is activated by an expert who is external to the capture process. –Architecture allows automatic detection and triggering via various pipelines. If there is no trigger then no visibility data is flushed to file. Once triggered, the observation has ended.

Correlator Data Capture cont. –Circular buffer size of possibly 100’s GB on disk. Example: 100 GB / 32 MB/s ≅ 52 mins –Circular buffer size can configured. Must be defragmented into a contiguous block to get maximum I/O performance.

Correlator Data Capture cont. In both modes: –One visibility file per machine per observation is produced. Total of 24 files per observation –Same data format and filenames. No special treatment of data files once they are produced. Special treatment of data buffers but that is hidden. Files will have unique identifiers in the file name to link them to the meta-data in our databases.

RTS Data Capture Accumulate and generate images on the GPU? –Avoid accessing visibilities from disk storage –Performance reasons (Concurrent disk access) Images dumped to separate location to visibilities. –Visibilities can be purged if the RTS images are bad. –Will not be transported as they can be reconstructed. Required more discussion –Mitch (RTS Group), M&C group, Curtin.

On-site Data Operations Facility to process images from archiving node on-site –Tools to access visibilities form local storage. –Images/processing will be done outside of the MWA data pipeline. Ability to “flag” bad data –Can be purged before transportation. –Who makes that decision?

Data Transport Data transport from MRO to Perth? –Transportable disk array 48 TB of storage Interim measure –10 Gb NBN Fiber link form MWA to Pawsey Termination location and timeframe is uncertain Transportation and archive coordination –NGAS

Next Generation Archiving System (NGAS) Distributed storage software solution. Operate transparently across physically and logically separated location –Reliable communications (HTTP interface) –Supports archive replication and mirroring. –Access to data on-site and through the archive. Scalable as it can co-ordinate multi-peta bytes of storage. Lots of tools. Proven architecture for archiving large data sets. –National Radio Astronomy Observatory (NRMO) –Atacama Large Millimeter/submillimeter Array (ALMA)

Archive Standard features you would expect from an archive. –Performance/usage trends, retrieval, store, etc Specific features to MWA –Sky Maps, Temperature plots, etc –Will evolve over time Comprehensive meta-data search tool –RA/DEC, Source, Gains, Freq, Date/Time, temperatures etc Pawsey supercomputer node. –Generate images from a composite set of visibilities. –Fully configurable pipeline plug-in architecture to archive. –Reduce I/O, storage & processing constraints for single users.

Current state of play Raised a PO for 48 TB transportable storage array and controllers. –Arrive in the new year. Data capture modes ready for first “Quarter T” roll-out. –May-June 2012 First cut of archive subsystems (NGAS) –Implementation, benchmarking, commissioning, interfaces. –April 2012

Thank You