Data Intensive Astronomy Group Talk II ICRAR Con 4 September 2015 Chen Wu.

Slides:



Advertisements
Similar presentations
September 13, 2004NVO Summer School1 VO Protocols Overview Tom McGlynn NASA/GSFC T HE US N ATIONAL V IRTUAL O BSERVATORY.
Advertisements

The Big Picture Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically This has lead to.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
SALSA HPC Group School of Informatics and Computing Indiana University.
Software for Science Support Systems EVLA Advisory Committee Meeting, March 19-20, 2009 David M. Harland & Bryan Butler.
Solar and STP Physics with AstroGrid 1. Mullard Space Science Laboratory, University College London. 2. School of Physics and Astronomy, University of.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
LAT Data Server Workshop - 1 Jan 13-14, 2005 Tom Stephens GSSC Database Lead GSSC LAT Data Server Overview.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Page 1JSOC Review – 17 March 2005 Database Maintenance Karen Tian
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Web Search – Summer Term 2006 V. Web Search - Page Repository (c) Wolfgang Hürst, Albert-Ludwigs-University.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
M. Lautenschlager (M&D/MPIM)1 The CERA Database Michael Lautenschlager Modelle und Daten Max-Planck-Institut für Meteorologie Workshop "Definition.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
Digitized Sky Survey Update Brian McLean : Archive Sciences Branch / Operations and Engineering Division.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
1 The Terabyte Analysis Machine Jim Annis, Gabriele Garzoglio, Jun 2001 Introduction The Cluster Environment The Distance Machine Framework Scales The.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
MWA Data Capture and Archiving Dave Pallot MWA Conference Melbourne Australia 7 th December 2011.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Mainframe (Host) - Communications - User Interface - Business Logic - DBMS - Operating System - Storage (DB Files) Terminal (Display/Keyboard) Terminal.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
Archive Access Tool Review of SSS Readiness for EVLA Shared Risk Observing, June 5, 2009 John Benson Scientist.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
06-1L ASTRO-E2 ASTRO-E2 User Group - 14 February, 2005 Astro-E2 Archive Lorella Angelini/HEASARC.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
A Technical Overview Bill Branan DuraCloud Technical Lead.
EVLA Data Processing PDR E2E Data Archive System John Benson, NRAO July 18, 2002.
Overview of MWA Data Archive Subsystem Team: Dave Pallot, Chen Wu, Andreas Wicenec 22 May 2012.
November 1, 2004 ElizabethGallas -- D0 Luminosity Db 1 D0 Luminosity Database: Checklist for Production Elizabeth Gallas Fermilab Computing Division /
Publishing Combined Image & Spectral Data Packages Introduction to MEx M. Sierra, J.-C. Malapert, B. Rino VO ESO - Garching Virtual Observatory Info-Workshop.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
M. Lautenschlager (M&D/MPIM)1 WDC on Climate as Part of the CERA 1 Database System Michael Lautenschlager Modelle und Daten Max-Planck-Institut.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
A Total Recall of Data Usage for the MWA Long Term Archive
DCS Status and Amanda News
Database Replication and Monitoring
StoRM: a SRM solution for disk based storage systems
Overview of the Belle II computing
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Introduction to Data Management in EGI
DI4R, 30th September 2016, Krakow
CTA: CERN Tape Archive Adding front-ends and back-ends Status report
Kirill Lozinskiy NERSC Storage Systems Group
R. Graciani for LHCb Mumbay, Feb 2006
Bryan Butler (for Bill Sahr)
Google Sky.
Presentation transcript:

Data Intensive Astronomy Group Talk II ICRAR Con 4 September 2015 Chen Wu

1 –MWA data system –MWA data usage –MWA storage modelling –GLEAM archive –GLEAM VO usage –In-archive processing Agenda

2 MWA Data System

3 MWA Archive

4

5 MWA ingest and retrieval 24 Jan Aug 2014

6 Data access by region 14 million successful retrievals

7 MWA usage

8 Distribution of file freshness and staleness

Data MWA LTA Pawsey 9 10 Gbps Stage Release Fast Disk Storage Tape Library DM-GET DM-PUT Archive Stage FE Nodes CXFS API POSIX MWA Pawsey Hierarchical Storage Management (DMF) Tape Library x2 (CSIRO - library + cables + optics) x32 Bulk Disk Storage Science DB M&C DB

10 Staging time  AGE_WEIGHT = constant + multiplier*

11 Storage performance modelling “simulation” using the MWA data access stream (25 million successful requests) (a, b, c, d, c, a)  3 is the reuse distance (a, b, c, d, c, a)  a is staged from the tape

12 GLEAM IVOA Interface GLEAM VO Server Web Interface GLEAM Archive Store 04 GLEAM Archive Store 06 NGAS Client Over 800,000 images 20,000 MeasurementSet 220 TB

13 GLEAM usage on all-sky view All sky view in Aladin Lite!

14 In archive processing Some “real” requirements from both MWA and GLEAM: –Interactive processing Cutout and regridding, NGAS Tasks –Batch (re-)processing - Process all files satisfying some conditions currently in the archive: e.g. Compress all visibility files that are (1) EoR project and (2) Observed on last Friday (MWA) Rescale flux of all snapshot images of GLEAM Phase 1 that are ingested in the past two weeks Make movies from images formed in DEC -26 strip scans Re-index all WCS headers of images ingested from last November –Incremental processing - Asynchronously, continuously, and selectively processing "newly" ingested files After a snapshot image tar is ingested, decompress it, and for each FITS image, compute its sky coverage, and update VO database indexes accordingly As soon as a 32MHz image is ingested, if its Robustness is 0, send a copy to RRI at India before transferring it to RDSI –NGAS Job Framework With the same spirit of MapReduce File  Object  Container or DROP

15 Data re-processing Web UI

16 –MWA data system –MWA data usage –MWA storage modelling –GLEAM archive –GLEAM VO usage –In-archive processing Conclusion

17 Thank you! Q & A