ALMA Archive Operations Impact on the ARC Facilities.

Slides:



Advertisements
Similar presentations
Strategic issues for digital projects... …or, what are we doing here?
Advertisements

Strategic issues for digital projects... …or, what are we doing here?
Distributed Data Processing
Networking Essentials Lab 3 & 4 Review. If you have configured an event log retention setting to Do Not Overwrite Events (Clear Log Manually), what happens.
Living with Exadata Presented by: Shaun Dewberry, OS Administrator, RDC Tom de Jongh van Arkel, Database Administrator, RDC Komaran Hansragh, Data Warehouse.
National Radio Astronomy Observatory June 13/14, 2005 EVLA Phase II Proposal Review EVLA Phase II Computing Development Bryan Butler (EVLA System Engineer.
Recovery Planning A Holistic View Adam Backman, President White Star Software
Lecture 19 Page 1 CS 111 Online Protecting Operating Systems Resources How do we use these various tools to protect actual OS resources? Memory? Files?
Moving Your Computer Lab(s) to the Cloud Rick O’Toole & Dave Hicking University of Connecticut Libraries.
Implementing A Simple Storage Case Consider a simple case for distributed storage – I want to back up files from machine A on machine B Avoids many tricky.
OPNET Technologies, Inc. Performance versus Cost in a Cloud Computing Environment Yiping Ding OPNET Technologies, Inc. © 2009 OPNET Technologies, Inc.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Mainframe Replication and Disaster Recovery Services.
Building a Sustainable Data Center Matthew Holmes Johnson County Community College.
Capacity Planning and Predicting Growth for Vista Amy Edwards, Ezra Freeloe and George Hernandez University System of Georgia 2007.
Understand Database Backups and Restore Database Administration Fundamentals LESSON 5.2.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
AOP version E Science Operations For the ASAC meeting March 9-10, 2010 Lars-Ake Nyman 1.
1 CS 501 Spring 2005 CS 501: Software Engineering Lecture 8 Requirements II.
Page 1JSOC Review – 17 March 2005 Database Servers Challenges A very large database on the order of a few TB -- We can't copy the whole database in real.
DBA Meeting December Supporting the MINOS MySQL Database at FNAL Nick West.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
Cloud Computing Economics Ville Volanen
Operating Systems.
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Storing Data. Memory vs. Storage Storage devices are like file drawers, in that they hold programs and data. Programs and data are stored in units called.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Paper on Best implemented scientific concept for E-Governance Virtual Machine By Nitin V. Choudhari, DIO,NIC,Akola By Nitin V. Choudhari, DIO,NIC,Akola.
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Disk and Tape Square Off Again Tape Remains King of Hill with LTO-4 Presented by Heba Saadeldeen.
November 2009 Network Disaster Recovery October 2014.
Switching Techniques Student: Blidaru Catalina Elena.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
For more notes and topics visit:
Tier 3g Infrastructure Doug Benjamin Duke University.
LAN / WAN Business Proposal. What is a LAN or WAN? A LAN is a Local Area Network it usually connects all computers in one building or several building.
Software License Agreement Negotiation 101 Ray Hsu, C.P.M. Assistant Director, Procurement Services University of Washington.
Nlr.net © 2004 National LambdaRail, Inc 1 Capacity Planning At Layers 1 and 2 Quilt Meeting Oct 13, 2006.
CAA/CFA Review | Andrea Laruelo | ESTEC | May CFA Development Status CAA/CFA Review ESTEC, May 19 th 2011 European Space AgencyAndrea Laruelo.
Consistency And Replication
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
Planning and Designing Server Virtualisation.
3rd Nov 2000HEPiX/HEPNT CDF-UK MINI-GRID Ian McArthur Oxford University, Physics Department
Cloud Computing Characteristics A service provided by large internet-based specialised data centres that offers storage, processing and computer resources.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
1 I 2 S - Allianz plc internal Innovation Award. 2 Allianz plc I 2 S Awards, 2009.
MWA Data Capture and Archiving Dave Pallot MWA Conference Melbourne Australia 7 th December 2011.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
1 © 2008 DataCore Software Corp. — All rights reserved © 2009 DataCore Software Corp DataCore Company Private Information Subject to change without notice.
Spending Plans and Schedule Jae Yu July 26, 2002.
Bulk Data Movement: Components and Architectural Diagram Alex Sim Arie Shoshani LBNL April 2009.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
Sharing Social Content from Home: A Measurement-driven Feasibility Study Massimiliano Marcon Bimal Viswanath Meeyoung Cha Krishna Gummadi NOSSDAV 2011.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
Cosc 4750 Backups Why Backup? In case of failure In case of loss of files –User and system files Because you will regret it, if you don’t. –DUMB = Disasters.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Juraj Sucik, Michal Kwiatek, Rafal.
Getting Ready for the NOCTI test April 30, Study checklist #1 Analyze Programming Problems and Flowchart Solutions Study Checklist.
Open-E Data Storage Software (DSS V6)
Server Virtualization IT Steering Committee, March 11, 2009
ProtoDUNE SP DAQ assumptions, interfaces & constraints
Diskless network security
"Cloud services" - what it is.
(System Development Life Cycle)
Presentation transcript:

ALMA Archive Operations Impact on the ARC Facilities

Data volume 1 year nominal ALMA is 200 TB. we’ve planned for 200 TB during the ‘early’ years in total. no backups! The ARCs are the backups. Replication to ARCs using media if necessary and over network once possible.

Deliverables ARCs will receive the software and support for the procurement of the hardware and the Oracle licenses. We are also including the ARCs in the planning of the archive operational concepts under the assumption that both the hardware and the software are very similar. Database is Oracle, if an ARC for whatever reason decides to deviate from this it has to carry the development and additional operational costs. Essentially this means that it is technically feasible but.... Hardware is essentially the same story, although a bit more relaxed (in particular if we manage to replicate over the network). The NGAS software plays an essential role in the operational concepts.

High level concepts SCO is hub for bulk and meta-data. OSF archive is hidden. Data are first replicated to SCO and from there to the ARCs. In general everything is replicated to the ARCs; in practice part of the monitor and log data might be irrelevant. Proposals are submitted to the SCO and replicated to the ARCs. OT submission interface talks to SCO.

Nominal Operations Database replication will be done using Oracle streams replication technology to the various sites. This means that meta-data will be available at the ARCs within seconds. Bulk data replication will be done using the NGAS mirroring service (network) or the NGAS cloning service (hard disks). The NGAS archives at the ARCs are virtually independent from the SCO NGAS, i.e. they don’t share DB or any other resources, but they ‘know’ of each other. Network transfer: During periods of average data rate the bulk data should arrive at the ARCs within a few minutes as well (network bandwidth limitation).

nominal operations Media transfer: Assuming that we would send media twice a week and that they get delivered within one week, the maximum delay would be 1.5 weeks after the observation. This has to be defined and implemented. Important data could still be replicated through the network. Access to data is always transparently possible, i.e. user accessing ARC can request data even if it has not yet been replicated.

Hardware Full blown ALMA ARC will consist of four 19” racks. 3 x 8 NGAS servers (4 HU, 24 disks) 3 NGAS disk handling and front-end servers (3 HU, 16 disks) 2 database machines (3 HU, 16 disks) Potentially additional disk array for database Terminal server for all machines Network switch

hardware This equipment requires sufficient cooling and good racks One of the 4 HU NGAS servers more than 100 kg, i.e. one of the three racks will be approximately 1 ton. Since one rack with current disk capacity holds about 1 year of ALMA data it should be possible to keep the full ALMA archive stable in terms of total space required by replacing disks with new double capacity disks after 2-3 years (no increase of data rate). This not only saves space, but also power and maintenance.

Deployment Layout of the ARCs is essentially the same as for the SCO. We just asked for prices for the first full installation of hardware at the OSF, which is very similar to 0.75 of an ARC: Price is about 135,000 $ including 110 TB of disk space. 8 x 24 slot machines, plus 14 x 16 slot machines. Only 35% of the slots filled. 1TB disk ~ 300$ (really low dollar :-(()

Prices Totals per TB: 300 $/disk $/slot in computer = 540 $/TB including auxiliary machines (disk handling, DB, front-end). No infrastructure included (racks, cooling, network, UPS...) At the time when we have to procure the hardware for the ARCs this should have gone down to about 270 $/TB. We have to buy about 1 year worth of capacity (200 TB) initially and that should by that time fit in half of the number of machines/slots.