The Astro-Wise Federation … staying connected Lorentz center, Monday, 31 March 2008.

Slides:



Advertisements
Similar presentations
Computing Infrastructure
Advertisements

ITEC474 INTRODUCTION.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
IB in the Wide Area How can IB help solve large data problems in the transport arena.
Content  Overview of Computer Networks (Wireless and Wired)  IP Address, MAC Address and Workgroups  LAN Setup and Creating Workgroup  Concept on.
Lofar Survey workshop 8/3/2007 Lofar -Surveys and AstroWise OmegaCEN NOVA – Kapteyn Institute -RuG 8 March 2007 Lofar Survey Workshop OmegaCEN NOVA – Kapteyn.
1 A Framework for Lazy Replication in P2P VoD Bin Cheng 1, Lex Stein 2, Hai Jin 1, Zheng Zhang 2 1 Huazhong University of Science & Technology (HUST) 2.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Algorithms (Contd.). How do we describe algorithms? Pseudocode –Combines English, simple code constructs –Works with various types of primitives Could.
VO-lecture Valentijn 17/1/2007 Astro-Wise Wide Field imaging Facilitate: handling, calibration, quality control, pipelining, user tuned research, archiving,
Data Grid: GRASP Mike Smorul. Grid Retrieval and Search Platform Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
F Fermilab Database Experience in Run II Fermilab Run II Database Requirements Online databases are maintained at each experiment and are critical for.
Distributed Databases
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
COMPUTER CONCEPTS.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
Suggested Exercise 9 Sarah Diesburg Operating Systems CS 3430.
ILDG5QCDgrid1 QCDgrid status report UKQCD data grid Chris Maynard.
Astro-Wise workshop Nov 2005 Astro-Wise workshop Lorentz center Nov 2005 Munchen, Napoli, Paris, Bonn, Bochum, Nijmegen, Leiden, Groningen Lorentz.
COMPUTER MODELS FOR SKY IMAGE ANALYSIS OF THE INASAN ZVENIGOROD OBSERVATORY Sergei Pirogov ( Institute of Astronomy, Russian Academy of Sciences) VIIth.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Windows 2000 Operating System -- Active Directory Service COSC 516 Yuan YAO 08/29/2000.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Block1 Wrapping Your Nugget Around Distributed Processing.
Open Search Office Web Services Database Doc Mgt Sys Pipeline Index Geospatial Analysis Text Search Faceting Caching Query parsing Clustering Synonyms.
Astro-WISE & Grid Fokke Dijkstra – Donald Smits Centre for Information Technology Andrey Belikov – OmegaCEN, Kapteyn institute University of Groningen.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
10 June 2002Towards an International VO - Garching bei Munchen 1 ASTRO-WISE An Astronomical Wide-field Imaging System for Europe Konrad Kuijken, Edwin.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
1 Chapter Overview Preparing to Upgrade Performing a Version Upgrade from Microsoft SQL Server 7.0 Performing an Online Database Upgrade from SQL Server.
A Brief Documentation.  Provides basic information about connection, server, and client.
7. Replication & HA Objectives –Understand Replication and HA Contents –Standby server –Failover clustering –Virtual server –Cluster –Replication Practicals.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
{ Cloud computing. Exciting and relatively new technologies allow computing to be a part of our everyday lives. Cloud computing allows users to save their.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
EURO-VO: GRID and VO Lofar Information System Design OmegaCEN Kapteyn Institute TARGET- Computing Center University Groningen Garching, 10 April 2008 Lofar.
Lofar Information System on GRID A.N.Belikov. Lofar Long Term Archive Prototypes: EGEE Astro-WISE Requirements to data storage Tiers Astro-WISE adaptation.
M-Connect John Pinder. Agenda Problems? Future development SQL server Oracle Web based Data dictionary Central repository.
A Scalable Distributed Datastore for BioImaging R. Cai, J. Curnutt, E. Gomez, G. Kaymaz, T. Kleffel, K. Schubert, J. Tafas {jcurnutt, egomez, keith,
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
Cloud Computing By Reedy McGeady. What is Cloud Computing? Cloud Computing is using another organisations computer, which are known as hosts.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
EGEE is a project funded by the European Union under contract IST Test di GPFS a Catania IV Workshop INFN Grid – Bari Ottobre
SURENDRA INSTITUTE OF ENGINEERING & MANAGEMENT PRESENTED BY : Md. Mubarak Hussain DEPT-CSE ROLL
Apache Ignite Data Grid Research Corey Pentasuglia.
Experiences with Large Data Sets
Network Requirements Javier Orellana
TYPES OF SERVER. TYPES OF SERVER What is a server.
2018 Huawei H Real Questions Killtest
Future Data Architecture Cloud Hosting at USGS
Distributed Databases
COMPASS Database SPACE TELESCOPE SCIENCE INSTITUTE Gretchen Greene
Sarah Diesburg Operating Systems CS 3430
Distributed Systems and Concurrency: Distributed Systems
Presentation transcript:

The Astro-Wise Federation … staying connected Lorentz center, Monday, 31 March 2008

Components  Databases  Dataservers  Distributed Processing Units  Users  Networks

The components of the federation  Databases Identical copies at all AstroWise nodes  Dataservers Multiple dataservers per node Cache frequently accessed data from other nodes  DPUs – Distributed Processing Units Process data using database and dataservers at the “local” node  Networks  Users

Databases  Hub-and-spoke replication using Oracle Streams  Groningen as hub  Bonn, München, Napoli as spokes  Databases can be used locally if network to hub is down  Active since about one year

Databases  15 GByte metadata  60 GByte sources and associations excluding USNO and SDSS  50 GByte indexes  600 GByte filespace including USNO and SDSS

Databases  Groningen581 GByte Includes SDSS, USNO  Bonn1 GByte Copying from scratch after crash  München70 GByte Raw FITS headers missing  Napoli7 GByte Sources and Associations still missing

Databases – what next  Upgrade database hardware Groningen  Upgrade to Oracle 11g Groningen, München  Increase network bandwidth Napoli  Copy remaining data  Local accounts for federated users

Dataservers  Peer-to-peer  Write locally, read anywhere  Local copy in cache of remote files  Connected for more than a year  Today connected between Bonn, Groningen, München, Napoli and Nijmegen

Dataservers  25 Dataservers  363 TByte total online storage  23 TByte in use  2 million files

DPUs  Distributed Processing Unit  Clusters with 8 nodes up to 100’s of nodes  Front end to GRID processing  Active for more than a year  Available in Bonn, Groningen, München, Napoli and soon in Nijmegen

Networks  Speed from or to Groningen Bonn2~5 MByte/s München2~5 MByte/s Napoli<0.5 MByte/s  5MByte/s radio link to be installed Nijmegen>5 MByte/s ?  Databases and dataservers share the same bandwidth

Discussion  Questions?  Comments?  Suggestions?

The components of the federation  Databases Identical copies at all AstroWise nodes  Dataservers Multiple dataservers per node Cache frequently accessed data from other nodes  DPUs – Distributed Processing Units Process data using database and dataservers at the “local” node  Networks  Users

Today’s federation  5 active nodes (Bonn, Groningen, München, Napoli, Nijmegen)  4 databases (Bonn, Groningen, München, Napoli) 150GB of sources and associations  Includes USNO and SDSS 5GB of metadata 50GB of indexes  10 dataservers 22TB of science and calibration images  5 DPU’s  Number of users 20-30

Tomorrow’s federation  6 active nodes (+Leiden)  6 DPU’s (+Nijmegen)  4 databases 150GB of sources and associations  Includes USNO and SDSS 5GB of metadata 50GB of indexes  18 dataservers 22TB of science and calibration images

Databases  Query locally and add data locally  Peer to peer copies using Streams  Metadata is almost instantly available  100,000 sources with 300 double precision parameters take less than a minute to appear on a different node (roughly 4MByte/s depending on the network bandwidth)  Each database knows each filename in the federation

Remote database not accessible  Remote database can be down or The network in between can be down  Local queries are not affected  Data that is added locally will be copied as soon as the remote database is up again  Users will not be aware that a remote database is down, but administrators will

Dataservers  Peer to peer access without copies  Dataservers know of each other  Store locally and retrieve from anywhere  Cache frequently used remote data because of bandwidth A small fraction of the data is accessed very frequently when processing Local network speed is about ten times or more the network speed between nodes

Dataserver not accessible  Any dataserver can be down or The network to it can be down  If a copy in the local cache is available it will be used  If data is not available in a local cache processing can only succeed if the dataserver can be accessed again

DPU  DPU is independent of other DPU’s  DPU functions independently of which database it is told to use  DPU will retrieves data from the dataservers that are federated