The ROB SDO data system All of the Sun all of the time: Distributing 1TB/day from the Solar Dynamics Observatory satellite, 24/7 for 5+ years ROB for BNC 2009 Brussels
The Presentation The people The science - studying the Sun The system - satellite, network, data centres In practice - getting it all to work The people David Boyes, Véronique Delouille, Benjamin Mampaey, Tobias Berghoff, Cis Verbeeck, Jean-François Hochedez (Royal Observatory of Belgium) ROB for BNC 2009 Brussels
Why look at the Sun? Weather - it affects us but can be forecast Science - there is still a lot to find out! why is the solar corona so hot what drives mass ejections why is there an 11 year cycle … and much more ROB for BNC 2009 Brussels
Why have satellites? The earths atmosphere blocks a lot - UV and above At these wavelengths the structure of the sun is revealed ROB for BNC 2009 Brussels
What are the effects on earth? Radiation - effects are immediate, goal is to predict them Particles - arrive in hours or days - we can give warning This is solar weather forecasting - ROB is a Regional Warning Center Graphic : NASA ROB for BNC 2009 Brussels
What's up there? Telescopes to observe the Sun's atmosphere at multiple UV wavelengths (AIA) Telescopes to measure specific wavelengths to allow calculation of magnetic fields and seismic activity (HMI) Wide band UV sensor to measure total UV spectrum (EVE) ROB for BNC 2009 Brussels
Some numbers about SDO A massive increase in data quantity and precision - 1000 to 10000 times as much data as current satellites Flies at 38 000 km, geosynchronous orbit AIA - images at 10 wavelengths from visible to 131Å, one image every 1.25s HMI – one magnetic image every 45s EVE – irradiance time series from 10 to 1050Å Images are 4Kx4K - 32MB per image A lot of data - more than 1TB/day ROB for BNC 2009 Brussels
The challenges of SDO Huge bandwidth Lots of data to be made available Too much data for humans to absorb ROB for BNC 2009 Brussels
The solution A worldwide network of data stores holding current quarter and popular data Joined by high-speed network Pushing a full copy of data to as wide an area as possible in compact form Software system (netDRMS with internal PostgreSQL) at each data store provides virtual storage for file requests from users Transparent access to any data, if needed going down to original source data Local users have the impression they have file access Web based mediation for remote use interface Automatic processing by high performance computing ROB for BNC 2009 Brussels
What's down here Ground station - main station at White Sands Mission Operations Center at Goddard Joint Science Operations Centers (JSOC) - Stanford and Colorado Knowledge base – Lockheed, Virtual Observatory - Harvard Storage at White Sands, JSOC and Data Centres Compute clusters and data servers at Data Centres A network of Data Centres... ROB for BNC 2009 Brussels
The Data Centres ROB for BNC 2009 Brussels
What this enables Many groups working in parallel on the unprecedented flow of data Simultaneous access and processing of bulk data in many high-performance systems Online access for forecasters to complete data to refine their techniques Completely open and low cost access to all data for both researchers with specific interests and for researchers with limited budgets ROB for BNC 2009 Brussels
Does it work? Yes it does – e.g. two weeks 320Mb/s from Harvard ROB for BNC 2009 Brussels
Network requirements Throughput Availability One set of data takes around 200Mb/s Requires 320Mb/s to handle catch ups Practical limit is network chain topology Availability More than five year, probably ten year operation 24/7, 365 Must maintain full performance for backbone data system even with subsystem failures ROB for BNC 2009 Brussels
In practice - Bandwidth-Delay product There are simply a lot of Bytes in the cable - this is the Bandwidth-Delay (BD) product Problem with the TCP protocol is that buffer size >= 2 * bandwidth * delay and the actual size is adaptive For example 200Mb/s and 0.1s → 5MB, and you need about twice that for adaptation. Standard Linux buffer size is 64K! Plus you can run into congestion control limits – designed to share traffic fairly! ROB for BNC 2009 Brussels
In practice - BD product Fixes … Use an improved scp (HPN-scp) Use multiple sessions Use another protocol Use a tool which combines these (e.g. GridFTP) We use multiple sessions in user space Raw bandwidth tests used many more than needed Production system has tool which interfaces with the data system ROB for BNC 2009 Brussels
In practice - routing Check it - you might be surprised Different networks have quite legitimate different behaviour What didn't get noticed with e-mail and web pages can still be a problem The odd few minutes for e-mail don't get noticed Low speed at 2am probably won't get noticed ROB for BNC 2009 Brussels
In practice - reliability Can't use terms like guarantee – things will go wrong Can't be qualitative – this is way beyond normal hardware reliability You still need quality, duplication, spares and conservative ratings Must get quantitative – failure analysis and point of failure identification Time to repair (night shifts!) is critical Must be able to detect failures A single failure will not show up in system performance ROB for BNC 2009 Brussels
In practice - reliability This is how the Belnet connections at ROB deliver reliability Single failures do not affect data flow, regardless of which HA node is active You must check that no failure has occurred ROB for BNC 2009 Brussels
In practice - last mile It's here you will have the most problems Both ends will need work Firewalls Routers Just where is the cable really A server is not quite as good as the manufacturer said Again, situations which might have gone unnoticed will make themselves known But you are right there to fix them... ROB for BNC 2009 Brussels
Where it's at The data network is running The data transfer system is testing The system is being documented So ... it's looking good ROB for BNC 2009 Brussels
Further reading SDO at the ROB http://wissdom.oma.be Belnet http://www.belnet.be SDO http://sdo.gsfc.nasa.gov ESnet Network Performance Knowledge Base http://fasterdata.es.net High Performance Enabled SSH/SCP http://www.psc.edu/networking/projects/hpn-ssh ROB for BNC 2009 Brussels
Thanks for your interest and success with your projects The ROB SDO data system Thanks for your interest and success with your projects ROB for BNC 2009 Brussels