Presentation is loading. Please wait.

Presentation is loading. Please wait.

TT-GISC – Brasilia – October 2015 Rémy Giraud (Météo-France)

Similar presentations


Presentation on theme: "TT-GISC – Brasilia – October 2015 Rémy Giraud (Météo-France)"— Presentation transcript:

1 TT-GISC – Brasilia – 13-16 October 2015 Rémy Giraud (Météo-France)
The 24h cache in the cloud TT-GISC – Brasilia – October 2015 Rémy Giraud (Météo-France)

2 A short history For GISC to GISC communication and in particular for exchange of « GlobalExchange » data the current architecture is based on an any to any unicast based solution The solution is: Politically very challenging (some bilateral links are virtually impossible to set up) Technically difficult to establish, to monitor and to maintain Financially unattractive as the same data is sent multiple times on an expensive network In 2014, at TT-GISC and ET-CTS, a solution using a cloud based approach was presented It was then agreed to run a pilot in 2015 to assess whether this option was viable Suitability of this solution will be established based on: The technical outcome of the pilot The result of a questionnaire (it appears that some organisations forbid the storage of their data into the cloud) The financial and contractual aspects The agreement of the GISCs to proceed with such a solution

3 The should and the may What we should have now…
What we may have tomorrow…

4 The pilot The pilot is managed by ET-CTS (R.Giraud) with the support of H.Kiehl (DWD) From a technical point of view, we are using AFD (Automated File Distribution). The dataflow is presented in next slide. As it is a test, it is only using the Internet. Pilot limited to data exchange, not to metadata The timeline of the pilot: 2015 May: ET-CTS conf call to present and discuss the plan Based on ET-CTS outcome, communication to TT-GISC for volunteers ECMWF as part of their own evaluation of a cloud based approach for their dissemination has agreed to « loan » two VMs for up to a year starting in May 2015 June: Installation of the system and configuration of AFD (June) July Interested GISC will be invited to join the pilot from July Septembre Each GISC will be able to join on a piecemeal approach All GISC in the pilot will be on board Issue the questionnaire We will then run a three-month evaluation 2016 January: we will prepare the report for ET-CTS 2016 session

5 The potential dataflow
The file system of the VM in the cloud AFD copy files in other outgoing and in cache GISC A uploads file in its incoming GISC B uploads file in its incoming /data /GISC A /GISC B /GISC I /incoming /incoming 24h cache /incoming /outgoing /outgoing /outgoing /24h cache /24h cache /24h cache The files in /incoming are deleted after processing. The files in the outgoing are available if a GISC wants to download data again. The files in the cache are kept as a reference of the 24h cache. CRON clear files older than 24h AFD push files to other GISCs

6 The status on october 12th 2015
Thirteen GISCs are pushing the data to the cloud server (Moscow, Brasilia, Offenbach, Beijing, Toulouse, Exeter, Melbourne,Tokyo, Seoul, Jeddah*, Tehran*,Washington*, Pretoria*) One GISCs has confirmed is willingness to join the pilot (Dehli) One GISC hasn’t made their decision clear (Casablanca) Protocol used is either FTP or SFTP Some are sending the GFNC (files like A_*.txt) and others are sending CCCC files. In all cases, bulletins are stored as individual files in the cache We have started experiment grouping files in .tar.bz2 (tests as shown that we should reduce the overall size to be transferred by half and divided by three at least the number of files) We plan to include a trial of AMQP later in the process (*) These GISCs are sending their data via a 3rd party (Offenbach and Exeter)

7 Some questions and answers so far
Questions from the participants: What file names should we use? Att. II 15 describes two methods: CCCCNNNNNNNN.[a,b,ua,ub] Global File Naming Convention Answer: both are possible Question 1: is it the job of the distribution software to be “WMO aware” ? Question 2: should it just receive/push files unknowingly of their content? What happen with duplicates? Answer: nothing. Everything received is resent. What should I send? All GTS data? A subset? Remark 1: The “24h cache” definition is apparently still unclear Remark 2: MSS are used to push/receive data, MSS don’t know WIS and don’t look at metadata to know what they should do with the files. So defining and handling the content of the cache is a challenge Remark 3: MSS et WIS software should have an interface to facilitate this definition Eg. WIS software could inform one a day which TTAAii are for GlobalExchange Should the « distribution » software be “WMO aware”? If a CCCCNNNNNNNN.a file is received, what should happen? Answer 1: Nothing, it is just a file the cloud doesn’t care about its content Answer 2: If I want to keep a clean “24h cache” in the cloud, the cloud needs to unpack it

8 The VDC by Interoute ECMWF is paying for the cloud servers for one year. Two virtual servers are available (one in Paris, one in Berlin). So far, only the server in Paris is used. A basic web interface to configure the VMs and the storage, but it does the job As part of the configuration: Firewall – Very limited. No support of “established” TCP connection. So to allow FTP, a large bunch of ports must be allowed NAT Load-Balancing (not used) Very good online support Network performance access very good. In theory, unlimited 10Gb/s access to the Internet (and if needed to the RMDCN) for free! For the time being, this solution is a very cost effective (the cost per VM is approximately 4k€ per year) and proves to be an easy way to “multicast” from any GISC to all others (while using “unicast” protocol)

9 Still to do… as part of the test
Complete the connection with all GISCs: either directly or using a relay Dehli and Casablanca (either directly or via a 3rd party) Aim at having for the first time (?) the full 24h cache available! Improve configuration Handle urgent messages (eg creating a separate directory for that purpose) Test new features - AMQP Run the test over a few months to assess the reliability of the solution Assess performance and gather statistics

10 Still to do… after the test
Prepare questionnaire for all WIS centers (to be issued before the end of 2015) Gather and consider responses to the questionnaire Prepare reports for ET-CTS, TT-GISC, CBS in 2016

11 Some ideas for the questionnaire (1)
For the GISCs: Assess support for this solution Agree on method to procure the VMs Do we need two independent providers for redundancy purposes (one being Interoute for the RMDCN access and another cloud provider on the Internet)? Use of the RMDCN contract? Do we need another contract? Which one? Agree on features required by the software on the cache server: How much “WMO awareness” do we need? None: the server in the cloud will just receive and send files, unaware of their content Some: Extract CCCC files and push bulletins using GFNC, create CCCC files? create .tar.bz2 files? Full: Handle deduplication? Filter data not intended for “Global Exchange” Depending on the requirements, select the appropriate software How to manage the VMs and the software used? For the “network” RMDCN, ECMWF is the technical and administrative interface. Interoute is in charge of the configuration and monitoring For the “cloud”, we need a similar design. Interoute may provide the VMs but won’t take care of the software,. Is ECMWF ready to extend its role to the “cloud” function? Who else could do that? Any volunteer among the GISC (or DCPC/NC)? Interest in managing the servers/software on behalf of the community. For free? For a fee? For a limited period of time?

12 Some ideas for the questionnaire (2)
For the DCPCs/NCs: Assess support for this solution In particular: Is storing your data in the cloud compliant with your policy Interest in managing the servers/software on behalf of the community. For free? For a fee? For a limited period of time?

13 The thank you slide All the GISCs part of the pilot
sent to the GISC list on 11th August In two weeks answers from 12 of them Two months later: 13 are up and running (at least partially) Exchange of s with the 2 remaining Are we going to see the “real” 24h cache ? ECMWF for supporting this pilot by offering the VMs for one year Holger Kiehl (DWD) is a tremendous support: Contact for GISC Offenbach Configuration and hardening of the VMs Configuration of AFD Improving AFD

14 Recommended text TT-GISC:
Recognizes that the “cache in the cloud” is a very promising solution to obtain a shared and uniformed cache between all the GISCs Welcomes the participation of 13 GISCs and urges the two remaining to join the pilot, either directly or via a third party Tasks ET-CTS to prepare the questionnaire to asses the possibility of using this solution operationally

15 Questions ?


Download ppt "TT-GISC – Brasilia – October 2015 Rémy Giraud (Météo-France)"

Similar presentations


Ads by Google