Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big-Data around the world

Similar presentations


Presentation on theme: "Big-Data around the world"— Presentation transcript:

1 Big-Data around the world
Data Transfer Nodes tool design Enzo Capone Head of Research Engagement and Support Trondheim 11 June 2018

2 Science is based on data
Science is based on data. Once the data has been produced, it’s more than likely that it needs to be shared to be used in the most profitable way. Which means it will need to be moved from the originating facility or location, to every place where it will need to be made available.

3 Data movement is at the basis of modern, collaborative science.
And today, ~80% of all scientific data transfer uses TCP/IP, which in itself is optimised for a general purpose, multi-user, bandwidth-sharing usage model.

4 Unfortunately, campus infrastructures are not optimised for the large and possibly long-lasting flows caused by this data movement, that will have to struggle fighting with the general user traffic in order to share the existing resources, on a fair-share regime.

5 Tuning, testing and validation is needed…
The way out Some solutions: ScienceDMZ Dedicated Data Transfer Nodes Tuning, testing and validation is needed… Some solutions have been found in the scientific community, like ScienceDMZ and optimised Data Transfer Nodes. But once this tools are deployed, or even to help their deployment, tuning, testing and validation is needed, in order for them to perform in the intended way.

6 Highly-optimized software and configuration
How GÉANT DTNs can help Dedicated hardware Highly-optimized software and configuration Optimal network topology/location No firewalling - only simple ACLs Bottleneck-free We’ve then come with this idea of creating a specialized tool for this purpose. We’ve deployed servers based on dedicated hardware, meaning they will only be devoted to the tests that will not be impacted by other tasks, equipped with superfast NVMe disks, highly-optimized operating system, software and configuration. Directly connected to the core of our backbone with no firewalling to limit the performances making them completely bottleneck-free.

7 Network topology Trunks (multi VLANs)
The servers are deployed in our PoPs in London and Paris, directly connected to our core routers with 100G and 10G network ports, configured to carry multiple VLANs, in case of need, and also to be connected to the LHCONE network, so to be tested from the inside by the sites connected to it.

8 perfSONAR or GÉANT DTNs? No need to choose…
Typical user Network/system manager Site manager/user Measurement Network low-level (latency, jitter, bandwidth, etc.) Data transfer application throughput (disk-to-disk, full stack) Hardware Dedicated Existing storage system Software Specialised toolkit Users’ own software and applications Scope Your network (LAN/WAN) Your storage system Some will probably think: “this looks a lot like perfsonar…”. If you look at their characteristics side-by-side, you’ll see that this is not an alternative to perfsonar, whose scope and intended usage is different. More a complementary tool for a first-aid approach to network issues. perfsonar will provide low-level measurements primarily aimed to network managers, using a dedicated server and specialised software. While the tool we have in mind will provide a higher-level, disk-to-disk throughput analysis, using the full-stack of your existing storage system, your own applications and data moving tools and protocols. But the main difference is that the scope of perfsonar is your network infrastructure (LAN/WAN), while our tool is targeted to your storage system as a whole (including the LAN/WAN components). This can come in hand to do Basic network troubleshooting Network validation To measure the achievable throughput of your storage system (not only of your network!) To test the whole software/application/storage stack

9 Use case example NREN B NREN A
A simple use case scenario sees two DTNs in two sites having performances issues during their transfers. What the user can do with our tool is to split the path in two, and test each individual location against our DTNs. It will then be possible to focus the troubleshooting activity on the location that shows a consistent behaviour between the two situations, making the case for a succeeding and deeper perfsonar-based analysis, for example.

10 Participate in the testing Suggest new tools Provide feedback
Help us to help you! Participate in the testing Suggest new tools Provide feedback We ask you to be part of this. Participate in the testing Suggest new tools Provide feedback We can install any software matching your data movement tool of choice, so to make it work with your existing data infrastructure.

11 Let’s build the service together!
Contact We want to build this service with our users, for our users So please contact us, and help us to shape this tool in possibly the most useful way for all of you.


Download ppt "Big-Data around the world"

Similar presentations


Ads by Google