Download presentation
Presentation is loading. Please wait.
Published byGodfrey Bradford Modified over 9 years ago
1
FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1
2
Agenda User Registration System – URS –Earthdata Login Requiring Registration for Data Access at EOSDIS –FTP/HTTP Comparison URS Guidance and Policy FTP retirement at Data Centers –Lessons Learned Backup: File Transfer Protocol (FTP/HTTP) –Engineering Perspective –Performance Study 2
3
NASA USER REGISTRATION – EARTHDATA LOGIN 3
4
Earthdata Login 4
5
Capturing User’s Area of Interest 5
6
Study Areas & Application Domains NASA - Primary study area*ESA - Primary Application Domain* Air sea interaction Atmospheric aerosols Biological Oceanography Clouds Cryospheric studies Geophysics Global biosphere Human dimensions of global change Hydrologic cycle Land processes Physical Oceanography Polar processes Radiation budget Sea ice Troposheric chemistry Upper atmospheric composition Upper atmospheric dynamics Other Atmosphere Sea-Ice Geodesy Geology Hazards Hydrology Ice Land Environment Methods Oceanography Renewable Resources Topographic Mapping Other Calibration/Validation Costal Zones 6
7
Federated User Identity Study Performing a study of other (non OAuth2) Single Sign -On technologies that will allow Earthdata Login to become interoperable with user registration systems from other systems and agencies. 7
8
Architecture LDAP store LDAP proxy (via LDAP store) HTTP- accessible RESTish API FTP clients HTTP clients Web-based user maintenance
9
REQUIRING REGISTRATION FOR DATA ACCESS AT EOSDIS FTP and HTTP comparison 9
10
Impact of requiring authentication with FTP at DAACs AdvantagesDisadvantages Minimal impact to existing usersMultiple flavors deployed at the data centers (5 different ftp servers) Minimal impact to data centersNo direct support for LDAP authentication on some of the flavors. No changes to firewall rules or similar configuration Not authenticated securely: some flavors unable to support secure authentication. *Direct support for anonymous accessProhibited at LP DAAC due to DoI regulations Maturity of capability / protocolDoes not integrate well with REST API for support of OpenID or OGC 10
11
Impact of requiring authentication with HTTP at DAACs AdvantagesDisadvantages Comprehensive support from the user community: protocol is well established and mature, all data centers use the same http server (apache) End user scripts will have to change, as will manual access to the files they access Modules can be applied to support many extensions and metrics gathering unavailable to certain ftpds Data center configurations will have to change (on the firewall and the apache server) Easily accommodates a REST API and provides well established LDAP modules for simple configuration and integration DAACs custom code will have to change Permitted as a transfer protocol by the DoI Data Center customizations and extensions will need to be modified Supports a secure authentication mechanism (https) 11
12
URS GUIDANCE & POLICY 12
13
Guidance for EOSDIS DAACs, Subsystems And Applications Purpose: To provide guidance and clarify the integration requirements for the URS into EOSDIS systems and components. Scope: This guidance applies to all EOSDIS DAACs, subsystems (ECHO, GCMD, Earthdata, GIBS, etc.) and related EOSDIS services and applications including (Reverb, ASTER GDEM Explorer, ASF Vertex, etc.). Guidance: URS will be implemented by DAACs, subsystems and related services for the following capabilities: –Downloading science data files and FTP services. –Downloading science data files from HTTP, HTTPS and FTP services. –Web services and tools –Web services and tools allowing access to science data files (e.g. OPeNDAP, Web Coverage Services, analysis tools, DAAC-unique ordering tools). –Online collaboration –Online collaboration and comment tools (e.g. Wikis, Forums, Code Repositories). –Other tools and services that currently have optional or required user registration. NOTRegistration is NOT required: –Read-access to Web pages –Read-access to Web pages and documentation. –Data discovery –Data discovery services such as Reverb, Earth Data Search Client (ESDC), Global Change Master Directory keyword services, CMR and DAAC unique search clients. Note: This portion of the policy applies up until the point where science data downloads are performed or write operations such as saving search parameters, inputting or updating metadata records are performed. 13
14
Evolution and Transition Planning URS is available and this guidance will go into immediate effect. –A staggered approach will be utilized to implementing URS throughout DAACs, subsystems and applications. –Schedules and transition plans for implementation will be negotiated between effected systems and ESDIS. Milestones and Timeline –In 2015, HTTPS Access with URS 4 (SSO) must be available for all current equivalent FTP/HTTP Access. –DAACs, subsystems and applications are allowed to run HTTPS access and FTP/HTTP* access in parallel 14
15
FTP RETIREMENT AT DATA CENTERS Lessons Learned 15
16
Near Real Time Data Access (LANCE) HTTPS File Distribution Requirements for LANCE LANCE Elements shall integrate with the URS and restrict access to NRT data to users with valid URS accounts. URL structure should be decided by the data providers From a users perspective, it should be possible to get all the files simply by using curl or wget, –eg : wget -r https://foo.nasa.gov/data/OMI/OMTO3/2007/05/11https://foo.nasa.gov/data/OMI/OMTO3/2007/05/11 –which would download all the OMTO3 data files and the Manifest for the date 2007/05/11. –To get the entire month use: wget -r https://foo.nasa.gov/data/OMI/OMTO3/2007/05https://foo.nasa.gov/data/OMI/OMTO3/2007/05 –To get the entire year I could use: wget -r -nd https://foo.nasa.gov/data/OMI/OMTO3/2007https://foo.nasa.gov/data/OMI/OMTO3/2007 16
17
17
18
LP DAAC migration to HTTP The LP DAAC switched from FTP to HTTP for data access on June 4, 2013. This change was advertised on the LP DAAC Web site as a News item. For users who do not regularly visit our page, we encourage them to consider subscribing to the RSS News Feed (https://lpdaac.usgs.gov/news_feed) so as not to miss out on future announcements. The News Item for the FTP to HTTP is available at (https://lpdaac.usgs.gov/lp_daac_discontinue_anonymous_ftp _june_4_2013). Note: The cURL command handles http and has been used by some to update their scripted access to Data Pool. LP DAAC provides a good model for HTTPS data distribution https://lpdaac.usgs.gov/data_access/data_pool https://lpdaac.usgs.gov/data_access/data_pool 18
19
User Feedback “I think that the data should be delivered by a ftp server, because in my case, here in PARAGUAY the internet signal is not stable. During downloads, my connection was interrupted many times forcing me to restart the request process and download it again.” “We used to receive order by email as ftp, currently it is only http, which is taking more time in downloading, can we go back to ftp option ?” “The problem I have with the http protocol is I don't know how to automate my wget script to get new data. With ftp I can use a wildcard at the end of the full file path. With the current naming of the.hdf files, MYD11C1.A2013153.005.2013155051730.hdf I don't know the filenames ahead of time, so I cannot even use a brute force, name every file to get approach. Is there some way you can recommend to automatically get these data? Can I request an automatic push to my incoming ftp site? “ 19
20
Summary Understanding that many of our users use scripts to get data from our anonymous FTP servers, this will require social as well as technical changes. We are gathering use cases and lessons learned from other DAACs in addition to providing ‘recipes’, reference software to automate authenticated HTTPS downloads, bulk download web clients, user tutorials and documentation. 20
21
Summary URS is also being enhanced to work with multiple web services. (e.g. OGC, OAI-PMH, OpenDAP, REST/SOAP). How to get HTTPS directory listings fast: https://wiki.earthdata.nasa.gov/display/HDD/HTTP+Data+Distribution+Home Some DAACs will be exempt from the HTTP requirement (via waivers) –Our CDDIS DAAC is serving over 1.8M files and 380 Gbytes/day to over 13K distinct users ftp. 21
22
FILE TRANSFER PROTOCOL ENGINEERING PERSPECTIVE Backup - FTP versus HTTP 22
23
FTP/HTTP Comparison 23
24
FTP/HTTP Comparison (con’t) 24 Legend Performance (speed) Security
25
FILE TRANSFER PROTOCOL PERFORMANCE STUDY Backup 25
26
Study Background Sending files over a high-speed network doesn’t guarantee that the end-to-end performance will match the network capacity or meet user expectations. When transferring data, network latency (round-trip time or RTT) and packet loss can impact the transmission rate in conjunction with the file transfer protocol used, and the characteristics and tuning parameters of the end systems. EOSDIS performed a study of a set of file transfer protocols from ESDIS Networks to determine how each one performed in different network environments –All protocols studied use TCP for transport 26
27
Study Summary High speed networks don’t come with high speed end-to-end performance guarantees –File transfer protocol performance impacted by file size, host buffer size and TCP behavior Network latency (round-trip time, RTT) and packet loss Most common file transfer protocols were designed when network capacity was much less than today –FTP over TCP/IP was developed in the 1980s –Single TCP stream New file transfer protocols are designed to better adapt to changes in high speed network environments –Multiple, parallel TCP streams Other strategies are being employed to increase performance –Increasing packet size –Encrypting only sensitive data 27
28
Study Conclusions No single file transfer protocol works best in every network environment Data delivery requirements should be used to determine choice of file transfer protocol –Multi-stream protocols (bbFTP and GridFTP) are best at sending larger files over WANs (long RTT, higher packet loss) –Efficient, single stream protocols (FTP, HTTP) work best at sending smaller files over LANs (short RTT, lower packet loss) –Encryption processing software overhead lowers throughput Increased CPU load 28
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.