Download presentation
Presentation is loading. Please wait.
Published byFrederica Payne Modified over 9 years ago
1
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org User data via HTTP(S) Andrew McNab University of Manchester
2
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org Outline Protocol structure Advantages “Missing” features Performance APIs SlashGrid Summary
3
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org Protocol structure HTTP uses a single control and data channel – cf separate control channel in (Grid)FTP Multiple requests can be sent down the same TCP connection – Each request starts with a block of headers, giving request URI, cookies etc. – No need to rebuild TCP connection: more requests are cheap – RFCs define many headers: partial fetches, redirections etc. HTTPS puts HTTP inside an encrypted SSL/TLS stream – SSL session reuse avoids need to rebuild SSL context even if TCP connection is closed
4
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org Advantages Simple protocol, with independent TCP connections – No special effort needed for firewalls Clients exist in almost all languages / environments Very good quality implementations due to the Web – eg Apache with a huge developer community Integrates seamlessly into Web portals – eg poster 58, with HTTPS added to DPM Reuses users' “common knowledge” about the Web
5
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org What's missing? GSI Proxies? – Clients can use GSI proxies without modification. – GridSite adds GSI proxy support to Apache webserver. Third-party transfers? – COPY is defined in WebDAV RFC – Implemented by GridSite using cookies (“onetime passcodes”) Multichannel / parallel tranfers? – Make parallel partial requests for blocks of a file – Apache supports these partial requests out of the box
6
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org “GridHTTP” A profile for using HTTP in a Grid environment – Doesn't define any new headers etc. Clients use GSI Proxies over HTTPS to authenticate Can request an HTTP data transfer, using “Upgrade” header. – Server may redirect to an HTTP version of the file – Includes a onetime passcode HTTP cookie in the response Client makes an HTTP GET request using the passcode cookie – Naïve clients like curl respond this way automatically! For third party transfers, instead of GET, client issues COPY to destination site with passcode, so it can pull the file instead.
7
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org Firewalls Some sites block outgoing port 80 (HTTP) and port 433 (HTTPS) – Risk of denial of service attacks on mainstream Web sites? Some sites also use transparent HTTP caches on port 80 to reduce interactive web traffic – This is decreasing due to “Web 2.0” and uncacheable pages To sidestep this, we advocate using two unused, reserved ports: – Port 488 (“gss-http”) for HTTPS – Port 777 (“multiling-http”) for HTTP Apache virtual hosts can readily listen on multiple ports: –
8
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org Performance (1) Mean of 5 * 100MB from Manchester to 21 EGEE sites mean GridHTTP time / mean GridFTP time vs mean GridHTTP time 960s is ncp.edu.pk 9s is man.ac.uk
9
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org Performance (2) Mean of 5 * 100MB from Manchester to 17 EGEE sites mean GridHTTP time / mean HTTPS time vs mean GridHTTP time 500s is indiacms.res.in 10s is man.ac.uk
10
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org APIs Languages / environments a big issue for new applications – ie not everyone uses C++ ! Many environments have “native” HTTP(S) support – eg libxml, ROOT, PHP, Java, Gnome (Virtually) all languages have HTTP(S) libraries – eg curl supports everything from Ada to wxWidgets Command-line (wget, curl,...) and file browser tools for (virtually) all operating systems Since GridHTTP uses standard HTTP concepts like cookies, standard client libraries work without modification.
11
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org SlashGrid This is the simplest API: a POSIX-like filesystem open(), read(), write(), mkdir(), ftruncate(), unlink(), stat(), readdir(), rename() Now part of GridSite – Uses FUSE kernel module on Linux, which is included in SL 4.4 and available for all 2.4.x/2.6.x kernels HTTP(S) to retrieve remote files, with GSI proxy if available URLs mapped to local paths: – /grid/https/node42.site.name/dir/file.dat
12
10 May 2007 HTTP - www.gridsite.org - Andrew.McNab@manchester.ac.ukwww.gridsite.org Summary HTTP(S) viable protocols for bulk data transfer Considerable advantages in terms of ubiquity of client tools and quality of servers “Missing” features provided using headers etc defined by the RFCs – In particular, “GridHTTP” profile A wide variety of APIs available for ~all langauges SlashGrid is a POSIX-like filesystem HTTP(S) client – That uses GSI proxies if available
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.