GIS in the cloud: implementing a Web Map Service on Google App Engine Jon Blower Reading e-Science Centre University of Reading United Kingdom
Scalability is a major concern Web Map Services are quite “heavy” on the server Many infrastructures now moving into operational phase How to serve many simultaneous clients? – Tile caching is a widely-used option, but reduces flexibility Load is typically “bursty”, so we’d like to be able to scale the back end up and down
Cloud computing Hardware Operating System Application container Specific applications
Google App Engine overview Virtual application-hosting environment – Python, Java (servlets) Machines are brought up and taken down automatically Built-in services, including: – User authentication – Distributed memcache – Distributed persistent store – Image transformation Free within certain quotas
Aims of this project Develop a fully-functional GAE WMS for high-res global raster data – NASA Blue Marble composite image Efficient for tiling clients, but supports untiled access too Supports multiple coordinate reference systems Use Java FOSS Use resources efficiently to stay within free quotas
Development challenges Much harder than anticipated! (see here):here Coding restrictions – No local file output – Can’t spawn threads – Can’t access some Java packages (e.g. most of java.awt, all of javax.imageio) – Limited RAM Deployment issues – Uploading data
Blue Marble image courtesy of NASA Earth Observatory
Testing Three “modes”: – Fully-dynamic (all images generated from scratch) – Self-caching (duplicate requests are served from cache) – Static tiles (all images pre-generated) All images 256x256 pixels Apache JMeter scripts – Many client threads, requesting images in random order from preselected list – Single client machine GAE is “black box” so can’t control all aspects of the experiments
Results: Throughput Fully-dynamic Self-caching Static tiles “Service not available” errors increase with load
Results: Latency Fully-dynamic Self-caching Static tiles Unpredictable latency spikes!
Some notes about quotas Outgoing bandwidth quota (1GB/day) runs out fastest – Hence serving JPEGs is more cost-effective than PNGs – Can serve 100, x256 JPEGs per day for free But it’s easy to code in such a way that some per-minute quotas are also exceeded – E.g. quota on output from the distributed data store Quotas can be increased!
Conclusions 1 Successfully implemented full WMS for raster images Significant usage at zero running costs Performance and scalability acceptable for many apps – But latency spikes are an issue Testing with distributed clients would be instructive
Conclusions 2: further potential? Hard to host lots of images in same instance using our method – Relies on storing data in local files, with a tight quota Restrictions on Java servlet environment make it hard to run standard software stacks – E.g. GeoServer Expansion to vector dataset is probably hard – Would need a spatial index on top of the distributed data store
Thank you! All code, full paper, results and more details about the experiments: