Download presentation
Presentation is loading. Please wait.
Published byShonda Powers Modified over 9 years ago
1
Caching Willem Visser RW334
2
Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code
3
AppEngine Python Datastore Datastore – db Old and will be going away at some point – ndb (https://developers.google.com/appengine/docs/ python/ndb/) New and supports some cool features from google.appengine.ext import ndb class Stuff(ndb.Model): title = ndb.StringProperty(required = True) content = ndb.StringProperty(required = True) date = ndb.DateTimeProperty(auto_now_add=True)
4
NDB Python class defines the model Each entity has a key, which in turn has a parent, up to the root that has no parent – Entities in this chain is in the same group – Entities in the same group has consistency guarantees stuff_title = self.request.get(’stuff_name') stuff = Stuff(parent=ndb.Key(”Things", stuff_title or "*notitle*"), content = self.request.get('content')) stuff.put()
5
NDB (2) Queries and Indexes There are very many ways to query Complex queries might need complex indexes – NDB creates simple indexes automatically – Complex ones can be defined in index.yaml GQL is similar to SQL Only gets executed when accessed stuff = ndb.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff)
6
No Caching Every db_read hits the database Database reads tend not to be the fastest thing This can be very inefficient therefore
7
Example No Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")
8
Naïve Caching This will do wonders for performance If the cache is too large it might start to slow down a bit Above the db_read is avoided but rendering HTML could also be cached if that takes a lot of time If not cache[request]: cache[request] = db_read(); return cache[request]
9
Example No Caching def top_stuff(): stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") return list(stuff) class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")
10
Example CACHE = {} def top_stuff(): key = 'top' stuff = CACHE[key] if not stuff: logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() self.redirect("/")
11
New data? Will the previous solution work? What happens if you add new data – Added to the DB and then redirect to / – Render_front calls top_stuff – However cache is hit and we get the old data Cache must be invalidated when new data comes
12
Clear Cache CACHE = {} def top_stuff(): … class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() CACHE.clear() self.redirect("/")
13
Cache Stampede If one user writes new data – Cache gets cleared Now lots of users all access the site at the same time – All of them doing db_reads since the cache is empty This hammers the DB and slows everybody down – Depending on settings the DB might also block or even crash Without any caching this could also happen
14
Cache Refresh def top_stuff(update = False): key = 'top' stuff = CACHE[key] if (not stuff or update): logging.error("DB QUERY") stuff = db.GqlQuery("SELECT * FROM Stuff ORDER BY created DESC LIMIT 10") stuff = list(stuff) CACHE[key] = stuff return stuff class MainPage(Handler): def render_front(self, …): stuff = top_stuff() self.render("front.html", stuff = stuff, …) def get(self): self.render_front() def post(self): title = … data = … a = Stuff(title = title, data = data) a.put() top_stuff(True) self.redirect("/")
15
Cache Update Most aggressive solution – No DB reads! On new data, store in the DB and also directly into the cache, without reading from the DB The DB is just a backup storage now for in case something goes wrong, such as a server going down
16
Cache Comparisons Cache Approach DB_Read on page view DB_Read on submit Wrong results NoneAlwaysNone NaïveCache missNoneYes RefreshSeldomOnce UpdateNone
17
Sharing a Cache If we have more than one server Do we have a cache for each server, or, share a cache amongst servers? Cache for each server can have suboptimal behavior if they are not synchronized – Data might be in the cache on server 1 and not server 2, for example Good solution is to use a very fast shared cache
18
Memcached See http://memcached.org/http://memcached.org/ Very fast, in-memory, key-value store Caching technology behind very many websites Support for it within AppEngine from google.appengine.api import memcache … def top_stuff(update = False): key = 'top' stuff = memcache.get(key) if (update) or (not stuff): stuff = db.GqlQuery("SELECT * FROM Art ORDER BY created DESC LIMIT 10”) stuff = list(stuff) memcache.set(key,stuff) return stuff
19
NDB and Caching Two Caches controlled by policies – In context (microseconds) Only current http request Writes to datastore and cache, reads first checks cache – Memcache (milliseconds) All nontransactional context caches here All contexts share same memcache Within a transaction memcache is not used Can be configured by policies – Some standard ones available
20
More Caching Some caches also live outside the developers immediate control Browser Cache – Single user Proxy Cache – Multiple users Gateway Cache – Distributed by Content Delivery Networks HTTP 1.1 supports “Cache-Control” header – Allows developers to control how things are cached
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.