Download presentation
Presentation is loading. Please wait.
Published byMargery Glenn Modified over 9 years ago
1
Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel
2
What does this session talk about? Python Performance Web applications Hands on session
3
Caching Hot topic in web applications because -Better response time across geo distribution -Better scalability Difficult to focus at development time Help developers to improve response time
4
Source: Steve Souders – Cache is King!Cache is King!
5
What to do Find text areas repeated in a web resource (page, json response, other dynamic resources) in order to split them in different responses Use Cache-Control, Expires and ETag HTTP Headers for caching control Identify all the dependencies for a given URL -Even AJAX calls
6
Proposed Solution Take snapshots in different points in time -Use selenium for: -Download ALL the content -Needs to run JS code for Ajax Compare the snapshots looking for similarities -Split the similar text in different HTTP responses
7
Solution – Snapshots Selenium through a forward proxy Proxy Twisted Data Web Server Store Content
8
Running Selenium – Snapshots Call Selenium from Python Use of WebDriver >>> from selenium import webdriver >>> >>> br = webdriver.Firefox() >>> >>> br.get(“http://www.intel.com”) >>> >>> br.close()
9
Twisted Proxy -Snapshots class CacheProxyClient(proxy.ProxyClient): def connectionMade(self): # Connection Made. Prepare object properties def handleHeader(self, key, value): # Save response header. def handleResponsePart(self, buf): # Store response data. def handleResponseEnd(self): # Finished response transmission. Store it class CacheProxyClientFactory(proxy.ProxyClientFactory): protocol = CacheProxyClient class CacheProxyRequest(proxy.ProxyRequest): protocols = dict(http=CacheProxyClientFactory) class CacheProxy(proxy.Proxy): requestFactory = CacheProxyRequest class CacheProxyFactory(http.HTTPFactory): protocol = CacheProxy
10
Selenium + Twisted - Snapshots Run Selenium using Proxy >>> from selenium import webdriver >>> fp = webdriver.FirefoxProfile() >>> fp.set_preference("network.proxy.type", 1) >>> fp.set_preference("network.proxy.http", "localhost") >>> fp.set_preference("network.proxy.http_port", 8080) >>> br = webdriver.Firefox(firefox_profile=fp)
11
Selenium + Twisted - Snapshots Configure Twisted and run Selenium in an internal Twisted thread from twisted.internet import endpoints, reactor endpoint = endpoints.serverFromString(reactor, "tcp:%d:interface=%s" % (8080, "localhost")) d = endpoint.listen(CacheProxyFactory()) reactor.callInThread( runSelenium, url_str) reactor.run()
12
All together running
13
1 1 n n 3 3 2 2 = 1 = 2 = n Comparison method Output
14
Comparison ''' Equal sequence searcher ''' def matchingString(s1, s2): '''Compare 2 sequence of strings and return the matching sequences concatenated''' from difflib import SequenceMatcher matcher = SequenceMatcher(None, s1, s2) output = "" for (i,_,n) in matcher.get_matching_blocks(): output += s1[i:i+n] return output def matchingStringSequence( seq ): ''' Compare between pairs up to final result ''' try: matching = seq[0] for s in seq[1:len(seq)]: matching = matchingString(matching, s) return matching except TypeError: return ""
15
Next Steps Split similar texts in different HTTP responses Set Cache-Control -Public -Private -No-cache Set Expires -Depending on the time it should be cache Set ETag -If response is big and does change too often
16
Advanced Features to be done Detect cache invalidation time from snapshots SSL supports Wait for all AJAX calls Selenium Scripting -Authenticated URLs -Full feature sequence
17
Summary If caching areas has not been identified previous to development, this code could save time and effort in doing so Caching areas need to be analyzed for looking best cache method (server cache, CDN, browser caching) Refactoring for maximizing caching data is the next step
19
Thank you! david.r.elfi@intel.com @elfoTech
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.