Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel.

Slides:



Advertisements
Similar presentations
CS193H: High Performance Web Sites Lecture 17: Rule 14 – Make Ajax Cacheable Steve Souders Google
Advertisements

Enabling Secure Internet Access with ISA Server
Caching Dynamic Documents Vipul Goyal Department of Computer Science & Engg Institute of Technology, Banaras Hindu University Sugata Sanyal School of Technology.
1 Caching in HTTP Representation and Management of Data on the Internet.
Servlets and a little bit of Web Services Russell Beale.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
12/11/01 Matt Bridges Advisor: Ralph Morelli. What is Web Analytics? In traditional commerce, store owners can observe their customers habits: What time.
1 The World Wide Web Architectural Overview Static Web Documents Dynamic Web Documents HTTP – The HyperText Transfer Protocol Performance Enhancements.
High Performance Websites (Based on Steve Souder’s lecture) By Bhoomi Patel.
HTTP HyperText Transfer Protocol Part 3.
Nikolay Kostov Telerik Corporation
CSCI-1680 Web Performance and Content Distribution Based partly on lecture notes by Scott Shenker and John Jannotti Rodrigo Fonseca.
Understanding and Managing WebSphere V5
Mark Phillip markphillip.com 200s, 304s, Expires Headers, HTTP Compression, And You.
Christopher M. Pascucci Basic Structural Concepts of.NET Browser – Server Interaction.
11 SUPPORTING INTERNET EXPLORER IN WINDOWS XP Chapter 11.
Chapter 9 Using Perl for CGI Programming. Computation is required to support sophisticated web applications Computation can be done by the server or the.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Lighttpd & Modcache 2009/06/28. Basic lighttpd info Event-driven, single process Event-driven, single process Uses non-block io (network) + writev (memory)
Krerk Piromsopa. Web Caching Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
Internal NetworkExternal Network. Hub Internal NetworkExternal Network WS.
Chapter 4: Core Web Technologies
Test Automation For Web-Based Applications Portnov Computer School Presenter: Ellie Skobel.
CSC 2720 Building Web Applications Getting and Setting HTTP Headers (With PHP Examples)
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
JavaScript, Fourth Edition Chapter 12 Updating Web Pages with AJAX.
JavaScript, Fourth Edition
USING PERL FOR CGI PROGRAMMING
Python and REST Kevin Hibma. What is REST? Why REST? REST stands for Representational State Transfer. (It is sometimes spelled "ReST".) It relies on a.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Cookies Web Browser and Server use HTTP protocol to communicate and HTTP is a stateless protocol. But for a commercial website it is required to maintain.
Grid Chemistry System Architecture Overview Akylbek Zhumabayev.
Dr. Azeddine Chikh IS444: Modern tools for applications development.
Web Service Programming with WCF 3.5 Eyal Vardi CEO E4D Solutions LTD Microsoft MVP Visual C# blog:
Module 9: Implementing Caching. Overview Caching Overview Configuring General Cache Properties Configuring Cache Rules Configuring Content Download Jobs.
AfterCollege Self-Service Scrape Configuration & Posting Utility Kai Hu Haiyan Wu May 14, Harney 235.
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
Web Database Programming Week 7 Session Management & Authentication.
Web Cache Consistency. “Requirements of performance, availability, and disconnected operation require us to relax the goal of semantic transparency.”
ASP.Net, Web Forms and Web Controls 1 Outline Session Tracking Cookies Session Tracking with HttpSessionState.
Caching Willem Visser RW334. Overview AppEngine Datastore No Caching Naïve Caching Caching invalidation Cache updating Memcached Beyond your code.
Advanced Web Technologies By: Faraz Ahmed. Contents 0 Course Outline 0 Architectures 0 HTTP.
Selenium January Selenium course content  Introduction (Session-1)Session-  Automation  What is automation testing  When to go for automation.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
REST By: Vishwanath Vineet.
Text INTRODUCTION TO ASP.NET. InterComm Campaign Guidelines CONFIDENTIAL Simply Server side language Simplified page development model Modular, well-factored,
1 The World Wide Web Architectural Overview Static Web Documents Dynamic Web Documents HTTP – The HyperText Transfer Protocol Performance Enhancements.
1 Chapter 22 World Wide Web (HTTP) Chapter 22 World Wide Web (HTTP) Mi-Jung Choi Dept. of Computer Science and Engineering
Web Cache. What is Cache? Cache is the storing of data temporarily to improve performance. Cache exist in a variety of areas such as your CPU, Hard Disk.
Speeding Up Alfresco and Share using Nginx Reverse/Caching Frontend Proxy Ishara Fernando Senior Linux Systems Administrator.
ASP.net Course From Intermediate to Advance level By Arsalan Ahmed 3 Months Course Cell :
Introduction to ASP.NET, Second Edition2 Chapter Objectives.
MICROSOFT AJAX CDN (CONTENT DELIVERY NETWORK) Make Your ASP.NET site faster to retrieve.
Fiddler and Your Website Robert Boedigheimer. About Me Web developer since 1995 Columnist for aspalliance.com Pluralsight Author 3 rd Degree Black Belt,
111 State Management Beginning ASP.NET in C# and VB Chapter 4 Pages
Selenium HP Web Test Tool Training
WWW and HTTP King Fahd University of Petroleum & Minerals
Hypertext Transfer Protocol
4166 Review.
Pre assessment Questions
Web Caching? Web Caching:.
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
An example design for an Amadeus APIv2 Web Server Application
Web Systems Development (CSC-215)
CSE 461 HTTP and the Web.
Electronic Payment Security Technologies
Presentation transcript:

Finding cacheable areas in your Web Site using Python and Selenium David Elfi Intel

What does this session talk about?  Python  Performance  Web applications  Hands on session

Caching  Hot topic in web applications because -Better response time across geo distribution -Better scalability  Difficult to focus at development time  Help developers to improve response time

Source: Steve Souders – Cache is King!Cache is King!

What to do  Find text areas repeated in a web resource (page, json response, other dynamic resources) in order to split them in different responses  Use Cache-Control, Expires and ETag HTTP Headers for caching control  Identify all the dependencies for a given URL -Even AJAX calls

Proposed Solution  Take snapshots in different points in time -Use selenium for: -Download ALL the content -Needs to run JS code for Ajax  Compare the snapshots looking for similarities -Split the similar text in different HTTP responses

Solution – Snapshots  Selenium through a forward proxy Proxy Twisted Data Web Server Store Content

Running Selenium – Snapshots  Call Selenium from Python  Use of WebDriver >>> from selenium import webdriver >>> >>> br = webdriver.Firefox() >>> >>> br.get(“ >>> >>> br.close()

Twisted Proxy -Snapshots class CacheProxyClient(proxy.ProxyClient): def connectionMade(self): # Connection Made. Prepare object properties def handleHeader(self, key, value): # Save response header. def handleResponsePart(self, buf): # Store response data. def handleResponseEnd(self): # Finished response transmission. Store it class CacheProxyClientFactory(proxy.ProxyClientFactory): protocol = CacheProxyClient class CacheProxyRequest(proxy.ProxyRequest): protocols = dict(http=CacheProxyClientFactory) class CacheProxy(proxy.Proxy): requestFactory = CacheProxyRequest class CacheProxyFactory(http.HTTPFactory): protocol = CacheProxy

Selenium + Twisted - Snapshots  Run Selenium using Proxy >>> from selenium import webdriver >>> fp = webdriver.FirefoxProfile() >>> fp.set_preference("network.proxy.type", 1) >>> fp.set_preference("network.proxy.http", "localhost") >>> fp.set_preference("network.proxy.http_port", 8080) >>> br = webdriver.Firefox(firefox_profile=fp)

Selenium + Twisted - Snapshots  Configure Twisted and run Selenium in an internal Twisted thread from twisted.internet import endpoints, reactor endpoint = endpoints.serverFromString(reactor, "tcp:%d:interface=%s" % (8080, "localhost")) d = endpoint.listen(CacheProxyFactory()) reactor.callInThread( runSelenium, url_str) reactor.run()

All together running

1 1 n n = 1 = 2 = n Comparison method Output

Comparison ''' Equal sequence searcher ''' def matchingString(s1, s2): '''Compare 2 sequence of strings and return the matching sequences concatenated''' from difflib import SequenceMatcher matcher = SequenceMatcher(None, s1, s2) output = "" for (i,_,n) in matcher.get_matching_blocks(): output += s1[i:i+n] return output def matchingStringSequence( seq ): ''' Compare between pairs up to final result ''' try: matching = seq[0] for s in seq[1:len(seq)]: matching = matchingString(matching, s) return matching except TypeError: return ""

Next Steps  Split similar texts in different HTTP responses  Set Cache-Control -Public -Private -No-cache  Set Expires -Depending on the time it should be cache  Set ETag -If response is big and does change too often

Advanced Features to be done  Detect cache invalidation time from snapshots  SSL supports  Wait for all AJAX calls  Selenium Scripting -Authenticated URLs -Full feature sequence

Summary  If caching areas has not been identified previous to development, this code could save time and effort in doing so  Caching areas need to be analyzed for looking best cache method (server cache, CDN, browser caching)  Refactoring for maximizing caching data is the next step

Thank