Apache Traffic Server Cache Toolkit API 9 Apr 2014 at ApacheCon NA 2014 1.

Slides:



Advertisements
Similar presentations
More on File Management
Advertisements

Part IV: Memory Management
Apache Traffic Server Extensible Host Resolution at ApacheCon NA 2014.
Free Space and Allocation Issues
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
The Zebra Striped Network File System Presentation by Joseph Thompson.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
Hands-On Microsoft Windows Server 2003 Administration Chapter 5 Administering File Resources.
© 2004, The Trustees of Indiana University 1 OneStart Workflow Basics Brian McGough, Manager, Systems Integration, UITS Ryan Kirkendall, Lead Developer.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
1 Lecture 10: TM Implementations Topics: wrap-up of eager implementation (LogTM), scalable lazy implementation.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
ATS SSL Updates ATS Summit Spring 2015 Susan Hinrichs.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Database Storage Considerations Adam Backman White Star Software DB-05:
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
SWEN 302: AGILE METHODS Roma Klapaukh & Alex Potanin.
File Processing - Indexing MVNC1 Indexing Jim Skon.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
File Storage Organization The majority of space on a device is reserved for the storage of files. When files are created and modified physical blocks are.
Serverless Network File Systems Overview by Joseph Thompson.
Distributed Backup And Disaster Recovery for AFS A work in progress Steve Simmons Dan Hyde University.
® IBM Software Group © 2007 IBM Corporation Best Practices for Session Management
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Implementing ISA Server Caching
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Chapter 5 Index and Clustering
Module 4: Managing Access to Resources. Overview Overview of Managing Access to Resources Managing Access to Shared Folders Managing Access to Files and.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Accelerating PHP Applications Ilia Alshanetsky O’Reilly Open Source Convention August 3rd, 2005.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Version Control and SVN ECE 297. Why Do We Need Version Control?
Module 4: Managing Access to Resources. Overview Overview of Managing Access to Resources Managing Access to Shared Folders Managing Access to Files and.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Explore engage elevate Data Migration Without Tears Mike Feingold Empoint Ltd Tuesday 10th November 2015.
WHAT'S THE DIFFERENCE BETWEEN A WEB APPLICATION STREAMING NETWORK AND A CDN? INSTART LOGIC.
Varnish Cache and its usage in the real world Ivan Chepurnyi Owner EcomDev BV.
File System Consistency
Jonathan Walpole Computer Science Portland State University
Module 11: File Structure
Module 4: Managing Access to Resources
CSE-291 (Cloud Computing) Fall 2016
Web Caching? Web Caching:.
Zhilin Huang Disk Hot Swap Zhilin Huang
Storage Virtualization
Chapter 9: Virtual-Memory Management
Hash-Based Indexes Chapter 11
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
TSConfig & Lua Config Better together
Creating and Managing Folders
Presentation transcript:

Apache Traffic Server Cache Toolkit API 9 Apr 2014 at ApacheCon NA

Speaker Alan M. Carroll, Apache Member, PMC – Started working on Traffic Server in summer – Implemented Transparency, IPv6, other stuff (like cache!) – Works for Network Geographics Provides ATS and other development services 9 Apr 2014 Network Geographics at ApacheCon NA

Outline Basics of the current cache and API The Cache Toolkit API In practice – using the toolkit to build solutions to current user issues. 9 Apr 2014 Network Geographics at ApacheCon NA

CURRENT CACHE AND API What do we have now? 9 Apr 2014Network Geographics at ApacheCon NA 20144

Cache Structure Cache span – storage on a specific physical device or file. Cache volume – administrative unit of cache storage Cache stripe – intersection of cache span and cache volume 9 Apr 2014 Network Geographics at ApacheCon NA

Spans, Volumes, Stripes The default is to split each physical span in to a stripe for each volume. The volumes can be differently sized leading to differently sized stripes. The stripes are not currently visible from outside the ATS core, administrative control is of volumes. 9 Apr 2014Network Geographics at ApacheCon NA 20146

Stripes Cache stripes are the internal base storage. Each stripe has its own directory of objects. Object storage is a circular buffer. Each object is stored entirely in a single stripe. 9 Apr 2014 Network Geographics at ApacheCon NA

Stripe Assignment Storing an object requires selecting the stripe. This is called stripe assignment. Reading an object requires remembering where it was assigned for writing. 9 Apr 2014 Network Geographics at ApacheCon NA

Keys and IDs Each object has a cache key which is a string. – Default is the URL for the object. Cache keys are hashed to create a cache id. The cache id is used as a locator in a stripe directory. Stripe assignment is independent of cache keys and cache ids. 9 Apr 2014 Network Geographics at ApacheCon NA

Notes Cache misses are usually fast, no I/O required. Objects can be evicted from cache due to overwrite via the circular buffer. – But, no fragmentation or GC issues. – No concept of “full” or “empty”. Data is never updated, only written anew. 9 Apr 2014 Network Geographics at ApacheCon NA

Fragments Serialized object data is stored in a fragment. Fragments are grouped for write if small but read individually. Each fragment has a Doc which is the header. All metadata for an object is stored in the first fragment. Every stored fragment has an entry in the stripe directory. 9 Apr 2014 Network Geographics at ApacheCon NA

Stripe Directory 9 Apr 2014 Network Geographics at ApacheCon NA

Large Objects Large objects are stored in multiple chained fragments in a single stripe. – There is a target maximum fragment size Because an object is always in a single stripe, object size is limited by stripe size, not volume size. – Therefore also by cache span size. – Therefore adding disks doesn’t let you store larger objects, only more objects. 9 Apr 2014 Network Geographics at ApacheCon NA

Multi-Fragment Object The cache key is converted to a cache ID which is then used to locate a directory entry (“Dir”). This in turn is used to locate the first fragment with the metadata (“First Doc”) for the object. “Earliest Doc” is the first fragment with content. Each alternate has a different Earliest Doc. 9 Apr 2014Network Geographics at ApacheCon NA

Alternates An object can have multiple stored versions. Each version is an alternate. Alternates default based on the Vary field. – Can be changed via configuration or plugin. Alternates have the same first fragment but different earliest fragments. 9 Apr 2014 Network Geographics at ApacheCon NA

Current API Cacheability. Cache key. Alternate selection. Limited ability to scan objects. – Metadata (HttpInfo) API appears unimplemented. No access to cache data structures. 9 Apr 2014 Network Geographics at ApacheCon NA

API Problems Can’t interact with stripes. Very limited inspection of cache state. Unstable, partially implemented, and undocumented. 9 Apr 2014 Network Geographics at ApacheCon NA

CACHE TOOLKIT API What to build 9 Apr 2014Network Geographics at ApacheCon NA

Motivations Inadequacies of current API Traffic Server increasingly used in more specialized / vertical ways – Leads to desire for very custom extensions Always good to get increased code cleanliness and modularity – Mechanisms used for the API can also be used by the core for better modularity. 9 Apr 2014 Network Geographics at ApacheCon NA

Demotivations Users don’t want – Extra testing and design for the general case. – Constantly porting changes to new versions. – Publishing code – Legacy maintenance – Hiring expensive expertise like me We don’t want – Code complexity used in just one deployment – Similarly for configuration complexity 9 Apr 2014 Network Geographics at ApacheCon NA

Back Story History – Started as tiered storage design effort. Initial concept at previous ATS Denver Summit. – Presented at Yahoo! ATS Summit. Everyone wanted different tiered features. Then they wanted different cache features. Since I made the mistake of learning a little bit of how the cache worked, I was assigned to do this. 9 Apr 2014 Network Geographics at ApacheCon NA

A Brilliant Idea Avoid these issues and get the good stuff with a general (“toolkit”) API for the cache that can be used with plugins. “Brilliant!” – Leif “Supremely clever!” – Brian “Inspired!” – James “Is it done yet?” - users 9 Apr 2014 Network Geographics at ApacheCon NA

Goals Make cache state, data, operations visible. Override cache actions dynamically. Initiate cache actions. Provides the tools for users to build the cache feature they want. 9 Apr 2014 Network Geographics at ApacheCon NA

Visibility making what was unseen visible Expose internal structures – Directories, directory entries. – Cache volumes, disks, stripes. Opaque pointer with accessor functions. – Plugin gets pointer then calls functions for data. Iteration functions for search. Useful itself but also required by rest of API. 9 Apr 2014 Network Geographics at ApacheCon NA

Hooks Getting in to the action Cache events – Wrap, disk fail, life cycle, etc. Object events – Directory miss, disk miss, evacuation, delete (?) Special case support for pure statistic hook? Stripe assignment. 9 Apr 2014 Network Geographics at ApacheCon NA

Stripe Assignment Stripe assignment controls accessibility and storage for objects. Parallel write. Parallel search on multiple stripes for read. – Callback per read complete, can accept or wait. Retry option for read – Sequential stripe searching. – Alternate key searches. 9 Apr 2014 Network Geographics at ApacheCon NA

Cache Operations Copy objects from stripe A to stripe B – More efficient to parallel store if possible Probe, delete directory entries. Evacuation marking. Read, write cache objects. Alternate control. 9 Apr 2014 Network Geographics at ApacheCon NA

Reading from Cache The plugin is called once the client request has been parsed. It should then initiate cache reads (or fail and mark the request as a cache miss). When a read completes the plugin can accept that as that source for a cache hit, discard the read, and/or initiate additional read operations. The plugin can reject any completed read as a miss. 9 Apr 2014Network Geographics at ApacheCon NA

Miscellaneous Allow storage and volume definitions to have key/value pairs that are visible to the API. Cache key / ID control – General string input. – Alternative hashing. Enable stripes made out of ram. – Potentially run no-disk cache. 9 Apr 2014 Network Geographics at ApacheCon NA

USE CASES Out in the fields 9 Apr 2014Network Geographics at ApacheCon NA

Awesome Yet Practical Not just a cool API, but useful for actual deployment problems! Design was use case driven. So let’s look at some of them. 9 Apr 2014 Network Geographics at ApacheCon NA

Monitoring Problem: Can’t optimize cache use if we have no knowledge of what’s happening How much of my content is still valid? How many directory misses with disk I/O are occurring? What is the average chain length in the directory? How are objects distributed across stripes? 9 Apr 2014 Network Geographics at ApacheCon NA

Monitoring Solution: Use API to explore / monitor cache state. – Hooks to Watch for interesting events. Generate statistics. Generate logging data for post processing. – API to examine raw cache state One off plugins for debugging. Build external tools for examination. (product idea, don’t steal!) 9 Apr 2014 Network Geographics at ApacheCon NA

Annotation Problem: want better information about cache contents. Monitoring can be used to accumulate more extensive data about cache contents. – Live information. TS data. Custom tags/tables. – Offline log processing. 9 Apr 2014 Network Geographics at ApacheCon NA

Tiered Storage Problem: – Two or more types of storage of significantly different access speeds. – Want to control content location Solution: – Create cache volumes based on storage type – Assign objects to stripes based on type of object and stripe storage. 9 Apr 2014 Network Geographics at ApacheCon NA

Tiered Storage 2 Can use multiple write to store on multiple tiers. – Demoting means just delete in directory (no I/O) Can still copy if needed. Can search tiers in parallel or serial 9 Apr 2014 Network Geographics at ApacheCon NA

Persistence Control Problem: some content should be kept for long periods, longer than a stripe wrap. Solution: segregate content by persistent vs. transitory across stripes. – Large stable objects only evicted by other large stable objects, not small temporary objects. – Can read failover from long term to short term – Potentially do generational GC 9 Apr 2014 Network Geographics at ApacheCon NA

Cache Key Flexibility Problem: Cache lookup commits to a single stripe. – Variant keys can causes misses – Any change in stripe assignment hides objects Solution: Use parallel read or retry – Read from a different stripe – Use a different key 9 Apr 2014 Network Geographics at ApacheCon NA

Large Hot Objects Problem: A small number of large objects that are frequently accessed, causing excessive disk load on a stripe. Solution: Write the objects to multiple stripes and use stripe assignment to rotate access across the stripes. 9 Apr 2014 Network Geographics at ApacheCon NA

Export If you can examine the cache via an API, you can export it. Cache state is becoming valuable to many users, export can preserve or copy it. Import can be done without changes, if with difficulty – But should try to make it easier 9 Apr 2014 Network Geographics at ApacheCon NA

Cache Cleaning Problem: clean non-stale objects Solutions: – List of URLs (or URL regex) to purge. – Force cache miss on read if URL match. – Background scan to pre-emptively delete. 9 Apr 2014 Network Geographics at ApacheCon NA Note: This can be done in the current API. It will just become much easier.

APPENDIX Scripts and Resources 9 Apr 2014Network Geographics at ApacheCon NA

State of Development Some work has been done on prototyping some of the features. – Learned much about the cache implementation Design is mostly finalized. Positive response so far from the community. Project is funded by Comcast. Development should start late spring this year. 9 Apr 2014 Network Geographics at ApacheCon NA

Versioning / Custom Metadata Speculative feature! Fragment type is not really used currently – Overload it to allow custom parsing of metadata – Use it to version objects/stripes. Need _flen too, but that’s OK. This would be quite complex – Need API to unpack alternates. 9 Apr 2014 Network Geographics at ApacheCon NA

Resources ATS has online documentation, a wiki, mailing lists, bug tracker, and IRC channel. Access these via – Active community – become involved! NG Consulting services – 9 Apr 2014 Network Geographics at ApacheCon NA