A chicken in every pot: a persistent snapshot memory scaled in time Liuba Shrira and Hao Xu Brandeis University.

A chicken in every pot: a persistent snapshot memory scaled in time Liuba Shrira and Hao Xu Brandeis University

Storage systems: the 7 year itch 1984: rotational delay – FFS 1991: large memory - LFS 1998: cheaper disk - Elephant 2005:.. a chicken in every pot : snapshot box on the side..

Trends Hardware: Disk Cheap (1$/GB) and cheaper Software Industry: Forbes (12/2004) says: need for keeping past state is growing

Trends cont. - A casino chases a card counter - IT dept. chased by Sarbanes Oxley - Hippocratic DB audited about patient privacy preservation Need to analyze past activity

SNAP: a snapshot system for an object storage system Goal: Storage system capability for back-in-time execution (BITE): application runs against read-only snapshots without synchronization analysis in retrospect

Baseline Requirements for BITE Consistent snapshots: same (old) invariants hold BITE of general code: after-the-fact ad-hoc analysis ( vs predefined SQL access methods) App chooses the snapshot: snapshot state meaningful to app (vs “some time in the past” ) High time “resolution”: fine-grained past analysis (vs backup for recovery)

Over long time-scales.. Living with the past: how close? today: too close (Temporal DB, CVFS) or too far (warehouse - Netezza) Snapshots can be of long-term importance, or transient today: uniform - apps can not discriminate Inherent tension: latency of access vs cost of representation (space and time) today: limited adaptation - compress or not

Capturing past states Two ways: Cheep - no-overwrite update past stays put, copy new : less to write, but bloated DB, past inherits same rep Opportunistic- in-place update past is copied-out, separated: more to write but can write smartly, can tailor past rep, and DB stays clustered (vigor)

Our requirements: Non-disruptive past: just right distance - separated At adaptive distance: e.g. faster BITE on more recent states Discriminated past: application classifies, snapshot system filters: Some snapshots outlive others, some can be accessed faster Flexible classification: e.g. after the fact

Snapshot system operations Request to take a snapshot (declaration): sid: snapshot_request (filter_spec) Request to access a snapshot v: snapshot_access (sid) Request to specify a filter for a snapshot v: lazy_filter (sid,filter_spec) T1, T2, S1, T3, T4, T5, S2,…

Baseline storage system General interface: pages and a page table transactions access objects on pages Server: DB disk: slotted pages of objects physical oid (page#,o#) and a page table Transaction Log Cache: pages and modifed object cache

Storage system, cont. optimistic CC+ARIES Clients fetch pages, run transactions send modifed objects to server Server validates, commits (WAL) caches committed modifications no-force, no-STEAL

The snapshot system Archive separated from DB: Archive i/o sequential, DB random Copy-on-write (COW): copy out snapshot states into archive just before updating DB during cleaning.

Snapshot interface Same as DB - Snapshot Pages Snapshot Page Table So BITE is transparent: BITE on snapshot S(v) uses PageTable(v)

Snapshot system: below the interface: Some S(v) pages are in the archive, some in DB and pages in the archive can have a different representations

BITE (v): namespace redirection

Creating non-disruptive snapshots: (i/o bound system) Archiving snapshot states when cleaning can slow down cleaning compared to a system without snapshots. Copying to the archive disk (sequential I/O) in parallel to database I/O (random) can partially hide archiving cost behind database I/O.

Creating snapshots: how well can you hide? Is determined by: how much is archived: compactness of snapshot representation, frequency, snapshot update workload (overwriting) cost of archiving, sequential, other archive traffic – BITE

Creating snapshots: some issues Issue: avoid overwriting snapshot states (without blocking, pinning etc) Issue: update snapshot meta data efficiently (large, dynamic page tables ) Issue: filter out long-lived snaps (focus here)

New techniques for copy-out snapshots: - VMOB: in-memory versioned data structure preserves snapshot states w/out blocking -LPT: incrementally archived page table with logarithmic reconstruction cost -Filtering: exploit smart representation for past states (focus here)

Filtering: motivation Want unlimited past at high resolution but some snapshots are transient others of long-term interest to application application needs to discriminate between snapshots

Thresher: a filtering system for SNAP

Snapshot representation What can representation do for filtering? life-time based allocation – avoids fragmentation diff-based encoding – reduces cost of copying adaptive combination - real winner

Example: hierarchical snapshots at multiple time granularity ICU patient monitoring DB takes snapshots:: minute by minute vital sign monitor readings hourly includes nurse’s writeup summarizing monitor readings daily includes doctor’s notes summarizing nurse’s checkups Doctor’s have longer life-time than nurse’s…

Brief overview: snapshot creation Some notation: Snapshot span Recorded pages example:..v4, T: w (x_P), T’: w (y_S), v5, T’’.. Span of v4 : T, T’ Pages recorded by snapshot v4: P, S

Incremental snapshot creation: Archived snapshot pages: dispersed: v4 P S v5 P Q …-|-----------------------|------------------------  Archived snapshot page tables (PT): PT(v4): addr (P4), addr(S4); PT(v5): addr(P5), addr(Q5).. …-|-----------------------|-------------------------  Another talk: how to construct archived page tables: :Construct APT (v4) = recorded (v4) + Construct APT (v5)

Filtering example: filter out short-lived v5 Doctor’s Nurse’s v4 P S v5 P Q v6 …-|-----------------------|-----------------------|-  Archive Filter: long-lived v4, reclaim v5: reclaim P5 retain Q5 (v4 needs it) filtering incremental snapshots creates fragmentation

Problem: fragmentation fragmented archive, over time: non sequential archive writes or random reads to copy out long lived states

Our approach: filter-spec Filter spec determines relative snapshot lifetime “App knows best”: the app supplies a filter spec the system filters

avoid fragmentation with filter-spec Known at snapshot declaration – use lifetime-based allocation After the fact - use a flexible rep to filter lazily rep allows adaptive trade-off: cost of filtering vs cost of BITE

App specifies filter at declaration P4 S4 Q5 long-lived pages …-|-----------------------------------------------  P5 short-lived …-|-----------------------------------------------  Invariant : to reclaim w/out fragmentation, short-lived areas store no long-lived pages

FilterTree: filter pages for free

After-the-fact (lazy) filtering Some applications want to defer filter specification Lazy filtering requires copying We can specialize representation (compact) to reduce copying cost

Compact representation: diffs Two components filtered separately: compact diffs – reduce cost of copying (diffs clustered by page) checkpoints – accelerate BITE (page-based snapshots system-declared, can use FilterTree)

Adaptive trade-off Like recovery log: less frequent checkpoints increase compactness more frequent checkpoints accelerate BITE

Lazy filtering: checkpoints filtered for free B1B1 B1B1 B2B2 B3B3 … … G 2 (diffs) G 1 (diffs) E1E1 E2E2 E3E3 FilterTree for checkpoints Archive regions for diff extents E

But some applications want more: lazy filtering and faster BITE e.g. - app runs BITE on batch of recent snapshots to decide which ones to retain - needs fast BITE to keep up..

Combined hybrid Faster BITE in recent window and Lazy filtering

Hybrid: checkpoints and checkpoint filtered for free

Status Implemented: SNAP and Thresher for Thor storage system Performance results – encouraging. here is a 5000 feet view:

Performance metrics Cost of filtering: non-disruptiveness = rate-of-drain/ rate-of-pour t_clean determins rate-of-drain workload parameter: overwriting Compactness of diff-based rep: retention relative to page-based rep R_diff - fixed R_ckp - tunable by frequency of checkpoints workload parameter: density BITE - page-based snapshots, vs diff-based vs DB

Non-disruptiveness Storage system w/hybrid snapshots vs w/out snapshots (Thor) How much drop in rate-of-drain / rate-of-pour

Experimental configuration Workoads: extend multiuser 007 to control density overwriting System configuration: single client, medium 007 – small DB 185MB multiple clients – large DB 140GB

FIlterTree Free!

Non-disruptiveness/ single client “summertime …life is easy”

Non-disruptiveness/multi user: “DB works harder”

Summary: non-disruptive snapshot memory Unlimited filtered past is cheaper than you may think... A chicken in every pot.. Every storage system can have a snapshot box on the side..

To get there: Generalize: ARIES/ STEAL / underway file systems / need extended interfaces Beyond: upgrades/ have techniques provenance / need ideas..

A chicken in every pot: a persistent snapshot memory scaled in time Liuba Shrira and Hao Xu Brandeis University.

Similar presentations

Presentation on theme: "A chicken in every pot: a persistent snapshot memory scaled in time Liuba Shrira and Hao Xu Brandeis University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A chicken in every pot: a persistent snapshot memory scaled in time Liuba Shrira and Hao Xu Brandeis University.

Similar presentations

Presentation on theme: "A chicken in every pot: a persistent snapshot memory scaled in time Liuba Shrira and Hao Xu Brandeis University."— Presentation transcript:

Similar presentations

About project

Feedback