Gorilla: A Fast, Scalable, In-Memory Time Series Database

Gorilla: A Fast, Scalable, In-Memory Time Series Database
Facebook vldb2015 王夏青

Abstract Gorilla: Facebook’s in-memory TSDB(time series database)
Key: strike the right balance between efficiency, scalability, and reliability Optimize for remaining highly available for writes and reads, even in the face of failures At the expense of possibly dropping small amounts of data on the write path

Abstract Introduction Background & Requirements
Comparison with TSDB systems Gorilla Architecture New Tools on Gorilla Experience & Future Work Conclude

Introduction Motivation:
Large-scale internet services: highly-available and responsive for their users An important requirement: accurately monitor the health and performance of the underlying system and quickly identify and diagnose problems Scale: thousands of individual systems running on many thousands of machines, often across multiple geo-replicated datacenters

Introduction Constraints: Gorilla: Writes dominate State transitions
High availability Fault tolerance Gorilla: New TSDB satisfies these constraints Functions as a write-through cache of the most recent data entering the monitoring system

Introduction Insight:
Users of monitoring systems do not place much emphasis on individual data point s but rather on aggregate analysis Do not store any user data so traditional ACID guarantees are not a core requirement Recent data points are of higher value than older points

Introduction Challenge: High data insertion rate Total data quantity
Real-time aggregation Reliability requirements

Background & Requirements
Operational Data Store(ODS) Monitoring system read performance issues

Background & Requirements
2 billion unique time series identified by a string key 700 million data points added per minute Store data for 26 hours More than 40,000 queries per second at peak Read succeed in under one millisecond Support time series with 15 second granularity Two in-memory, not co-located replicas Always server reads even when a single server crashes Ability to quickly scan over all in memory data Support at least 2x growth per year

Comparison with TSDB Systems
Existing solutions: OpenTSDB Whisper(Graphite) InfluxDB

Gorilla Architecture

Gorilla Architecture Monitoring data:
3-tuple of a string key, a 64 bit time stamp integer and a double precision floating point value A new time series compression algorithm Arrange in-memory data structures to allow fast and efficient scans of all data while maintaining constant time lookup of individual time series

Gorilla Architecture Compressing time stamps Compressing values

Gorilla Architecture In-memory data structures: Timeseries Map(TSmap)
Shared-pointers Read-write spin lock & 1-byte spin lock

Gorilla Architecture On disk structures: GlusterFS
A Gorilla host -> multiple shards A single directory per shard Each directory: Key lists Append-only logs Complete block files Checkpoint files

Gorilla Architecture Tolerating single node, temporary failures with zero observable downtime Localized failures(such as a network cut to an entire region)

New Tools on Gorilla Correlation engine Charting Aggregations

Experience & Future Work
Fault tolerance: Network cuts Disaster readiness Configuration changes and code pushes Bug Single node failures

Site wide error rate debugging

Lessons learned Prioritize recent data over historical data Read latency matters High availability trumps resource efficiency

Add a second, larger data store between in-memory Gorilla and HBase based on flash storage Rewrite write path to wait longer before writing to HBase

Conclusion Gorilla: a new in-memory times series database deployed at Facebook Functions as a write through cache for monitoring data Described a new compression scheme that allows us to efficiently store monitoring data Reduces production query latency Enables new monitoring tools Verified Gorilla’s fault tolerance capabilities

Q&A THANKS

Gorilla: A Fast, Scalable, In-Memory Time Series Database

Similar presentations

Presentation on theme: "Gorilla: A Fast, Scalable, In-Memory Time Series Database"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gorilla: A Fast, Scalable, In-Memory Time Series Database

Similar presentations

Presentation on theme: "Gorilla: A Fast, Scalable, In-Memory Time Series Database"— Presentation transcript:

Similar presentations

About project

Feedback