Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gorilla: A Fast, Scalable, In-Memory Time Series Database

Similar presentations


Presentation on theme: "Gorilla: A Fast, Scalable, In-Memory Time Series Database"— Presentation transcript:

1 Gorilla: A Fast, Scalable, In-Memory Time Series Database
Facebook vldb2015 王夏青

2 Abstract Gorilla: Facebook’s in-memory TSDB(time series database)
Key: strike the right balance between efficiency, scalability, and reliability Optimize for remaining highly available for writes and reads, even in the face of failures At the expense of possibly dropping small amounts of data on the write path

3 Abstract Introduction Background & Requirements
Comparison with TSDB systems Gorilla Architecture New Tools on Gorilla Experience & Future Work Conclude

4 Introduction Motivation:
Large-scale internet services: highly-available and responsive for their users An important requirement: accurately monitor the health and performance of the underlying system and quickly identify and diagnose problems Scale: thousands of individual systems running on many thousands of machines, often across multiple geo-replicated datacenters

5 Introduction Constraints: Gorilla: Writes dominate State transitions
High availability Fault tolerance Gorilla: New TSDB satisfies these constraints Functions as a write-through cache of the most recent data entering the monitoring system

6 Introduction Insight:
Users of monitoring systems do not place much emphasis on individual data point s but rather on aggregate analysis Do not store any user data so traditional ACID guarantees are not a core requirement Recent data points are of higher value than older points

7 Introduction Challenge: High data insertion rate Total data quantity
Real-time aggregation Reliability requirements

8 Background & Requirements
Operational Data Store(ODS) Monitoring system read performance issues

9 Background & Requirements
2 billion unique time series identified by a string key 700 million data points added per minute Store data for 26 hours More than 40,000 queries per second at peak Read succeed in under one millisecond Support time series with 15 second granularity Two in-memory, not co-located replicas Always server reads even when a single server crashes Ability to quickly scan over all in memory data Support at least 2x growth per year

10 Comparison with TSDB Systems
Existing solutions: OpenTSDB Whisper(Graphite) InfluxDB

11 Gorilla Architecture

12 Gorilla Architecture Monitoring data:
3-tuple of a string key, a 64 bit time stamp integer and a double precision floating point value A new time series compression algorithm Arrange in-memory data structures to allow fast and efficient scans of all data while maintaining constant time lookup of individual time series

13 Gorilla Architecture

14 Gorilla Architecture Compressing time stamps Compressing values

15 Gorilla Architecture

16 Gorilla Architecture

17 Gorilla Architecture

18 Gorilla Architecture

19 Gorilla Architecture

20 Gorilla Architecture

21 Gorilla Architecture

22 Gorilla Architecture In-memory data structures: Timeseries Map(TSmap)
Shared-pointers Read-write spin lock & 1-byte spin lock

23 Gorilla Architecture

24 Gorilla Architecture On disk structures: GlusterFS
A Gorilla host -> multiple shards A single directory per shard Each directory: Key lists Append-only logs Complete block files Checkpoint files

25 Gorilla Architecture Tolerating single node, temporary failures with zero observable downtime Localized failures(such as a network cut to an entire region)

26 New Tools on Gorilla Correlation engine Charting Aggregations

27 Experience & Future Work
Fault tolerance: Network cuts Disaster readiness Configuration changes and code pushes Bug Single node failures

28 Experience & Future Work
Site wide error rate debugging

29 Experience & Future Work
Lessons learned Prioritize recent data over historical data Read latency matters High availability trumps resource efficiency

30 Experience & Future Work
Add a second, larger data store between in-memory Gorilla and HBase based on flash storage Rewrite write path to wait longer before writing to HBase

31 Conclusion Gorilla: a new in-memory times series database deployed at Facebook Functions as a write through cache for monitoring data Described a new compression scheme that allows us to efficiently store monitoring data Reduces production query latency Enables new monitoring tools Verified Gorilla’s fault tolerance capabilities

32 Q&A THANKS


Download ppt "Gorilla: A Fast, Scalable, In-Memory Time Series Database"

Similar presentations


Ads by Google