Download presentation
Presentation is loading. Please wait.
Published byBenedict Daniel Modified over 9 years ago
1
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015
2
TAO - Facebook’s Distributed Data Store Geographically distributed data store Provides efficient and timely access to the social graph Read-optimized
3
Background A single Facebook page may aggregate and filter hundreds of items from the social graph Each user’s content is custom tailored Infeasible to aggregate and filter on data creation Aggregated and filtered on viewing Originally used MySQL and PHP scripts with results cached Problems Inefficient edge lists – a change to a single edge requires the entire list to be reloaded Distributed Control Logic – control logic is run on clients that don’t communicate. Creates failure modes Expensive read-after-write consitency
4
Goals Provide basic access to the nodes and edges of a constantly changing graph in data centers across multiple regions. Optimized heavily for reads, and explicitly favors efficiency and availability over consistency. Handle most application needs while allowing a scalable and efficient implementation
5
Data Model Objects are nodes in the graph Associations are directed edges between objects Actions can be either nodes or edges Associations naturally model actions that can happen at most once or recorded state transitions. Associations can be bi-directional but are treated as separate associations. Associations that have inverses are configured as an inverse type associate
6
Architecture – Storage Layer Handles a larger volume of data than can be stored on a single MySQL server Data is divided into logical shards Each shard is contained in a logical database Database server handle multiple shards Number of shards > number of servers Load balancing of shards
7
Architecture – Caching Layer Consists of multiple cache servers that together form a tier Tier is collectively capable of responding to any TAO request Request maps to a single cache server using a sharding scheme Client issues request directly to a cache server Fill it on demand Evict items using a least recently used policy
8
Architecture – Leaders and Followers Large tiers are problematic – hot spots Leaders read from and write to the storage layer Followers forward read misses and writes to a leader Clients communicate with a follower and never contact a leader Care must be taken to keep TAO caches consistent. Each shard has one leader. All writes to the shard go to the leader. Leader always consistent Provides eventual consistency for followers
9
Architecture – Scaling Geographically Follower tiers can be thousands of miles apart Network round trip times can become a bottleneck Each TAO follower must be local to a tier of databases holding a complete copy of the social graph. This would be expensive Data Center locations are clustered into only a few regions. Local leader forwards writes to master database Writes that fail during the switch are reported to the client as failed and are not retried Master/slave design ensures that all reads can be satisfied within a single region at the expense of returning potentially stale data to clients
10
Architecture - Visualized
11
Consistency and Fault Tolerance Availability and performance are important Eventually consistent Database failure: slave database becomes the new master Leader failures: followers route read and write requests around it. Read misses go directly to DB, writes to a random leader in the tier Follower failure: other followers pick up the load
12
Production Workload Random sample of 6.5 million requests over 40 days Reads are more common than writes Most edge queries have empty results Query frequency, node connectivity, and data size have distributions with long tails Only 0.2% of requests involved writes!
13
Performance Availability Over a 90 day period a fraction of only 4.9 x 10 ^ -6 queries failed Hit Rates and Latency Hit rate for reads was 96.4% Average write latency in the same region was 12.1 msec and for remote regions 74.4 msec Replication Lag: Slave storage lags behind master by less than 1 second 85% of the time Less than 3 seconds 99% Less than 10 seconds 99.8%
14
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.