Presentation is loading. Please wait.

Presentation is loading. Please wait.

A tutorial on building large-scale services

Similar presentations


Presentation on theme: "A tutorial on building large-scale services"— Presentation transcript:

1 A tutorial on building large-scale services
James Ge (戈君)

2 As a large-scale online service
7*24 stable Aggregate feedback from users in-time and iterate quickly Highly efficient testing, deployment, maintenance Massive in-parallel development

3 From the framework perspective

4 Building a service TCP/IP guarantees reliable data transmissions, however building a service needs more abstractions: What is the format of data transmission? Can multiple requests be sent through one TCP connection simultaneously? How to talk with a cluster with many machines? What should I do when the connection is broken? What if the server does not respond? ...

5 RPC Abstract network communications as "clients accessing functions on servers“ Data needs to be serialized which is done by protobuf pretty well. Creating and re-using of connections are transparent to users, but users can choose different connection types: short, pooled, single. Machines are discovered by Naming Services. RPC retries when the connection is broken. When server does not respond within given time, client fails with a timeout error.

6 Serve multiple protocols in one port, or access all sorts of services.
A industrial-grade RPC framework, with 1,000,000+ instances(not counting clients) and thousands kinds of services. Serve multiple protocols in one port, or access all sorts of services. Servers can handle requests synchronously or asynchronously. Clients can access servers synchronously, asynchronously, semi-synchronously, or use combo channels to simplify sharded or parallel accesses. Debug services via http and run profilers. Better latency and throughput.

7 Different color = different thread
Client side Server side work stealing scheduling ABA-free Acceptor 1 bthread for 1 request no locking NS Wait-free Channel 1 LB Socket Socket Event Dispatcher Parse Process Request bthread swap (saving a CS) NS Process Request KeepWrite Channel 2 LB Concurrency within fd Channel 3 Socket Socket Parse Process Request Process Response Parse Event Dispatcher Service 1 Process Response Service 2 Locate context in O(1) time w/o global contention Process Response Parse Different color = different thread

8 The built-in services

9 bvar QPS & counting Percentiles & CDF Latency Min & Max
System-wide stats Per-second stats Can be within any time window

10 bvar #include <bvar/bvar.h>
bvar::LatencyRecorder g_reader_hole_latency("ds_common_log_channel_reader_hole"); void table_search() { ... base::Timer tm; tm.start(); channel_reader_hole(); tm.stop(); g_reader_hole_latency << tm.u_elapsed(); }

11 From the architectural perspective

12 A typical (isomorphic) service
Client Load Balancer Naming Service RPC Server Server Server

13 Add a server Client Sync periodically or by event-driven Load Balancer
Naming Service Registration at start Server (Active) Server (Active) Server (Active) Server (Inactive)

14 Remove a server Client Sync periodically or by event-driven
Load Balancer Naming Service Unregistration Server (Active) Server (Active) Server (Active) Server (Inactive) Removable

15 When a server crashes Retry Client Load Balancer Naming Service
Idempotence should be handled properly sometimes Server Server Server Server

16 When the NS crashes Client Load Balancer Fail to sync Naming Service
RPCs are unaffected, however servers are not updated anymore. Server Server Server

17 Experiments Client Client Experiment Framework
Choose exps according to probabilities and layers Load Balancer Naming Service A lot of modules developed by different teams in parallel Module 1 Module 2 Module 3 Module N ….. Exps(■ ■) Exps(■ ■) Server Server Server Independent Mutually exclusive

18 Collect data …… Machine Server Logs Agent http Machine Server Logs
Serving logs User defined variables Tracing logs Profiling results …… Message Queues Services

19 Dev cycles Dev repo Last good Releasing branch Add new features
Auto tests Periodically Analyze Failure, reject commit Turn off unstable features Data Dashboard Running… Online Deployment More tests Unstable, rollback Previous online ….


Download ppt "A tutorial on building large-scale services"

Similar presentations


Ads by Google