Download presentation
Presentation is loading. Please wait.
Published byJuan Carlos Fidalgo Modified over 5 years ago
1
A tutorial on building large-scale services
James Ge (戈君)
2
As a large-scale online service
7*24 stable Aggregate feedback from users in-time and iterate quickly Highly efficient testing, deployment, maintenance Massive in-parallel development
3
From the framework perspective
4
Building a service TCP/IP guarantees reliable data transmissions, however building a service needs more abstractions: What is the format of data transmission? Can multiple requests be sent through one TCP connection simultaneously? How to talk with a cluster with many machines? What should I do when the connection is broken? What if the server does not respond? ...
5
RPC Abstract network communications as "clients accessing functions on servers“ Data needs to be serialized which is done by protobuf pretty well. Creating and re-using of connections are transparent to users, but users can choose different connection types: short, pooled, single. Machines are discovered by Naming Services. RPC retries when the connection is broken. When server does not respond within given time, client fails with a timeout error.
6
Serve multiple protocols in one port, or access all sorts of services.
A industrial-grade RPC framework, with 1,000,000+ instances(not counting clients) and thousands kinds of services. Serve multiple protocols in one port, or access all sorts of services. Servers can handle requests synchronously or asynchronously. Clients can access servers synchronously, asynchronously, semi-synchronously, or use combo channels to simplify sharded or parallel accesses. Debug services via http and run profilers. Better latency and throughput.
7
Different color = different thread
Client side Server side work stealing scheduling ABA-free Acceptor 1 bthread for 1 request no locking NS Wait-free Channel 1 LB Socket Socket Event Dispatcher Parse Process Request bthread swap (saving a CS) NS Process Request KeepWrite Channel 2 LB Concurrency within fd Channel 3 Socket Socket Parse Process Request Process Response Parse Event Dispatcher Service 1 Process Response Service 2 Locate context in O(1) time w/o global contention Process Response Parse Different color = different thread
8
The built-in services
9
bvar QPS & counting Percentiles & CDF Latency Min & Max
System-wide stats Per-second stats Can be within any time window
10
bvar #include <bvar/bvar.h>
bvar::LatencyRecorder g_reader_hole_latency("ds_common_log_channel_reader_hole"); void table_search() { ... base::Timer tm; tm.start(); channel_reader_hole(); tm.stop(); g_reader_hole_latency << tm.u_elapsed(); }
11
From the architectural perspective
12
A typical (isomorphic) service
Client Load Balancer Naming Service RPC Server Server Server
13
Add a server Client Sync periodically or by event-driven Load Balancer
Naming Service Registration at start Server (Active) Server (Active) Server (Active) Server (Inactive)
14
Remove a server Client Sync periodically or by event-driven
Load Balancer Naming Service Unregistration Server (Active) Server (Active) Server (Active) Server (Inactive) Removable
15
When a server crashes Retry Client Load Balancer Naming Service
Idempotence should be handled properly sometimes Server Server Server Server
16
When the NS crashes Client Load Balancer Fail to sync Naming Service
RPCs are unaffected, however servers are not updated anymore. Server Server Server
17
Experiments Client Client Experiment Framework
Choose exps according to probabilities and layers Load Balancer Naming Service A lot of modules developed by different teams in parallel Module 1 Module 2 Module 3 Module N ….. Exps(■ ■) Exps(■ ■) Server Server Server Independent Mutually exclusive
18
Collect data …… Machine Server Logs Agent http Machine Server Logs
Serving logs User defined variables Tracing logs Profiling results … …… Message Queues Services
19
Dev cycles Dev repo Last good Releasing branch Add new features
Auto tests Periodically Analyze Failure, reject commit Turn off unstable features Data Dashboard Running… Online Deployment More tests Unstable, rollback Previous online ….
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.