Arrested by the CAP Handling Data in Distributed Systems Aviran Mordo, VP of Engineering, Wix.com Twitter: @aviranm linkedin/aviran aviransplace.com
Service A Service B System A, two systems
What is this arrow? Service A Service B Arrow represent a distributed system
Microservices = Distributed System eCom Catalog syb-system
Over 800 Microservices (unique) in Production
Hello Aviran Mordo, VP of Engineering, Wix.com @aviranm
Wix.com in Numbers 130M website builders (+2M monthly) 600M monthly visitors Multiple clouds & data centers (Google, Amazon) Over 800 microservices 2000 Employees (~50% R&D) #5 best software companies to work for worldwide (according to Glassdoor)
AGENDA Avoiding database transactions Handling database schema changes Read consistency in a distributed system Dealing with multiple datacenters
01 Avoid DB Transactions
Create an Invoice
Create an Invoice Header Multiple line items Master – details tables
Create an Invoice Header Save as Transaction Multiple line items
Create an Site Multiple Pages Just like invoice with multiple line items, we save a site with multiple pages
How do we save multiple pages in a transaction (without DB transaction)?
Replace DB Transaction with Logical Transaction
Saving a Wix Site’s Data Browser Saving a Wix Site’s Data List of page IDs Editor Server Save page(s) Save header Save each page as an atomic operation Finalize transaction by sending site header (pointers to pages) Site Pages DB Save page(s) Site Header DB Save header Can generate orphaned pages, not a problem in practice Logical DB transaction
Master-Master Replication across DCs MySQL Active – Active DC-1 DC-2 Master-Master Replication across DCs Pages MySQL Pages MySQL Replicating data across DC (conflicts)
Write Traffic may Flow to Both Datacenters Browser Browser Save page Save page DC-1 DC-2 Write Traffic may Flow to Both Datacenters Pages MySQL Pages MySQL
Stop replication or Ignore conflict (drop incoming) Wix users change millions of pages every day. DC-1 DC-2 Replication Conflict Pages MySQL Pages MySQL MySQL strategy Stop replication or Ignore conflict (drop incoming)
DB Conflicts can be safely ignored as content is identical Avoiding Replication Conflicts DC-1 DC-2 Pages MySQL Pages MySQL Page ID is a content-based hash: • Immutable data • Idempotent operation DB Conflicts can be safely ignored as content is identical
02 Database & Schema Changes
No Downtime
Database Changes Add Fields Remove Fields Complete Schema / Database Change Altering very large tables may take a very long time and cause downtime.
Database Changes Add Fields Remove Fields Complete Schema / Database Change 1.1. For adding metadata (non-indexed fields) Use a blob field for schema flexibility (JSON works really well).
Database Changes Add Fields Remove Fields Complete Schema / Database Change 1.1. For adding metadata (non-indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed) Use another table and join by primary key.
Database Changes Add Fields Remove Fields Complete Schema / Database Change 1.1. For adding metadata (non-indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed fields) Use another table and join by primary key. 2. Stop using it in the code. Do not do any DB schema changes.
Database Changes Add Fields Remove Fields Complete Schema / Database Change 1.1. For adding metadata (non-indexed fields) Use a blob field for schema flexibility (JSON works really well). 1.2. If the fields are searchable (indexed fields) Use another table and join by primary key. 2. Stop using it in the code. Do not do any DB schema changes. 3. Lazy migration
Feature Toggles
Feature Toggle = Code branch FT Open New Code Old Code Mitigate risk by gradually exposing a feature http://github.com/wix/petri
Feature Toggle = Code branch FT Open FT Open Not just a Boolean, can also be a state. Can have criteria: Company employees Specific users / group Percentage of traffic By GEO By Language By user-agent User Profile based Any other context… New Code Old Code Mitigate risk by gradually exposing a feature http://github.com/wix/petri
New DB Schema with Data Migration Plan a lazy migration path controlled by feature toggle Deploy the new schema/DB
Distributed Transaction #1 #2 Write to both (first old then new) / Read from old Warning! Distributed Transaction #3 Write to both / Read from New, fallback to old Fail on write to old, “ignore" failure on new Backward compatibility is a must! Write to old / Read from old #4 Write only to New / Read from new, fallback to old #5 Eagerly migrate data in the background #6 Write and Read to new - Remove migration code Point of No Return Your old DB is now read-only and will not change. http://www.aviransplace.com/2015/12/15/safe-database-migration-pattern-without-downtime/
Remove old DB http://www.aviransplace.com/2015/12/15/safe-database-migration-pattern-without-downtime/ https://hiveminer.com/Tags/cosplay%2Cgoldfish
03 Consistent Read
Glasses.com Store owner Customer In this use case we have 2 actors that need different consistency level
Store owner updates a product’s details UpdateProduct(…) Product Service Save data Master DB Slave DB Replicate
Customer wants to view a product GetProduct(…) Product Service Read data Master DB Slave DB Replicate
Store owner wants to view a product for update Usually not an issue... GetProduct(…) Product Service Read data Master DB Slave DB Replicate
Store owner wants to view a product for update ...unless there’s a replication lag. GetProduct(…) Product Service Read data Master DB Slave DB Replicate
Store owner wants to view a product for update Separate API for consistent reads GetConsistentProduct(…) Product Service Read data Master DB Slave DB Replicate Good for read after write
04 Multiple Datacenters
Multiple Data Centers DC-1 DC-2 Replicate Product Service GetConsistentProduct(…) GetConsistentProduct(…) DC-1 DC-2 Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate
Cross DC Replication Lag GetConsistentProduct(…) GetConsistentProduct(…) DC-1 DC-2 Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate Inconsistent data
Cross DC Flows DC-1 DC-2 Replicate Load Balancer Load Balancer Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate
Option 1 Pin APIs to Active DC
GetConsistentProduct(…) Configure Master DC in the LB Configure API-level Stickiness DC-1 DC-1 DC-2 Master DC GetConsistentProduct(…) GetConsistentProduct(…) Load Balancer Load Balancer Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate
GetConsistentProduct(…) Configure Master DC in the LB Configure API-level Stickiness Pros: Fine grain control over API No changes for the service Cons: Complicated LB configuration Multiple connection strings (one for master and one for replica DB DC-1 DC-1 DC-2 Master DC GetConsistentProduct(…) GetConsistentProduct(…) Load Balancer Load Balancer Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate
Separate read/write Services Option #2 Separate read/write Services
Configure Master DC in the LB Configure Service-level Stickiness GetConsistentProduct(…) Master DC Load Balancer Load Balancer Product Write Service Product Read Service Product Write Service Product Read Service Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate Seperating services also help scaling No need for 2 DB connetion strings one for master and other for replica like in the prev exmple
Configure Master DC in the LB Configure Service-level Stickiness Pros: No multiple DB connection strings Simpler LB configuration Fits microservices architecture best practice Better for scaling read services Cons: More complicated system (adding another microservice) Additional service for the client to talk with DC-1 DC-1 DC-2 GetConsistentProduct(…) Master DC Load Balancer Load Balancer Product Write Service Product Read Service Product Write Service Product Read Service Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate Seperating services also help scaling No need for 2 DB connetion strings one for master and other for replica like in the prev exmple
Pin DB to Service using SQLProxy Option #3 Pin DB to Service using SQLProxy
Configure Master DC in the SQL Proxy GetConsistentProduct(…) Master DC Load Balancer Load Balancer Product Service Product Service SQL Proxy SQL Proxy Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate
Configure Master DC in the SQL Proxy Pros: Simple microservice DB configuration DB replication lag monitoring Adds DB maintenance flexibility Cons: Adding DB access latency Take away control from the developers DC-1 DC-1 DC-2 GetConsistentProduct(…) Master DC Load Balancer Load Balancer Product Service Product Service SQL Proxy SQL Proxy Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate
Option #4 Redirect Client
Client Routing Browser Replicate DC-1 DC-2 Product Service GetProduct(…) GetProduct(…) Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate DC-1 DC-2
Client Routing Browser Master DC Replicate DC-1 DC-2 Product Service GetConsistentProduct(…) Browser GetProduct(…) GetProduct(…) Master DC Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate DC-1 DC-2
Client Routing Browser Master DC Replicate DC-1 DC-2 Pros: Fine grain control over API Simpler DC configuration Cons: Complicated client configuration Traffic changes need to update all clients with new config GetConsistentProduct(…) Browser GetProduct(…) GetProduct(…) Master DC Product Service Product Service Read data Read data Master DB Slave DB Replicate Master DB Slave DB Replicate Replicate DC-1 DC-2
RECAP Option 1– API-level cross DC Option 2 – Separate Service Option 3 - ProxySQL (pin to DC) Option 4 – Client routing
WHAT WE DO AT WIX Option 1– API-level cross DC Option 2 – Separate Service Option 3 - ProxySQL (pin to DC) Option 4 – Client routing
Informing the users of eventual consistency processes Your changes are being applied, it may take few minutes to show up on the site… Just like invoice with multiple line items, we save a site with multiple pages
Client API should remain simple Is Store Owner GetConsistentProduct GetProduct Yes No Slave DB Master Replicate Product Write Service Product Read Service API server GetProduct(…) GetConsistentProduct(…) Client API should not be aware of consistency concerns
Arrow -> Distributed System Avoiding database transactions Handling database schema changes Read consistency in a distributed system Dealing with multiple datacenters
Thank You Download presentation at: http://wix.to/sUCZAGY twitter@aviranm linkedin/aviran aviransplace.com