LINKEDIN COMMUNICATION ARCHITECTURE Ruslan Belkin, Sean Dawson.

Slides:



Advertisements
Similar presentations
MQ Series Cross Platform Dominant Messaging sw – 70% of market Messaging API same on all platforms Guaranteed one-time delivery Two-Phase Commit Wide EAI.
Advertisements

Building LinkedIn’s Real-time Data Pipeline
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
Unveiling ProjectWise V8 XM Edition. ProjectWise V8 XM Edition An integrated system of collaboration servers that enable your AEC project teams, your.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.
HyperContent 2.0 JA-SIG Winter Conference December 5, 2005 Alex Vigdor, Columbia University.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
 Java  Python  Bigtable(Bt) is a distributed storage system for managing structured data that is designed to scale to a very large size.  Query Language.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
One.box Distributed home service interface. Core Components Pop3 client Router Storage Pop3 Server.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Chapter 1 and 2 Computer System and Operating System Overview
September 2011 At A Glance The API provides a common interface to the GMSEC software information bus. Benefits Isolates both complexity of applications.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Passage Three Introduction to Microsoft SQL Server 2000.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
SQL Server Notification Services Andy Potter Senior System Consultant SQL Server Notification Services Intellinet.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
© 2009 IBM Corporation 1 RTC ClearQuest Importer and Synchronizer Lorelei Ngooi – RTC ClearQuest Synchronizer Lead.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Institute of Computer and Communication Network Engineering OFC/NFOEC, 6-10 March 2011, Los Angeles, CA Lessons Learned From Implementing a Path Computation.
Lecture On Database Analysis and Design By- Jesmin Akhter Lecturer, IIT, Jahangirnagar University.
Taiwan Network Information Center Introduction to TWNIC RMS (Resource Management System) 15 th APNIC NIR Meeting David Chen Feb 26,
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
CH2 System models.
Module 7: Fundamentals of Administering Windows Server 2008.
® IBM Software Group © 2007 IBM Corporation J2EE Web Component Introduction
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.
Module 9: Preparing to Administer a Server. Overview Introduction to Administering a Server Configuring Remote Desktop to Administer a Server Managing.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Mailjet and Microsoft Azure Offer All-in-One Infrastructure and Deliverability while Saving IT and Enterprise Time and Money with Scalability MICROSOFT.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
MISSION CRITICAL COMPUTING Siebel Database Considerations.
Group Communication Theresa Nguyen ICS243f Spring 2001.
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
Message Store CORE SYSTEMS MANAGEMENT AND AVAILABILITY INTEGRATION – COPPERPOINT.
IPS Infrastructure Technological Overview of Work Done.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Database Growth: Problems & Solutions.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
The Holmes Platform and Applications
MQ Series Cross Platform Dominant Messaging sw – 70% of market
Module 9: Preparing to Administer a Server
Netscape Application Server
Open Source distributed document DB for an enterprise
SmartHOTEL Planner Add-In for Outlook: Office 365 Integration Enhances Room Planning, Booking, and Guest Management for Small Hotels and B&Bs OFFICE 365.
Software Architecture in Practice
#01 Client/Server Computing
The Top 10 Reasons Why Federated Can’t Succeed
MQ Series Cross Platform Dominant Messaging sw – 70% of market
Module 9: Preparing to Administer a Server
TN19-TCI: Integration and API management using TIBCO Cloud™ Integration
#01 Client/Server Computing
Presentation transcript:

LINKEDIN COMMUNICATION ARCHITECTURE Ruslan Belkin, Sean Dawson

Learn how LinkedIn built and evolved scalable communication platform for the world’s largest professional network

Agenda  LinkedIn Communication Platform at a glance Evolution of LinkedIn Communication System Evolution of the Network Updates System  Scaling the system: from 0 to 23M members  Q&A

Communication Platform – Quick Tour

Communication Platform – The Numbers  23M members  130M connections  2M messages per day  250K invitations per day

Communication Platform – The Setup  Sun™ x86 platform and Sparc production hardware running Solaris™ Operating System  100% Java programming language  Tomcat and Jetty as application servers  Oracle and MySQL as DBs  ActiveMQ for JMS  Lucene as a foundation for search  Spring as a glue  Spring HTTP-RPC as communication protocol  Mac for development

Communication Platform – Overview  The Communication Service Permanent Message storage InBox messages s Batching, delayed delivery Bounce, cancellation Actionable content Rich content  The network updates service Short-lived notifications (events) Distribution across various affiliations and groups Time decay Events grouping and prioritization

Communication Service  How is it different: Workflow oriented Messages reference other objects in the system Incorporates delivery Batching of messages Message cancellation Delayed delivery, customer service review queues, abuse controls Supports reminders and bounce notifications to users  Has undergone continuous improvements throughout life of LinkedIn

Message Creation

Message Delivery

Communication Service  Message Creation Clients post messages via asynchronous Java Communications API using JMS Messages then are routed via routing service to the appropriate mailbox or directly for processing Multiple member or guest databases are supported  Message Delivery Message delivery is triggered by clients or by scheduled processes Delivery actions are asynchronous Messages can be batched for delivery into a single message Message content is processed through the JavaServer Page™ (JSP™) technology for pretty formatting The scheduler can take into account the time, delivery preferences, system load Bounced messages are processed and redelivered if needed Reminder system works the same way as message delivery system

Communication Service – Scheduler

Network Updates Service – Motivation  Homepage circa 2007  Poor UI Cluttered Where does new content go?  Poor Backend Integration Many different service calls Takes a long time to gather all of the data

Network Updates Service – Motivation  Homepage circa 2008  Clean UI Eliminates contention for homepage real estate  Clean Backend Single call to fetch updates Consistent update format

Network Updates Service – Iteration 1 API

 Pull-based architecture  Collectors Responsible for gathering data Parallel collection to improve performance  Resolvers Fetch state, batch lookup queries, etc… Use EHCache to cache global data (e.g., member info)  Rendering Transform each object into its XML representation

Network Updates Service – Iteration 1  Lessons learned: Centralizing updates into a single service leaves a single point of failure Be prepared to spend time tuning the HttpConnectionManager (timeouts, max connections) While the system was stabilizing, it was affecting all users; should have rolled the new service out to a small subset! Don’t use “Least Frequently Used” (LFU) in a large EHCache—very bad performance!

Network Updates Service – Iteration 2  Hollywood Principle: “Don’t call me, I’ll call you”  Push update when an event occurs  Reading is much quicker since we don’t have to search for the data!  Tradeoffs Distributed updates may never be read More storage space needed

Network Updates Service – Iteration 2 (Push)

Network Updates Service – Iteration 2 (Read)

Network Updates Service – Iteration 2  Pushing Updates Updates are delivered via JMS Aggregate data stored in 1 CLOB column for each target user Incoming updates are merged into the aggregate structure using optimistic locking to avoid lock contention  Reading Updates Add a new collector that reads from the Update Database Use Digesters to perform arbitrary transformations on the stream of updates (e.g, collapse 10 updates from a user into 1)

Network Updates Service – Iteration 2  Lessons learned: Underestimated the volume of updates to be processed CLOB block size was set to 8k, leading to a lot of wasted space (which isn’t reclaimed!) Real-time monitoring/configuration with Java Management Extension (JMX™) specification was extremely helpful

Network Updates Service – JMX Monitoring  We use JMX™ extensively for monitoring and runtime configuration  For the Network Updates Service, we have statistics for: EHCache (cache hits, misses) ThreadPoolExecutor health (number of concurrent threads) ActiveMQ (queue size, load) Detailed timing information for update persistence  Extensive runtime configuration: Ability to tune update thresholds Ability to randomly throw out (or disable) updates when load is high (configurable by type)

JMX Interface – Cache Monitoring

JMX Interface – Network Updates Configuration

Network Updates Service – Iteration 3  Updating a CLOB is expensive  Goal: Minimize the number of CLOB updates Use an overflow buffer Reduce the size of the updates

Network Updates Service – Iteration 3  Add VARCHAR(4000) column that acts as a buffer  When the buffer is full, dump it to the CLOB and reset  Avoids over 90% of CLOB updates (depending on type), while still retaining the flexibility for more storage

Scaling the system  What you learn as you scale: A single database does not work Referential integrity will not be possible Cost becomes a factor: databases, hardware, licenses, storage, power Any data loss is a problem Data warehousing and analytics becomes a problem Your system becomes a target for spamming exploits, data scraping, etc.  What to do: Partition everything: by user groups by domain by function Caching is good even when it’s only modestly effective Give up on 100% data integrity Build for asynchronous flows Build with reporting in mind Expect your system to fail at any point Never underestimate growth trajectory

LinkedIn Communication Architecture Ruslan Belkin Sean Dawson Slides will be posted at We are hiring, join our team!

LinkedIn Spring Extensions  Automatic context instantiation from multiple spring files  LinkedIn Spring Components  Property expansion  Automatic termination handling  Support for Builder Pattern  Custom property editors: Timespan (30s, 4h34m, etc.) Memory Size, etc.