Data Infrastructure at LinkedIn Shirshanka Das XLDB 2011 1.

Slides:

Advertisements

Similar presentations

The Big Data Ecosystem at LinkedIn

Advertisements

Building LinkedIn’s Real-time Data Pipeline

Data Freeway : Scaling Out to Realtime Author: Eric Hwang, Sam Rash Speaker : Haiping Wang

Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox

PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,

Amazon RDS (MySQL and Oracle) and SQL Azure Emil Tabakov Telerik Software Academy academy.telerik.com.

Kafka high-throughput, persistent, multi-reader streams

Engineering v v Adam Cataldo Tuesday, January 24, 2012 Quick Deploy A distributed systems approach to developer productivity.

The Evolution of Data Infrastructure at Linkedin LinkedIn Confidential ©2013 All Rights Reserved.

A Java Architecture for the Internet of Things Noel Poore, Architect Pete St. Pierre, Product Manager Java Platform Group, Internet of Things September.

PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.

© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,

Mint-user MINT Technical Overview October 8 th, 2010.

Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.

Understanding and Managing WebSphere V5

PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.

Apache Spark and the future of big data applications Eric Baldeschwieler.

SQL Server Replication By Karthick P.K Technical Lead, Microsoft SQL Server.

Windows Azure SQL Database and Storage Name Title Organization.

Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.

Gary MacDougall Premjit Singh Managing your Distributed Data.

USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.

Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,

Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.

INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

SURENDER SARA 10GAS Building Corporate KPI’s

IMDGs An essential part of your architecture. About me

Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.

FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.

Communicate with All Workers Involved in the Process of Delivering High-Quality Health Care by Choosing Dossier365 on the Azure Platform MICROSOFT AZURE.

1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.

MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.

Case Study ProsperaSoft’s global sourcing model gives the maximum benefit to customers in terms of cost savings, improved quality, access to highly talented.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.

Wellstorm Development Connecting Real Time Data to Everything Hugh Winkler May 11, 2006.

The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.

MongoDB is a database management system designed for web applications and internet infrastructure. The data model and persistence strategies are built.

Internet Architecture and Governance

Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.

Geo-distributed Messaging with RabbitMQ

Building micro-service based applications using Azure Service Fabric

Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.

 Cachet Technologies 1998 Cachet Technologies Technology Overview February 1998.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?

MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.

Introduction to Web Services Presented by Sarath Chandra Dorbala.

Mark Gilbert Microsoft Corporation Services Taxonomy Building Block Services Attached Services Finished Services.

Bigtable: A Distributed Storage System for Structured Data

Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |

1 Options Clearing Corporation Encore Data Distribution Services April 22, 2004.

Page  1 Social Media in Business Session 8 Agenda Guest Speaker: Cathy Reilly, Agros.org Facebook and LinkedIn Student Presentations Class Project: Create.

Apache Kafka A distributed publish-subscribe messaging system

Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:

CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook

1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.

Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.

Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.

Improving searches through community clustering of information

CHAPTER 3 Architectures for Distributed Systems

Uber How to Stream Data with StorageTapper

Arrested by the CAP Handling Data in Distributed Systems

Evolution of messaging systems and event driven architecture

Mark Quirk Head of Technology Developer & Platform Group

Presentation transcript:

Data Infrastructure at LinkedIn Shirshanka Das XLDB

Me  UCLA Ph.D (Distributed protocols in content delivery networks)  PayPal (Web frameworks and Session Stores)  Yahoo! (Serving Infrastructure, Graph Indexing, Real-time Bidding in Display Ad Exchanges) LinkedIn (Distributed Data Systems team): Distributed data transport and storage technology (Kafka, Databus, Espresso,...) 2

Outline  LinkedIn Products  Data Ecosystem  LinkedIn Data Infrastructure Solutions  Next Play 3

LinkedIn By The Numbers  120,000,000+ users in August 2011  2 new user registrations per second  4 billion People Searches expected in 2011  2+ million companies with LinkedIn Company Pages  81+ million unique visitors monthly *  150K domains feature the LinkedIn Share Button  7.1 billion page views in Q  1M LinkedIn Groups * Based on comScore, Q

5 Member Profiles

Signal - faceted stream search 6

People You May Know 7

Outline  LinkedIn Products  Data Ecosystem  LinkedIn Data Infrastructure Solutions  Next Play 8

Three Paradigms : Simplifying the Data Continuum Member Profiles Company Profiles Connections Communications Online Signal Profile Standardization News Recommendations Search Communications Nearline People You May Know Connection Strength News Recommendations Next best idea Offline 9 Activity that should be reflected immediately Activity that should be reflected soon Activity that can be reflected later

Data Infrastructure Toolbox (Online) Capabilities Key-value access Rich structures (e.g. indexes) Change capture capability Search platform Graph engine 10 Systems Analysis {

Data Infrastructure Toolbox (Nearline) 11 Systems Analysis

Data Infrastructure Toolbox (Offline) Capabilities Machine learning, ranking, relevance Analytics on Social gestures 12 Systems Analysis

Laying out the tools 13

Outline  LinkedIn Products  Data Ecosystem  LinkedIn Data Infrastructure Solutions  Next Play 14

Focus on four systems in Online and Nearline  Data Transport –Kafka –Databus  Online Data Stores –Voldemort –Espresso 15

Kafka: High-Volume Low-Latency Messaging System LinkedIn Data Infrastructure Solutions 16

Kafka: Architecture 17 WebTier Topic 1 Broker Tier Push Event s Topic 2 Topic N Zookeeper Offset Management Topic, Partition Ownership Sequential write sendfile Kafka Client Lib Consumers Pull Events Iterator 1 Iterator n Topic  Offset 100 MB/sec 200 MB/sec  Billions of Events  TBs per day  Inter-colo: few seconds  Typical retention: weeks Scale Guarantees  At least once delivery  Very high throughput  Low latency  Durability

Databus : Timeline-Consistent Change Data Capture LinkedIn Data Infrastructure Solutions 18

Relay Databus at LinkedIn Event Win 19 DB Bootstrap Capture Changes On-line Changes On-line Changes DB Compressed Delta Since T Consistent Snapshot at U Consumer 1 Consumer n Databus Client Lib Client Consumer 1 Consumer n Databus Client Lib Client Features  Transport independent of data source: Oracle, MySQL, …  Portable change event serialization and versioning  Start consumption from arbitrary point Guarantees  Transactional semantics  Timeline consistency with the data source  Durability (by data source)  At-least-once delivery  Availability  Low latency

Voldemort: Highly-Available Distributed Data Store LinkedIn Data Infrastructure Solutions 20

Highlights Open source Pluggable components Tunable consistency / availability Key/value model, server side “views” In production Data products Network updates, sharing, page view tracking, rate-limiting, more… Future: SSDs, multi-tenancy Voldemort: Architecture

Espresso: Indexed Timeline-Consistent Distributed Data Store LinkedIn Data Infrastructure Solutions 22

Espresso: Key Design Points  Hierarchical data model –InMail, Forums, Groups, Companies  Native Change Data Capture Stream –Timeline consistency –Read after Write  Rich functionality within a hierarchy –Local Secondary Indexes –Transactions –Full-text search  Modular and Pluggable –Off-the-shelf: MySQL, Lucene, Avro 23

Application View 24

Partitioning 25

Node 3 Node 2 Partition Layout: Master, Slave Cluster Manager Partition: P.1 Node: 1 … Partition: P.12 Node: 3 Partition: P.1 Node: 1 … Partition: P.12 Node: 3 Database Node: 1 M: P.1 – Active … S: P.5 – Active … Node: 1 M: P.1 – Active … S: P.5 – Active … Cluster Node 1 P.1 P.2 P.4 P.3 P.5 P.6 P.9 P.1 0 P.5 P.6 P.8 P.7 P.1 P.2 P.11 P.1 2 P.9 P.1 0 P.1 2 P.11 P.3 P.4 P.7 P.8 Master Slave 3 Storage Engine nodes, 2 way replication

Espresso: API  REST over HTTP  Get Messages for bob –GET /MailboxDB/MessageMeta/bob  Get MsgId 3 for bob –GET /MailboxDB/MessageMeta/bob/3  Get first page of Messages for bob that are unread and in the inbox –GET /MailboxDB/MessageMeta/bob/?query=“+isUnread:true +isInbox:true”&start=0&count=15 27

Espresso: API Transactions Add a message to bob’s mailbox transactionally update mailbox aggregates, insert into metadata and details. POST /MailboxDB/*/bob HTTP/1.1 Content-Type: multipart/binary; boundary= Accept: application/json Content-Type: application/json Content-Location: /MailboxDB/MessageStats/bob Content-Length: 50 {“total”:”+1”, “unread”:”+1”} Content-Type: application/json Content-Location: /MailboxDB/MessageMeta/bob Content-Length: 332 {“from”:”…”,”subject”:”…”,…} Content-Type: application/json Content-Location: /MailboxDB/MessageDetails/bob Content-Length: 542 {“body”:”…”} — 28

Espresso: System Components 29

LinkedIn  First applications –Company Profiles –InMail  Next –Unified Social Content Platform –Member Profiles –Many more… 30

Espresso: Next steps  Launched first application Oct 2011  Open source 2012  Multi-Datacenter support  Log-structured storage  Time-partitioned data 31

Outline  LinkedIn Products  Data Ecosystem  LinkedIn Data Infrastructure Solutions  Next Play 32

The Specialization Paradox in Distributed Systems  Good: Build specialized systems so you can do each thing really well  Bad: Rebuild distributed routing, failover, cluster management, monitoring, tooling 33

Generic Cluster Manager: Helix Generic Distributed State Model Centralized Config Management Automatic Load Balancing Fault tolerance Health monitoring Cluster expansion and rebalancing Open Source 2012 Espresso, Databus and Search 34

Stay tuned for  Innovation –Nearline processing –Espresso eco-system –Storage / indexing –Analytics engine –Search  Convergence –Building blocks for distributed data management systems 35

Thanks! 36

Appendix 37

Espresso: Routing  Router is a high-performance HTTP proxy  Examines URL, extracts partition key  Per-db routing strategy –Hash Based –Route To Any (for schema access) –Range (future)  Routing function maps partition key to partition  Cluster Manager maintains mapping of partition to hosts: –Single Master –Multiple Slaves 38

Espresso: Storage Node  Data Store (MySQL) –Stores document as Avro serialized blob –Blob indexed by (partition key {, sub-key}) –Row also contains limited metadata  Etag, Last modified time, Avro schema version  Document Schema specifies per-field index constraints  Lucene index per partition key / resource 39

Espresso: Replication  MySQL replication of mastered partitions  MySQL “Slave” is MySQL instance with custom storage engine –custom storage engine just publishes to databus  Per-database commit sequence number  Replication is Databus –Supports existing downstream consumers  Storage node consumes from Databus to update secondary indexes and slave partitions 40