Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.

Slides:



Advertisements
Similar presentations
Capacity Planning for LAMP Architectures John Allspaw Manager, Operations Flickr.com Web Builder 2.0 Las Vegas.
Advertisements

Fast Data at Massive Scale Lessons Learned at Facebook Bobby Johnson.
Finding a needle in Haystack Facebook’s Photo Storage
Scalable Data Srinivas Narayanan 11/13/09.
Inner Architecture of a Social Networking System Petr Kunc, Jaroslav Škrabálek, Tomáš Pitner.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
REK’s adaptation of Prof. Claypool’s adaptation of
2/18/2004 Challenges in Building Internet Services February 18, 2004.
2/11/2004 Internet Services Overview February 11, 2004.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Answering the Database Scale Out Problem: SSDs in the Data Center April 14, 2010 Dan Marriott Director - Production Operations
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
Introduction to Cyberspace
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
© 2011 MindTree Limited CONFIDENTIAL: For limited circulation only e-Commerce web app Architecture and Scalability Srinivas Bhagavatula.
What makes Facebook do what it does? By Gavin Mais.
Introduction To Windows Azure Cloud
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Why I LIKE the Facebook Database… Sharon Viente May 2010.
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.
CPSC 441: Multimedia Networking1 Outline r Scalable Streaming Techniques r Content Distribution Networks.
Data Structures & Algorithms and The Internet: A different way of thinking.
Cassandra - A Decentralized Structured Storage System
DIGITAL WORLDWIDE Ashish. s  107 trillion – The number of s sent on the Internet in  294 billion – Average number of messages.
Conversing in the Cloud Ryan Kupfer, Scott Wetter, Bryan Welfel, Shekhar Pradhan.
Sharing Social Content from Home: A Measurement-driven Feasibility Study Massimiliano Marcon Bimal Viswanath Meeyoung Cha Krishna Gummadi NOSSDAV 2011.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Knowing Your Limits Jason Fish #bweb11.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Presented By: Nick Koziol ISC110.  Had 1.19 billion members as of October  Largest social networking site in the world  Mark Zuckerberg  Many databases.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Bigtable: A Distributed Storage System for Structured Data
Cloud Computing from a Developer’s Perspective Shlomo Swidler CTO & Founder mydrifts.com 25 January 2009.
CS422 Principles of Database Systems Introduction to NoSQL Chengyu Sun California State University, Los Angeles.
Cloud Computing: Pay-per-Use for On-Demand Scalability Developing Cloud Computing Applications with Open Source Technologies Shlomo Swidler.
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Wikimedia architecture Ryan Lane Wikimedia Foundation Inc.
Finding a needle in Haystack: Facebook’s photo storage OSDI 2010
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Cassandra - A Decentralized Structured Storage System
Instructor: Ahmed Jafer
Finding a needle in Haystack: Facebook’s photo storage OSDI 2010
CSE-291 Cloud Computing, Fall 2016 Kesden
Finding a Needle in Haystack : Facebook’s Photo storage
Steve Ko Computer Sciences and Engineering University at Buffalo
Platform as a Service.
1. Public Network - Each Rackspace Cloud Server has two networks
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
LECTURE 34: WEB PROGRAMMING FOR SCALE
Utilization of Azure CDN for the large file distribution
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Steve Ko Computer Sciences and Engineering University at Buffalo
LECTURE 32: WEB PROGRAMMING FOR SCALE
LECTURE 33: WEB PROGRAMMING FOR SCALE
AWS Cloud Computing Masaki.
Internet Protocols IP: Internet Protocol
Introduction to Cyberspace
LECTURE 33: WEB PROGRAMMING FOR SCALE
Caching 50.5* + Apache Kafka
Fast Accesses to Big Data in Memory and Storage Systems
Presentation transcript:

Scalable Data

Scale

#2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries Over 300 million active users More than 2 32 photos … 100 million search queries per day > 3.9 trillion feed actions processed per day 2 billion pieces of content per week 6 billion minutes per day

Growth Rate M Active Users

Social Networks

nikos | METIS | OSNs are popular! OSNs have become wildly popular over last few years, FB > 800M, Twitter > 230M etc. Distributed across the planet Changed how content is created + consumed: inherently long-tailed as only ‘ friends ’ are interested Explosion of smartphones: -Photos/HD videos easy to shoot and share

Scaling Social Networks ▪ Much harder than typical websites where... ▪ Typically 1-2% online: easy to cache the data ▪ Partitioning & scaling relatively easy ▪ What do you do when everything is interconnected?

name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo name, status, privacy, video thumbnail name, status, privacy, profile photo

System Architecture

Overall achitecture: Facebook ▪ Facebook has 2 datacenters, 1 per coast ▪ reads spread across both ▪ writes only to W. Coast; periodically (~10 minutes) replicated to E. Coast ▪ >2000 MySQL servers, >25TB RAM for memcached ▪ Challenge: inconsistency due to stale data ▪ I change status message => Friends on East Coast datacenter don’t see change for 10 min ▪ What if E.Coast person changes own status?? 10

Web at 100 feet: georeplication & CDN’s Source: “How Facebook Works”,Technology Review, Jul/Aug

Architecture Database (slow, persistent) Load Balancer (assigns a web server) Web Server (PHP assembles data) Memcache (fast, simple)

▪ Simple in-memory hash table ▪ Supports get/set,delete,multiget, multiset ▪ Not a write-through cache ▪ Pros and Cons ▪ The Database Shield! ▪ Low latency, very high request rates ▪ Can be easy to corrupt, inefficient for very small items Memcache

▪ Multithreading and efficient protocol code - 50k req/s ▪ Polling network drivers - 150k req/s ▪ Breaking up stats lock - 200k req/s ▪ Batching packet handling - 250k req/s ▪ Breaking up cache lock - future Memcache Optimization

Network Incast Many Small Get Requests Memcache Switch PHP Client

Memcache Switch PHP Client Many big data packets Network Incast

Memcache Switch PHP Client Network Incast

Memcache Switch PHP Client Network Incast

Memcache 3 Objects PHP Client 3 round trips total1 round trip per server 4 Objects Memcache 3 Objects Memcache Clustering

ScribeScribeScribe ScribeScribeScribe ScribeScribeScribe Thousands of MySQL servers in two datacenters MySQL has played a role from the beginning

Photos

Photos + Social Graph = Awesome!

Photos: Scale ▪ 20 billion photos x4 = 80 billion ▪ Would wrap around the world more than 10 times! ▪ Over 40M new photos per day ▪ 600K photos / second

Photos Scaling - The easy wins ▪ Upload tier - handles uploads, scales images, stores on NFS ▪ Serving tier: Images served from NFS via HTTP ▪ However... ▪ File systems are not good at supporting large number of files ▪ Metadata too large to fit in memory causing too many IOs for each file read ▪ Limited by I/O not storage density ▪ Easy wins ▪ CDN ▪ Cachr (http server + caching) ▪ NFS file handle cache