Wikimedia architecture Ryan Lane Wikimedia Foundation Inc.

Slides:



Advertisements
Similar presentations
Module 13: Implementing ISA Server 2004 Enterprise Edition: Site-to-Site VPN Scenario.
Advertisements

CHAPTER 15 WEBPAGE OPTIMIZATION. LEARNING OBJECTIVES How to test your web-page performance How browser and server interactions impact performance What.
ITIS 3110 Jason Watson. Replication methods o Primary/Backup o Master/Slave o Multi-master Load-balancing methods o DNS Round-Robin o Reverse Proxy.
Brion Vibber Google Tech Talk April 28, 2006 Wikimedia Foundation, Inc. Wikipedia and MediaWiki What are those crazy wiki people up to anyway?
Building Wikipedia Scalable LAMP on a shoestring budget Brion VibberGatorJUG
Upcoming MediaWiki goodies (aka, Wikipedia takes over everything) Brion Vibber Wikimedia Foundation, Inc. MySQL Users Conference Santa Clara, CA April.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
AKAMAI Content Delivery Services AKAMAI Content Delivery Services CIS726 : PRESENTATION Avinash Ponugoti Avinash Ponugoti Nagarjuna Nagulapati Sathish.
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
Efficient Content Distribution on Internet. Who pays for showing a Web page to a user? Receiving side –Users pay to small ISPs, who pay to big ISPs, who.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
IT 210 The Internet & World Wide Web introduction.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
World Wide Web Caching: Trends and Technologys Gerg Barish & Katia Obraczka USC Information Sciences Institute, USA,2000.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
On the use of Reliable Multicast for Content Distribution Vassilis Chatzigiannakis
Scalability at GROU.PS Emre Sokullu. Disclaimer We’re not fully there yet We hire:
Internet Real-Time Laboratory Arezu Moghadam and Suman Srinivasan Columbia University in the city of New York 7DS System Design 7DS system is an architecture.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
By Ruizhe Ma, Avinash Madineni Sidoine Lafleur Kamgang Nov,
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
Microsoft Cloud Solution.  What is the cloud?  Windows Azure  What services does it offer?  How does it all work?  How to go about using it  Further.
Web Cache. What is Cache? Cache is the storing of data temporarily to improve performance. Cache exist in a variety of areas such as your CPU, Hard Disk.
The Site Architecture You Can Edit Varnish Mobile? Ryan Lane Wikimedia Foundation Membase? Swift.
Wikimedia architecture Ryan Lane Wikimedia Foundation Inc.
MirrorManager: The Fedora Mirror System Matt Domsch Fedora Mirror Wrangler Linux Technology Strategist Office of the CTO Dell, Inc.
1 Distributed DNS best practices to build redundant, reliable & scalable DNS architecture By Ladislav Vobr SE/SOP/I&eS Etisalat, UAE.
Chapter 1 Getting Started with ASP.NET Objectives Why ASP? To get familiar with our IDE (Integrated Development Environment ), Visual Studio. Understand.
Web Programming Language
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
REPLICATION & LOAD BALANCING
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
High Availability Linux (HA Linux)
Ad-blocker circumvention System
Practical Censorship Evasion Leveraging Content Delivery Networks
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
1. Public Network - Each Rackspace Cloud Server has two networks
Azure-Powered Augmented Reality Storytelling Platform for Kids Makes Learning Adaptive, Fun “Azure and its associated storage, content delivery, and virtual.
SUBMITTED BY: NAIMISHYA ATRI(7TH SEM) IT BRANCH
Processes The most important processes used in Web-based systems and their internal organization.
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Internet Networking recitation #12
Internet Applications
Distributed Content in the Network: A Backbone View
Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.
Packet Switching To improve the efficiency of transferring information over a shared communication line, messages are divided into fixed-sized, numbered.
Architecture of Large-Scale Websites
The Only Digital Asset Management System on Microsoft Azure, MediaValet Is Uniquely Equipped to Meet Any Company’s Needs MICROSOFT AZURE ISV PROFILE: MEDIAVALET.
Data Security for Microsoft Azure
Multimedia and Networks
Edge computing (1) Content Distribution Networks
Keep Your Digital Media Assets Safe and Save Time by Choosing ImageVault to be Your Digital Asset Management Solution, Hosted in Microsoft Azure Partner.
Domain Name System Refs: Chapter 9 RFC 1034 RFC 1035.
Engineering a Content Delivery Network
Computer Networks Primary, Secondary and Root Servers
LOAD BALANCING INSTANCE GROUP APPLICATION #1 INSTANCE GROUP Overview
Client-Server Model: Requesting a Web Page
Web Servers (IIS and Apache)
AKAMAI Content Delivery Services
Engineering a Content Delivery Network
Yale Digital Conference 2019
Presentation transcript:

Wikimedia architecture Ryan Lane Wikimedia Foundation Inc.

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Overview Intro Our technical operations Global architecture Application servers Caching Storage Load balancing Content Delivery Network (CDN)

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Top five worldwide sites

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Our operations Currently managed by ~6 ops engineers Historically ad-hoc, “fire fighting mode” Technical staff spread out globally Always someone awake...but no on-call Working to engage community operations contributors

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Operations communication Most communication public on IRC Documentation in a wiki ( Sensitive communication via private lists and resource trackers

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Architecture: LAMP...

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture...on steroids.

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture The Wiki software All Wikimedia projects run on a MediaWiki platform Designed primarily for Wikimedia sites Very scalable, very good localization Open Source PHP software (GPL)

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture MediaWiki optimization We try to optimize by... not doing anything stupid caching expensive operations focusing on the hot spots in the code (profiling!) If a MediaWiki feature is too expensive, it doesn’t get enabled on Wikipedia

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture MediaWiki caching Caches everywhere Most of this data is cached in Memcached, a distributed object cache

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture MediaWiki profiling

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Core databases One master, many replicated slaves Load balanced reads to slaves, write operations to master Separate database per wiki Separate big, popular wikis from smaller wikis (sharding)

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Squid caching Caching reverse HTTP proxy Serves most of our traffic Split into three groups Hit rates: 85% for Text, 98% for Media, since the use of CARP

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture CARP

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Squid cache invalidation Wiki pages are edited at an unpredictable rate Users should always see current revision Invalidation through expiry times not acceptable Purge implemented using multicast UDP based HTCP protocol

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Varnish Caching Currently used for delivering static content such as javascript and css files (bits) Will eventually replace the Squid architecture 2-3 times more efficient than Squid

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Media storage Current solution is not scalable Looking at a number of solutions Would love help finding open source solution

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Thumbnail generation stat() on each request is too expensive, so assume every file exists If a thumbnail doesn’t exist, ask the application servers to render it

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Load Balancing: LVS- DR Linux Virtual Server Direct Routing mode All real servers share the same IP address The load balancer divides incoming traffic over the real servers Return traffic goes directly!

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Content Distribution Network (CDN) 2 clusters on 2 different continents: Primary cluster in Tampa, Florida Secondary caching-only cluster in Amsterdam Adding a new datacenter in Virginia, soon

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Geographic Load Balancing Most users use DNS resolver close to them Map IP address of resolver to a country code Deliver CNAME of close datacenter entry based on country Using PowerDNS with a Geobackend

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Closing notes We rely heavily on open source Always looking for efficiencies Looking for more efficient management tools Looking for more contributors

Ryan Lane, Wikimedia Foundation Inc. Wikimedia architecture Questions, comments? Ryan Lane IRC: Freenode, #wikimedia-tech