How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

PCT303 – Content Publishing in SharePoint Eugene Rosenfeld Black Blade Associates
Implementing Tableau Server in an Enterprise Environment
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
Chapter 4 Infrastructure as a Service (IaaS)
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
SQL Server Replication
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
2 June 2015 © Enterprise Storage Group, Inc. 1 The Case for File Server Consolidation using NAS Nancy Marrone Senior Analyst The Enterprise Storage Group,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Oracle Application Express Architecture
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.
WDK Driver Test Manager. Outline HCT and the history of driver testing Problems to solve Goals of the WDK Driver Test Manager (DTM) Automated Deployment.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Copyright © 2002 Wensong Zhang. Page 1 Free Software Symposium 2002 Linux Virtual Server: Linux Server Clusters for Scalable Network Services Wensong Zhang.
Platform as a Service (PaaS)
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Sitefinity Performance and Architecture
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Cross Platform Mobile Backend with Mobile Services James
1 Oracle 9i AS Availability and Scalability Margaret H. Mei Senior Product Manager, ST.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
What makes Facebook do what it does? By Gavin Mais.
Lecture 8 – Platform as a Service. Introduction We have discussed the SPI model of Cloud Computing – IaaS – PaaS – SaaS.
© 2011 Cisco All rights reserved.Cisco Confidential 1 APP server Client library Memory (Managed Cache) Memory (Managed Cache) Queue to disk Disk NIC Replication.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Meet with the AppEngine Márk Gergely eu.edge. What is AppEngine? It’s a tool, that lets you run your web applications on Google's infrastructure. –Google's.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
IPS Infrastructure Technological Overview of Work Done.
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
noun ; Software Defined Enterprise/SDE/ The enterprise who leverages software to flank their traditional business offerings, or to create entirely new.
Building Cloud Solutions Presenter Name Position or role Microsoft Azure.
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
Narasimha Reddy Gopu Jisha J. Agenda Introduction to AlwaysOn * AlwaysOn Availability Groups (AG) & Listener * AlwaysOn Failover * AlwaysOn Active Secondaries.
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Presented by Alexey Vedishchev Developing Web-applications with Grails framework American University of Nigeria, 2016 Intro To MVC Architecture.
Platform as a Service (PaaS)
Data Services for Service Oriented Architecture in Finance
Build /26/2018 6:17 AM Building Resilient, Scalable Services with Microsoft Azure Service Fabric Érsek © 2015 Microsoft Corporation.
Platform as a Service (PaaS)
Platform as a Service (PaaS)
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
N-Tier Architecture.
Docker Birthday #3.
Maximum Availability Architecture Enterprise Technology Centre.
Platform as a Service.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
CHAPTER 3 Architectures for Distributed Systems
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Dev Test on Windows Azure Solution in a Box
Big Data - in Performance Engineering
Lecture 1: Multi-tier Architecture Overview
OPS-7: Building and Deploying a Highly Available Application
Windows Azure Hybrid Architectures and Patterns
Running C# in the browser
Presentation transcript:

How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer Web

In some situations, failure is not an option

Apps as well as infrastructure need to fail…

Apps as well as infrastructure need to fail…gracefully

Sometimes apps and infrastructure don’t fail gracefully

Well, that’s why we have jobs

About WebMD Technology WebMD, Medscape, MedicineNet, eMedicine, UK cobrand Serving nearly 1.2 Billion Pageviews a month, 136 million unique visitors Running over 200 separate applications, vast majority in- house developed Environments: Dev/Devint, QA01/02, QA00, Production/DR Two main data centers, geographically diverse OS: mix of Linux and Windows Datastores: Sql Server, Oracle, Mongo, Vertica and mysql Web: mix of Apache and IIS App: mix of Tomcat, ASP,.Net 2.x - 4.x Service: ActiveMQ, Memcache, Couchbase

Anatomy of a Request

What We Had Architecture – 9 Years Old – Key Assumption – Everything Will Change Presentation Data Rendering Business Logic Data Source – Provider Model Very Stable and Robust

Before The Change Widget Page Widget Data Provider XSLT Provider File Repository NAS Request DCTMAssetProvider DB

Two Problems Change in Architecture Change in Infrastructure NAS Single Point of Failures Switchover To backup took finite Time Scaling Up required more investment in Expensive NAS hardware The Solution DB Scaling Up – still had Single Point of failure Site non functional if DB is not available The Solution

We formed a team from Dev, Ops, QA and Project Management

Phases Phase I -Reduce NAS Access Phase II - Database Scaling Phase III – Change Driven Push Caching

NAS Problem Problem Constraint: – Still need a proven storage method – Ubiquitous protocol Solution Constraint: – 200 apps to update …or not – Use NAS as a backup method

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else Remember, we have 200 apps to update

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else Remember, we have 200 apps to update Looked at Scality, Cassandra, MongoDB, Couchbase

What can replace a NAS Filestore? Does the solution need to provide SMB / NFS interface? Can we use something else Remember, we have 200 apps to update Looked at Scality, Cassandra, MongoDB, Couchbase

Integrating Couchbase Can the architecture Support It? How difficult would it be to do? Will it disrupt the planned business roadmap? How are we going to do this? How do we Minimize the Risk?

Key Development Goals Transparent Change to Existing Architecture No Rewrite of Business Rules Backward Compatible Performance Same or Better Reliability and Scalability Better than Before Short time to Execute, Test and Deploy

NAS SQL Server Page Request Widget Data Provider XSLT Provider File Repository DCTMAssetProvider Persistent Data Provider File Data Provider Couch Data Provider File With Couch Data Provider Couchbase

Controlling the Cache :

Couchbase Design Cache Key – File Name – Too Long –MD5 Hash Data Storage – Binary – Can Store Other Objects – Performance – No Query Requirements

Filling up the Cache Two Choices – The Runtime Web Application could populate cache after the initial fetch from NAS – The Publishing System could push into the Couchbase

Filling the Cache Publishing System NAS SQL Server Couchbase

Why we Choose This No Additional load on the Runtime System. Performance is not impacted Do not have to deal with synchronizing multiple writes Ties in with our Phase III goals

Reliability Testing Negative testing – No NAS – No Couchbase – No Database – Combinations of above

Testing Results NAS Failure system runs at 20% better performance, no downtime or user impact Couchbase failure – system runs at performance before architectural changes. No downtime or user impact DB Failure – Site is impacted

What We Were Able to Achieve Performance Improvement – 20% NAS Access Reduced – 87% Reliability – Multiple Levels – Local Cache – Central Cache – NAS Scalability – Ability to Add Servers without requiring additional NAS hardware

Phases Phase I -Reduce NAS Access  Phase II - Database Scaling Phase III – Change Driven Push Caching Phase IV – Reduce Database Access

DB Problem Problem Constraints: – Improve Availability – Improve Scalability Solution Constraints: – No code updates. Needs to “just work”

Like changing the tires while the bus is moving

* My * DB Problem is easy to solve

Data Access Pattern “narrow-read” workload – many copies on many nodes Database workload does not exceed server hardware All data is read only Load balancing works better than clustering in this pattern

DB Solution

What’s Next Phase I -Reduce NAS Access Phase II - Database Scaling  Phase III – Change Driven Push Caching

The Cache Based on Polling Time Intervals on the Web Application Polling Queues are too longs at times Web Servers not in synch

How We are Doing It Move Cache Refresh to be Changed Based Changes Notification to Web App Web App Refreshes Only those caches which actually have changed

What it going to look like Publishing System NAS SQL Server Couchbase Message Queue Web Server

Was it worth it? Fixed DB problem by using Peer to Peer replication and Load Balancing – no code changes Fixed NAS problem by adding caching layer to reduce calls to NAS Fixing cache pull model with push model for the content publishing system – reduces publishing times to all web servers in seconds and all web servers have the same cached content Serving content is now faster – less latency Able to virtualize the web servers

Questions?