GENERAL SCALABILITY CONSIDERATIONS

Slides:



Advertisements
Similar presentations
JQuery MessageBoard. Lets use jQuery and AJAX in combination with a database to update and retrieve information without refreshing the page. Here we will.
Advertisements

Chapter 4 Infrastructure as a Service (IaaS)
HTML FORMS
WEB HOSTING. WHAT IS WEB HOSTING? A web host is a company with several computers that are connected to the internet at all times. The computers they have.
Maxim Zhvirblya EPAM Systems © 2013 Or make MSSQL breathe easily RBS and Blob Cache in SharePoint 2013.
By: Chris Hayes. Facebook Today, Facebook is the most commonly used social networking site for people to connect with one another online. People of all.
Distributed components
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Toolbox Mirror -Overview Effective Distributed Learning.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Online Magazine Bryan Ng. Goal of the Project Product Dynamic Content Easy Administration Development Layered Architecture Object Oriented Adaptive to.
LCT2506 Internet 2 Data-driven web sites Week 5. LCT2506 Internet 2 Current Practice  Combining web pages and data stored in a relational database is.
Firefox 2 Feature Proposal: Remote User Profiles TeamOne August 3, 2007 TeamOne August 3, 2007.
Microsoft ® Official Course Developing Optimized Internet Sites Microsoft SharePoint 2013 SharePoint Practice.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Sitefinity Performance and Architecture
Client/Server Architectures
Lecture 7 Page 1 CS 236 Online Password Management Limit login attempts Encrypt your passwords Protecting the password file Forgotten passwords Generating.
AJAX Chat Analysis and Design Rui Zhao CS SPG UCCS.
Databases and the Internet. Lecture Objectives Databases and the Internet Characteristics and Benefits of Internet Server-Side vs. Client-Side Special.
Networked File System CS Introduction to Operating Systems.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Part 04 – Preparing to Deploy to the Cloud Entity Framework and MVC Series Tom Perkins NTPCUG.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Open Search Office Web Services Database Doc Mgt Sys Pipeline Index Geospatial Analysis Text Search Faceting Caching Query parsing Clustering Synonyms.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
® IBM Software Group © 2007 IBM Corporation Best Practices for Session Management
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
{ Cloud computing. Exciting and relatively new technologies allow computing to be a part of our everyday lives. Cloud computing allows users to save their.
Saving State on the WWW. The Issue  Connections on the WWW are stateless  Every time a link is followed is like the first time to the server — it has.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Cloud Computing Computer Science Innovations, LLC.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
CACHING TO IMPROVE PERFORMANCE
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Google App Engine. Contents Overview Getting Started Databases Inter-app Communications Modes.
Unit 3 Virtualization.
Amazon Web Services (aws)
Scaling Network Load Balancing Clusters
Instructor: Ahmed Jafer
Dedicated Servers vs Cloud Hosting
Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.
Time is the enemy: Ten Core Lessons for Achieving Peak
CSE-291 Cloud Computing, Fall 2016 Kesden
Database application MySQL Database and PhpMyAdmin
CHAPTER 3 Architectures for Distributed Systems
Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
Domain Name System (DNS)
Internet Networking recitation #12
Chapter 16: Distributed System Structures
Cookies BIS1523 – Lecture 23.
Arrested by the CAP Handling Data in Distributed Systems
SpiraTest/Plan/Team Deployment Considerations
Building Web Applications
Configuring Internet-related services
AWS Cloud Computing Masaki.
HTTP and Abstraction on the Internet / The Need for DNS
Software System Testing
Four Rules For Columnstore Query Performance
Computer Networks Primary, Secondary and Root Servers
Lecture 34: Testing II April 24, 2017 Selenium testing script 7/7/2019
Client/Server Computing and Web Technologies
Yale Digital Conference 2019
Presentation transcript:

GENERAL SCALABILITY CONSIDERATIONS

Overview of scalability As the number of users grows, maintain: – Low latency – High throughput – High reliability

Latency Latency = total time between when an operation is initiated and when the operation completes Latency (measured in seconds) Responsiveness (measured in seconds)

Throughput Throughput = number of operations completed per unit time Web page Web server Web page 10/min 4/min 2/min Throughput: 32/minute

Reliability Reliability = percentage of operations successfully completed Web page Web server Web page 1/10 failure 0/10 failure 2/4 failure 0/2 failure Reliability: 29/32 = 90%

Scalability Scalability means that even when the number of users grows into the thousands or millions, your website still maintains – Low latency – High throughput – High reliability

Very rough reasonable goals Reasonable # "simultaneous" users LatencyThroughputReliability One single-core server Hundreds or maybe thousands Low hundreds of milliseconds A few hundred operations per second 99% One multi-core server Thousands or tens of thousands Around 100 milliseconds Thousands of operations per second 99% A cluster of a few multi-core computers Tens or hundreds of thousands Under 100 milliseconds Tens of thousands of operations per second 99.99% A small datacenter with a few dozen multi-core computers MillionsA few dozen milliseconds (assuming a great network connection) Hundreds of thousands of operations per second %

Techniques to improve scalability Minimal size messages Minimal number of messages Minimal amount of computation Local computation Replication Aggressive caching Aggressive indexing

Minimal size of messages When client-server communicate… – Only send data needed at that moment – Use a concise data format (i.e., probably JSON) For example, suppose that an app needed to retrieve a list of courses in response to a query in order to show a list of links –

Option #1 565 bytes CS 361 cscaffid Intro to SE Blah blah blah blah blah blah blah blah blah CS 494 cscaffid Web development Blah blah blah blah blah blah blah blah blah CS 496 cscaffid Cloud+Mobile development Blah blah blah blah blah blah blah blah blah

Option #2 108 bytes [{n:"CS361",t:"Intro to SE"}, {n:"CS494",t:"Web development"}, {n:"CS496",t:"Cloud+Mobile development"}] 1.Combine fields if appropriate (e.g., dept and number) 2.Omit fields if not needed (e.g., description) 3.Shorten field names if appropriate (e.g., n and t) 4.Use JSON if feasible 1.Combine fields if appropriate (e.g., dept and number) 2.Omit fields if not needed (e.g., description) 3.Shorten field names if appropriate (e.g., n and t) 4.Use JSON if feasible

Minified JS and CSS Online services for squeezing the whitespace and other wasted characters out of your JS – Search for JS "minifier" or "minimizer" – E.g., Ditto for CSS – E.g.,

Minimal number of messages Eliminate unnecessary messages – E.g., eliminate unnecessary images from UI Combine messages if feasible – E.g., if you need to query CS and ECE courses, design server to handle both queries at once Defer messages if feasible – E.g., give the user the option to defer logging in until it’s absolutely necessary

Minimal amount of computation Avoid "feature bloat" – Only implement the features you need – This also will enhance usability! Avoid blithely copy-pasting code – E.g., It's simplest to do certain things at the top of every web page in your site (send JS, open db) even when each page doesn't actually need this

Minimal amount of computation Use the right data structures – E.g., If you need to use an associative array, then use an associative array Use the right APIs – E.g., There is an AJAX API for retrieving JSON as an object – don't try to write such an API yourself Your version will be buggy and slow!

Minimal amount of computation Retrieve only the data you need – E.g., if you need one row, use a WHERE clause in SQL (rather than retrieving all rows & looping) Looping just creates unnecessary computation! Use SQL aggregate functions when practical Duh

Local computation If a computation uses a very large amount of data, then move the computation to the data, instead of the data to the computation. Example: Find city with maximal rainfall in US Option #1: – Server sends rainfall for 4500 cities to browser – Browser loops through cities to choose maximum Option #2: – Server loops through cities to choose maximum – Server sends just the maximum to the browser

Replication Make copies of your computation and data Web page Web server 10/min 4/min 2/min Throughput: 32/minute Web server 10/min 4/min 2/min Web server

Replication You also can replicate your database Web server Database Databases can be configured to automatically "mirror" contents

Shopping for a hosting service When leasing space from a "hosting service" – You pay them $X per month – They let you use Y machines If you want replication, look for… – Load balancing: automatic routing of traffic evenly across the machines you lease – Mirroring: automatic copying of data updates from one server to another ("master/slave") – Failover: automatic routing (and restart) around machines that crash

Learning about replication If you really want to get your hands dirty with the details of replication… – CS496: Mobile + Cloud Software Development – CS440: (Advanced) Database Management

Aggressive caching & indexing Caching: If a computation or transmission is expensive, then do it once, save the result, and reuse the result later Indexing: If you have lots of data, create a data structure that makes it easier to find the data These will each be covered by a whole lecture