Lessons Learnt Developing Web Applications

Slides:



Advertisements
Similar presentations
File: ebusiness_ref.PPT 1 Yogi Schulz e-Business Projects High Performance Characteristics Reference Section 15 Copyright © 2002 by Corvelle Management.
Advertisements

Introduction to DBA.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Operating Systems CS451 Brian Bershad
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Chapter 8 Operating System Support
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved The Operating System Machine.
Modified from Silberschatz, Galvin and Gagne ©2009 CS 446/646 Principles of Operating Systems Lecture 1 Chapter 1: Introduction.
©Brooks/Cole, 2003 Chapter 7 Operating Systems Dr. Barnawi.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Operating Systems: Principles and Practice
Supervisor: Hadi Salimi Abdollah Ebrahimi Mazandaran University Of Science & Technology January,
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Running Your Database in the Cloud Eran Levin VP R&D - Xeround.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Types of Operating System
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Operating Systems Who’s in charge in there?. Types of Software Application Software : Does things we want to do System Software : Does things we need.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction n What is an Operating System? n Mainframe Systems.
Chapter 1. Introduction What is an Operating System? Mainframe Systems
Ali YILDIRIM Emre UZUNCAKARA
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
© Pearson Education Limited, Chapter 16 Physical Database Design – Step 7 (Monitor and Tune the Operational System) Transparencies.
Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.
OSes: 1. Intro 1 Operating Systems v Objectives –introduce Operating System (OS) principles Certificate Program in Software Development CSE-TC and CSIM,
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
Architectures of distributed systems Fundamental Models
Operating System Concepts Chapter One: Introduction What is an operating system? Simple Batch Systems Multiprogramming Systems Time-Sharing Systems Personal-Computer.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
Module 10: Maintaining High-Availability. Overview Introduction to Availability Increasing Availability Using Failover Clustering Standby Servers and.
Chapter 7 Operating Systems. Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the.
A Summary of the Distributed System Concepts and Architectures Gayathri V.R. Kunapuli
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
High Availability in DB2 Nishant Sinha
Review of Computer System Organization. Computer Startup For a computer to start running when it is first powered up, it needs to execute an initial program.
The History of Clustering. What is computer clustering? Computer clustings is when a group of computers are linked together operating as one, sharing.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
 PROCESS MANAGEMENT  A process is a program in execution: (A program is passive, a process active.)  A process has resources (CPU time, files) and.
SQL Advanced Monitoring Using DMV, Extended Events and Service Broker Javier Villegas – DBA | MCP | MCTS.
Managing Multi-User Databases
ALICE Monitoring
William Stallings Computer Organization and Architecture
TYPES OF SERVER. TYPES OF SERVER What is a server.
Chapter 1: Introduction
RAID RAID Mukesh N Tekwani
湖南大学-信息科学与工程学院-计算机与科学系
Chapter 15 – Part 1 The Internal Operating System
Near Real Time ETLs with Azure Serverless Architecture
Operating System Concepts
Operating Systems CSE451 Winter 2000
Operating Systems : Overview
Subject Name: Operating System Concepts Subject Number:
Operating Systems CSE451 Spring 2000
Operating Systems CS451 Spring 1998
LO2 – Understand Computer Software
Operating Systems CS451 Fall 1998
RAID RAID Mukesh N Tekwani April 23, 2019
Machine-Independent Operating System Features
Operating System Overview
Operating System Concepts
Lecture Topics: 11/1 Hand back midterms
Setting up PostgreSQL for Production in AWS
Presentation transcript:

Lessons Learnt Developing Web Applications Satyadeep Musuvathy Architect, Yahoo!

Balance Balance between System, Data and Operations If even one of the legs is in-correct, the stool tends to topple If even one of the legs is not balanced, the stool will topple. Balance between System, Data and Operations

Systems Now let’s talk about systems

Systems Evolve Design for evolution The fundamental idea is to design for “Evolution”. There will always be one more feature!  Not only do systems evolve, but teams do too. You want to make it easier for new people to join in and reduce the amount of harm individual errors can bring to the system. There will always be one more “feature” 

Designing For Evolution Have clear separation of concerns Your web page, Sir I canz do business API, Savvy? Separate the “web” part from the “business logic” for the application. Most solutions will have mechanisms for humans (Web Apps) and machines (Web-Service/API) to interact with the data. One way to test this is to see that they are running of the same implementation of business logic. In fact, formalize the interface between the “web” part and business logic. You should be able to run them on two different machines with minor changes (say adding a marshalling layer). This will allow you to evolve each part independently as needs change. Note: you might start with both on the same machine, but unless you design it this way, difficult to scale later. Taking this idea further, one option is to get the business layer to server out JSON or XML data and build a rich Ajax based front end on that data. GMail is a prominent example of this architecture. Look at "OData" as an emerging standard to help with this in a more formalized manner. Separate Web and API interaction from the business logic

Designing For Evolution Manage State Carefully State A State B State C Memory is a “scarce” resource “State-full” Web applications are "statefull". Worry about the amount of information in one session. This more than anything will limit the number of "sessions" one can support concurrently on a given host. Remember most users will just close the browser instead of "logging out". This will mean that most session are reaped by timeout and hence will consume resource (read memory) long after the user has gone away. In fact the most scarce resource you have to manage is not "CPU", but actually memory. This will mean that you could get more millage by running more "UI" centric processes in dedicated boxes with far fewer business layers servers service both the UI and API layers. This is one of the reasons why process-per-request model systems like PHP etc., are very good, because you can run a lot of these processes on a machine and when the reques is done, you can just throw the memory away when the process dies. This also means that you will have to keep the session data in an external store and not worry about memory overhead of the same. You can also do the external session store on Java based systems, but not the default. Web Applications are “State-full”. Worry about “OOM”

Designing For Evolution Aggressively differentiate sync and a-sync jobs Vs. Differentiate aggressively between stuff that "has to be done now“ within the synchronous flow of a request, versus stuff that can be scheduled for later (asynchronous processing). You can then scale for just the synchronous part of the system independent of the other stuff, which can run on more dedicated batch processing stacks like Hadoop etc. The more stuff you can do as asynchronous operations, the better you can scale your infrastructure and the higher will be the utilization for the same. Synchronous traffic is highly spiky, and you want to scale a smaller set of hardware for this as opposed to the rest of the system. Asynchronous Synchronous Design and scale the synchronous aspects separately from the asynchronous jobs – Not all operations need to be synchronous

Data

Application Data Most Systems are I/O bound Next to memory, the next scarce resource is I/O bandwidth. In fact a major part of "perceived" application performance is I/O throughput. Define your data flows accordingly. Small things like compressing the data between the web-app and the browser will have drastic impact on perceived performance of your application. This also means that it is cheaper to perform more computation closer to the data and then forward the smaller result sets for upstream processing that sending more data up and trying to filter or compute at the front-end server layer. In most cases I/O throughput defines “perceived” performance

Application Data Make data “Shardable” Design your databases so that it is 'shardable'. You don't have to do this from day 1, but unless you plan for it, very difficult to do later. This will become important as your data starts outgrowing capabilities of a single instance. This will also allow you to scale the data layer independent of the other layers in the stack. Worry about your index behavior at the database level. Database systems like oracle etc will dynamically change the query plans based on the perceived load. This can really bite you unless you look out for it. Typical solutions involve either adding addition index's or hints to your queries to pick the correct set of indexes. This will especially happen on complex join related queries. “Shardable” data will allow you to scale out your data demands as the application grows.

Application Data Consider multiple stores for data Grid Database Think of possibilities of having different stores of data for different needs. For example, you might have your code information in an RDBMS database, but for large analysis or batch jobs, it might be cheaper to ship a 'flattened' copy over to a hadoop cluster to process it rather than loading the database. On the same note, If you have multiple kinds of applications working on related data sets, see if you can partition the processing across clusters of database servers. Only makes sense if you have huge data sets being worked on. “Divide and Conquer” Consider shipping copy of the data to Grid or dedicated machines for batch or “secondary” tasks.

Operations This is frequently under-rated, but unless you have an idea of how your current system is behaving, it will be very difficult to know what parts needs scaling or are under performing.

Operations Utilization is very spiky Plan for peak loads, but try to distribute processing over time to minimize over-provisioning. Remember, traffic usage patterns are very spiky. It is not uncommon for the high's to be 10x or more than the normal. You need to plan for the high's. But if you just plan for 10x hardware across the board this will be very in-efficient. This is why you need to plan for scaling each layer independently to get better return on investment. For example, you might notice that there are certain days in your week or certain weeks in a month when the usage is really high. You might want to plan your data intensive operations around these times so as to better spread the load on your data and business logic layers.

Operations Constantly monitor systems Constantly monitor the system for CPU, Memory, Disk and I/O Have system “raise” events for critical issues rather then parsing log files. Have mechanism to look at CPU, I/O, disk space, memory utilization etc. Far cheaper to act to changes if you have an idea of your growth and usage profile rather than react on every outage.

Operations Have a failover plan A good failover mechanism is your friend. Have a failover process and test it! An untested failover is not a failover :-) Plan and “TEST” backup systems. Look for and prevent domino effects of failure

Thank You! Q & A