CWIC Developers Meeting January 29 th 2014 Calin Duma Service Level Agreements High-Availability, Reliability and Performance.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

First create and sign up for a blue host account Through the help of Blue Host create a WordPress website for the business After you created WordPress.
Performance and Reliability 101 Brent Cromarty Ping Identity
CWIC Start OpenSearch prototype January 28 th 2014 Calin Duma CWIC and GCMD OpenSearch Implementations.
Service Design – Section 4.4 – Availability Management.
CLOUD COMPUTING AN OVERVIEW & QUALITY OF SERVICE Hamzeh Khazaei University of Manitoba Department of Computer Science Jan 28, 2010.
SLA Basics Describes a set of non functional requirements of the service. Example : RTO time – Return to Operation Time if case of failure SLO – Service.
Task Scheduling and Distribution System Saeed Mahameed, Hani Ayoub Electrical Engineering Department, Technion – Israel Institute of Technology
Measuring Performance Chapter 12 CSE807. Performance Measurement To assist in guaranteeing Service Level Agreements For capacity planning For troubleshooting.
The Bio-Networking Architecture: An Infrastructure of Autonomic Agents in Pervasive Networks Jun Suzuki netresearch.ics.uci.edu/bionet/
What Can You do With BTM? Business Transaction Management touches the following disciplines:  Performance Management  Application Management  Capacity.
1 © 2006 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Presentation Using the Cisco Technical Support.
1 Copyright 2008 MySQL AB The World’s Most Popular Open Source Database MySQL Enterprise for SaaS and Managed Hosting Providers Jimmy Guerrero Sr Product.
Improving cooperation and quality in deliveries from Primary statistics (PS) to National Accounts (NA) Roger Pettersson Statistics Sweden.
WGISS-37 Meeting Tuesday April 15 th 2014, 1:00PM CWIC-Smart A generic OpenSearch client Calin Duma Doug Newman
DNN Performance & Scalability Planning, Evaluating & Improving : Part 2.
Slide: 1 CWIC Status Report Yonsook Enloe WGISS-39, Tsukuba May 13, 2015.
WGISS-37 Meeting Tuesday April 15 th 2014, 1:00PM CWIC-Smart and CWIC-Start Testing Calin Duma Doug Newman
Quality Data: Fresno State's Analytics Strategy Rob Robinson Web Developer for Fresno
Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian
OFC 200 Microsoft Solution Accelerator for Intranets Scott Fynn Microsoft Consulting Services National Practices.
Page 1 CSISS Center for Spatial Information Science and Systems Design and Implementation of CWIC Metrics Weiguo Han, Liping Di, Yuanzheng Shao, Lingjun.
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Service Transition & Planning Service Validation & Testing
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal MINCyT,
2015 CWIC Developers Meeting February 19 th 2015 Calin Duma Doug Newman Service Level Agreements High-Availability,
The New Enterprise Manager: End to End Performance Management of Oracle Solutions Julie Wong Principal Product Manager Arsalan Farooq Senior Development.
Data Tagging Architecture for System Monitoring in Dynamic Environments Bharat Krishnamurthy, Anindya Neogi, Bikram Sengupta, Raghavendra Singh (IBM Research.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Developer TECH REFRESH 15 Junho 2015 #pttechrefres h Understand your end-users and your app with Application Insights.
What is CWIC? Authors: Doug Newman Andrew Mitchell
Page 1 CSISS Center for Spatial Information Science and Systems CWIC Metrics: Current and Future Weiguo Han, Liping Di, Yuanzheng Shao, Lingjun Kang Center.
Visual Studio Windows Azure Portal Rest APIs / PS Cmdlets US-North Central Region FC TOR PDU Servers TOR PDU Servers TOR PDU Servers TOR PDU.
© 2006, National Research Council Canada © 2006, IBM Corporation Solving performance issues in OTS-based systems Erik Putrycz Software Engineering Group.
CWIC Developers Meeting January 28 th 2014 Calin Duma CSW and OpenSearch from the CWIC Start client perspective.
Slide: 1 CWIC Status Report Yonsook Enloe WGISS-40, Harwell UK Oct 1, 2015.
Cloud Computing is a Nebulous Subject Or how I learned to love VDF on Amazon.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
July 28, 2004WSRF Technical Committee F2F meeting1 WSRP leveraging WSRF Use case for Portlets as WS-Resources.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
CI R1 LCO Review Panel Preliminary Report. General Comments –Provide clear definition of the goals of the phase (e.g. inception), the scope, etc. in order.
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
1 © 2004 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Cisco Technical Support Seminar Using the Cisco Technical Support Website.
1 Presented by: Val Pennell, Test Tool Manager Date: March 9, 2004 Software Testing Tools – Load Testing.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal.
What is Google Analytics?
PIWIK JUNIOR TIDAL ASSOCIATE PROF., WEB SERVICES & MULTIMEDIA LIBRARIAN NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY.
Software Architecture in Practice
CWIC Status Report Yonsook Enloe yonsook. k.
Discover How Your Business Can Benefit from a Facebook Fanpage
Discover How Your Business Can Benefit from a Facebook Fanpage
WGISS Connected Data Assets Sept 26, 2017 Yonsook Enloe
LECTURE 34: WEB PROGRAMMING FOR SCALE
Storage Virtualization
WGISS Connected Data Assets April 5, 2017 Yonsook Enloe
LECTURE 32: WEB PROGRAMMING FOR SCALE
LECTURE 33: WEB PROGRAMMING FOR SCALE
WGISS Connected Data Assets April 9, 2018 Yonsook Enloe
INFO 344 Web Tools And Development
CEOS WGISS INTEGRATED CATALOG (CWIC)
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
WGISS Connected Data Assets Client Guide Oct 23, 2018 Archie Warnock
ESIP Winter Meeting 2016 January 2016
CS246: Search-Engine Scale
LECTURE 33: WEB PROGRAMMING FOR SCALE
WGISS Connected Data Assets Session Today
WGISS WGISS Connected Data Assets Status Report October, 2019 CWIC Team Eugene Yu (GMU), Archie Warnock (A/WWW), Li Lin (GMU)
Presentation transcript:

CWIC Developers Meeting January 29 th 2014 Calin Duma Service Level Agreements High-Availability, Reliability and Performance

Agenda What are SLAs Why use SLAs Joint SLOs / SLAs dependencies How to establish joint SLOs / SLAs CWIC Data Providers SLO challenges Initial Sample Approach CWIC Start performance challenges CWIC Start metrics options Joint metrics to consider 2

What are SLAs Service Level Agreements: – Specify service level requirements between a service provider and a service consumer – Often in terms of a legal contract with penalties for non-compliance – Concrete and measurable service level objectives (SLOs) are used to test that SLAs are being met In general there is a recognized gap between the expected service levels and the delivered ones: – Availability : downtime per year (ex 5 minutes translates to an SLO of % uptime) – Reliability : advertised components failure rates, can be mitigated by fault tolerant software and system design – Performance : response time (completion - submission) and throughput (concurrent requests) oriented SLOs Response times increase as throughput increases 3

Why use SLAs CWIC is gaining popularity and is providing potential for excellent exposure of data islands (India, China, Brazil etc.) We should provide better end-user service: – Service consumers know what to expected when using GCMD, CWIC and CWIC Start (and other clients) We should establish SLOs for our applications: – Involves hardware resources, infrastructure platforms (OS, Web Application stack) and custom code – Teams are motivated to work toward agreed upon targets – Can dictate and provide empirical data for future hardware and software needs 4

Joint SLOs / SLAs dependencies CWIC Start depends on GCMD and CWIC CWIC depends on GCMD and 5 providers: – NASA, INPE, GHRSST, USGSLSI and CCMEO In order to have availability, reliability and performance SLOs we would have to coordinate among 8 components: 1.CWIC Start 2.GCMD 3.CWIC 4.NASA / ECHO 5.INPE 6.GHRSST 7.USGSLSI 8.CCMEO If any of the above components are down or slow the end- user will be subject to a sub-optimal experience Complexity will increase when more providers are added 5

How to establish joint SLOs / SLAs While usage of our services is free it doesn’t mean that we can’t provide a reasonable user experience and set realistic user expectations True joint SLOs / SLAs would be at most the SLO / SLA of the weakest component and therefore not desirable CWIC, GCMD, CWIC Start and ECHO can work together on joint SLOs / SLAs CWIC can collect existing provider SLAs where applicable or help providers think about SLAs 6

CWIC data providers SLOs challenges Similar to ECHO’s challenges of dealing with its 11 data partners ECHO model is something we can learn from: – Provide individual availability notices on the CWIC WGISS home page – If providers do not communicate down times or availability, collect statistics with monitoring technologies / APIs – Collect CWIC Start and CWIC metrics that can capture current SLOs for all external dependencies* 7

Initial Sample Approach 8

CWIC Start Performance Challenges CWIC Start had performance issues due to: – A distributed search memory leak – Inconsistent OpenLayers maps rendering – Potential memory leak due to high load generated by search bots It was and still is very challenging to pinpoint performance problems due to: – Ruby on Rails running on top of jRuby and difficulties of using memory profilers that point to the actual ruby code – Clustered / load balanced deployment and requests from the same user being serviced on different hosts – Difficulties in collecting host level performance metrics such as free physical memory, swap utilization, CPU and network IO 9

CWIC Start metrics options We are investigating Real User Monitoring Metrics (RUM) that capture the user browser experience: – Google Analytics (~26 subjects with hundreds of dimensions / specific descriptive attributes) – W3C Navigation Timing to complement GA – New Relic: excellent back end code instrumentation targeting SLAs and detailed performance metrics We added semantic logging and detailed durations to make it easy to trace requests on a cluster: Example: [813d9f df507a10eac] [ ] Started GET "/datasetssearch?standard=csw" for at :22: Example: HttpRequest.submit RESPONSE, DURATION (uCPU sCPU usCPU real): ( ) 10

Joint metrics to consider CWIC ExtJS application is an excellent start Questions to answer: – Who is using CWIC and GCMD CSW (clientId?) – Who is using CWIC and GCMD OpenSearch (clientId) – Who is using CWIC without GCMD interaction – Granule metadata and data downloads via CWIC provided links – Percentage of direct downloads vs. provider welcome page redirects – Average response times 11

Joint metrics to consider cont. Questions to answer (CSW and OpenSearch): – Number of errors due to provider internal errors – Number of errors due to CWIC internal errors – Number of errors due to provider unavailable – CWIC specific performance metrics per provider – GCMD specific performance metrics – Others? 12