Download presentation
Presentation is loading. Please wait.
Published byDelilah Bryan Modified over 9 years ago
1
2015 CWIC Developers Meeting February 19 th 2015 Calin Duma cv.duma@gmail.com Doug Newman douglas.j.newman@nasa.gov Service Level Agreements High-Availability, Reliability and Performance
2
Agenda Why are we talking about SLAs again? What are SLAs Why use SLAs Joint SLOs / SLAs dependencies How to establish joint SLOs / SLAs CWIC data providers SLOs challenges Initial Sample Approach CWIC Smart Performance Metrics Approach CWIC Smart PROD metrics options Joint metrics to consider 2
3
Why are we talking about SLAs again? Revisit topic discussed last year: – CWIC, GCMD and CWIC-Smart made a lot of progress in terms of spearheading OpenSearch and facet specification compliance – We are individually collecting metrics and have some idea about the service levels that we are currently offering – We need to try to quantify them via individual and common metrics Ongoing CWIC-Smart performance tests: – Ongoing task to quantify CWIC-Smart performance (performance and throughput) under load – Translates into performance under load for CWIC and GCMD – Load is placed from ECHO workload environment to GCMD TEST (gcmddemo) and CWIC TEST (cwictest) – Need ability to infer production metrics based on workload metrics based on: TEST / WORKLOAD environment CPU, RAM and disk and network IO and PROD environments for all components (CWIC-Smart, GCMD and CWIC) Performance expectations for both GCMD and CWIC in each environment 3
4
What are SLAs Service Level Agreements: – Specify service level requirements between a service provider and a service consumer – Often in terms of a legal contract with penalties for non-compliance – Concrete and measurable Service Level Objectives (SLOs) are used to test that SLAs are being met In general there is a recognized gap between the expected service levels and the delivered ones: – Availability : downtime per year (ex 5 minutes translates to an SLO of 99.999% uptime) – Reliability : advertised components failure rates, can be mitigated by fault tolerant software and system design – Performance : response time (completion - submission) and throughput (concurrent requests) oriented SLOs Response times increase as throughput increases 4
5
Why use SLAs CWIC is gaining popularity and is providing world-wide exposure of data islands (India, China, Brazil etc.) We should provide outstanding end-user service: – Service consumers know what to expected when using GCMD, CWIC and CWIC Smart (and other clients) We should establish SLOs for our applications: – Involves hardware resources, infrastructure platforms (OS, Web Application stack) and custom code – Teams are motivated to work toward agreed upon targets – Can dictate and provide empirical data for future hardware and software needs 5
6
Joint SLOs / SLAs dependencies CWIC Smart depends on GCMD OpenSearch and CWIC OpenSearch CWIC depends on GCMD and 5 providers: –NASA, INPE, GHRSST, NOOA-NODC, USGSLSI, EUMETSAT and CCMEO In order to have availability, reliability and performance SLOs we would have to coordinate among 10 components: 1.CWIC Start 2.GCMD 3.CWIC 4.NASA / ECHO 5.INPE 6.GHRSST 7.NOOA-NODS 8.USGSLSI 9.EUMETSAT 10.CCMEO If any of the above components are down or slow the end-user will be subject to a sub-optimal experience Complexity will increase when more providers are added 6
7
How to establish joint SLOs / SLAs While usage of our services is free it doesn’t mean that we can’t provide a reasonable user experience and set realistic user expectations True joint SLOs / SLAs would be the SLO / SLA of the weakest component and therefore not desirable CWIC, GCMD, CWIC Smart and ECHO can work together on joint SLOs / SLAs CWIC can obtain existing provider SLAs where available, create basic ones from request/responses or help providers think about SLAs 7
8
CWIC data providers SLOs challenges Similar to ECHO’s challenges of dealing with its data partners ECHO model is something we can learn from: –Provide individual availability notices on the CWIC WGISS home page –If providers do not communicate down times or availability, collect statistics with monitoring technologies / APIs –Collect CWIC Smart and CWIC metrics that can capture current SLOs for all external dependencies* 8
9
Initial Sample Approach 9
10
CWIC Smart Performance Metrics Approach CWIC Smart objective: –Determine maximum throughput (concurrent requests) that does not decrease the average response time below x milliseconds Challenges: –Headless tests vs. browser tests –Environments where we can map concurrency to CPU / CPU cores otherwise we execute sequential requests –Framework (ror) concurrency: Global Interpreter Lock vs. jRuby JVM scheduled threads (green threads) –Result aggregation and analytics 10
11
CWIC Smart PROD metrics options Real User Monitoring Metrics (RUM): –Google Analytics (~26 subjects with hundreds of dimensions / specific descriptive attributes) –W3C Navigation Timing to complement GA –New Relic: excellent back end code instrumentation targeting SLAs and detailed performance metrics We added semantic logging and detailed durations on many events to make it easy to trace requests on a cluster Splunk reports can currently be used for analytics 11
12
Joint metrics to consider CWIC ExtJS application is an excellent start Questions to answer: –Who is using CWIC and GCMD OpenSearch (clientId) –Who is using CWIC OpenSearch without GCMD interaction –It is worth tracking browse and granule metadata and data downloads from CWIC Smart? –Percentage of direct downloads vs. provider welcome page redirects (based on provider) –Average response times 12
13
Joint metrics to consider cont. Questions to answer: –Number of errors due to provider internal errors –Number of errors due to CWIC internal errors –Number of errors due to provider unavailable –CWIC specific performance metrics per provider –GCMD specific performance metrics –Others? 13
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.