CMR ECHO Transition and Client Collaboration Winter ESIP 2016 Jason Gilman The material is based upon work supported by the National Aeronautics and Space Administration under Contract Number NNG15HZ39C
& CMR ECHO At the EOSDIS Technical Interchange Meeting back in November Dana Shum and Katie Baynes spoke about some of the changes going in to modernize the legacy ECHO components of the CMR. We’re moving some of the core capabilities from ECHO technologies to the more modern approaches used in the CMR. We’re going to get a lot of benefits out of this move which I’ll detail along with some new features and REST APIs. Along with some of the older SOAP based APIs are going away. I’m going to go into more detail here and explain why we’re making this move, how we’re doing it, and what benefits you’re going to see.
ECHO Prehistory To understand where ECHO and the CMR are today we have to jump back to the origins. Originally ECHO was a single monolithic application. It has a SOAP API. Providers sent their metadata over FTP to an Ingest process. Users would search for and order data through a client called WIST which was a copy of an older client that had run at individual archive centers. Note: Describe the purpose of the colors here. Grey – legacy component. Blue - clients
ECHO Middle Ages The next step in ECHO evolution brought some separate services which are shown in pink. We added REST APIs for ingesting, searching for, and ordering data. New response formats were added. Catalog REST and Elasticsearch were added for a new way to ingest and search. The Reverb client was developed around this time which was a significant improvement over the previous WIST client.
CMR and ECHO Together The next stage brings the addition of the CMR here in green which provides Ingest and search capabilities. Every diagram I've shown so far has only added new things. Nothing has gone away with the exception of WIST. Providers can still use FTP Ingest. They can ingest through Catalog REST. And starting just recently they can ingest directly into the CMR. It all ends up in the CMR and is available for searching. All of the existing clients like Reverb can still work through their existing APIs You can see here we have a mix of different technology’s and approaches from three different eras all working together. We’ve got legacy monolithic Java applications in grey. We’ve got Ruby applications here in red and the newer microservices of the CMR.
Legacy System Problems It’s great that we’ve been able to continue to use our legacy applications for so long but there are a lot of limitations with their continued use. Legacy System Problems
Cost It’s expensive to maintain this system of legacy technologies. There are additional hardware costs to run the legacy applications. They legacy technologies require developers to be familiar with larger sets of code bases which means we have to spend longer on fixing things. Throughout the entire lifetime of the CMR we’ve had a team working on it but we also have several people dedicated to the legacy systems. Those additional developers are a cost both in terms of their salary but also in terms of opportunity. They’re spending time maintaining an old system when we could be adding new value to the earth science community in new features.
Complexity Maintaining a system with that number of legacy components adds a lot of complexity. The size of the boxes I showed in the diagram before doesn’t represent the complexity of an individual box. The CMR services are relatively simple microservices that focus on doing one thing well. They’re easy to deploy and run. The legacy components represent very complex, decades old applications with many 10s to 100s of thousands of lines of code. They use different older technologies which means that developers have to be familiar with all of the different technologies being used.
Holding Us Back The limitations of the legacy components prevent us from making some fundamental improvements in the CMR. The CMR utilizes legacy ECHO code for orders, authentication, authorization and other things. The next couple diagrams will help demonstrate why that’s an issue.
Degraded Service Events (DSE) since CMR went live Reliability Degraded Service Events (DSE) since CMR went live This is a diagram showing counts of degraded service events (aka outages) since the CMR went live categorized by cause. (Explain each column) You can see that the largest cause of outages was in ECHO code and related software while the newer CMR approaches have not yet been the source of any outages. When one of these outages occurs it’s not like just the legacy component itself goes down. The legacy systems are still a fundamental part of the CMR. When they don’t work it means that the CMR doesn’t work so users can’t perform searches and data providers can’t ingest data. CMR - Indicates it was caused by code problem in CMR or software issue in CMR application servers and third party libraries. ECHO Scalability - Indicates it was caused by systems inability to scale to handle many requests. Process - Means it was caused by someone making a mistake in a process that was not automated. Shared Infrastructure - Means the cause was in the infrastructure (load balancers, puppet management, NASA network etc) ECHO - Indicates it was caused by code problem in ECHO or software issue in ECHO application servers and third party libraries
Granule Search Performance This graph shows the performance improvement when the CMR went live. CMR is fast in spite of ECHO. We have to jump through hoops to make the CMR fast. We need to enforce access control rules during a query. Fetching this data from ECHO takes about a minute. We have to make sure to cache that, keep the cache warm, and only ever fetch it on a background thread. These legacy systems just weren’t designed for the performance requirements the CMR has.
We are going to transition the legacy components of ECHO into the CMR. Transition Goals
Incremental Rollout We’re not going to do a big bang release. We’re going to go capability by capability and transition each one to the CMR.
Minimize Impact* We’re going to do what we can to minimize the impact of the change to clients and users. That means your data isn’t going to disappear. That also means you shouldn’t see downtime during the switchover. If you’re using one of the current supported REST APIs it should seem like nothing has changed. * Legacy SOAP APIs are retiring in September 2016
Safety Net We’re planning an incremental rollout using techniques that have worked before. But part of making a big transition like this is identifying risks and planning for what could go wrong. When transitioning a capability if we find a serious bug that we somehow missed in testing in various environments we have a toggle that we can throw to switch back to the old implementation. That should be transparent to clients and users. We probably won’t have to do it but it’s insurance in case of problems.
Transition Plan
For Each Capability Design and implement CMR component. Integrate use of CMR component into legacy code. Live switch over.
Before Capability This is the current state of the CMR and ECHO. It’s the same diagram I showed before.
After Capability Added Note that clients can continue to use existing APIs during transition without problems or interruption. This is a temporary increase in complexity on the path to a simplified experience
Eventually… (~September 2016) Eventually after transitioning all the components and retiring the legacy parts we’ll be at a much more simplified architecture.
Benefits Reduced cost Improved reliability and performance Easily add new capabilities Consistent APIs and features
Data Stored as Immutable Revisions Speaking of consistent APIs and features one nice feature of the transition is that we will be storing the transferred data in the same way we store granule and collections. You’ve probably heard that the CMR stores every update of a granule or collection as a separate immutable revision. We don’t overwrite your data. We create a new copy in our database and eventually age off the old copies to save space. The MMT app exposes this to allow you to view your update history, see who made changes, revert to a previous revision and even undelete. Those are some great features. We’re going to be storing all of the transitioned ECHO data the same way so you’ll have the same capabilities with ECHO data like access control rules, orders, data quality summaries, and order options.
We need your help. Transition off of legacy APIs Provide Feedback SOAP -> REST FTP Ingest -> REST Ingest Provide Feedback Use EDSC and MMT Feedback on CMR APIs
Finally I’m going to discuss something that we’ve been thinking about as we grow the CMR and add new APIs is API versioning. API Versioning
API Improvements API Stability There’s a natural push and pull between the stability of an API and the ability to improve it. As more people use our APIs it gets harder to change it. We don’t want to break existing clients. But we also want to be able to improve it over time. If the CMR is going to be around for a long time it has to be able to change.
Solution! API Versioning API Stability Client Collaboration API Improvements Solution! API Versioning and Client Collaboration are part of the solution here. API Versioning and similar techniques let us add new things make forward progress without breaking clients. We get improvements and stability. Client collaboration is needed as we go forward. We need to know what’s working and what’s not working. And eventually we need cooperation when we’re going to get rid of old versions.
API Versioning Strategies
URI Path Popular (Twitter, Github, etc) Problems https://cmr…/search/v2/collections Popular (Twitter, Github, etc) Problems URIs should be stable. Granularity problems
Query Parameter or Custom Header https://cmr…/search/collections?ver=2 Easy to use. Problems Not RESTful. Doesn’t take advantage of built in HTTP capabilities. Granularity problems
Content Negotiation RESTful URLs stay the same. GET https://cmr…/search/collections Accept: application/reference2+json RESTful URLs stay the same. Request and response content versioned separately. Problems More difficult to specify.
Content Negotiation via URL ext. GET https://cmr…/search/collections.ref2_json Easy to specify for clients that can’t specify a header. Problems Doesn’t handle request content Changes the URL
What will we use? Currently researching approaches. May use combination approach: Content Negotiation + URL extensions Feedback Requested
Client Collaboration
CMR Wiki https://wiki.earthdata.nasa.gov/display/CMR
Client Developer Forum https://wiki.earthdata.nasa.gov/display/CMR/CMR+Client+Developer+Forum
CMR Designs in Wiki Also note holding design reviews where people give feedback.
CMR Client Developer Email List cmr-client-developers@lists.nasa.gov Sign up here: https://lists.nasa.gov/mailman/listinfo/cmr-client-developers
This material is based upon work supported by the National Aeronautics and Space Administration under Contract Number NNG15HZ39C.