”Smart Containers” Charles F. Vardeman II, Da Huo, Michelle Cheatham, James Sweet, and Jaroslaw Nabrzyski
Observation: Scientific Computing is adopting cloud infrastructure due to economics Source:
Observation: Cloud infrastructures are adopting “containerization” for infrastructure deployment Source:
Observation: Scientific Data requires Context Source:
Why ”Smart Containers”?
Source: Krzysztof Janowicz, Frank van Harmelen, James A. Hendler, and Pascal Hitzler. “Why the Data Train Needs Semantic Rails.” AI Magazine, “A major paradigm shifts introduced by the Semantic Web is to focus on the creation of smart data instead of smart applications. The rationale behind this shift is the insight that smart data will make future applications more (re)usable, flexible, and robust, while smarter applications fail to improve data along the same dimensions…”
Smart Containers IN THEORY: DATA METADATA PROVENANCE What is it? Put data, metadata and provenance in the “same world” Enhance data by linking to other things 2 1
Smart Containers IN THEORY: How does it work? Add machine- readable labels Link things together into a knowledge graph 2 1
Smart Containers IN THEORY: Why do I need it? To break down silos, and find relationships among data, software, documents and more So you can ask big questions that cross disciplinary boundaries So machines can do the grunt work for you while you focus on the science! -Identify software dependencies and set up your compute environment -Automatically capture provenance/metadata -Find connections so you can “follow your nose” to more information “I’d like to run an astrophysics simulation of a dwarf star dying – anything out there?” I found an executable notebook, would you like me to set it up and run it? +
Smart Containers IN PRACTICE: Anatomy of a Smart Container Docker Image Docker ENGINE Docker Container SC Python wrapper is added to standard Docker container 1 Provenance and metadata are written directly to image label -Machine-readable -Enables discovery in large repositories 2 Container is provisioned as a “Smart Container”: -API to write metadata -Metadata storage and standardization -Specification of data location 3
Smart Containers IN PRACTICE: Building a new container SC Use Smart Container command line tool* (replaces Docker command line tool) 1 Provenance is captured automatically 2 *Smart Containers can also be used in an infrastructure through an API >_ Customize by adding any additional metadata you want (or don’t and go with the default!) 3 My Meta
Smart Containers IN PRACTICE: Searching for a container SC You: execute a search 1 Machine: identifies dependencies, pulls together any additional containers you need and runs your selection 4 Machine: searches knowledge graph of available containers and returns matches 2 You: select the one you want 3 “I’d like this one!”
Smart Containers IN PRACTICE: Feature Summary Adds machine-readable metadata label to targz that can be read without opening/running the container Discovery Engine finds dependencies and retrieves them, you just run the Smart Container Can move you code to the data, or the data to your code Key access: can put collaborators “on the list” to access your container if public sharing is not appropriate (e.g. when HIPPA data is involved)
Smart Containers IN PRACTICE: Example Search: astrophysics simulation of a dwarf star dying Results (2): executable notebook, with results A executable notebook, blank Select and fire notebook with results Find and run container with notebook software Send notebook to user’s container Send webpage to container’s web interface Make changes and work with notebook Capture provenance and send back to web service as “executable notebook, with results B” New Search: astrophysics simulation of a dwarf star dying Results (3): executable notebook, with results A executable notebook, with results B executable notebook, blank
Smart Containers IN PRACTICE: Beyond the container KG Image Source: Smart Container knowledge graph links out to other knowledge graphs Combine data in many different locations, inc. local data and wikidata Enables aggregate searches and federated queries
Thank You