”Smart Containers” Charles F. Vardeman II, Da Huo, Michelle Cheatham, James Sweet, and Jaroslaw Nabrzyski https://github.com/crcresearch/smartcontainers.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
BI Web Intelligence 4.0. Business Challenges Incorrect decisions based on inadequate data Lack of Ad hoc reporting and analysis Delayed decisions.
17th February, 2000 by Maciej Korzeniowski (CERN-IT-IA-MI) 1 Oracle Discoverer Product Presentation  This is an ad hoc query and analysis tool for.
Microsoft A Vision for Health. Consumerism/ Choice A Challenging World Public Health Healthcare spend increasing as % of GDP spend Increasing social cost.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
CS 431 The Semester in Elevator Speak Carl Lagoze – Cornell University May 5, 2004.
Riding the Wave: a Perspective for Today and the Future APA Conference, November 2011 Monica Marinucci EMEA Director for Research, Oracle.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Adaptive Book: A Platform for teaching, learning and student modeling Ananda Gunawardena School of Computer Science Carnegie Mellon University.
Presenter: NAME Date: MM/DD/YYYY CUSTOMER NAME Presenter: Harris Date: 04/06/ An extensible platform for creating.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Chapter 5 Application Software.
>>> Korean BioInformation Center >>> KRIBB Korea Research institute of Bioscience and Biotechnology GS2PATH: Linking Gene Ontology and Pathways Jin Ok.
Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.
Server-side Scripting Powering the webs favourite services.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Web 2.0: Concepts and Applications 6 Linking Data.
IBrutus Request Processor Grammar Rules Computer Vision Module Software Analysis and Design  Multiple data sources - CETI project data is spread over.
Science Research: Journey to 10,000 Sources Presented by: Abe Lederman, President and Founder Deep Web Technologies, Inc. Special Libraries Association.
November 2003 Presented to “Commercializing RDF” Semantic Software Solutions for Enterprise Web Management International World Wide Web Conference 2004.
Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Part 04 – Preparing to Deploy to the Cloud Entity Framework and MVC Series Tom Perkins NTPCUG.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Connecting Specimens, Images and Vocabulary Specify, Morphbank, Morphster Beach, Noble, Spears – KU Mast, Riccardi – FSU Miranker, Tirmizi UT.
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Teranode Tools and Platform for Pathway Analysis Michael Kellen, Solution Manager June 16, 2006.
Digital curation activities enhance access and retrieval, maintain quality, add value, and facilitate use and re-use over time. This poster demonstrates.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Scenarios for a Learning GRID Online Educa Nov 30 – Dec 2, 2005, Berlin, Germany Nicola Capuano, Agathe Merceron, PierLuigi Ritrovato
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Datalayer Notebook Allows Data Scientists to Play with Big Data, Build Innovative Models, and Share Results Easily on Microsoft Azure MICROSOFT AZURE ISV.
Copyright All right reserved 1 i - LIKE Linked Data enrichment for an e-learning system Networked interactions to create, learn and share knowledge.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
8 th Semester, Batch 2009 Department Of Computer Science SSUET.
Windows Azure poDRw_Xi3Aw.
LoCloud Conference - Sharing local cultural heritage online with LoCloud services Microservices in LoCloud Walter Koch Gerda Koch
An Open Data Platform in the framework of the EGI-LifeWatch Competence Centre Fernando Aguilar Jesús Marco
”Smart Containers - Demo” James Sweet, Charles Vardeman II.
Enhancements to Galaxy for delivering on NIH Commons
Mike Hildreth representing the DASPOS Team
About me Civil engineer (not in IT) and self-taught developer
Wikidata as a digital preservation knowledgebase
Themes in Geosciences.
Self Healing and Dynamic Construction Framework:
The importance of being Connected
Mike Hildreth representing the DASPOS Team
Primal and Microsoft Azure Deliver Personalized Content, Intelligence, and Analytics That Match Your Content to the Interests of Your Audience MICROSOFT.
Implementing Knowledge Management in Organization
Improving Scilab’s Xcos User Interface
New Directions in Discovery
LOD reference architecture
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Dtk-tools Benoit Raybaud, Research Software Manager.
Quoting and Billing: Commercialization of Big Data Analytics
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

”Smart Containers” Charles F. Vardeman II, Da Huo, Michelle Cheatham, James Sweet, and Jaroslaw Nabrzyski

Observation: Scientific Computing is adopting cloud infrastructure due to economics Source:

Observation: Cloud infrastructures are adopting “containerization” for infrastructure deployment Source:

Observation: Scientific Data requires Context Source:

Why ”Smart Containers”?

Source: Krzysztof Janowicz, Frank van Harmelen, James A. Hendler, and Pascal Hitzler. “Why the Data Train Needs Semantic Rails.” AI Magazine, “A major paradigm shifts introduced by the Semantic Web is to focus on the creation of smart data instead of smart applications. The rationale behind this shift is the insight that smart data will make future applications more (re)usable, flexible, and robust, while smarter applications fail to improve data along the same dimensions…”

Smart Containers IN THEORY: DATA METADATA PROVENANCE What is it? Put data, metadata and provenance in the “same world” Enhance data by linking to other things 2 1

Smart Containers IN THEORY: How does it work? Add machine- readable labels Link things together into a knowledge graph 2 1

Smart Containers IN THEORY: Why do I need it? To break down silos, and find relationships among data, software, documents and more So you can ask big questions that cross disciplinary boundaries So machines can do the grunt work for you while you focus on the science! -Identify software dependencies and set up your compute environment -Automatically capture provenance/metadata -Find connections so you can “follow your nose” to more information “I’d like to run an astrophysics simulation of a dwarf star dying – anything out there?” I found an executable notebook, would you like me to set it up and run it? +

Smart Containers IN PRACTICE: Anatomy of a Smart Container Docker Image Docker ENGINE Docker Container SC Python wrapper is added to standard Docker container 1 Provenance and metadata are written directly to image label -Machine-readable -Enables discovery in large repositories 2 Container is provisioned as a “Smart Container”: -API to write metadata -Metadata storage and standardization -Specification of data location 3

Smart Containers IN PRACTICE: Building a new container SC Use Smart Container command line tool* (replaces Docker command line tool) 1 Provenance is captured automatically 2 *Smart Containers can also be used in an infrastructure through an API >_ Customize by adding any additional metadata you want (or don’t and go with the default!) 3 My Meta

Smart Containers IN PRACTICE: Searching for a container SC You: execute a search 1 Machine: identifies dependencies, pulls together any additional containers you need and runs your selection 4 Machine: searches knowledge graph of available containers and returns matches 2 You: select the one you want 3 “I’d like this one!”

Smart Containers IN PRACTICE: Feature Summary Adds machine-readable metadata label to targz that can be read without opening/running the container Discovery Engine finds dependencies and retrieves them, you just run the Smart Container Can move you code to the data, or the data to your code Key access: can put collaborators “on the list” to access your container if public sharing is not appropriate (e.g. when HIPPA data is involved)

Smart Containers IN PRACTICE: Example Search: astrophysics simulation of a dwarf star dying Results (2): executable notebook, with results A executable notebook, blank Select and fire notebook with results Find and run container with notebook software Send notebook to user’s container Send webpage to container’s web interface Make changes and work with notebook Capture provenance and send back to web service as “executable notebook, with results B” New Search: astrophysics simulation of a dwarf star dying Results (3): executable notebook, with results A executable notebook, with results B executable notebook, blank

Smart Containers IN PRACTICE: Beyond the container KG Image Source: Smart Container knowledge graph links out to other knowledge graphs Combine data in many different locations, inc. local data and wikidata Enables aggregate searches and federated queries

Thank You