Having Your Cake and Eating It Too With Apache OODT and Apache Solr Andrew F. Hart Paul M. Ramirez.

Slides:



Advertisements
Similar presentations
Maximo 7 Integration Framework
Advertisements

AHRT: The Automated Human Resources Tool BY Roi Ceren Muthukumaran Chandrasekaran.
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
Edoclite and Managing Client Engagements What is Edoclite? How is it used at IU? Development Process?
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
Please Describe Data ingestion. This includes support for real-time sensor data (object ring buffers) as well as simulation output (grid portals) –We have.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
Presented by IBM developer Works ibm.com/developerworks/ 2006 January – April © 2006 IBM Corporation. Making the most of Creating Eclipse plug-ins.
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
Apache Jakarta Tomcat Suh, Junho. Road Map Tomcat Overview Tomcat Overview History History What is Tomcat? What is Tomcat? Servlet Container.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Upcoming Enhancements to the HST Archive Mark Kyprianou Operations and Engineering Division Data System Branch.
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
Gary MacDougall Premjit Singh Managing your Distributed Data.
Workflow Management Chris A. Mattmann OODT Component Working Group.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
NE II NOAA Environmental Software Infrastructure and Interoperability Program Cecelia DeLuca Sylvia Murphy V. Balaji GO-ESSP August 13, 2009 Germany NE.
Getting Started with CONTENTdm Corey Harper, University of Oregon Terry Reese, Oregon State University OLA - April 8, 2005.
Extending Vista The PowerLinks WebServices SDK John Hallett Senior Product Manager WebCT, Inc
Fundamentals of Database Chapter 7 Database Technologies.
Microsoft Application Virtualization 5.0: Introduction Mohnish Chaturvedi & Ian Bartlett Premier Field Engineer WCL312.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
Dali JPA Tools. About Dali Dali JPA Tools is an Eclipse Web Tools Platform sub-Project Dali 1.0 is a part of WTP 2.0 Europa coordinated release Goal -
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Imaging Node Meeting Atlas II Status and Plans August 2, 2006.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Shakeh Elisabeth Khudikyan NASA Jet Propulsion Laboratory, California Institute of Technology A Look at Apache OODT Balance Framework.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:
Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Wrapping Scientific Applications As Web Services Using The Opal Toolkit Wrapping Scientific Applications As Web Services Using The Opal Toolkit Sriram.
Ganymede Simultaneous Release | © 2008 by Springsite B.V., The Netherlands made available under the EPL v1.0 Teneo Ganymede Simultaneous Release.
XML and Web Services (II/2546)
User Working Group 2013 Data Access Mechanisms – Status 12 March 2013
A radiologist analyzes an X-ray image, and writes his observations on papers  Image Tagging improves the quality, consistency.  Usefulness of the data.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
Representational State Transfer (REST). What is REST? Network Architectural style Overview: –Resources are defined and addressed –Transmits domain-specific.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
How to combine IRIS products Available APIs Examples of integrations Ole Andersen Senior Strategic Account Manager.
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
NSDL STEM Exchange: Technical Overview and Implications for Active Dissemination of Federally Funded Resources Across Implementation Systems.
ODATA DESIGN PRINCIPLES July 26, BUILD ON HTTP, REST OData is a RESTful HTTP Protocol Build on HTTP Entities modeled as Resources Relationships.
Presented By:. What is JavaHelp: Most software developers do not look forward to spending time documenting and explaining their product. JavaSoft has.
Google Code Libraries Dima Ionut Daniel. Contents What is Google Code? LDAPBeans Object-ldap-mapping Ldap-ODM Bug4j jOOR Rapa jongo Conclusion Bibliography.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
International Planetary Data Alliance Registry Development and Coordination Project Report 7 th IPDA Steering Committee Meeting July 13, 2012.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
Overview of Basic 3D Experience (Enovia V6) Concepts
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
October 2014 HYBRIS ARCHITECTURE & TECHNOLOGY 01 OVERVIEW.
Integrating ArcSight with Enterprise Ticketing Systems
Integrating ArcSight with Enterprise Ticketing Systems
API (Application Program Interface)
Working with Feature Layers
z/Ware 2.0 Technical Overview
VI-SEEM Data Discovery Service
Teneo Ganymede Simultaneous Release Graduation Review
Execute your Processes
Presentation transcript:

Having Your Cake and Eating It Too With Apache OODT and Apache Solr Andrew F. Hart Paul M. Ramirez

About Myself… Software Engineer –NASA Jet Propulsion Laboratory –“Data Management” Committer: –OODT, SIS, Gora, Streams (Incubating) Mentor: Streams (Incubating)

What We’ll Cover Overview of OODT & Solr Projects Strategies for Combining OODT and Solr Detailed Deployment/Config. Example Where to Learn More & Participate

Apache OODT Object Oriented Data Technology Origin in NASA mission data systems Components for –Information integration –Data cataloging and archiving –Configurable workflow processing

Apache OODT Apache –Incubation: 2010, Graduation: 2011 –29 Committers –Latest Release: 0.5 (Dec. 26, 2012)

Apache OODT Karoo Array Telescope (KAT-7)

Apache OODT Virtual Pediatric Intensive Care Unit

Apache OODT Regional Climate Model Evaluation System

Apache OODT Commonalities between systems –Lots of data –Defined processing steps / algorithms Archives important (… search important)

Apache OODT Strengths of OODT for the above use cases –Loosely coupled components –Standard protocols, well-defined interfaces –Highly configurable –Vetted, reliable code

Apache Solr Search + Web Services –Powerful features –Flexible formats –Highly configurable

Apache Solr The White House

Apache Solr Netflix

Apache Solr NASA Planetary Data System

OODT & Solr Why use these projects together? Archives often need search capability Similarities / Compatibilities –XML-based configuration –Environment (Java, Tomcat)

Example Integration “Standard” Data Archive Pipeline

Example Integration “Standard” Data Archive Pipeline + Search

OODT Products Typically 1-1 with Files Each uniquely identifiable (GUID) Support for higher-level “ProductType” –A way to define collections

OODT Metadata Annotations for products Key:{Val|Multival} Common across all OODT components Two general classes: –System –User

OODT Metadata System Metadata –Added automatically by OODT Components –Used to track state –Used to encode relationships between data

OODT Metadata User Metadata –Specified as “policy” –Can be product-level, or productType-level –Used to extract & persist information from files as they are ingested (become products)

OODT Metadata Metadata (Policy) Example (external)

Solr Schema XML document Define what will be indexed (“Fields”) Provide high-level context hints –Data type, behavior, pre-processing Extremely flexible, extensible

Solr Schema Solr Schema Example (external)

Making the Connection SolrIndexer Tool –Part of the File Manager component tools –Map OODT Metadata to Solr Fields –Create Solr documents from OODT products –Note: only talking about metadata

SolrIndexer Tool Org.Apache.Oodt.Cas.Filemgr.Tools Available since 0.4 Release Recommend to use 0.5+ as some stability improvements were added Several modes of operation

SolrIndexer Tool

Invocation Examples: Ingest all products from the specified File Manager instance java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --all \ --fmUrl \ --solrUrl

SolrIndexerTool Invocation Examples: Ingest all products from the specified ProductType(s) java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --types urn:some:ProductType \ --fmUrl \ --solrUrl

SolrIndexerTool Invocation Examples: Ingest a single product by its unique product id java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --product 19bcb4b e1-b581-8b d \ [--delete] \ --fmUrl \ --solrUrl

SolrIndexerTool Invocation Examples: Force optimization of the Solr index java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --optimize --solrUrl

Indexer.properties Configuration file for the SolrIndexer Specify mapping between OODT product metadata and Solr fields Additional “pre-processing” features

Indexer.properties Example Indexer.properties file (external)

Use Case I Building a searchable data archive “Long-term” / “Lights-out” archive Products & metadata immutable Many NASA mission data systems use this model Want to make it easily searchable

Use Case I “Standard” Data Archive Pipeline + Search

Use Cases II Building an interactively editable, searchable data archive Data and metadata mutable Want to dynamically select product(s) to edit based on metadata

Use Case II Interactively Editable Data Archive Pipeline + Search

Use Case II Interactively Editable Data Archive Pipeline + Search Solr catalog out of sync!

Synchronization Two ways (at least) to solve this: A. Modify the OODT Curator Services B. Treat OODT Curator Services as “black box” and write “wrapper” service to invoke Curator Services AND update Solr (via scripted call to SolrIndexer, for example)

Modify Curator Services Services implemented in JAX-RS /curator/src/main/java/org/apache/oodt/cas/curation/service [curator_url]/services/metadata/update Options: –Utilize Solr Java API –Wrap call to OODT SolrIndexer tool

Use Case II-A Modified Curator Services to Simultaneously update Solr

Example Interactive event tagging

Wrap Curator Services Curator Service/API is “black box” Develop custom service that: –Issues POST request to Curator service –Updates Solr index via, e.g.: Utilize Solr Java API Wrap call to OODT SolrIndexer tool

Use Case II-B Wrapping OODT Curation Services with Custom UI & Services

Example

Lessons Solr compliments OODT File Manager RESTful interfaces (Solr + OODT Curator) allow for great flexibility in designing services and UI “Best” approach depends on situation

Next Steps Develop “SolrCatalog” for OODT File Manager? –Pros: Reduction in “moving parts” –Cons: Restrictive? Implement Use Case II-A as optional mode for Curator web service layer

Learning More Solr – OODT – Homehttps://cwiki.apache.org/confluence/display/OODT/ Home

Thanks! Questions?