Rya Working Group: Back end persistence methods for Rya
Goal Quick overview of different persistence methods that exist – Accumulo Rya “Rdf” y “Accumulo” – Mongo DB Still in infancy – Future? Embedded Rya Rya on AWS Why branch out beyond Accumulo? – Rya could serve a purpose as being a NoSQL Sail interface – A lot of the work in developing Accumulo backed Rya would also apply to other NoSQL datastores
Accumulo Backed Rya Rya’s sweet spot General indexing strategy – 3 tables – A couple of secondary index options Pre-computed joins (PCJs) Geo, Free Text, Temporal indexing Numerous features: – Better load balancing on ingest (prepend hash) – Query optimizations PCJs, Join selectivity, Coarse query optimization based off of cardinalities
Mongo Backed Rya Relatively new – Only since Rya 3.9 General indexing strategy – Triple stored as a single document – Composite indices SPO POS OSP – Support for some other indices Free Text, Geo Working on temporal – Other indices are stored in same documents as original triples Not too many features – Ingest performance not well understood – No support for any query optimizations May be a good thing for GSOC
Backend Agnostic Features Inference REST service Anything else?
Future? Other backend options – Embedded Rya Have thought about including LevelDB as a backend option – AWS integration Could include support for Dynamo as a backend option More features for existing Rya backends – Improve Mongo DB support Query optimizations – PCJ support, Cardinality estimation? Support for more indexing options – Free Text sucks compared to Accumulo – Temporal indexing – Beef up Geo-Indexing support Maybe start pursuing ingest optimizations Thoughts? Ideas? – Let’s capture this in Jira!