Registry Replication Registry calls are forwarded by a registry Service to a single registry instance (i.e. replica) per VDB. If a replica cannot be contacted or a better alternative becomes available, the registry service will switch (not the producer/consumer). Information in the old replica (whether it’s still working, or fails and recovers) will go stale which causes two problems: –It may send erroneous removeProducer messages to consumers (because it’s no longer receiving showResourceSignOfLife messages for the producer) –It may replicate out-of-date entries to other replicas, reinserting producer/consumer entries that have actually been deleted elsewhere The first can be solved by treating addProducer/removeProducer messages as notifications to the consumer: the consumer must decide for itself whether or not to act on them. The second can be solved by adding sequence numbers generated by the producers and consumers to each registry update call. SF, RB, JW
Schema Replication So table names can be re-used, use table numbers instead of table names in system calls where mutual agreement on table definition is required. User only sees names (except in one or two direct calls to the registry and schema). Replace SELECT strings with parsed select statements. Numbers assigned using the “majority voting” algorithm outlined before. Producer stores table definitions for each table number it is publishing (to check inserts) so immune to schema changes. (would like to use registry replication mechanism for schema too, but still not clear how to handle recovering schemas and how to ensure deletions are propagated to all replicas) JW, SH
External tuple stores User will be able to specify a logical name for the tuple store when they create a producer. In secure installations, it will be prefixed with user’s DN, so only has to be unique to them; in insecure systems (single namespace?)??? Stores can only be re-used by the user that created them, and only if the predicate (and current table definition) matches; only one producer per store. User can query R-GMA for real name of database (up to sysadmin whether or not they are then granted access) Still need some way of cleaning up unwanted stores. Not too many new operations please! (JW) SF, AW
Chunking Consumer service sends execute() and abort() calls to producer for all types of query (so these replace startStreaming() and stopStreaming() too). One-time and continuous queries both stream results back from producer service to consumer service (how it does this is a design issue – it doesn’t affect the interface). For ODP’s I suggest the user code must implement start(), pop(maxCount) and abort() and the ODP service will call these to retrieve the tuples in chunks and stream them to the consumer. JW, SH
Security See chapter 10 in the current spec. SH, MC
Mediation 1/2 Move mediation into a separate module in the Consumer Service for now (i.e. remove the getPlansForQuery call from the registry, amend getProducersForQuery to return whatever registry details the mediator will require. Change registerContinuousQuery to just register and not return plans). Can we make decisions now on changes to current spec (ch 9)? –Changes to definition of producer predicate (see next slide) –Changes to definition of simple/complex query (just add union?) –Whether or not Secondary Producers are picked for continuous queries (no?) –Any changes to producer-selection rules for query types/query plans? (no?) –Any changes to when warnings are applied? (no?) MC, AW, JW, RB
Mediation 2/2 (producer predicates) WHERE (col1 op11 value11 AND col2 op21 value21 AND...) OR (col1 op12 value12 AND col2 op22 value 22 AND...) OR... 'op' may be any one of =, >, >=, < and <= (as in a simple query); OR is allowed provided the query is translated into the above form (disjunctive normal form) String ranges are also allowed (e.g. column > 'A') They also have proposals to allow ‘column IN (value1, value2,... )’, however they estimate the registry database would be twice as complex to support this. I don't think it's worth it at this stage. In any case I don't see why it couldn't be translated into a statement involving ORs which can be stored in the existing proposed database (MC). MC, AW, JW, RB
Run-time monitoring / config. Don’t know yet. SH, MC
Time-stamps R-GMA TIMESTAMP type represented as ISO8601-compatible string: YYYY-MM-DDTHH:mm:ss.sZ where “.s” means 0-9 decimal places, on INSERT and in a result set. Always UTC and no abbreviations allowed. Declared with precision e.g. TIMESTAMP(5) in CREATE TABLE and stored in schema. Default precision is zero. Producer database MUST support at least the requested precision or fail the declareTable. The requested precision must be stored in the registry so that mediation can ensure that a secondary producer does not lose precision. If a secondary producer precision is less than one of the primary producers from which it would otherwise consume it will just ignore this primary producer. (sounds like bunk to me - can’t the Secondary Producer can get the precision from the schema and fail the declareTable just like a primary producer? JW) People should not request more precision than they need. SF, AW
Data integrity 1/2 Q: I think R-GMA must make some statement about preservation of data values through R-GMA for each data type, e.g. string/integer values are guaranteed not to change, timestamps/real/floats might degrade by up to... (or "best endeavours"). A: best endeavours (hmmm JW) Q: But... do we remove quotes from strings (I don't think we quote strings in the XML ResultSet) and do we resolve embedded quotes that have been doubled up? A: we should do them correctly i.e. escape or double quotes as needed and then remove them again to ensure that they are transmitted unchanged SF, AW
Data integrity 2/2 Q: How do we represent floating point numbers in XML result sets (i.e. do we ever use scientific notation etc...) A: Use the normal 1.234E-05 etc - i.e. E an optional minus sign and up to 3 digits. No spaces inside the number. Q: How do we represent values that don't fit into one of our types - I would suggest use a VARCHAR. A: This can only happen for derived types where the database type has no mapping onto an R-GMA type. I guess the string representation is the best we can do. SF, AW
Popping data from Consumer Keep hasAborted() as consumer operation. Put end-of-result-set flag on result set. Deprecate isExecuting() (and count() and popAll()??) To check if result is complete, call hasAborted() after the pop() loop. Another thought (JW) – should we add an API method to all APIs to make it easy to glue result sets together? RB, SH
Error Numbers No progress. JW, AD
Naming virtual databases Don’t know yet… AD, MC
A couple more questions Don’t know yet… AW, AD