Presentation is loading. Please wait.

Presentation is loading. Please wait.

HYPERLEDGER Fabric Pluggable/Queryable State Database

Similar presentations


Presentation on theme: "HYPERLEDGER Fabric Pluggable/Queryable State Database"— Presentation transcript:

1 HYPERLEDGER Fabric Pluggable/Queryable State Database

2 Ledger v1 State Database Blockchain (File system)
Key document Transaction document Txn Reads[] Writes[] { "asset_name":"marble1", "owner":”jerry", "date":"9/6/2016", ”version":”2:1", } "txId": "tx ",    ”transaction_height”:”2:1”  "function": " UpdateAsset(‘marble1’,’jerry’)",    "date": "9/6/2016",    "rwset": {       "reads": [         {"key": "marble1",   "version": ”1:1"   } ],  "writes": [    {"key": "marble1",          "value": {  "asset_name": "marble1",                "owner": "jerry",                "date": "9/6/2016"             }          }       ]    } } Txn Reads[] Writes[] Latest written key/values for use in transaction simulation (current v1) Txn Reads[] Writes[] Transaction documents can be saved for historical reporting (proposed) Txn Reads[] Writes[] Immutable source of truth ‘Index’ of the blockchain for runtime queries

3 State Database - Queryability
In a key/value database such as RocksDB, the content is a blob and only queryable by key Does not meet chaincode, auditing, reporting requirements for many use cases In a document database such as CouchDB, the content is JSON and fully queryable Meets a large percentage of chaincode, auditing, and simple reporting requirements For deeper reporting and analytics, replicate to an analytics engine such as Spark Compatible with existing chaincode programming model, no changes required if chaincode models key/value as JSON SQL data stores also possible, but requires more complicated relational transformation layer, as well as schema management.

4 v1 Transaction Lifecycle (with state database plugin)
SDK 3) Submit transaction with simulation results Transaction Reads[] Writes[] Transaction Reads[] Writes[] Transaction Reads[] Writes[] Transaction Reads[] Writes[] 4) Receive batches of transactions from Ordering Service 1) Submit proposal Ordering Service Endorsing Peer Committing Peer 2) Simulate proposal in peer Plugin queries State DB Build RWSet in Peer 5) Validate transaction and commit Plugin queries State DB during MVCC check Plugin pushes WriteSet to State DB

5 Ledger Query - Options Option 1 – Leverage investment of existing databases Keep Blockchain file storage. Leverage existing databases for state database/queries. Pluggable model to support various database engines. Option 2 – Custom query engine Keep RocksDB embedded key/value state database. Build indexes and query engine on top of RocksDB. Option 3 – Custom blockchain database Build a custom database optimized for a converged Blockchain/State Database Single copy of data (blocks, transactions, keys, history… all natively linked and queryable) Less modular More effort

6 Ledger Query - Options Option 1 – Leverage investment of existing databases Keep Blockchain file storage. Leverage existing databases for state database/queries. Pluggable model to support various database engines. Option 2 – Custom query engine Keep RocksDB embedded key/value state database. Build indexes and query engine on top of RocksDB. Option 3 – Custom blockchain database Build a custom database optimized for a converged Blockchain/State Database Single copy of data (blocks, transactions, keys, history… all natively linked and queryable) Focus of this chartdeck Less modular More effort

7 Pluggable state database – How delivered?
Golang does not support dynamic link libraries. Therefore, the peer needs to be re-compiled to plug in a new database. This is something a vendor/provider would likely do Ledgernext code will be refactored so that it is obvious which interfaces need to be implemented.

8 Pluggable state database with query – How used?
Provide a new SDK API for query execution (outside chaincode), e.g. lookups, reporting, auditing SDK API can be secured with ACL list Note – remove ‘query’ chaincode API in order to discourage building reporting applications within chaincode Within ‘invoke’ chaincode, no changes required if normal GetState/PutState operations are utilized Golang has native Struct JSON marshaling Within ‘invoke’ chaincode, rich query can be used to identify keys (documents) to update Two limitations to be aware of: Cannot Query/Write, and then Re-Query in the same transaction, since the simulation results are not in the DB yet. The endorser/committer architecture cannot prevent phantom reads, therefore this solution can only be used by applications not sensitive to phantom reads A phantom read occurs when the result set at simulation time does not match the result set at commit time, due to in-flight transactions Example: Update/Transfer all assets owned by ‘tom’. A new asset for ‘tom’ may arrive between simulation and commit phase, which would be missed. Cannot be solved unless we re-query at commit time and compare result sets. Stub.PutState(“marble1”,marble1_json)

9 CouchDB Example     {       "asset_name": "marble1",       "color": "blue",       "size": 35,       "owner": “jerry”       ”version": ”2:1"     } Use existing Marbles chaincode (uses JSON data model already) Ledgernext plug-in implementation for CouchDB. PutState() persists to CouchDB. Scenario Create marble owned by Tom Transfer marble to Jerry Query ledger using CouchDB 2.0 query language Tom Jerry Query marble1 current state PUT /marbles_app/_find Query for all Jerry’s marbles PUT /marbles_app/_find {"selector":{"owner":”jerry"}} {"selector":{"asset_name":"marble1"}} Full set of query operators, filtering, sorting also available as of CouchDB 2.0 Query full history/provenance of marble1 PUT /marbles_app/_find Query full history of Jerry’s transactions PUT /marbles_app/_find {   "selector": {     "rwset.writes": {       "$elemMatch": {         "value.asset_name": "marble1"   } } } } {   "selector": {     "rwset.writes": {       "$elemMatch": {         "value.owner": ”jerry"   } } } }

10 Pluggable state database – More details

11 Pluggable state database - Objectives
Enrich Query API for Blockchain Leverage state-of-the-art databases to extend query capabilities against current state and transactions Both SQL and NoSQL flavors Ensure API supports plugging in different state database, for example by a vendor building on top of fabric Query support opportunities Current state Historical point in time Provenance Geo-location Text Speed delivery and quality by leveraging investments in existing database engines Allow new R&D to focus on blockchain specific capabilities Do not re-invent the wheel To the degree possible, embed database and leverage capabilities within fabric, rather than requiring DBA skills Would require much investment to invent our own database, query language, etc Enriched Query API without introducing yet another paradigm to application development. Already existing, well understood impedence mismatch between application and data store that NoSQL, JSON and XML extensions already address. Why introduce yet-another-paradigm and associated query language and adminstration. Allow new R&D to focus on added value prop of blockchain (trust, privacy, and inter-organization shared ledger) Most likely, only the state database would be pluggable, not the actual block storage that resides on file system in ledgernext.

12 Pluggable state database - The Challenge
How to support v1 endorsement/simulation model, when most databases do not support simulation result sets? That is, how to make uncommitted updates to an arbitrary database and determine the ReadWriteSet that is required for endorsement and commit validation? Not possible with most databases… Proposed Solution: Query database for key values during simulation, using database’s rich query language Perform simulation updates in private workspace (peer memory) using normal chaincode APIs, e.g. PutState() Get ReadWriteSet from endorder simulation (Reads come from DB queries, Writes come from simulation private workspace) Endorsement, Consensus, Validation use transaction ReadWriteSet Simulation Results as normal Push Writes to database during Commit phase

13 Pluggable state database – CouchDB (NoSQL document DB)
Assumption – start with single-node CouchDB as the first pluggable NoSQL document database CouchDB is a JSON document store. The JSON document will serve as the chaincode application key’s value Support for complex objects Chaincode manipulates JSON document JSON document gets persisted in ledger (block’s WriteSet as well as state database) Use CouchDB 2.0’s rich query language, or as needed, use CouchDB’s map/reduce views. These charts will use a simple asset transfer scenario Good CouchDB primer:

14 Pluggable database – transaction lifecycle for asset transfer example
Get current key state in chaincode, using normal chaincode API, e.g. GetState() Stub.GetState (“marble1”) Database plug-in generates database query to retrieve JSON value, e.g. GET /<ChaincodeID>/marble1 RESPONSE: { "_id":”marble1", “asset_name”:” marble1”, "owner":"Tom", "date":"9/5/2016”, ”txId":"f81d4fae-7dec-11d0-a765-00a0c91e6bf6" , "_rev":”5” } Database plug-in removes ‘internal’ fields before passing to chaincode (”_id”, “txId”, “_rev”) OldValue= { "asset_name":” marble1 ", "owner":"Tom", "date":"9/5/2016” } Simulate the transaction updates as normal in chaincode NewValue= { "asset_name":” marble1 ", "owner":"Jerry", "date":"9/6/2016”} Stub.PutState(“marble1”,[]byte(NewValue)) Simulation results in a ReadWriteSet as normal “Reads”:[{"key" : “marble1 ", "version" : "5"}] “Writes”:[{"key" : “marble1 ", "value" : { "asset_name":” marble1 ", "owner":"Jerry", "date":"9/6/2016"} }] Endorsement, Consensus, and Committer Validation as normal During final commit, the database plug-in converts the read set into a query for MVCC check, and the write set into a database update, and re-adds ‘internal’ fields for persistence (”_id”, “txId”, “_rev”) PUT /<ChaincodeID>/marble1 { "_id":" marble1 ", “asset_name”:”my_asset”, "owner":”Jerry", "date":"9/6/2016”, ”txId":"0b1f4cc8-75d6-11e6-8b77-86f30ca893d3" , "_rev":”5” }

15 Querying for multiple keys
Example: Transfer all of Tom’s assets to Jerry Stub.GetStateMultipleKeysUsingQuery(“ {”selector": { ”owner": ”Tom"} }” ) Database plug-in generates database query, e.g. POST /<ChaincodeID>/_find {”selector": { ”owner": ”Tom"}} 3 documents returned and placed in simulation read-set: {”_id”:”marble1”, “asset_name":”marble1", "owner":"Tom", "date":"9/5/2016” , “txId”: “0b1f4cc8-75d6-11e6-8b77-86f30ca893d3”, “_rev”:”5”} {”_id”:”marble2”, "asset_name":”marble2", "owner":"Tom", "date":”8/6/2016” , “txId”: “2a1f4cc8-75d6-11e6-8b77-86f30ca893b2”, “_rev”:”8”} {”_id”:”marble3”, "asset_name":”marble3", "owner":"Tom", "date":”8/22/2016” , “txId”: “5d1f4cc8-75d6-11e6-8b77-86f30ca893e1”, “_rev”:”10”} Plug-in needs to specify which fields to use for primary key, version, and transaction id _id, _rev, txId in above example Then in chaincode, use SetStateMultipleKeys() as normal, to change the owner of all assets to ‘Jerry’, within the respective JSON values (documents) Note – Risk of phantom reads, for example if a fourth asset was transferred to Tom between simulation time and commit time… this asset will be missed. Cannot be solved unless we re-query at commit time and compare result sets.

16 Transaction documents
Write a transaction document to shadow the blockchain ledger, for reporting/audit queries only. State Database (RocksDB by default) Blockchain (File system) Key document Transaction document Txn Reads[] Writes[] { "asset_name":"marble1", "owner":"Jerry", "date":"9/6/2016", "txId":" tx ", } "txId": "tx ",    "block": 2,    "block_position": 1,    "function": " UpdateAsset(‘marble1’,’jerry’)",    "state": "completed",    "date": "9/6/2016",    "rwset": {       "reads": [         {"key": "marble1",   "version": "tx "   } ],  "writes": [    {"key": "marble1", "version": "tx ",             "value": {  "asset_name": "marble1",                "owner": "jerry",                "date": "9/6/2016"             }          }       ]    } } Txn Reads[] Writes[] Latest written key/values for use in transaction simulation (current v1) The entire block is the real transaction at the end of the day. Therefore technically speaking the state database does not need to provide atomic writes across key documents, as long as we ensure that everything is in a consistent state before processing the next block. Lack of atomic writes means there will be a higher number of simulated transactions that eventually fail MVCC check during commit validation we may therefore serialize readers (endorser/simulation) and writers (committer) which impacts performance (config opti we have to work harder to ensure everything is in a consistent state ourself, as opposed to delegating this to the database if we somehow cannot get the database into such a consistent state, the peer will have to be rebuilt. Txn Reads[] Writes[] Transaction documents can be saved for reporting (future) Txn Reads[] Writes[]

17 Rich queries Key document With the history of transactions in CouchDB, we can also create views and perform interesting state, point in time, provenance, and audit queries, for example: Show full history for marble1 Show full history for all Jerry’s assets and transactions Which transactions were marble1 involved in? Which blocks contain those transactions? What assets and owners were involved in transaction 0b1f4cc8-75d6-11e6-8b77? { "_id":" marble1 ", "asset_name":”marble1", "owner":"Jerry", "date":"9/6/2016", "txId":"0b1f4cc8-75d6-11e6-8b77-86f30ca893d3", "_rev":"5" } Transaction document { "txId":"0b1f4cc8-75d6-11e6-8b77-86f30ca893d3", "block":1, "block_position":1, "function":" UpdateAsset(‘marble1’,’Jerry’)", "state":"completed" "rwset":{ "Reads":[ "key":" marble1", "version":"5" } ], "Writes":[ "version":"5", "value":{ "asset_name":" marble1", "owner":"Jerry", "date":"9/6/2016" ] },

18 Pluggable database – RDBMS
Could provide a generic Go “database/sql” plug-in Solution specific tables custom-defined alongside chaincode Customized and optimized for use case and expected queries These charts will use a simple asset transfer scenario and ASSETS table as an example Option to deliver ledger tables in database as well (BLOCK, BLOCK_TRANSACTIONS) Either as shadow tables to query the blockchain ledger, or perhaps even as the primary blockchain ledger

19 Pluggable database – transaction lifecycle for asset transfer
Get current key state in chaincode, using query language specific to the database plug-in Stub.GetStateUsingQuery(“123”,“Select asset_id, owner, date, version from assets where asset_id = ? ”,”123”) Database plug-in responsible for converting result set into JSON ’value’ for use in chaincode, could also use a table API metaphor asset_id owner date Version (txId) 123 Tom 9/5/2016 f81d4fae-7dec-11d0-a765-00a0c91e6bf6 OldValue= { "asset_id":"123", "owner":"Tom", "date":"9/5/2016" } Simulate the transaction updates as normal in chaincode NewValue= { "asset_id":"123", "owner":"Jerry", "date":"9/6/2016" } Stub.PutState(123,[]byte(NewValue)) Simulation results in a ReadWriteSet as normal “Reads”:[{"key" : “123", "version”:”f81d4fae-7dec-11d0-a765-00a0c91e6bf6“}] “Writes”:[{"key" : “123", ”value" : { "asset_id":"123", "owner":"Jerry", "date":"9/6/2016"} }] Endorsement, Consensus, and Committer Validation as normal During final commit, a generator provided by the plugin is used to convert the Read set into a query for MVCC check, and the Write set into data manipulation language of the specific database update assets set owner=‘Jerry’, date=‘9/6/2016’, Version=‘0b1f4cc8-75d6-11e6-8b77-86f30ca893d3’ where asset_id = ‘123’ asset_id owner date Version (txId) 123 Jerry 9/6/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3

20 Querying for multiple keys
Example: Transfer all of Tom’s assets to Jerry Stub.GetStateMultipleKeysUsingQuery(“Select asset_id, owner, date, version from assets where owner = ? ”, “Tom”) asset_id owner date Version (txId) 123 Tom 9/5/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3 456 8/6/2016 2a1f4cc8-75d6-11e6-8b77-86f30ca893b2 789 8/22/2016 5d1f4cc8-75d6-11e6-8b77-86f30ca893e1 Solution needs to specify which result set columns to use for primary key and version (asset_id, version columns in above example) Then in chaincode, use SetStateMultipleKeys() as normal, to change the owner of all assets to ‘Jerry’ (within the respective JSON values)

21 Tables after asset transfer committed
ASSETS table (latest state, custom table) asset_id owner date Version (txId) 123 Jerry 9/6/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3 Asset table custom defined by chaincode developer. Asset history table custom defined by chaincode developer. (populated by db trigger on asset table?) Block and transaction tables delivered with database plug-in. Commit() will update asset state, history, and block tables in one atomic transaction. ASSET_HISTORY table (write once, custom table) asset_id owner date Version (txId) 123 Tom 9/5/2016 f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Jerry 9/6/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3 BLOCKS (write once) block hash prior_block_hash date block_blob 1 bbbbbbbb aaaaaaaa 9/5/2016 binary_data 2 cccccccccc 9/6/2016 BLOCK_TRANSACTIONS (write once) tx_id block block_ position function RWSet date f81d4fae-7dec-11d0-a765-00a0c91e6bf6 1 CreateAsset(‘123’,Tom’) Reads:[…] Writes:[…] 9/5/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3 2 UpdateAsset(‘123’,’Jerry’) 9/6/2016

22 Queries ASSETS table (latest state, custom table)
With asset state and history in database, we can now perform interesting state, point in time, and provenance queries. Show full history for asset 123: select * from ASSET_HISTORY where asset_id = ‘123’ order by date desc Show full history for Jerry: where owner = ‘Jerry’ Easily correlate assets, transactions, and blocks for other audit queries, e.g. Which transactions were asset 123 involved in? Which blocks contain those transactions? What assets and owners were involved in transaction 0b1f4cc8-75d6-11e6-8b77? Improve query performance with traditional table indexes. ASSETS table (latest state, custom table) asset_id owner date Version (txId) 123 Jerry 9/6/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3 ASSET_HISTORY table (write once, custom table) asset_id owner date Version (txId) 123 Tom 9/5/2016 f81d4fae-7dec-11d0-a765-00a0c91e6bf6 Jerry 9/6/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3 BLOCKS (write once) block hash prior_block_hash date block_blob 1 bbbbbbbb aaaaaaaa 9/5/2016 binary_data 2 cccccccccc 9/6/2016 BLOCK_TRANSACTIONS (write once) tx_id block block_ position function RWSet date f81d4fae-7dec-11d0-a765-00a0c91e6bf6 1 CreateAsset(‘123’,Tom’) Reads:[…] Writes:[…] 9/5/2016 0b1f4cc8-75d6-11e6-8b77-86f30ca893d3 2 UpdateAsset(‘123’,’Jerry’) 9/6/2016

23 Restrictions Cannot Insert/Update/Delete using database syntax
Must use normal PutState() methods in order to build ReadWriteSet required by v1 endorsement/simulation model Normal Read/Write transactions only. Cannot Read/Write/Read/Write the same key multiple times in one transaction (since the simulated updates are not in the database yet).

24 Alternate options with this approach
Instead of having write SQL generated on committer side based on JSON key value, potentially SQL could be an output of chaincode simulation in a new ‘ExecutionSet’, that is included in the endorsed action. Each of the committers would simply execute the SQL that comes out of simulation. It is assumed that BLOCK and BLOCK_TRANSACTION tables shadow the actual blockchain ledger for query purpose. But if the database provides blockchain attributes such as immutable tables and atomic updates, the primary blockchain ledger could be in the database, ensuring that the blockchain ledger and state database are always in sync

25 Other approaches to investigate
1) Explore data source(s) where it is possible to intercept at different stages of simulation/validation/committance, for example read the databases’s uncommitted tran log to achieve simulation. 2) Custom build a richer data model on top of existing key/value data store A JSON data model and a simple query language on JSON data model RWSet to be the delta change in JSON records Include primary key in every RWSet Use additional Column families for indexing secondary fields updateQuery examples { "operation":"insert”, "schemaName":"personalDetails”, "data":{ "name":{ "firstName":"abc”, "lastName":"xyz", ”age”:30 } DatamodelDef { "schemaName":"personalDetails” "schema":{ "name":{ "firstName":"string", "lastName":"string", } "address":"string", "age":"int" "PrimaryKey":{ "name.firstName, name.lastName” "indexOn":{"age"} updateQuery examples { "operation":"update”, "schemaName":"personalDetails", "data":{ "address":"New City” } "query": { "type":"PrimaryKey", "PK”:{ "name”:{ "firstName":"abc", "lastName":"xyz", searchQuery example { "type":"RangeQuery", "field":"age”, "range":[20, 40] }


Download ppt "HYPERLEDGER Fabric Pluggable/Queryable State Database"

Similar presentations


Ads by Google