Bitcoin Bitcoin is a cryptocurrency. The platform that hosts Bitcoin is a p2p system. Bitcoin can be abstracted as a digital file that records the account balance of each user (i.e. a ledger). The “file” is stored on all the machines in the Bitcoin network. Alice5 Bob3 Carole7 …… ledger
Bitcoin is not issued by any authority. People use Bitcoin because they think it has value. AliceBob
Transactions in the BTC network When a user transfer BTC to another user, the transfer is broadcast to all the nodes in the BTC network. The transfer record is known to all the nodes in the network. Each node of the BTC network knows all the transaction records of the BTC network. Each node of the BTC network knows the balance of each user of the BTC by looking through all the transaction records.
Distributed vs centralised In the traditional banking system, the bank keeps the ledger of all its customers. In the BTC network, each node has a copy of the ledger. For a bank, a customer only knows her own transactions In BTC, a transaction is known to everyone. For a bank, the customer can trust the bank. If something goes wrong, the customer can sue the bank. In the BTC network, the nodes do not trust each other. Cryptography is used to ensure that bad behaviour (e.g. stealing BTC) can be prevented or cannot affect the functioning of the BTC network. BTC lost due to software bugs cannot be recovered.
SHA-256 hash function Produce a 256-bit digest (hash). Partition the message into 512-bit blocks. Pad the message if required The initialization vector is a constant. Each 512-bit message block is processed in turn
message hash IV 256 bits 512 bits 256 bits 512 bits padding hash 256 bits … … hash 256 bits 512 bits 256 bits hash
hash pointer A hash pointer indicates the location where some information is stored. The value of a hash pointer is the hash of the information. Given a hash pointer, we can retrieve the information and check whether the information has been modified.
Block chain A block chain is a linked list with hash pointers. data H( ) data H( ) data H( ) … data H( )
Merkle tree (hash tree) A Merkle tree is a tree in which the value of a non-leaf node is the hash of the concatenation of the values of the node’s children Data can be stored in a hash tree to make retrieval more efficient compared with a linked list. H( ) data
public-key cryptography When encrypting information, we need to use one key to encrypt the information and one key to decrypt information. In public-key cryptography, different keys are used for encrypting and decrypting information. A pair of keys (public/private keys) are used. The information encrypted using one of the keys in the pair can only be decrypted using the other key in the pair. A user generates a pair of public/private keys. The user keep the private key as a secret while making the public key available to the public.
Public-key cryptography is also used in authentication. The possession of the private key is used as a proof that the entity is the subject associated with the public key that corresponds to the private key. If Bob wants to make sure that he is talking to Alice, Bob can ask Alice to encrypt a string with Alice’s private key. If Bob can decrypt the encrypted message correctly with Alice’s public key, Bob regards the person that encrypted message is Alice. You must keep your private key secret.
Digital Signature The digital signature of a data item is generated by compute the hash of the data item encrypt (i.e. sign) the hash with the private key of the owner of the data item A digital signature can be verified by compute the hash of the data item, H(data) decrypt the digital signature with the public key of the owner of the data item to obtain the hash calculated by the owner H’(data) check whether H(data) = H’(data)
Transactions in BTC Each user has at least one public/private key pair. The public key is used as the address for receiving BTC payment. When Alice sends money to Bob, Alice creates a transaction to record this transfer of money. The transaction has Bob’s public key to indicate that Bob receives the payment. The transaction is signed by Alice using Alice’s private key to ensure that Alice is the sender. Encrypt the hash of the transaction. Alice broadcasts the transaction to all the nodes in the BTC network.
The transaction have one or several inputs. Each input specifies the amount of BTC that Alice has gained from other transactions (i.e. Alice actually has the BTC to spend). When a BTC network node receives the transaction, the node needs to first verify whether the inputs are valid (i.e. whether Alice indeed has gained these BTCs in previous transactions). The node needs to verify the signature of Alice attached to the transaction to ensure that the transaction is valid. How? This is to prevent the case that Eva uses the transaction that credit Alice as the input of her transaction to give Alice’s money to Eva herself. The node also needs to make sure that Alice has not spent the BTC that she gained in the input transactions (i.e. no double spending).
Transaction input … Alice Bob Transaction Eva Alice Transaction Allen Alice It can be seen that you can trace the origin of the fund involved in all the transactions in the BTC network.
Transaction Alice Bob Transaction James Alice Transaction Amy Alice Transaction Bob Amy Transaction Steve Amy Transaction Howard James
Transaction Alice Bob Transaction James Alice Transaction Amy Alice Transaction Bob Amy Transaction Steve Amy Transaction Howard James
Transaction Alice Bob Transaction James Alice Transaction Amy Alice Transaction Bob Amy Transaction Steve Amy Transaction Howard James
When a machine joins the BTC network as a “permanent” node, the machine obtains a copy of all the transactions that have happened since the BTC network came into existence. The machine needs to verify the validity of all the transactions. In the BTC network, the machines do not trust each other. The validation can take several tens of hours. However, it only needs to be done once by a node.
A transaction has one or several outputs. Each output specifies the amount to be paid to an address (i.e. the public key of the receiver). Alice’s address can appear in the output. For example, if Alice is buying a product that costs 1 BTC from Bob and the input is 3 BTC, apart from paying 1 BTC to Bob, Alice pays herself 2 BTC as the change.
To figure out the balance of your BTC account, you need to scan though all the transactions that pay you and you have not used these transactions as the inputs of other transactions. So, BTC is not a file recording the balances of all the users. It is a log containing all the transactions in the BTC network.
Anonymity Your account in BTC network is represented as a public key. You can create any number of public keys to represent your account. Users can generate public keys themselves. A public key is 520 bits. So, the chance that two users generate the same public key is negligible. If you use Tor to access the BTC network, your identity can be hidden. However, if you are not careful when making transactions, people can link several public keys together. That is, the public keys all represent the same user. You use those transactions that are paid to several public keys as the inputs of one transaction.
Double spending Each transaction can only be used once as an input of another transaction. Each node needs to check whether a transaction has been used when the transaction is used as the input. Transaction Steve Alice Transaction Alice Bob Transaction Alice Tom Invalid transaction
Transactions are propagated in the BTC network. As different transactions might be propagated along different routes, BTC nodes might have different perceptions on which transaction has been spent. T1: Eva uses a transaction to pay Bob. T2: Eva uses the same transaction to pay herself. If T1 and T2 are propagated along different routes in the BTC network, different nodes in the BTC network would have different perception on whether T1 happens before T2 or vice versa. As a result, some nodes would think T1 is invalid while others would think that T2 is invalid. To prevent the above double spending scenario, BTC nodes need to agree on the order of the events (i.e. whether T1 or T2 happens first). Paxos? BTC uses a cryptography-based technique to achieve consensus among a set of nodes that do not trust each other
BTC mining Transactions are grouped together and placed in a block. The transactions in a block are regarded as happening at the same time. The blocks are ordered to form a block chain. Transactions in a block at the back of the chain are regarded as happen after the transactions in the blocks at the front of the chain. The pointers between the blocks are hash pointers Block tx 123 tx 234 … tx 345 Block tx 456 tx 567 … tx 678 Block tx 789 tx 89A … tx ABC Block tx BCD tx CDE … tx DEF …
Each BTC node constructs its own block of transactions (in theory) and place the transactions that it has heard of in the block. The BTC node needs to carry out various checks on the transactions (as described earlier) before adding the transactions to it block. Due to the propagation delay of the transactions, BTC nodes might come up with blocks that contain different (or partially different) set of transactions. In order to add a block into the block chain, a node needs to solve a cryptography puzzle. The puzzle is hard to solve in terms of the computation required to solve the puzzle. The solution is easy to check.
As the computation for solving the puzzle is expensive, the BTC node needs to be given incentive for creating the block and maintain the block chain (i.e. carrying out various checks to confirms that the transactions to be included in the block are valid). The nodes are rewarded with BTC for creating a block AND solving the puzzle that allows the block to be added to the block chain. The current reward is 25 BTC for one valid block.
Coinbase transaction A coinbase transaction is the transaction that specifies that the node creating a block is being awarded 25 BTC. A coinbase transaction does not have any input and it does not need to be signed. A coinbase transaction has a coinbase parameter that can have any value. The coinbase parameter can be adjusted by a node when the node tries to solve the cryptographic puzzle.
The data structure of a block Hash of a block = hash(hash of the previous block | hash of transaction tree | nonce) Nonce is a 32-bit value Pre: H( ) tx_tree root: H ( ) Nonce: Hash: H( ) txPay Alice 25 BTC Coinbase: xxx… Pre: H( ) tx_tree root: H ( ) Nonce: Hash: tx
The hash of a block is calculated using SHA-256 that generated a 256-bit number. The puzzle is to find a nonce that makes the hash of a block to less than a given value. The given value is chosen by the BTC nodes to control the difficulty of the puzzle to ensure that on average a valid block can be found (i.e. a node can solve the puzzle by finding the right nonce) every 10 minutes. If the hash function is secure, the only way to succeed in solving the puzzle is to try enough nonce. Hash space Target space
Bitcoin mining farm in China
Once a node finds a nonce that meets the requirement, the node publishes the block. Other nodes can check the validity of the node by computing the harsh of the block with the nonce in the published block. A valid block is added to the chain.
There is a little probability that more than one node find the nonces that meets the requirement at almost the same time. Each node adds the first block that it hears to the block chain. As a result, there are might be several branches in the block chain. The tie is broken when the next valid block is found as all nodes would try to add their block to the longest chain in the BTC network. As a result, the creator of the losing branch has to give up its block (i.e. the creator would not get any reward) and try to create another block. The transactions in the losing block will be regarded as unconfirmed transactions, and they need to be included in other valid blocks.
The amount of computation required for solving a puzzle makes it rare for nodes to find the solutions of the puzzle at the same time. It is even rarer for this to happen several times in a row. Therefore, the branches in the block chain will disappear after a few blocks. That is, a consensus will be reached among the nodes on which blocks should be in the block chain.
As a block in a block chain might later be discarded and the transactions in the discarded blocks become unconfirmed transactions, transactions at the end of the block chain are regarded as “risky”. Normally, people would wait for six more blocks to be added to the back of a block before the transactions in the block are regarded as confirmed. At the moment, it takes about 10 minutes for a block to be added to the block chain. So, to confirm a transaction, it takes about one hour. This is much slower than a credit card transaction.
As branches might occur in the block chain, an attacker might explore this feature to launch a double spending attack. Eva sends money to Bob. Then, Eva sends money to herself, computes a block that includes the transaction that pays herself, and add the block to the block chain before the Eva Bob transaction is added to the block chain. This would make Eva Bob transaction an invalid transaction. This attack can only succeed if Eva can make sure that her branch is the longest chain. Eva has to produce more than one valid block to add to her branch in order to make her branch being the longest in the block chain. Eva can only do this with over 50% probability to succeed if she possesses more than 50% of the computing power of the BTC network.
Mining pools It is almost impossible for an individual miner to compute the nonce for creating a valid block. Miners are pooling their computing resources together to form mining pools to solve the puzzle together. The largest pool Ghash.io has over 50% computing power of the BTC network at one stage. To prevent the pool’s resource is misused, the pool has voluntarily restricted its own size.
Summary Protection against invalid transaction is based on cryptographic functions. Protection against double-spending is by consensus. You are never 100% sure that a transaction is in the consensus branch. Guarantee is probabilistic.
References Nakamoto, Satoshi (October 2008). "Bitcoin: A Peer-to-Peer Electronic Cash System“, blockchain.info
Reviews How does BTC ensures that a user’s BTC is not spent by other people? Why is it important for the nodes in the BTC network to agree on the order in which transactions occur? What is the difference between a transaction chain and a block chain? How do the nodes of the BTC network reach consensus on the which blocks are in the block chain? How does the security of the block chain, the number of the diversity of the miners and the value of the bitcoin influence each other?
Reviews For an attacker with more than 50% of the computing power of the BTC network, is it possible for the attacker to steal bitcoins from existing addresses supress transactions from the block chain or from the BTC’s p2p network change block reward destroy confidence in bitcoin launch double spend attack How is a transaction being handled by the BTC network? How is the validity of a transaction being checked? Why is it hard to solve the puzzle for creating a valid block?