Now we have discussed the permission model of blockchain. And we have looked into the details of a bitcoin network as an example of permission blockchain. And, we have seen different aspects of a bitcoin network where users can join in the network and start doing the transaction.
So, in this particular discussion we will discuss one interesting property of a blockchain network or rather a set of algorithms, which help in achieving consensus in a distributed or decentralized network.
we look into the details different methods of consensus and how they are applicable for a general blockchain environment and with a special case of the bitcoin network.
- A procedure to reach a common agreement in a distributed or decentralized multi-agent platform
- Important for a message-passing system
So, this concept of consensus is an interesting topic in a decentralized or distributed network. So, consensus means it is a procedure to reach a common agreement in a decentralized or distributed platform. So, here is an example of a typical example of a consensus.
The mechanism you can assume, that in an army there are 4 generals and generals are taking some kind of decisions. And when the generals are taking the decisions in a decentralized way; that means, they have their own policies to take the decision they can either make an attack or can retreat from the attack.
Now, in a consensus algorithm, these generals individually express their opinion on their individual opinion and from their individual opinion by applying some kind of choice function, which can be the majority decision in this particular case.
So, by applying such kind of choice function the environment or the system finally, decides what to do next. So, here in this particular An example that 3 generals they are making a choice towards attack. with the majority, the principle the system can come to a consensus that they should make an attack collectively.
Now, this kind of consensus algorithm is important for a message-passing environment in a distributed system.
- Reliability and fault tolerance in a distributed system
- Ensure correct operations in the presence of faulty individuals
- Commit a transaction in a database
- State machine replication
- Clock synchronization
Let us first look into why we require consensus. So, in a traditional or conventional distributed system we apply consensus to ensure reliability and fault-tolerance.
So, by mean reliability and fault-tolerance, it is like that in a decentralized environment when you have multiple individuals parties and they can take their own decision, then it may happen that some nodes are, some parties are, some individuals are working as maliciously or they are working as a faulty individual.
in those particular cases, it is important to come to a common decision or a common viewpoint. So, having a common viewpoint in an environment where people can behave maliciously or people can crash or work in a faulty way is being a difficult thing.
under this kind of distributed environment, our objective is to ensure reliability; that means, to ensure correct operations in the presence of faulty individuals.
So, there are multiple examples for a consensus in a distributed system if you think of a distributed database, where the transactions are going on from multiple points of sales say; from multiple ATMs, multiple banking sectors the transactions are coming in during that time consensuses as important aspects say, for example, you want to transact some money from one branch to another bank branch another account in another bank branch.
during that time you need to come to a consensus that all the bank branches need to decide that well this transaction is a valid transaction and after deciding that this transaction is a valid transaction they should commit the transaction, then another An example of consensus is state machines in replication.
State machine replication is an important aspect of any distributed protocol. So, say for example, if you want to run some kind of distributed protocol over a network. Every individual node runs the current protocol.
The current version of the protocol and this stored the state of the protocol in a different state machine. So, the entire execution part of the protocol can be represented as a state Machine. Now, these state machines need to be replicated into multiple nodes.
Every individual node can reach a common point or common output of that protocol. So, the state machine replication is an example of consensus, then another example of consensus is the clock synchronization or distributed clock synchronization.
Say you have multiple clocks in your network and every individual node tries to find out that, which is the most updated clock or which is the most current clock.
they should make a consensus among themselves and come to a current come to a single clock and by applying this kind of clock synchronous asynchronous clock architecture across the network, they can do further operations. So, these are the typical examples of consensus in a traditional distributed system environment
WHY CONSENSUS CAN BE DIFFICULT IN CERTAIN SCENARIOS
- Consider a message passing system, and a general behaves maliciously
So, let us look into that while achieving consensus can be difficult in a certain scenario or typical in a message-passing system.
So, here is the example that we have considered earlier, so we are considering that example again that we have now multiple generals.Now these generals are utilizing a message passing environment to communicate their viewpoint to others.
Now, you can just think that every individual general is making a telephone call to another general and then communicating their viewpoint to others. Now, let us look into this diagram.
First general(anti clockwise situation like G, G1, G3, G2 ) sends its information to neighboring generals 1 and 2. Now whenever the general is sending the information the general makes a telephone call to the other 2 generals say G 1 and G 2, and a and inform his viewpoint that they should now attack. Similarly this general says the general G 3 he again makes a phone call to G 2 and G 1 and sends his decision that the soldiers should now make an attack.
But, this particular general says G 1 this is a malicious general and this general makes a malicious call like he makes a call to 1 general say G 4 and asks for retreat. Whereas, whenever he is making a call to general G 3 he is saying as an attack.
Now in a complete distributed or a decentralized environment the generals may get confused say, for example, whenever these general is informing G 1 is informing G 4 to retreat G 4 can get are confused that whether or what he should do?
That way in a decentralized or a distributed platform achieving consensus over a message-passing system can be difficult, when you have this kind of malicious node or you have nodes which start working maliciously. So, we have a technical term for this kind of node we call them as the byzantine time nodes. We call this kind of failure as byzantine time failure.
So, later on, we will discuss this kind of byzantine failures in detail, but in brief that in the presence of this kind of byzantine failures the system can behave maliciously or it may be difficult to achieve a consensus in a distributed environment.
- If there is no failure, it is easy and trivial to reach in a consensus
- Broadcast the personal choice to all
- Apply a choice function, say the maximum of all the values
but as we know that if there is no failure in the system, then it may be easy and trivial to reach a consensus. So, in a genetic the algorithm can be something like this like you broadcast the personal choice to all, and then you apply your choice function say in this case, if your choice is the maximum of all the receive values, then you can achieve consensus.
So, here are 4 nodes individual nodes make a choice of 10 20 30 or 40 and they informed they are individual choices to all other nodes in the network, and whenever every node receives all the choices from all the neighbors they can apply on a max function to find out that what is the maximum? So, you can easily see that everyone will reach the value 40, if they apply the maximum function of all the received values.
whenever we are saying that achieving consensus in this particular architecture is easy and straightforward it is under certain scenarios.this scenario search first of all The system needs to be faultless there should not be any failure in the system.
So, every individual node can receive the message correctly and, but the second requirement is that the system should behave in a synchronous way.
what is meant by the synchronous way we will look into the formal definition of a synchronous message passing system, but by definition synchronous message passing system is something, where it is expected that you will receive all the messages within some predefined time interval.
So, here in this particular architecture every individual node is expected to receive all the messages from the peers in a nonfailure environment, within a predefined timeout and every node can wait for that much of time and in a synchronous system it is expected that everyone will receive the message.
Once they receive the message they can find out the maximum of all the received values from different messages and take that as a value for the consensus.
There can be various types of faults in a distributed system.
- Crash Fault: A node suddenly crashes or becomes unavailable in the middle of a communication
- Network or Partitioned faults: A network fault occurs (say the link failure) and the network gets partitioned
- Byzantine Fault: A node starts behaving maliciously
However, achieving consensus can be non-trivial in case of a distributed environment due to the presence of multiple types of failures. Now typically in a distributed system we consider 3 different types of failures the first one is called the crash fault.
A crash fault is something like a node suddenly crashing. the node suddenly crashes or the nodes become unavailable in the middle of communication. So, you are not expected to receive any message from that particular node.
This is one type of typical fault that can be a kind of hardware fault or a software fault due to which the node or the process, which is communicating with another one that particular process, fails.
The second type of fault is network fault or the partition fault a network fault is something when a network link fails. a network failure may result in a partition in the network.
we can see an example of that. So, say assume that there are multiple nodes in the network and these nodes are interconnected to each other.
We have multiple nodes in the network and these individual nodes are interconnected now. So, we have a few other nodes that are interconnected with each other. Now in particular node 1 if this node fails then the if link(between 1 and 9) fails then the entire network is partitioned into 2 different parts.
We have one partition left and we have a second partition on the right side. So, the entire network gets partition and you are not expected to receive any message from any node of the left side partition to any node of right side partition or the vice versa.So, this message communication will not happen, because you had a bottleneck link and that particular bottleneck link has failed.
Under this kind of network failure, because this kind of partition can happen in the network we say that it is a type of a partition fault we used the term partition fault to denote this kind of network failure.
This kind of network failure can hamper reaching the consensus. So, if a network failure makes a partition of the network the nodes in one partition they become unavailable to the nodes in another partition. So, all the nodes cannot communicate with each other and come to a general consensus.
These are the 2 typical types of faults which are coming from either the hardware failure or software failure, but a third type of faults that are the byzantine faults is the kind of more difficult fault to handle in a distributed environment. So, the byzantine fault is just like the example that I have shown you earlier here: the node starts behaving maliciously.
Now, whenever a node starts behaving maliciously you do not know what would be the action for that node.
The node can send a positive vote or sometimes the node can send the negative vote. In the case of say the crash fault or in case of a network or a partition fault, it is still expected or you can find out or you can make an expectation that what is going to be the effect of a network fault or a crash fault, but the effect of a byzantine fault is difficult to guess, because it completely depends on how maliciously the node is behaving and what the node is doing.
Sometimes the node can give a vote against a consensus or sometimes the node can give a vote in front of the consensus.
So, that is why handling byzantine nodes become difficult in a typical distributed system, but while we are dealing with different kind of consensus protocols you have to deal with these 3 different types of faults
DISTRIBUTED CONSENSUS – PROPERTIES
- Termination :Every correct individual decides some value at the end of the consensus protocol
- Validity: if all the individuals proposes the same value, then all correct individuals decide on that value
- Integrity: Every correct individual decides at most one value, and the decided value must be proposed by some individuals
- Agreement: every correct individual must agree on the same value
Now, whenever we are talking about distributed consensus protocols we have to need to satisfy certain properties of the protocols. So, the first property is called termination.
So, the termination properties say that a every correct individual decides and some value at the end of the consensus protocol that means, whoever be the corrector a non-faulty node in the network the non-faulty the node must terminate the protocol, and decide on one value, and that value should be correct value.
Then the second property is the validity property says that if all the individuals propose the same value, then all correct individuals decide on that value.
That is the basic idea of this validity property which says that if all the individuals in the node they propose on the same value like the example that we have shown earlier, that if all the individual propose a value 10 then every correct node should come to a consensus with value 10 they should not deviate from that particular value. So, that is the validity property of consensus protocol the third properties integrity.
The integrity property says that every correct individual decides at most one value. And the decided value must be proposed by some individuals.
The integrity property ensures that the consensus value should not deviate from the values which are proposed by individuals in the network. So, you should not get a value of 20 in the consensus, if none of the nodes in the network proposes a value of 20.
So, this is the integrity of the third property of a consensus protocol, the fourth property of the consensus, the protocol is agreement, the agreement properties say that every correct Individual must agree on the same value.
That means, and that is the most important property of a consensus, protocol that all the individuals in the network must agree on the same value. So, whenever they agree on the same value after termination, we call that the system has reached the consensus.
SYNCHRONOUS VS ASYNCHRONOUS SYSTEM
- Synchronous Message passing System: the message must be received within a predefined time interval
- Strong guarantee on message transmission delay
- Asynchronous Message Passing System: there is no upper bound on the message transmission delay or the message reception time
- No timing constraint, message can be delayed for arbitrary period of times
While dealing with consensus with we consider 2 different types of message-passing systems; one type of message passing the system we call as the synchronous message passing system and the second one is the asynchronous message-passing system.
So, in case of synchronous message passing a system so, the message must be received within a predefined time interval. So, we have a kind of strong guarantee on the message passing delay and you know a priori that what can be the maximum delay of message passing for this particular network.
Now, this kind of synchronicity give you simplification in designing the protocol, in the sense that you can think of that you will wait for a certain duration that duration will be the maximum lifetime or maximum bound on the message delay and if you wait for that amount of duration, it is guaranteed that you will receive all the messages within that time.
So, this is the synchronous message passing system, but in case of asynchronous message passing systems on the contrary we do not have any upper bound on the message transmission delay or the message reception time.
You do not have any such constraint message can be arbitrarily delayed or the message delivery time can be arbitrarily long. So, in case of an asynchronous message passing system you cannot expect that if you wait for a finite duration you will receive all the messages with some certain probability or with some guaranteed probability.
Now, designing a distributed system in a synchronous environment is much easier because you have that particular strong assumption on the message passing delay. And that is why if you wait for that amount of duration, it is expected that you will receive all the messages, but in case of an asynchronous message passing system because you do not have that much delay.
You have to also deal with the kind of fault that may come due to the asynchronous nature of the system.
That means, it may happen that you are waiting for a certain duration and you may not receive any vote for some of the neighbors, because the message which is coming from those nodes in the network have observed some kind of unbounded or some time of very long delay.
If you wait for a certain amount you may not receive the messages from all the neighbors. So, having consensus under this kind of environment is much more difficult compared to designing a consensus protocol for synchronous distributed systems.
FLP85 (Impossibility Result): in a purely asynchronous distributed system, the consensus problem is impossible (with a deterministic solution) to solve if in the presence of a single crash failure.
- Results by Fischer, Lynch and Patterson (most influential paper awarded in ACM PODC 2001)
- Randomized algorithms may exist
Now, there is an interesting result we called this result as FLP 85 or sometimes people call it an impossibility result.
The impossibility result states that in a purely asynchronous distributed system, the consensus problem is impossible with a deterministic solution, if there is a single even a single crash failure in the system.
So, in case of a purely distributed environment or purely asynchronous distributed environment, if there is a single fault in the system you cannot design any kind of consensus protocol or better to say any kind of deterministic.
A consensus protocol for that particular the system, but in this particular impossibility theorem node that terms deterministic consensus protocol, it is true that will not be able to design any kind of deterministic consensus protocol or we cannot have any kind of deterministic the solution, but we can always design some kind of randomizing solution or some kind of probabilistic solution for purely asynchronous
distributed system in the presence of failures But, this particular environment gives on bound on what type of consensus protocol you can design for a distributed environment.
And this is the foundational paper foundational work for a distributed system, which was initially proposed by Fischer Lynch and Patterson in 1985 fan in one of the top Asian Conference and Distributed System ACM Policy 2001 this paper got the most influential paper award.
I suggest all of you look into the formal proof for the impossibility result. We are not going into that details if you are interested. You can look into this paper and look into the formal proof that why it is impossible to achieve a consensus in a purely asynchronous distributed system in the presence of failure.
But, ideally, this gives us an idea about what type of system you can design a consensus protocol.
- Various consensus algorithms has been explored by the distributed system community
- Byzantine fault tolerance(BFT)
Now whenever talk about synchronous the consensus they exist the various time of type of consensus algorithms, that people have a design in a traditional distributed system environment; like the Paxos raft byzantine fault-tolerance BFT type of consensus algorithms people have tried to implement many of this type of consensus algorithm for asynchronous the system as well by apply probabilistic nature inside it.
So, we look into all those different types of consensus algorithms later, but ideally this gives us an algorithm that what type of consensus you can design for a distributed algorithm.
CORRECTNESS OF A DISTRIBUTED CONSENSUS PROTOCOL
- Safety: correct individuals must not agree on an incorrect value
- Nothing bad happened
- Liveliness (or Liveness): Every correct value must be accepted eventually
- Something good eventually happens
Now, the correctness of a distributed consensus is the algorithm that can be characterized by these 2 properties: safety and liveness. The safety properties say that the correct individual must not agree on an incorrect value. That means nothing bad has happened in the System.
The safety property ensures that you will never convert to an incorrect value or the correct individuals in the network, they will never convert to an incorrect value. And the liveliness property or in some books are different manuals. They like it as a liveness property. The liveness property states that every correct value must be accepted eventually; that means, something good will eventually happen.if you have proposed some good values.
that good value will be committed eventually, although there can be some time lag or there can be some delay in reaching into the consensus, but after the consensus protocol terminates you will be expected to get a consensus value out of that.
So, these are the 2 correctness properties for a distributed consensus that we need to ensure whenever we design a distributed consensus algorithm.
CONSENSUS IN AN OPEN SYSTEM
- The tradition distributed consensus protocols are based on
- Message passing (when individuals are connected over the internet)
- Shared memory (when a common memory place is available to read and write the shared variables that everyone can access)
- Message passing requires a closed environment – everyone need to know the identity of others
Now, let us look into the consensus protocols from the perspective of a blockchain environment.
So, the type of blockchain environment like the bitcoin that we have discussed till now, their kind of open environment.
Open environment in the sense like any node can join the network any type, but a type of distributed systems that we are considering till now, like the synchronous distributed system and the type of consensus algorithms, that have been explored for a distributed system the traditional distributed system that kind of consensus algorithm realize in a message-passing environment.
A message-passing environment means the every individual transfer some kind of messages to others and then for waiting for some finite amount of time they receive all the information from the peers, and after receiving all the information, from the peers whatever information they have received till now they process that information, and based on that information they reached to a or they design a consensus the parameter to come into the consensus.
But, because this kind of traditional distributed consensus algorithm like raft Paoxs or Byzantine fault-tolerant algorithm, they rely on this kind of message passing environment, they are designed particularly for a close Environments.
In a message-passing environment you need to know which node you want to transmit a message. Now, the moment you have this constant like the moment you need to know which node you want to send a message to, your node needs to know the identity of all those nodes.
That way this kind of distributed message passing algorithms for having a consensus was mostly design for a closed environment, but for blockchain type of environment in bitcoin applications, you are mostly talking about open environment or earlier you have used the term as a permissionless environment, where any node can join in the network at any time.
Now, under this kind of environment having a consensus protocol based on the message passing argument is difficult because you do not know who are the nodes in your periphery to whom you want to send a message
- Shared memory is not suitable for internet grade computing
- Where do you put the shared memory?
- Bitcoin is an open environment
- Anyone can join in the bitcoin network anytime
- How do you ensure consensus in such an open system?- a key challenge
Now, in case of a consensus in an open system we have to a broad type of algorithms the shared memory is one architect that people have exploded, but shared memory architecture it is not suitable for internet grid computing, because you need to put a memory which should be readable and writable by every individual nodes in the network.
In general, the shared memory algorithms we apply whenever we try to reach consensus among multiple distributed the process inside a system.
But, we apply the message passing in the other case, but as we have looked into that message passing is not feasible in an open environment.
Because in an open environment like bitcoin anyone can join the bitcoin network any time and under this kind of open environment the challenges of how will you ensure consensus under this particular scenario.
WHY DO WE REQUIRE CONSENSUS IN BITCOIN NETWORK
- Bitcoin is a peer-to-peer network
- Alice broadcast a transaction in this peer-to-peer network
- All the nodes in this network need to agree on the correctness of this transaction
So, let us look into how to ensure consensus and even before going to that, let us look into why we require the consensus in a bitcoin network, whenever we are utilizing blockchain as the backbone of a bitcoin.
As we have looked into bitcoin as a peer to peer network and in this peer to peer network say let us take an example, that Alice broadcast her transactions and Alice made a transaction to bob and see broadcast this transaction in this peer to peer network. Now, remember that this kind of broadcast is different from traditional message passing.
The system, for a traditional message passing the system you need to know which are the peer nodes to whom you need to send a message, but this kind of broadcasting works like flooding.
It works like flooding in the sense like you send the message to the subset of the people whom you know, then those nodes will further send a message to another subset of the people, those people will again send it to another subset of the people this way by flooding the message in the network the the message will get propagated.
The architecture of this flooding environment is different from a traditional message passing environment, where you need to know who are the neighbors to whom you need to send a message? Now, flooding is more costly compared to a normal message passing environment that is why we restrict the flooding to certain applications.
Now, in the case of a bitcoin peer to peer network whenever Alice is proposing a transaction Alice broadcast the transaction in the peer to peer networks. Now here we require the consensus, because all the nodes in this network need to agree on the correctness of these transactions.
How will you ensure that the transaction is actually coming from Alice and not from an attacker? The attacker may also impersonate Alice and can send a transaction in the network in the name of Alice. So, the network needs to find an outcome to a consensus that this particular transaction is coming from Alice and not from an attacker
- A node does not know all the peer in the network – this is an open network
- Some nodes can also initiate malicious transaction
As I have mentioned that in this particular peer to peer network architecture node does not know all the pairs in the network, because this is an open environment.in under that open environment, some nodes can also initiate malicious transactions and you need to prevent the system from having this kind of malicious transactions.
CONSENSUS IN A BITCOIN NETWORK
- Every node has block of transaction that has already reached into the consensus (block of committed transaction)
- The node also has list of outstanding transaction that need to be validated against the block of committed transaction
Now, in the case of the bitcoin network every node has a block of transactions that has already reached into the consensus. So, that is the blocks of committed transactions, which is stored in the form of a blockchain. And every node also has a list of outstanding transactions that need to be validated against the existing transactions in the block; that means the existing blockchain.
So, that is an example. you already have a block of transactions
These are the individual blocks of transactions block 23 22 24, inside every block the architecture we have looked earlier. So, you have a Merkle root under that all the transactions are organized in the Merkle tree. Now, every node has this blockchain and they have a set of new transactions.
So, these are a set of new transactions.
Now, these new transactions need to be committed to the block then need to be added to the blockchain. So, to commit these existing transactions the network needs to validate that these transactions are coming from the authenticated person and not from an attacker. And for this we require the consensus protocol in a bitcoin Network.
CONSENSUS IN BITCOIN
So, whenever we talk about the consensus in a bitcoin network. So, this is an interesting fact that why we are applying something called a blockchain. So, you can always have a per transaction consensus. So, you take every individual transaction and then validate that transaction.
But, if you do for per transaction consensus it is inefficient because you have to run the transaction algorithm for every individual transaction. So, rather than running the transaction running the consensus algorithm for every individual transaction, you run the algorithm on a block of transactions.
So, that way you can make the consensus algorithm much more efficient. You do not need to run the consensus for every individual transaction rather you are running for a block of transaction.
Here comes the concept of a blockchain that, we are representing the transactions in the form of blocks or and we are not representing it in the form of a transactions chain rather we are representing it in the form of a blockchain for an efficient implementation of a consensus.
That is the objective of the Bitcoin consensus algorithm. earlier we have looked into the concept of miners who actually participate in the consensus algorithm.
Here we have 3 individuals or 3 miners M 1 M 2 and M 3. So, everyone has one blockchain in hand now everyone has constructed or observed these transactions from the clients, now they construct a new block of transactions.
M 1 has constructed a new block of transactions from the transactions that he or she have heard of, M 2 has constructed another block of transactions, M 3 has constructed the third block of transactions.
And interestingly it is not necessary that every miner will hear the same set of transactions, there can be always the difference between the set of transactions that would be heard by the individual miners and that is why the proposed blocks are different for different miners.
The bitcoin consensus algorithm the objective is that among these 3 blocks which block I will add to this existing blockchain. This is the problem of the Bitcoin consensus Algorithm.
In the next discussion, we will look into that how the bitcoin proof up for work The algorithm actually achieves the consensus under an open environment.
READ MORE ARTICLE: R3 CORDA
follow us on facebook