Byzantine Broadcast in Dolev-Strong Protocol
In this article, we study a classic result from 1983 by D.Dolev and H.Strong on reaching an agreement in a distributed system between a set of nodes in the presence of faulty (Byzantine) nodes .
Motivation: Understanding the Byzantine General Problem and Dolev-Strong Protocol is the gateway to getting a feel for how consensus protocol works, including more complicated models used in the blockchain space and distributed systems. So in the spirit of building up your muscles, understanding this protocol will make life easier for you.
1. The Byzantine General Problem
The Problem
How to make absolutely sure that multiple entities, which are separated by distance, are in absolute full agreement before an action is taken?
In other words, how can individual parties find a way to guarantee full consensus?
Example
Imagine you're a General in a Byzantine army and you're planning to attack an enemy city.
You have the city surrounded by several other generals, each of them separated several miles from the other and you must decide as a group whether to attack or retreat. Some generals may prefer to attack while others prefer to retreat. The important thing is that all generals agree on a common decision.
A coordinated attack will lead to successful victory. An uncoordinated attack will lead to defeat.
Approach
You have decided to attack the city but you have no means of communication to tell other generals your decision. How do you make sure then, that other generals reach a consensus and all attack together at the same time?
You could send messengers on horseback but you need also to have a reply from each of your generals confirming that they have received your message, which means they have to send messenger again to you on horseback.
Questions
- What if other generals are Traitors, and they send back to you a message that they're willing to attack while their intention is to Retreat?
- How do other generals know that the messages that they received from you are genuine and haven't been intercepted or altered by enemies? (Solved by digital signatures)
- What if a messenger is captured by the enemy and an imposter messenger with a fake reply is sent back to you?
- What if one of your messengers is killed by other armies before or after delivering the message?
2. Goals And Assumptions
Goals
- Consistency (Agreement): All nodes agree on the same value .
- Liveness (Validity): Every submitted value to at least one node is eventually added to all nodes' history.
Assumptions
- Permissioned Settings: We assume we have a priori known set of nodes before the start of the protocol.
- Public Key Infrastructure: Each node has a distinct public-private key pair ( / ). is known to all nodes at the start of the protocol.
- Synchronous: Every message sent by one node at time arrives to recipient by time step . All nodes share a global clock.
- All honest nodes: We will assume at the beginning that we don't have Byzantine or faulty nodes (we'll relax this assumption later).
Plan
Solve the problem under these assumptions and then relax the last assumption of having all honest nodes.
3. Solving Via Round-Robin Leaders
A Lazy Protocol
If every value submitted by a node always arrives at exactly the same time at every node, then the protocol is valid. But if a node only submits a transaction to a subset of the nodes, or if the network delays because values arrive at different nodes in different time steps, then consistency will generally be violated.
Coordination via Rotating Leaders
The protocol above highlights the need for nodes to coordinate. A solution is to use a rotating leader for each step . At time step , Node 1 sends value to both nodes . At time step , Node 2 will be the new sender and sends value to the nodes it's connected to.
We keep repeatedly iterating through the nodes in a round-robin order. In the last time step, all the nodes will have the same value , with both consistency and validity achieved.
Under the 4 previous assumptions (Permissioned, PKI, Synchronous, and All honest nodes) Rotating leader seems to be a good protocol to achieve consistency and validity.
4. Faulty/Byzantine Nodes
A node that is not honest is called a faulty node.
Types of faulty nodes
Failure of Rotating Leader Protocol
If we relax the last assumption and consider a system of nodes with faulty nodes, our previous protocol of rotating leaders will fail. Byzantine nodes can ignore your protocol and do whatever they want, so at a time step where a Byzantine node is the leader, it may send trickier values to other nodes.
5. Byzantine Broadcast Problem
Cross-Checking
Our simple rotating leaders' protocol achieves consistency and liveness when but not when . The idea: stick with the rotating leader approach, but instead of letting nodes blindly accept the new value received from the leader, add a cross-checking functionality.
Byzantine Broadcast + Rotating Leaders = Fault-tolerant System
Goals of Byzantine Broadcast
- Termination: Each honest node eventually halts with some output .
- Agreement: All honest nodes choose the same value (whether or not the sender is honest).
- Validity: If the sender is honest, all honest nodes equal .
Implementation
We already know how to solve the Byzantine Broadcast problem in the case of . The sender sends a private value and eventually all nodes output whatever they heard from the sender.
This doesn't work in case because a Byzantine sender can send different values to different nodes leading to inconsistency. To solve this we add cross-checking: when a sender sends a private input , all honest non-senders compare their values to check if they received the same value. The tricky part is that there may be a Byzantine non-sender who lies during the cross-checking phase.
Using the PKI assumption, we can implement the cross-checking functionality with digital signatures.
6. Testing the Protocol
The case
Let's test our protocol with nodes, one of them is faulty ():
- Agreement: We only care about the case when the sender is Byzantine. Because , every other non-sender is honest. In the first time step, the Byzantine sender sends different values. In the second time step, every honest non-sender echoes the value received. In the third time step, all honest non-senders have the same information and carry out the same majority computation.
- Validity: We only care about the case when the sender is honest. The sender sends with its digital signature. Each honest non-sender receives one vote for signed by the honest sender, then receives at least one more vote echoed by another honest non-sender. Using majority vote, all honest non-senders output .
The case (the protocol fails)
A conspiracy and coordination between two byzantine nodes will break the protocol. We need more than one round of cross-checking. This is exactly what happens in Dolev-Strong Protocol.
7. The Dolev-Strong Protocol
The Dolev-Strong protocol gives a solution to state machine replication. It works under the assumption of the Synchronous, Permissioned model with PKI.
Validity Proof
Assume the sender is honest. At time step , the sender sends a signed copy to all non-senders. Because the signature can't be forged, all nodes receive the same output before the beginning of the next step.
Agreement Proof
Assume the sender is byzantine. We need to prove that if a non-sender gets convinced of a value by a message , all other honest non-senders get convinced of that value before the end of the protocol.
- : If becomes convinced of a new value , there is no time for to notify other nodes. The only hope is that other nodes already added independently.
- : The protocol has not yet ended, so if a non-sender gets convinced of a new value by a message , it sends this value and adds its signature. Every other honest non-sender receives this message prior to time step .
This article uses extensively lectures by Tim Roughgarden on "Foundation of Blockchain" and other research papers.