Distributed Systems and Blockchain Consensus Mechanism

Introduction#

With the increasing complexity of internet systems, most systems have transitioned from monolithic architectures to distributed architectures. In distributed systems, especially those based on blockchain technology, data consistency and consensus mechanisms are highly important.

This article will introduce the concepts of consistency and consensus in distributed systems and their practical applications and developments in blockchain.

Distributed Systems#

Consistency Problem#

As business scenarios become more complex, a single business is often provided by a cluster of servers located in different physical locations with different operational states. Achieving consistency in such distributed systems has become an important problem in the field of distributed computing.

In general, there are three specifications for achieving consistency in distributed systems:

Termination
Agreement
Validity

Distributed transactions need to ensure that consistent results can be achieved within a limited time frame. The result must be a proposal put forward by a node, and different nodes must reach the same decision.

Strong Consistency#

Achieving strong consistency is easy in ideal conditions, such as in a monolithic application or when all nodes have ideal performance, network bandwidth, etc. However, in real-world business scenarios, achieving strong consistency comes at a high cost. It requires ensuring the absolute stability of the system, no delays in communication between systems, and strong consistency can also reduce system performance and scalability.

In the case of strong consistency, the data in all nodes is always the same at any given time. Strong consistency usually includes two types: sequential consistency and linearizability.

Sequential Consistency#

Sequential consistency requires that the global execution order of all processes is consistent with the order of each process itself, but it does not require a global order of processes in physical time. Therefore, this is a relatively practical approach.

Linearizability#

Linearizability adds a rule that requires global ordering of processes between processes in addition to sequential consistency. It requires that the operations of all processes at all times are synchronized in real-time. This type of absolute consistency is often difficult to achieve in practice and usually requires the use of global locks or complex synchronization algorithms at the cost of sacrificing performance.

Weak Consistency#

In real-world business scenarios, real-time synchronization and absolute consistency are often not necessary. Therefore, it is acceptable to tolerate partial access or achieve consistency after a certain period of time. These weakened forms of consistency in certain aspects are called weak consistency.

Consensus Mechanisms#

Consensus mechanisms refer to the mechanisms by which multiple nodes in a distributed system reach consensus on a transaction. The following are some theories and principles related to achieving consensus:

FLP impossibility theorem
CAP theorem
ACID principles
BASE theory
Two-phase commit

FLP Impossibility Theorem#

The FLP impossibility theorem is a theory proposed by Fischer, Lynch, and Patterson, which states that it is impossible to achieve consensus in a reliable but asynchronous system where nodes may fail (e.g., crash) within a finite time frame.

Asynchrony refers to the existence of time differences between nodes in the system, making it impossible to determine whether the lack of response to a message is due to node failure or failure during transmission. Therefore, it is impossible to determine whether a message is lost.

CAP Theorem#

In engineering practice, it is often necessary to weaken certain requirements to meet the needs of real-world business scenarios. The CAP theorem is designed to address this issue. CAP stands for:

Consistency
Availability
Partition tolerance

A distributed system cannot guarantee all three of these properties simultaneously. At most, it can guarantee two of them. So, what are the practical applications of this theorem?

AP system: In scenarios such as static websites and non-real-time databases, consistency can be weakened, such as achieving consistency after a certain period of time after a new version is released.
CP system: In scenarios where consistency is absolutely critical, such as bank transfers, availability can be weakened, such as refusing service when the system fails or encounters failures.
AC system: Two-phase commit and some relational databases weaken network partitioning, such as ZooKeeper.

ACID Principles#

Distributed database transactions need to sacrifice some availability to achieve consistency and must adhere to the ACID principles, which are as follows:

Atomicity: All operations of a transaction must be executed or none of them should be executed. If a failure occurs, the transaction should be rolled back.
Consistency: The state of the system should be consistent before and after the transaction. There should be no intermediate states.
Isolation: Multiple transactions can be executed concurrently but should be isolated from each other.
Durability: State changes should be permanent.

BASE Principles#

The BASE principles are as follows:

Basically Available: The system should be basically available even in the presence of failures.
Soft State: The state of the system can change over time.
Eventual Consistency: The system will eventually become consistent.

This is an approach that sacrifices strong consistency to achieve consistency in the system over time.

Two-Phase Commit#

The two-phase commit decomposes the transaction submission process into two phases: pre-commit and commit, to avoid conflicts. However, it still faces issues such as synchronization blocking, single point of failure, and data consistency.

The TCC transaction mechanism mainly consists of the following phases:

Try Phase
Confirm Phase
Cancel Phase

In the Try phase, the business is checked and business resources are reserved. In the Confirm phase, the resources are used to execute the business. In the Cancel phase, the execution is canceled and resources are released. This approach adds some business processing to the two-phase commit, but it increases the complexity of the code because it is split into three interfaces.

The three-phase commit introduces a timeout mechanism and adds a pre-commit attempt phase in the first phase of the two-phase commit, which mainly solves the problems of single point of failure and blocking.

Consensus Algorithms#

Based on fault tolerance types (whether there are malicious nodes), consensus algorithms can be divided into two types: Crash Fault Tolerance (CFT) and Byzantine Fault Tolerance (BFT).

CFT (Crash Fault Tolerance)#

CFT refers to scenarios in distributed systems where there are faulty nodes but no malicious nodes. In such scenarios, messages may be lost or duplicated, but they will not be incorrect. Achieving consensus in such conditions is a common requirement in the real world.

Paxos#

The Paxos algorithm is similar to the two-phase commit. It defines three logical nodes: proposers, acceptors, and learners. The proposers propose proposals, the acceptors vote and accept the proposals, and the learners obtain the results and broadcast them.

Only proposals proposed by proposers can be approved, and all nodes can compete to become proposers. However, in each round of consensus, only one proposer proposes a proposal, which ensures a certain level of fairness.

However, Paxos can only guarantee consensus under certain conditions, such as when more than half of the nodes participate.

Raft#

Due to the difficulty of implementing the Paxos algorithm, many variants have emerged, such as Fast Paxos and Multi-Paxos. One of the more representative variants is the Raft algorithm.

Raft divides the consensus process into leader election, log replication, and safety, and defines three logical nodes: leaders, candidates, and followers.

All nodes start in the follower state, and if they want to participate in leader election, they become candidates and request an election. If they receive votes from more than half of the nodes, they become leaders in the current term.

The leader handles all requests, replicates logs to followers, and periodically sends heartbeat messages to all followers. If a failure occurs and the heartbeat message is not received within the timeout period, a new election process is initiated.

BFT (Byzantine Fault Tolerance)#

Byzantine Fault Tolerance (BFT)#

Byzantine fault tolerance algorithms are mainly used to handle scenarios where there are malicious nodes in the network. They primarily address the Byzantine Generals' Problem and can achieve consensus when the number of malicious nodes is less than or equal to one-third. However, the complexity of these algorithms is very high (exponential).

Practical Byzantine Fault Tolerance (PBFT)#

PBFT is an optimization of the BFT algorithm. It uses cryptographic techniques such as RSA signatures, message verification, and digests, and combines them with related algorithms such as Paxos. As a result, the complexity of the algorithm is reduced to quadratic.

In the implementation of the PBFT algorithm, a random or rotating node is selected as the primary node. The primary node receives client requests within its view and broadcasts them to other nodes (using a three-phase commit mechanism, as mentioned above). When all nodes have completed processing the request, they return the results to the client. If at least 2f + 1 identical results are received from different nodes, consensus is achieved.

Pre-prepare: The primary node receives the message, signs it, and broadcasts it to other nodes.
Prepare: Other nodes receive the message, verify it, sign it, and broadcast it to other nodes. Other nodes also verify it.
Commit: The message is signed and broadcasted to indicate the commit status. If it is validated by 2f + 1 nodes, the system achieves consensus.

Others#

In addition to PBFT, PoW, PoS, HotStuff, and other consensus algorithms are widely used in blockchain projects such as Bitcoin, Ethereum, and Libra, and are continuously being optimized. Due to their low efficiency, Byzantine fault tolerance algorithms are mostly used in public chain environments, while consortium chains often use non-Byzantine fault tolerance methods supplemented by access control mechanisms to balance performance and security.

Conclusion#

The above is a summary of the concepts and practical applications of distributed systems and blockchain consensus mechanisms. In the future, more in-depth analysis will be conducted on various consensus algorithms widely used in the industry.

References#

区块链原理、设计与应用

分布式事务，这一篇就够了

理解 TCC、2PC 和 3PC

【共识专栏】共识的分类（上）

【共识专栏】共识的分类（下）