Skip to main content

TiKV

TiKV is an open source distributed Key-Value database which is based on the design of Google Spanner and HBase, but it is much simpler without dependency on any distributed file system. It's has primary features including Geo-Replication, Horizontal scalability, Consistent distributed transactions, Coprocessor support.

History of TiKV

Inspired by Google Spanner, released the first version of TiKV along with TiDB in 2016. TiKV has been accepted by Cloud Nature Computing Foundation (CNCF) as a Sandbox project in August 2018 and an Incubating project in May 2019.

Architecture of TiKV

TiKV implements the multi-raft-group replica mechanism based on the design of Google Spanner. A Region is a basic unit of the key-value data movement and refers to a data range in a Store. Each Region is replicated to multiple nodes. These multiple replicas form a Raft group. A replica of a Region is called a Peer. Typically there are 3 peers in a Region. One of them is the leader, which provides the read and write services. The PD component balances all the Regions automatically to guarantee that the read and write throughput is balanced among all the nodes in the TiKV cluster. With PD and carefully designed Raft groups, TiKV excels in horizontal scalability and can easily scale to store more than 100 TBs of data.

  • Placement Driver:- PD is the cluster manager of TiKV, which periodically checks replication constraints to balance load and data automatically.
  • Store:- There is a RocksDB within each Store and it stores data into the local disk.
  • Region:- Region is the basic unit of Key-Value data movement. Each Region is replicated to multiple Nodes. These multiple replicas form a Raft group.
  • Node:- A physical node in the cluster. Within each node, there are one or more Stores. Within each Store, there are many Regions.

Image of Yaktocat

Region and RocksDB

There is a RocksDB database within each Store and it stores data into the local disk. All the Region data are stored in the same RocksDB instance in each Store. All the logs used for the Raft consensus algorithm is stored in another RocksDB instance in each Store. This is because the performance of sequential I/O is better than random I/O. With different RocksDB instances storing raft logs and Region data, TiKV combines all the data write operations of raft logs and TiKV Regions into one I/O operation to improve the performance.

Consensus algorithm

When building a distributed system, one principal goal is to build in fault-tolerance. That is, if one particular node in the network goes down, or if there is a network partition, the system should continue to operate in a consistent way, i.e., nodes in the system should have a consensus on the state of the system. The consensus should be considered final once it is reached, even if some nodes were in faulty states at the time of decision.\ Distributed consensus algorithms often take the form of a replicated state machine and log. Each state machine accepts inputs from its log, and represents the value(s) to be replicated, for example, a change to a hash table. They allow a collection of machines to work as a coherent group that can survive the failures of some of its members.\ Two well known distributed consensus algorithms are Paxos and Raft. Paxos is used in systems like Chubby by Google, and Raft is used in systems like TiKV or etc. Raft is generally seen as more understandable and simpler to implement than Paxos.\ In TiKV we harness Raft for distributed consensus. We found it much easier to understand both the algorithm, and how it will behave in even truly perverse scenarios.

Distributed transaction

As TiKV is a distributed transactional key-value database, transaction is a core feature of TiKV. In this chapter we will talk about general implementations of distributed transaction and some implementation details in TiKV.\ A database transaction, by definition, must be atomic, consistent, isolated and durable. Database practitioners often refer to these properties of database transactions using the acronym ACID.\ Transactions provide an “all-or-nothing” proposition: each work-unit performed in a database must either complete in its entirety or have no effect whatsoever. Furthermore, the system must isolate each transaction from other transactions, results must conform to existing constraints in the database, and transactions that complete successfully must get written to durable storage.\ A distributed transaction is a database transaction in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that encompasses all operations against such resources. Distributed transactions, as any other transactions, must have all four ACID properties.\ A common algorithm for ensuring correct completion of a distributed transaction is the two-phase commit (2PC).

Why TiKV?

Because it has following properties

  • High scalability
  • Low and stable latency
  • Easy to use
  • Easy to maintain
  • Adjustable consistency
  • Consistent distributed transactions

Usefull links: