Over the last few years NoSQL databases have emerged as a cutting edge alternative to the traditional relational database systems. Prior to the NoSQL disruption, organizations used to rely on traditional database systems for all kind of workloads. A primary drawback of the relational databases is that they do not perform well at scale and the only option to scale those systems is to add more resources like RAM, CPU, Disks, etc. in to the existing machine (which is referred to as vertical scaling). However, with vertical scaling; relational databases still do not manage to perform exceptionally, primarily due to the design limitations with third normal forms and table joins.
The introduction of NoSQL databases changed this perception about database systems. Most of the NoSQL database are based on distributed architecture, where the system is comprised of a number of low cost commodity servers. NoSQL databases are known for their scaling ability, where by we can add more low cost commodity servers (referred to as horizontal scaling) to achieve scaling performance. However, NoSQL databases have their own limitations like no ACID compliance, lack of transactional feature, etc. and hence they can't be think about as a straight forward replacement of traditional relational database systems.
There are a number of popular NoSQL databases, which are ruling the database world. In this article, I will talk about one such NoSQL database called Cassandra which is getting widely adopted by businesses. We will explore the framework of Cassandra and some of the key terminologies used in Cassandra.
Before we begin with the discussion about Cassandra, we need to understand the foundation behind the design of a distributed database system. The foundation of a distributed system is based on the CAP theorem (illustrated in the following diagram), which states that it is impossible for a distributed computing system to simultaneously guarantee Consistency, Availability and Partition Tolerance. Therefore, each distributed system must do some trade off and choose any two of these three properties.
As per the CAP theorem, a distributed system can either guarantee Consistency and Availability (CA) while allowing some trade off with Partition Tolerance, or it can guarantee Consistency and Partition Tolerance (CP) while allowing some trade off with Availability or it can guarantee Availability and Partition Tolerance (AP) while allowing some trade off with Consistency.
Cassandra is highly Available and Partition Tolerant distributed database system with tunable (eventual) Consistency. In Cassandra, we can tune the consistency on per query basis (which would be a topic of another discussion).
Cassandra is a open source distributed database system, which is built with the understanding that system and hardware failures do occur. Cassandra is built in a way such that there is no single point of failure. Cassandra addresses the problem of system/hardware failure by implementing a peer-to-peer distributed database system across homogeneous nodes (servers/machines), where data is distributed across all the nodes in the Cassandra cluster as shown in the following diagram.
In Cassandra, there is no master/slave concept. Each node (server) in a Cassandra cluster is treated or functions equally. A client read or write request can be routed to any node ( which acts as a coordinator for that particular client request) in the Cassandra cluster irrespective of the fact whether that node owns the requested/written data. Each node in the cluster exchange information (using a peer-to-peer protocol) across the cluster every second, which makes it possible for the nodes to know which node owns which data and status of other nodes in the cluster. If a node fails, any other available node will serve the client's request. Cassandra guarantees availability and partition tolerance by replicating data across the nodes in the cluster. We can control the number of replicas (copies of data) to be maintained in the cluster through "Replication Factor" (more details in upcoming section). There is no primary or secondary replica, each replica is equal and a client request can be served from any available replica.
Since Cassandra is a distributed database, it needs to distribute the data across all the nodes (servers) in a cluster to achieve scaling performance. Cassandra also needs to replicate the same data across multiple nodes in the cluster to guarantee availability and fault (partition) tolerance.
In Cassandra, data is distributed across the cluster using a mechanism called "consistent hashing" which is based upon a token ring. In general, a token ring is a range of values (in Cassandra it ranges from -2^(63) to 2^(63)-1) which is used to generate a subset of token ranges and then these subset of token ranges are assigned to all the nodes in the cluster. When we write/insert data, it is hashed on the portioning column (In Cassandra each row/record has a partitioning column) using a hashing function and then the hashed value of the data record is matched to the token ranges of the nodes in the cluster to determine which node in the cluster to place the data in. This mechanism of hashing and using a token range helps in evenly distributing the data across the cluster.
In a Cassandra cluster, the nodes need to be distributed throughout entire possible token range starting from -2^(63) to 2^(63) - 1. Each node in the cluster must be assigned a token range. This token range determines the position of the node in the token ring and it's range of data. For instance, if we have a token ring ranging from 0 to 200 and have 5 nodes in the cluster, then the token range for the nodes would be 0, 40, 80, 120 and 160 respectively. The fundamental idea behind the token range is to balance the data distribution across the nodes in the cluster.
In earlier versions of Cassandra (prior to 1.2), we had to manually assign the token range to each node while configuring a Cassandra cluster or adding/removing nodes in/from the cluster. In the earlier versions, we were generating the token range based on the number of nodes in the cluster and assigning a single token range per node in the cluster. This type of token assignment has limitations with respect to cluster maintenance like adding or removing nodes, balancing data across nodes, etc. Following diagram illustrates the manual method of token range generation for a 8 node cluster using the Cassandra token ring (value ranging from -2^(63) to 2^(63)-1).
For a better understanding, the following diagram represents the token range distribution for a 5 node cluster with tokens ranging from 0 to 200.
In version 1.2, Cassandra had introduced virtual nodes (vnodes) which eliminates the need of manual token assignment or management. With vnodes, the token ring (-2^(63) to 2^(63)-1) gets automatically divided to a number of small token ranges and each node can have multiple set of these small token ranges rather than just one taken range. In a vnode configuration, we just need to mention the number of token to be generated for a node in the cluster (while configuring Cluster or while adding a node) and Cassandra will automatically calculate or recalculate (while adding or removing node) and distribute the set of token ranges in the cluster as well as balance the data based on the token range distribution. In a vnode configuration, each node has a high number of small token ranges rather than just a single token of a very high range (which was the case with earlier versions).
Following diagram illustrates the generation of a high number of small token ranges (A to Q) generated by the vnodes method using the Cassandra token ring (value ranging from -2^(63) to 2^(63)-1).
In a vnode setup, by default each node will have 256 set of small token ranges. For instance, if we have a 10 node Cassandra cluster, there will be 2560 (256*10) set of small token ranges generated from the Cassandra token ring (-2^(63) to 2^(63)-1) as opposed to 10 set of huge token ranges in earlier versions.
We have covered the high level framework of Cassandra distributed database in the previous section. Let's talk about some of key structures/components of a Cassandra cluster as listed below.
In this article, we have explored the Cassandra distributed database at a higher level. We have discussed about foundation behind the design of a Cassandra distributed database and familiarized ourselves with a number of key terminologies used with a Cassandra distributed database. In the next article, we will take a deep dive in to Cassandra's internal database architecture and will try to understand how it functions internally.