Vector Clock & Applications

7 min readJun 19, 2021

Overview

Ordering operations in distributed systems is challenging task. As there is no shared global physical clock in distributed systems, it’s difficult to resolve conflicts in the state of the system. Vector clocks are used to tackle the above challenge and determine the order of operations in a distributed system.

Logical Clock

Logical Clock, measures the passing of time in terms of logical op- erations, not wall-clock time. The simplest possible logical clock is a counter, which is incremented before an operation is executed. Doing so ensures that each operation has a distinct logical times- tamp. If two operations execute on the same process, then neces- sarily one must come before the other, and their logical timestamps will reflect that.

Imagine 3 processes P1, P2 & P3. Every process in the system has its own local logical clock implemented with a numerical counter that follows specific rules:

The counter is initialized with 0.
The process increments its counter before executing an operation.
When the process sends a message, it increments its counter and sends a copy of it in the message.
When the process receives a message, its counter is updated to 1 plus the maximum of its current logical timestamp and the message’s timestamp.

It guarantees if an operation ‘X’ happened before operation ‘Y’ then logical timestamp of X should be less than Y, but the vice-versa is not true. In above example operation E doesn’t happen before C but it still has a lesser timestamp. Hence, we cannoy comment anything about C by just looking at the clock. But vector clocks do ensure this, let’s see how they work.

Vector Clock

A vector clock is a logical clock that guarantees that if two opera- tions can be ordered by their logical timestamps, then one must have happened-before the other. A vector clock is implemented with an array of counters, one for each process in the system. And similarly to how Lamport clocks are used, each process has its own local copy of the clock.

For example, if the system is composed of 3 processes 𝑃 1, 𝑃 2, and 𝑃 3, each process has a local vector clock implemented with an ar- ray7 of 3 counters [𝐶𝑃 1, 𝐶𝑃 2, 𝐶𝑃 3]. The first counter in the array is associated with 𝑃 1, the second with 𝑃 2, and the third with 𝑃 3.

A process updates its local vector clock based on the following rules:

Initially, the counters in the array are set to 0.
When an operation occurs, the process increments its own counter in the array by one.
When the process sends a message, it increments its own counter in the array by one and sends a copy of the array with the message.
When the process receives a message,it merges the array it received with the local one by taking the maximum of the two arrays element-wise. Finally, it increments its own counter in the array by one.

Vector clock timestamps can be partially ordered, (given two operations X and Y with timestamps 𝑇1 and 𝑇2), if:

If every counter of X is less than or equal to corresponding counter in Y.
There is atleast one counter in X that is strictly less than corresponding counter in Y.

In above example E [0,0,1] anc C[2,1,0] do not satisty above constraints anc cannot be ordered. Hence E & C are concurrent.

Inconsistency Resolution in Databases

Vector clocks act act as a versioning system fot every operation and are commonly used in BASE databases like Cassandra, DynamoDB etc. The problem of version conflict in such databases are common because of concurrent updates they receive and these databases are highly distributed. Vector clocks can tell wether states are conflicted or not and conflict can be resolved by some other entity or client.

Imagine a database cluster with 3 write nodes or servers [S1, S2, S3] i.e any of these 3 servers can cater to a write. Below is the scenario

A client writes to S1 some data item as D1.
Another client reads D1 and updates it to D2 and writes it back. S1 handles this write.
Now two clients read D2 concurrently and update it. One of them update data to D3 and write it, which is handled by S2. On the otherhand second updates it to D4 and write it.

The vector clocks of D3[2,1,0] & D4[2,0,1] indicate that these two are concurrent events and show a conflict as these vector violate our two constraints discussed above.

Conflicts can be resolved by either client or some consensus algorithms.

Collaborative Editing

Ellis and Gibbs’s paper involves the use of a vector clock to order the operations generated on different sites (each site or user is associated with a single document replica). The paper defines a Convergence Property and a Precedence Property and a notion of quiescence to describe the correctness of the algorithm. The authors define the properties as follows:

The Precedence Property states that if one operation, o, precedes another, p, then at each site the execution of o happens before the execution of p.

A groupware session is quiescent iff all generated operations have been executed at all sites, that is, there are no requests in transit or waiting to be executed by a site process.

The Convergence Property states that site objects are identical at all sites at quiescence

The Distributed Operational Transformation (dOPT) algorithm

For every change that a site makes to its replica, a request is generated and sent to other sites. To achieve the two properties the design uses a request queue Qi and a request log Li, where the subscript i is the site identifier.

The request queue contains operation requests either sent by remote sites. These are requests that are waiting to be processed, and the queue acts as a buffer where all incoming requests are stored for further processing.
The request log, on the other hand, is a log of requests that have been executed by the site. The log is a list of requests ordered by the order in which the requests were executed.

Each request has the form <i, s, o, p>, where i represents the site identifier and s represents the state vector of the site i. The state vector is essentially a vector clock that specify when an operation was executed on site j and its relation to the operations in the request queue in site i. o represents the operation to be performed (insert or delete). Finally, p specifies the priority of the operation.

The algorithm uses state vectors to order events causally. Given two state vectors si and sj:

si = sj if each component of si is equal to the corresponding component of sj.
si < sj if each component of si is less than or equal to the corresponding component of sj and at least one component of si is less than the corresponding component in sj.
si > sj if at least one component of si is greater than the corresponding component in sj.

The algorithm defines three possible execution states:

Operation request generation,
Operation request reception, and
Operation execution.

During operation request generation the site i generates an insert or delete operation. The operation is not executed immediately; the local data is not modified during operation request generation. Once the request is generated, it is appended to the site’s request queue Qi and broadcast to all other sites.

Generate operation <i ,si , o, p>
Qi := Qi + <i ,si , o, p>

A request generated on a site j is eventually received by site i which then moves to the “operation request reception” state. In this state, the received operations are appended to the site’s request queue.

Receive operation request from remote site j: <j ,sj , oj, pj >
Qi := Qi + <j ,sj , oj, pj >

During operation execution, requests from the request queue are processed. The order of execution of requests in the request queue is determined by the total order of events in the request queue as determined by the comparison of the state vectors. Briefly, in this step the operation from the request queue is chosen based on the executed operations in the request log. We locate the operation older than the current state vector at site i. Transformation is performed based on the operation logs in the request log. Comparison of the state vectors follows the conditions stated previously:

If sj > sj, this means that the site j has executed operations which site i has not seen yet. So this operation will have to stay in the queue till all operations between i and j have been executed.
If sj = sj, the two state vectors are identical and operation oj can be executed without transformation.
If sj < sj, site i has executed operations not seen by site j. The operation can be applied immediately, but requires operations to be transformed because other changes not visible to site j have already been executed by site i.