Distributed and Consistent Hashing – Part 1

Windows Azure Cache (WA Cache) is an distributed in-memory cache. WA Cache provides a simple <Key, Value> based API. Like, Cache.Put (key, value), Cache.Get (key). You can correlate WA Cache API to that of a Hash. This blog post series tries to take the user from traditional one-node Hash to a distributed Hash.  This is first post of the series.

Hashing:

Hash is an in-memory data structure indexed by key. Hash is generally implemented as an array of collections. Array is referred to as Hash Directory or directory. Each collection is referred to as Hash Bucket or simply Bucket. The array index for a bucket is called as Bucket Id.

Since, key can be of any data type; key is converted to a number namely HashCode. There exists variety of schemes to generate a HashCode for a given key. HashCode is then converted to bucket id by different scheme. The simplest scheme is, BucketIndex = HashCode % BucketCount.

In all the examples of this article, we will use data as numbers. Plus, HashCode=Key.

Blue boxes are Hash directories and Green boxes are Hash buckets. The Hash looks like the diagram below if we insert 0 to 10.

image

Figure 1 : Normal Hash

What I have described (above) is a single level Hashing. Let us now delve into multi-level hashing. In multi-level hashing, leaf level has hash buckets and all other levels are hash directories.

An example of two-level hash is given below. Data is 0 to 10.

image

Figure 2 : Multi Level Hash

Thus far, we have discussed in-memory Hashing (of different levels) where all memory is within a node. If we were to Hash of data whose size is in Gigabytes (and Terabytes), then the only to use this hashing is to have a machine with that much memory/RAM; which you know is super costly. The immediate thing that strikes is why not use memory from multiple nodes? Namely, distributed hashing. The capacity of distributed hash is the sum of the capacities of node hashes.

Now with multiple nodes coming into picture, hashing has to have a provision to identify ‘node’ on which to place/retrieve data. Hence, we need (and assume) to have a unique identifier for each node – Node Id.

Distributed Hashing API is not different from the regular Hashing API. That means we need a way to find the node on which the key would be stored. Once we identify the node, we then use the regular Hashing with in that node. As you can see, we have a communication over a network with typical Client-Server system. Each node runs the server side and the actual user application would run the client side.

Instead of jumping into perfect solution, let us go step-by-step with improvements.

Simple Client – Simple Server:

In this scheme, Client identifies the node by applying modulo operator with Node Count.

Node Id = Hash Code % Node Count

Client talks to Server identified, and Server would then uses its own hashing scheme to get/put the user supplied key.

image

Figure 3 : Distributed Hash – Simple Client and Simple Server

Simple, is not it? Let us find problems in this model.

Application has been inserting data into the Hash and distributed hash has run out of capacity (how does one detect that hash went out of capacity is altogether a separate topic and I will not digress into it. Let us assume, there is a mechanism to detect and inform the user). To increase the capacity, one more server has been added. Just after adding one more server, the system looks like below:

image

Figure 4 : Distributed Hash – Simple Client and Simple Server – Cluster Expansion

Now Client wants to get Key 3. Node Id = 3 (Hash Code) % 3 (Node Count) = 0. Client then talks to server 0; server 0 can’t find the key ‘3’. Client now gets confused on ‘Key does not exist’ answer, as it has put the key ‘3’ into hash. What’s wrong? Client identification of node is tied with server count.

Remedy 1– Reapply hashing scheme on existing data, whenever there is a change in server/node count, and redo the placement. Let us call this procedure as, ‘data movement’. Here is how the distributed hash looks like, after data movement:

image

Figure 5 : Distributed Hash – Simple Client and Simple Server – Data Movement

Now, the Client’s get of key ‘3’ would succeed as the Server 0 has the key and returns. Let us look at the problems with this.

‘Red’ colored data represents moved data. As you can observe, around 70%+ of data has participated in data movement. That’s huge amount and is inefficient. The problem gets aggravated if nodes go down and come up frequently.

Taking a step back and rethinking of placement of data in nodes may yield better results.

Scheme: Divide hash code space in to ‘Node Count’ ranges. Each range is owned by a node whose id is nearest to that hash code. Client also does the same when doing get/put to identify the node.

Hash Code is a number and if we consider it as 32-bit integer, the hash space is MIN_INT32 to MAX_INT32. For simplicity of our discussion, let us take Hash Code as 4-bit integer that translates the hash code space as [0, 15].

When there is only one node, whole range is owned by that one node (Server 0).

image

Figure 6 : Distributed Hash – Range Based – One Node Cluster

Let us add one more node. As the number of ranges is equal to the number of nodes, we will have to divide the hash code space into two ranges. As a result, server 0 now owns [0-7] and server 1 owns [8-15].

image

Figure 7 : Distributed Hash – Range Based – Two Node Cluster

We retained the practice of ‘Red’ coloring data items that participated in data movement. The data moved now is less. Let us add one more node again. As a result, we need to divide the whole range into three sub-ranges and move the data.

image

Figure 8 : Distributed Hash – Range Based – Two Node Cluster – Data Movement

We have reduced the data movement volume, but data distribution is skewed. I will introduce you to another way of distributing data, which may/may not solve the distribution skew.

Instead of Node Id of Servers as contiguous, let us assign identifiers with gap but within the hash code space. In our example, HashCode space is [0, 15]. Let us give server id as 4, 10, and 15. The key owner is identified by the server near to the key’s HashCode. The hash with two nodes ids 3, 9 can be depicted as:

image

Figure 9 : Distributed Hash – Sparse Node Id and Nearest Node Id

Let us introduce another node with id 15. We would treat HashCode space as circular and hence ‘0’ is nearer to Server 15 than Server 4. With this scheme of distribution, the hash is:

image

Figure 10 : Distributed Hash – Sparse Node Id and Nearest Node Id – Data Movement

The beauty of data distribution schemes discussed so far is that client’s way of identifying the ‘server’ where data should be placed/found is same as server’s way of distributing data. In other words, Client and Server have common and agreed way of distributing data. If we get rid of this common understanding, we will find other ways of data distribution. The decoupling and how clients learn server’s distribution model is the next blog post.

About these ads
This entry was posted in Official and tagged , , , . Bookmark the permalink.

One Response to Distributed and Consistent Hashing – Part 1

  1. Pingback: Distributed and Consistent Hashing – Part 3 | Laxmi's Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s