Consistent Hashing

Problem:

  1. your database is full, your db system needs to scale up.
  2. your database needs to be partitioned.

How do you partition your database? there are 3 ways to partition your data depending on how you structure your stuff (read more here).

Today we will be talking about just one: horizontal partitioning.

so for example your database (that is currently full) is a nosql document database. you have outgrown the number of rows that one collection can support, and you need more.

There are two ways to perform this split.

  1. range based partitioning - split the rows of your table by something (can be by alphabet, can be by some number range etc)
  2. hash based partioning - generate 'database indexes' for your rows using a hash function, and send your data to those databases

We will talk about hash based partitioning (because i'm still trying to get to the subject of consistent hashing). This works by taking, say, a unique value from your row (e.g. user id, or some other unique value), and mapping it into a number by running it through a function (hash function).

// mutable
function hashFunction(url) {
    let total = 0;
    for (let i=0, i < url.length, i++) {
        // url = "Mia"
        const character = url[i]; // "M"
        const ascii = character.charCodeAt(0); // 77
        total += ascii;
    }
    // some magic using ascii map
    return total;
}

// immutable
function hashFunction2(url) {
    // .map .filter .reduce
    return url.reduce((current, total, index) => {
        const character = url[index]; // "M"
        const ascii = character.charCodeAt(0); // 77
        return total + ascii
    }, 0)
}

const sample = "Mia";
hashFunction(sample); // 297

http://tinyurl.com/tangliqun_5673 --> hashFunction() --> 3

// store
const hashNum = hashFunction('http://tinyurl.com/tangliqun_5673') // 3
const objectToStore = {
  url: 'http://tinyurl.com/tangliqun_5673',
  fullURL: 'http://google.com',
}
const database = [
  undefined,
  undefined,
  undefined,
  objectToStore,
  undefined,
  undefined,
  undefined,
  undefined,
]

database.forEach(obj => {
  if (obj.url === 'http://tinyurl.com/tangliqun_5673') return 'i found it'
})

// ----------------
// finding a url
// ----------------
// 1. user gives me "http://tinyurl.com/tangliqun_5673"

// 2. i run the url through a hash function and get 3
const hashNum = hashFunction('http://tinyurl.com/tangliqun_5673') // 3
// "http://tinyurl.com/tangliqun_5673" === 3

// 3. i look at the database at item 3
const mySearchData = database[3]

// 4, i return the fullURL to user
const toUser = mySearchData.fullURL

this hash function MUST always return the same mapping given the same value.

Some key properties of the hash function used in Hash based partitioning:

we now have a number, which we can use to direct your data to an allocated database.

only now you're stuck with 10 databases, no more, no less. BECAUSE your hash function only returns numbers 1-10, if you add a database called 'database 11', nothing will ever be sent there, because the index doesn't get generated! What do you do? you seriously shouldn't consider rehashing ALL YOUR DATA into numbers 1-11 and then generate all hash mappings again!

This is where consistent hasing (finally) comes in. a good way to think about it is to imagine a circle, upon which your databases sit in evenly spaced distances.

consistent hashing 1

In Consistent Hashing, when the hash table is resized (e.g. a new cache host is added to the system), only ‘k/n’ keys need to be remapped where ‘k’ is the total number of keys and ‘n’ is the total number of servers. Recall that in a caching system using the ‘mod’ as the hash function, all keys need to be remapped.

so now instead of remapping 10 databases worth of data, we plan ahead:

consistent hashing new database

how it works with an additional database:

ref

from medium