Untitled

Three main types

Partitioning Methods

split rows by 'difference'
the 'difference' is something you decide
- in a DB of zip codes, it can be for ZIP codes that range from 0-1000 in one table, and range 1000-2000 in another table.
- in a DB of URLs, it can be split by the first letter: an 'a' table, a 'b' table, 'c' table...

if the value whose range is used for partitioning isn't chosen carefully, the partitioning scheme will lead to unbalanced servers.
- e.g. if Tokyo is zip code 0-1000, that DB will receive a very high load vs other DBs

split data by feature
each specific feature on our system has their own partition
e.g. Instagram
- data related to users - server 1
- photos they upload - server 2
- people they follow - server 3
more straightforward to implement
has low impact on the application

it may be necessary to further partition a feature specific DB across various servers (because its big, e.g. photos or log data are a single feature, but can't be handled by only 1 server)

have a directory based server
a loosely coupled approach to work around issues mentioned in the above schemes
create a lookup service (directory server) which knows your current partitioning scheme and abstract it away from the DB access code
to find where the particular data entity resides, we query the directory server that holds the mapping between each tuple key to its DB server (DB server index)
this means we can perform tasks like adding servers to the DB pool or change partitioning scheme without having and impact on the application.

WIP