Untitled
Three main types
Partitioning Methods
1. Horizontal Partitioning / Range based Partitioning
- split rows by 'difference'
- the 'difference' is something you decide
- in a DB of zip codes, it can be for ZIP codes that range from 0-1000 in one table, and range 1000-2000 in another table.
- in a DB of URLs, it can be split by the first letter: an 'a' table, a 'b' table, 'c' table...
problems with this approach
- if the value whose range is used for partitioning isn't chosen carefully, the partitioning scheme will lead to unbalanced servers.
- e.g. if Tokyo is zip code 0-1000, that DB will receive a very high load vs other DBs
2. Vertical Partitioning
- split data by feature
- each specific feature on our system has their own partition
- e.g. Instagram
- data related to users - server 1
- photos they upload - server 2
- people they follow - server 3
- more straightforward to implement
- has low impact on the application
problem with this approach
- it may be necessary to further partition a feature specific DB across various servers (because its big, e.g. photos or log data are a single feature, but can't be handled by only 1 server)
3. Directory Based Partitioning\
- have a directory based server
- a loosely coupled approach to work around issues mentioned in the above schemes
- create a lookup service (directory server) which knows your current partitioning scheme and abstract it away from the DB access code
- to find where the particular data entity resides, we query the directory server that holds the mapping between each tuple key to its DB server (DB server index)
- this means we can perform tasks like adding servers to the DB pool or change partitioning scheme without having and impact on the application.
Partitioning Criteria
WIP