Hash Tables
the simplest way to put it
what you want
- i have a value.
- i want to make it so that it's really fast to find it
what you can do
- use a hash table / hash map.
what does it do
- you use a value from the document/object that you want to store (and entire string, a user email address, any identifier that's unique)
- the hash function generates an index from your value, e.g.
const dbIndex = hashFunction("tliqun@gmail.com"); // 5 - you put your document/object at 'slot' 5 in your database (where slow could be 'row', or 'table' etc. row is the fastest, table allows you to store more than 1 of the same index, useful because.... (read below))
- when you need to find it again, just use the same value again to find the document/object. it takes O(1) time (1 step), where the 1 step is the hash function.
- of course, there are caveats. read below.
what happens if the index for valueA and valueB return the same index?
hash functions will sometimes return the same index for a given input, which is logical, because there's probably less indices than values in your system. a good hash function will at least have clashes that are equally distributed, e.g. eventually there are ~10 values in each of the indices.
how do i solve it?
2 methods:
- open addressing: where you jump to the next available empty slot using another function (probing). examples:
- linear probing: you check the next index, then the next etc
- quadratic probing: you check the next index, then 2 indices away, then 4 indices away, then 16 indices away...
- plus 3 rehash
- double hashing: you hash your value again, this time with something else added (e.g. the result of the first hash) and hope for the best
- closed addressing: where you use the same slot, but your slot stores more than 1 data
- store it as an array in the slot. you need a lotta memory to keep empty space if you use arrays.. that's why you...
- store it as a linked list. each slot only takes up as much memory as is needed to store the data. this makes it a linked hash map.
- eventually your data takes O(n) time to find, which sorta defeats the purpose of a hash table
everything eventually slows down. how do i tell when it's time to scale up (add more rows)?
you look at your load factor.
Load Factor = Total number of items stored / size of the array
seems there's a default of 0.75 in java
once your load factor goes above the number you set, it's time to expand the array (and rehash all your values).