Hash Tables

the simplest way to put it

what you want

  1. i have a value.
  2. i want to make it so that it's really fast to find it

what you can do

what does it do

  1. you use a value from the document/object that you want to store (and entire string, a user email address, any identifier that's unique)
  2. the hash function generates an index from your value, e.g. const dbIndex = hashFunction("tliqun@gmail.com"); // 5
  3. you put your document/object at 'slot' 5 in your database (where slow could be 'row', or 'table' etc. row is the fastest, table allows you to store more than 1 of the same index, useful because.... (read below))
  4. when you need to find it again, just use the same value again to find the document/object. it takes O(1) time (1 step), where the 1 step is the hash function.
  5. of course, there are caveats. read below.

what happens if the index for valueA and valueB return the same index?

hash functions will sometimes return the same index for a given input, which is logical, because there's probably less indices than values in your system. a good hash function will at least have clashes that are equally distributed, e.g. eventually there are ~10 values in each of the indices.

how do i solve it?

2 methods:

everything eventually slows down. how do i tell when it's time to scale up (add more rows)?

you look at your load factor.

Load Factor = Total number of items stored / size of the array

seems there's a default of 0.75 in java

once your load factor goes above the number you set, it's time to expand the array (and rehash all your values).