-Intro to Hash Tables
-Handling Collisions
-Hash Table Keys and Values
Intro to Hash Tables
Hash tables are used to store key-value pairs.
They are similar to arrays but there is no required order.
Hash tables have a great speed so they are widely used.
Hash tables have different names in other languages but the functionality is the same.
Python hash tables are called dictionaries.
JavaScript hash tables are called objects and maps.
function hash(key, arrayLen) {
let total = 0;
for (let char of key) {
// map "a" to 1, "b" to 2, "c" to 3, etc.
let value = char.charCodeAt(0) - 96
total = (total + value) % arrayLen;
}
return total;
}
Handling Collisions
There are two ways to handle hash table collisions.
- Separate Chaining
- Linear Probing
Separate Chaining
is when the hash table has each index in the array has stored values with more data similar to an array or a linked list.
Linear Probing
When a collision is found, a search is performed throughout the array to find the next empty slot.
Hash Table Keys and Values
Keys - Loops through the hash table array and returns an array of keys in the table
Values - Loops through the hash table array and returns an array of values in the table
keys(){
let keysArr = [];
for(let i = 0; i < this.keyMap.length; i++){
if(this.keyMap[i]){
for(let j = 0; j < this.keyMap[i].length; j++){
if(!keysArr.includes(this.keyMap[i][j][0])){
keysArr.push(this.keyMap[i][j][0])
}
}
}
}
return keysArr;
}
values(){
let valuesArr = [];
for(let i = 0; i < this.keyMap.length; i++){
if(this.keyMap[i]){
for(let j = 0; j < this.keyMap[i].length; j++){
if(!valuesArr.includes(this.keyMap[i][j][1])){
valuesArr.push(this.keyMap[i][j][1])
}
}
Top comments (6)
I don't think your implementation is correct.
When implementing keys() you shouldn't check for duplicates.
Just iterate linearly add add to keysArr. That's because keysArr.includes() is an O(N) operation, so your keys() implementation is O(N^2) which is really bad.
In values() it's the same. You don't need to check for duplicates, in fact duplicates are possible.
I assume you're using separate chaining. There are also other techniques like coalesced chaining where you would keep a certain amount of space preallocated for collisions, but if that runs out you rehash.
Also probing can be linear, quadratic or you can use double hashing (a second hash function) which can reduce clustering.
From a user's perspective it's important to understand what can cause collisions so that a hash table doesn't become a linked list. To avoid that sort of issue, I generally only use primitives as keys or very carefully crafted custom objects (like UUID). I don't casually use a custom object as a key.
Probably in a language like C probing techniques might be better because you can get a more compact RAM by allocating all at once (giving you better cache performance) whereas separate chaining would come with a fragmentation cost.
In languages like Java (and probably V8 and .Net as well) separate chaining is not so bad because allocating is very cheap with a copy collector and memory is compacted (for long lived memory) so lookups will be very fast since the next object is probably already in the CPU cache.
That's probably why Java doesn't mind using a list to resolve collisions. Though there's a space penalty for pointers.
I will save this for future purposes. Thank you so much!
if anyone wants to read more about this: geeksforgeeks.org/hash-table-data-...
Awesome introduction to hashing!
Jumping from a high level definition of a hash table straight into collision handling is confusing for somebody who is new to hash tables.