Open Addressing
It is also known as Close Hashing
In Open Addressing Hashtables, all elements occupy the hash table itself. That is, each table entry contains either an element of the dynamic set (one key per bucket) or NIL.
When searching for an element, we systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. No lists and no elements are stored outside the table, unlike in chaining. Thus, in open addressing, the hash table can “fill up” so that no further insertions can be made; one consequence is that the load factor can never exceed 1
Of course, we could store the linked lists for chaining inside the hash table, in the otherwise unused hash-table slots, but the advantage of open addressing is that it avoids pointers altogether.
Instead of following pointers, we compute the sequence of slots to be examined. The extra memory freed by not storing pointers provides the hash table with a larger number of slots for the same amount of memory, potentially yielding fewer collisions and faster retrieval.
Properties
Load Factor
Theoretical maximum load factor of 1.
The size of the hash table array must always be at least as large as the number of keys in the hash table.
Collisions
To sum up, collisions are dealt with by searching for another empty bucket within the hash table array itself.
- There is NO help from external data structures.
The hash functions used are slightly different from the ones in the closed addressing, but they still respect the SUHA. We face this topic in the next section.
Hash functions

Implementation Techniques
To perform insertion using open addressing, we successively examine, or probe, the hash table until we find an empty slot in which to put the key. The most common types of probing are:
- Linear Probing
- Quadratic Probing
- Double hashing
- Hopscotch hashing
- Robin Hood hashing
- Cuckoo hashing
- 2-Choice hashing
Operations
We assume all the elements
Search/ Inspection/ Probing
To search, inspect or probe an open addressing hashtable all mean the same:
Given an element
, a hash function is computed and its result is a cell's address inside the given hashtable T, such that . The content of the cell is, then, examined (usually for insertion purposes).
Since all the possible elements are in the table, the table must be as large as
For this reason when probing we always start from the index
After computing the hash function
- Successful Search: If the cell
contains , the search succeeded. - Unsuccessful Search: Else if the cell
contains and the search failed. - Busy Cell: Else if the cell
contains a key different from - We compute another cell's index, based on
, the order of inspection , and the previously inspected cells.
- We compute another cell's index, based on
We keep inspecting
- Until we happen to be in the first two cases,
- Or we have already scanned the whole table:
inspections.
Now we can see the hash function, must avoid traversing the whole table for each probe operation. not only depends on the key but also on the order of inspection. This is section is specified in the section Hash Function - Open Addressing
hash_search(T,k)
i = 0
found = false
repeat:
j = h(k, i) # Compute the hash function
if T[j] == k:
found = true
else:
i=i+1
until found or T[j]==NIL or i==m
if found:
# Key in table
return j
else:
# Key not in table
return NIL
$$
**Final Time Complexity**: $T(n) = $
### Hash Insert
Now that we defined how to search, we can proceed with insertion.
```python
hash_insert_v1(T, k)
i = 0
found = false
repeat:
j = h(k, i) # Compute the hash function
if T[j] == NIL:
T[j] = k # We can insert only if it's an empty cell
found = true #Set to true
else:
i=i+1
until found or i==m
if found:
return j
else:
raise_error ("overflow hash table")
$$
**Final Time Complexity**: $T(n) = \mathcal{O}(n)$
* The worst case is upon insertion of an element, when table is already full.
* Usually, when the table reaches a certain low percentage of empty cells, we need to restructure the table and
re-insert all the keys.
* By changing $m$, all the keys also change position and this heavily impacts the performances.
### Delete
Deletion from an open-address hash table is difficult.
#### Deletion Problem
When we delete a key from slot $i$, we cannot simply mark that slot as empty by storing $NIL$ in it. If
we did, we might be unable to retrieve any key $k$ during whose insertion we had probed slot $i$ and found it occupied.
We can solve this problem by marking the slot, storing in it the special value $DELETED$ instead of $NIL$, where $DELETED \neq NIL$.
We would then modify the procedure `hash_insert` to treat such a slot as if it were empty so that we can insert a
new key there.
We do not need to modify `hash_search`, since it will pass over $DELETED$ values while searching.
When we use the special value $DELETED$, however, search times no longer depend on the load factor $\alpha$, and for
this reason chaining is more commonly selected as a collision resolution technique when keys must be deleted.
Example: Say we want to perform these operations in order
* `hash_insert_v1(T,k=118)`
* `hash_delete(T, k=39)`
* `hash_search(T, k=118)`
with $T$ as
| index $i$ | key $k$ |
|----------- |---------- |
| $0$ | $15$ |
| $1$ | $25$ |
| $2$ | $NIL$ |
| $3$ | $NIL$ |
| $\ldots$ | $\ldots$ |
| $5$ | $NIL$ |
| $6$ | $NIL$ |
| $7$ | $39$ |
| $\ldots$ | $\ldots$ |
| $n$ | $NIL$ |
Would result in:
1. $h(118,0) = 2$, Collision
2. $h(118,0) = 7$, Collision
3. $h(118,0) = 3$, Empty
1. We proceed with insertion
| index $i$ | key $k$ |
|----------- |---------- |
| $0$ | $15$ |
| $1$ | $25$ |
| $2$ | $NIL$ |
| $3$ | **118** |
| $\ldots$ | $\ldots$ |
| $5$ | $NIL$ |
| $6$ | $NIL$ |
| $7$ | $39$ |
| $\ldots$ | $\ldots$ |
| $n$ | $NIL$ |
Now, we want to remove the key $k=39$
| index $i$ | key $k$ |
|----------- |---------- |
| $0$ | $15$ |
| $1$ | $25$ |
| $2$ | $NIL$ |
| $3$ | **118** |
| $\ldots$ | $\ldots$ |
| $5$ | $NIL$ |
| $6$ | $NIL$ |
| $7$ | ~39~ |
| $\ldots$ | $\ldots$ |
| $n$ | $NIL$ |
And finally we look for $k=118$:
1. $h(k,0) = 2$, is not the right one
2. $h(k,1) = 7 \Rightarrow NIL$
1. Then $k=118 \notin T$ since we used the value $NIL$ to substitute the previous one upon deletion.
2. By construction of the algorithm this stops the search.
### Rewriting Insert
```python
hash_insert_v2(T, k)
int i = 0, found = 0; #We can set found to a boolean too
do:
int j = h(k, i); # Compute the hash function
if(T[j] == NULL || T[j] == DELETED):
T[j] = k; #We can insert only if it's an empty cell
found = 1; #Set to true/1
else:
i++;
while(i != m && !found); # same as (i==m || found)
if(found):
return j;
else:
raise_error (“overflow hash table”);
$$
**Final Time Complexity**: $T(n) = \mathcal{O}(n)$
## Analysis of Open Addressing
We will use the following assumptions:
1. Simple Uniform Hashing
2. No deletions
The analysis will be done in terms of load factor $0 < alpha \leq 1$ because we only have the cells in our table. Otherwise,
we would have an overflow.
### Theorem - Unsuccessful Search
Given an open-address hash table with load factor $\alpha = n/m < 1$, the expected
number of probes in an **unsuccessful search** is at most $1/(1-\alpha)$, assuming uniform hashing
#### Demonstration
$$\alpha < 1 \Rightarrow \text{ There are some empty cells } \Rightarrow \text{ Search stops when we find one }$$
If we consider the number of inspections at each iteration:
1. First one, $1$
2. Second one, $\frac{n}{m} = \alpha$
3. Third one, $\frac{n}{m} \cdot \frac{n-1}{m-1} \leq \alpha^{2}$
1. Where $\frac{n-1}{m-1} \leq \alpha$
4. n-th one, $\frac{n}{m} \cdot \frac{n-1}{m-1}$
Then, we can find the total inspections as the geometric series
$$
1 + \alpha + \alpha^{2} + \ldots \leq \sum_{i=0}^{\infty} \alpha^{i} \stackrel{|\alpha|<1} \Longrightarrow \sum_{i=0}^{\infty} \alpha^{i} = \frac{1}{1-\alpha}
$$
#### Interpretation
If $\alpha$ is constant an unsuccessful search is executed $T_{avg}(n)= \Theta(1)$
* If the table is half full, $\alpha = 0.5 = 1/2 \Rightarrow \text{ Average Inspections } = 1/(1-1/2) = 2$
* If the table is $90%$ full, $\alpha = 0.9 = 9/10 \Rightarrow \text{ Average Inspections } = 1/(1-9/10) = 10$
#### Corollary 11.7
Inserting an element into an open-address hash table with load factor $\alpha$ requires at most $1/(1-\alpha)$
probes on average, assuming uniform hashing.
Demonstration: An element is inserted only if there is room in the table, and thus $\alpha < 1$. Inserting a key
requires an unsuccessful search followed by placing the key into the first empty slot found.
Thus, the expected number of probes is at most $1/(1-\alpha)$ as the previous theorem states.
### Theorem - Successful Search
Given an open-address hash table with load factor $\alpha = n/m < 1$, the expected
number of probes in an **unsuccessful search** is at most $\frac{1}{\alpha} \cdot ln(\frac{1}{1-\alpha})$, assuming
uniform hashing and assuming that each key in the table is equally likely to be searched for.
#### Interpretation
If $\alpha$ is constant a successful search is executed $T_{avg}(n)= \Theta(1)$
* If the table is half full, $\alpha = 0.5 = 1/2 \Rightarrow \text{ Average Inspections } \leq 1.387$
* If the table is $90%$ full, $\alpha = 0.9 = 9/10 \Rightarrow \text{ Average Inspections } \leq 2.559$
## Restructuring
When $\alpha > 0.5$ we double its size, and proceed with inserting the currently existing elements into the new table.
This may call for:
* Rehashing of all these key values
* Transferring all the records
This is because the average time for operations can become quite substantial.
Subsequent dictionary operations will be more efficient and can more than make up for the overhead in creating
the larger table. Usually it doubles the size. [1]
### Conclusions
Pros:
* No size overhead apart from the hash table array.
* Better memory locality and cache performance. All elements laid out linearly in memory.
* Performs better than closed addressing when the number of keys is known in
advance and the churn is low.
Cons:
* Chaining results better performing than this implementation ($\alpha > 2$ )
---
### Extra Credits
* [1] [Restructuring](https://gtl.csa.iisc.ac.in/dsa/node51.html)hash_search(T,k)
i = 0
found = false
repeat:
j = h(k, i) # Compute the hash function
if T[j] == k:
found = true
else:
i=i+1
until found or T[j]==NIL or i==m
if found:
# Key in table
return j
else:
# Key not in table
return NIL
$$
**Final Time Complexity**: $T(n) = $
### Hash Insert
Now that we defined how to search, we can proceed with insertion.
```python
hash_insert_v1(T, k)
i = 0
found = false
repeat:
j = h(k, i) # Compute the hash function
if T[j] == NIL:
T[j] = k # We can insert only if it's an empty cell
found = true #Set to true
else:
i=i+1
until found or i==m
if found:
return j
else:
raise_error ("overflow hash table")
$$
**Final Time Complexity**: $T(n) = \mathcal{O}(n)$
* The worst case is upon insertion of an element, when table is already full.
* Usually, when the table reaches a certain low percentage of empty cells, we need to restructure the table and
re-insert all the keys.
* By changing $m$, all the keys also change position and this heavily impacts the performances.
### Delete
Deletion from an open-address hash table is difficult.
#### Deletion Problem
When we delete a key from slot $i$, we cannot simply mark that slot as empty by storing $NIL$ in it. If
we did, we might be unable to retrieve any key $k$ during whose insertion we had probed slot $i$ and found it occupied.
We can solve this problem by marking the slot, storing in it the special value $DELETED$ instead of $NIL$, where $DELETED \neq NIL$.
We would then modify the procedure `hash_insert` to treat such a slot as if it were empty so that we can insert a
new key there.
We do not need to modify `hash_search`, since it will pass over $DELETED$ values while searching.
When we use the special value $DELETED$, however, search times no longer depend on the load factor $\alpha$, and for
this reason chaining is more commonly selected as a collision resolution technique when keys must be deleted.
Example: Say we want to perform these operations in order
* `hash_insert_v1(T,k=118)`
* `hash_delete(T, k=39)`
* `hash_search(T, k=118)`
with $T$ as
| index $i$ | key $k$ |
|----------- |---------- |
| $0$ | $15$ |
| $1$ | $25$ |
| $2$ | $NIL$ |
| $3$ | $NIL$ |
| $\ldots$ | $\ldots$ |
| $5$ | $NIL$ |
| $6$ | $NIL$ |
| $7$ | $39$ |
| $\ldots$ | $\ldots$ |
| $n$ | $NIL$ |
Would result in:
1. $h(118,0) = 2$, Collision
2. $h(118,0) = 7$, Collision
3. $h(118,0) = 3$, Empty
1. We proceed with insertion
| index $i$ | key $k$ |
|----------- |---------- |
| $0$ | $15$ |
| $1$ | $25$ |
| $2$ | $NIL$ |
| $3$ | **118** |
| $\ldots$ | $\ldots$ |
| $5$ | $NIL$ |
| $6$ | $NIL$ |
| $7$ | $39$ |
| $\ldots$ | $\ldots$ |
| $n$ | $NIL$ |
Now, we want to remove the key $k=39$
| index $i$ | key $k$ |
|----------- |---------- |
| $0$ | $15$ |
| $1$ | $25$ |
| $2$ | $NIL$ |
| $3$ | **118** |
| $\ldots$ | $\ldots$ |
| $5$ | $NIL$ |
| $6$ | $NIL$ |
| $7$ | ~39~ |
| $\ldots$ | $\ldots$ |
| $n$ | $NIL$ |
And finally we look for $k=118$:
1. $h(k,0) = 2$, is not the right one
2. $h(k,1) = 7 \Rightarrow NIL$
1. Then $k=118 \notin T$ since we used the value $NIL$ to substitute the previous one upon deletion.
2. By construction of the algorithm this stops the search.
### Rewriting Insert
```python
hash_insert_v2(T, k)
int i = 0, found = 0; #We can set found to a boolean too
do:
int j = h(k, i); # Compute the hash function
if(T[j] == NULL || T[j] == DELETED):
T[j] = k; #We can insert only if it's an empty cell
found = 1; #Set to true/1
else:
i++;
while(i != m && !found); # same as (i==m || found)
if(found):
return j;
else:
raise_error (“overflow hash table”);
$$
**Final Time Complexity**: $T(n) = \mathcal{O}(n)$
## Analysis of Open Addressing
We will use the following assumptions:
1. Simple Uniform Hashing
2. No deletions
The analysis will be done in terms of load factor $0 < alpha \leq 1$ because we only have the cells in our table. Otherwise,
we would have an overflow.
### Theorem - Unsuccessful Search
Given an open-address hash table with load factor $\alpha = n/m < 1$, the expected
number of probes in an **unsuccessful search** is at most $1/(1-\alpha)$, assuming uniform hashing
#### Demonstration
$$\alpha < 1 \Rightarrow \text{ There are some empty cells } \Rightarrow \text{ Search stops when we find one }$$
If we consider the number of inspections at each iteration:
1. First one, $1$
2. Second one, $\frac{n}{m} = \alpha$
3. Third one, $\frac{n}{m} \cdot \frac{n-1}{m-1} \leq \alpha^{2}$
1. Where $\frac{n-1}{m-1} \leq \alpha$
4. n-th one, $\frac{n}{m} \cdot \frac{n-1}{m-1}$
Then, we can find the total inspections as the geometric series
$$
1 + \alpha + \alpha^{2} + \ldots \leq \sum_{i=0}^{\infty} \alpha^{i} \stackrel{|\alpha|<1} \Longrightarrow \sum_{i=0}^{\infty} \alpha^{i} = \frac{1}{1-\alpha}
$$
#### Interpretation
If $\alpha$ is constant an unsuccessful search is executed $T_{avg}(n)= \Theta(1)$
* If the table is half full, $\alpha = 0.5 = 1/2 \Rightarrow \text{ Average Inspections } = 1/(1-1/2) = 2$
* If the table is $90%$ full, $\alpha = 0.9 = 9/10 \Rightarrow \text{ Average Inspections } = 1/(1-9/10) = 10$
#### Corollary 11.7
Inserting an element into an open-address hash table with load factor $\alpha$ requires at most $1/(1-\alpha)$
probes on average, assuming uniform hashing.
Demonstration: An element is inserted only if there is room in the table, and thus $\alpha < 1$. Inserting a key
requires an unsuccessful search followed by placing the key into the first empty slot found.
Thus, the expected number of probes is at most $1/(1-\alpha)$ as the previous theorem states.
### Theorem - Successful Search
Given an open-address hash table with load factor $\alpha = n/m < 1$, the expected
number of probes in an **unsuccessful search** is at most $\frac{1}{\alpha} \cdot ln(\frac{1}{1-\alpha})$, assuming
uniform hashing and assuming that each key in the table is equally likely to be searched for.
#### Interpretation
If $\alpha$ is constant a successful search is executed $T_{avg}(n)= \Theta(1)$
* If the table is half full, $\alpha = 0.5 = 1/2 \Rightarrow \text{ Average Inspections } \leq 1.387$
* If the table is $90%$ full, $\alpha = 0.9 = 9/10 \Rightarrow \text{ Average Inspections } \leq 2.559$
## Restructuring
When $\alpha > 0.5$ we double its size, and proceed with inserting the currently existing elements into the new table.
This may call for:
* Rehashing of all these key values
* Transferring all the records
This is because the average time for operations can become quite substantial.
Subsequent dictionary operations will be more efficient and can more than make up for the overhead in creating
the larger table. Usually it doubles the size. [1]
### Conclusions
Pros:
* No size overhead apart from the hash table array.
* Better memory locality and cache performance. All elements laid out linearly in memory.
* Performs better than closed addressing when the number of keys is known in
advance and the churn is low.
Cons:
* Chaining results better performing than this implementation ($\alpha > 2$ )
---
### Extra Credits
* [1] [Restructuring](https://gtl.csa.iisc.ac.in/dsa/node51.html)