Hash Tables

Hash tables are used to efficiently store and retrieve unique data. Ideally, these operations would run in $O(1)$, but that may not be the case. In this section, we use a large fixed-size vector to store the data.

Hash Function

Given an element $x$, we use a hash function to map $x$ into an index $i$, $0 \leq i < n$, where $n$ is the size of the table. If the hash function runs in $O(1)$, we can store and retrieve $x$ at the index $i$ of the table in $O(1)$. If two elements are mapped to the same index, however, a collision will occur, requiring more computational time. To reduce the chance of collision, we must use a large table and a good hash function that spreads evenly the data across the indices. How good a hash function is depends on the particular data.

We can make a simple and universal hash function by converting the binary data of $x$ into a large binary number, and then computing this number modulo $n$. To lower the chances of collision, we should choose a prime number for $n$, as suggested in the table below.

Intended size	Hash table size
1,000	1,009
10,000	10,007
100,000	100,003
1,000,000	1,000,003

const int MAX_SIZE = 10007;

int hashcode(void *value, int value_size) {
    int index, code;
    for (index = 0, code = 0; index < value_size; index++) {
        unsigned char *byte = index + (unsigned char *) value;
        code = (code << 8 + *byte) % MAX_SIZE;
    }
    return code;
}

Collision

A collision happens when two elements are mapped to the same index by a hash function. There are many strategies to deal with it, and we present two simple strategies. Let $n$ be the size of the hash table, and suppose that it will store $O(n)$ elements in the worst case.

Linked list. We store a simple linked list in each entry of the table, which will contain all elements that have the same hash value. In the worst case, retrieving an element would cost $O(n)$.

Static collision. We use a hash function that produces a sequence of different indices. If there is a collision with the first index, we try the next one and so forth, until there is no collision. Given a hash function $f$, we can create this new hash function $g$ as:

\[g(x, t) = f(x) + k \cdot t \mod{n},\]

where $k$ and $n$ are relative primes. In the worst case, retrieving an element would cost $O(n)$.

Structure

We present two different structures for the hash table, one for each collision strategy. In the examples below we use a simple int value as data, but in a real scenario a more complex object would be stored.

Linked list strategy

typedef struct node_s {
    int value;
    struct node_s *next;
} node_t;

node_t hashtable[MAX_SIZE];

void init(node_t table[]) {
    int i;
    for (i = 0; i < MAX_SIZE; i++) {
        table[i].next = NULL;
    }
}

Static collision strategy

#include <assert.h>

typedef struct {
    int value;
    char used;
} elem_t;

elem_t hashtable[MAX_SIZE];

void init(elem_t table[]) {
    int i;
    for (i = 0; i < MAX_SIZE; i++) {
        table[i].used = 0;
    }
}

int get_index(elem_t hash[], int value) {
    int code = hashcode(&value, sizeof (int));
    int index = code;
    while (hash[index].used && hash[index].value != value) {
        index = (index + 2) % MAX_SIZE;
        assert(index != code); /* overflow */
    }
    return index;
}

Insertion

Insert an element in the position computed by the hash function.

Linked list strategy

#include <stdlib.h>

void insert(node_t table[], int value) {
    int index = hashcode(&value, sizeof (int));
    node_t *node = (node_t *) malloc(sizeof (node_t));
    node->value = value;
    node->next = table[index].next;
    table[index].next = node;
}

Static collision strategy

void insert(elem_t table[], int value) {
    int index = get_index(table, value);
    table[index].value = value;
    table[index].used = 1;
}

Search

Search and retrieve an element from the hash table.

Linked list strategy

node_t * get_prior(node_t *node, int value) {
    while (node->next != NULL && node->next->value != value) {
        node = node->next;
    }
    return node;
}

int * search(node_t table[], int value) {
    int index = hashcode(&value, sizeof (int));
    node_t *prior = get_prior(&table[index], value);
    return prior->next != NULL ? &prior->next->value : NULL;
}

Static collision strategy

int * search(elem_t table[], int value) {
    int index = get_index(table, value);
    return table[index].used ? &table[index].value : NULL;
}

Removal

Remove an element from the hash table, if it is present.

Linked list strategy

#include <stdlib.h>

void remove(node_t table[], int value) {
    int index = hashcode(&value, sizeof (int));
    node_t *prior = get_prior(&hash->table[index], key);
    if (prior->next != NULL) {
        node_t *node = prior->next;
        prior->next = node->next;
        free(node);
    }
}

void remove_all(node_t table[]) {
    int i;
    for (i = 0; i < MAX_SIZE; i++) {
        while (table[i].next != NULL) {
            node_t *node = table[i].next;
            table[i].next = node->next;
            free(node);
        }
    }
}

Static collision strategy

void remove(elem_t table[], int value) {
    int index = get_index(table, value);
    table[index].used = 0;
}

Data structures
Vectors \| Linked Lists \| Stacks \| Queues \| Hash Tables \| Heaps \| B-trees \| Segment Tree \| Union-Find Disjoint Set

Repository of Algorithms

A selection of common and useful algorithms written in ANSI C

Hash Tables

Hash Function

Collision

Structure

Insertion

Search

Removal