Hashing Techniques in Data Structures

1. What is Hashing?

Definition: Hashing is a technique used to map large data sets into smaller tables using a hash function that generates an index.

Real Time Biotechnology Usage: Used in genomic databases, DNA indexing systems, protein sequence search systems like BLAST.

/* Basic Hash Table Implementation */
#include
#define SIZE 10
int hashTable[SIZE];

int hash(int key){
    return key % SIZE;
}

void insert(int key){
    int index = hash(key);
    hashTable[index] = key;
}

int main(){
    insert(25);
    insert(35);
    return 0;
}

2. Division Method

Definition: h(k) = k mod m

Element Formula Index Collision
1212 % 102No
2222 % 102Yes
3232 % 102Yes
4545 % 105No
5555 % 105Yes
6565 % 105Yes
7878 % 108No
8888 % 108Yes
9999 % 109No
1919 % 109Yes

Biotechnology Usage: Used in DNA fragment storage systems and genome indexing.

#include
#define SIZE 10
int table[SIZE];

int hash(int key){
    return key % SIZE;
}

void insert(int key){
    int index = hash(key);
    while(table[index] != 0){
        index = (index + 1) % SIZE;
    }
    table[index] = key;
}

3. Mid-Square Method

Definition: Square the key and extract middle digits.

ElementFormulaIndexCollision
1111²=121 → 22No
2121²=441 → 44No
3131²=961 → 66No
4141²=1681 → 88No
5151²=2601 → 00No
6161²=3721 → 22Yes
7171²=5041 → 44Yes
8181²=6561 → 66Yes
9191²=8281 → 88Yes
1313²=169 → 66Yes

Biotechnology Usage: Used in molecular sequence indexing.

int hash(int key){
    int square = key * key;
    return (square/10) % 10;
}

4. Folded Method

Definition: Split key into parts and sum them.

ElementFormulaIndexCollision
123412+34=46 →66No
234523+45=68 →88No
345634+56=90 →00No
456745+67=112 →22No
567856+78=134 →44No
678967+89=156 →66Yes
789078+90=168 →88Yes
890189+01=90 →00Yes
901290+12=102 →22Yes
112211+22=33 →33No

Biotechnology Usage: Used in chromosome ID mapping systems.

int hash(int key){
    int part1 = key / 100;
    int part2 = key % 100;
    return (part1 + part2) % 10;
}

5. Multiplication Method

Definition: h(k) = floor(m (kA mod 1))

ElementFormulaIndexCollision
10floor(10*(10*0.618 mod1))8No
20floor(10*(20*0.618 mod1))6No
30floor(10*(30*0.618 mod1))4No
40floor(10*(40*0.618 mod1))2No
50floor(10*(50*0.618 mod1))0No
60floor(10*(60*0.618 mod1))8Yes
70floor(10*(70*0.618 mod1))6Yes
80floor(10*(80*0.618 mod1))4Yes
90floor(10*(90*0.618 mod1))2Yes
100floor(10*(100*0.618 mod1))0Yes

Biotechnology Usage: Used in high-speed genome indexing and drug discovery databases.

int hash(int key){
    float A = 0.618;
    return (int)(10 * ((key * A) - (int)(key * A)));
}

Theoretical Overview: How to Overcome Collisions

Collision Resolution Techniques:

In biotechnology systems like DNA sequence databases and protein indexing, chaining and double hashing are widely used to efficiently manage large biological datasets.