Hashing Techniques

1. What is Hashing?

Definition: Hashing is a technique used to map large data sets into smaller tables using a hash function that generates an index.

Real Time Biotechnology Usage: Used in genomic databases, DNA indexing systems, protein sequence search systems like BLAST.

/* Basic Hash Table Implementation */
#include
#define SIZE 10
int hashTable[SIZE];

int hash(int key){
    return key % SIZE;
}

void insert(int key){
    int index = hash(key);
    hashTable[index] = key;
}

int main(){
    insert(25);
    insert(35);
    return 0;
}

2. Division Method

Definition: h(k) = k mod m

Element	Formula	Index	Collision
12	12 % 10	2	No
22	22 % 10	2	Yes
32	32 % 10	2	Yes
45	45 % 10	5	No
55	55 % 10	5	Yes
65	65 % 10	5	Yes
78	78 % 10	8	No
88	88 % 10	8	Yes
99	99 % 10	9	No
19	19 % 10	9	Yes

Biotechnology Usage: Used in DNA fragment storage systems and genome indexing.

#include
#define SIZE 10
int table[SIZE];

int hash(int key){
    return key % SIZE;
}

void insert(int key){
    int index = hash(key);
    while(table[index] != 0){
        index = (index + 1) % SIZE;
    }
    table[index] = key;
}

3. Mid-Square Method

Definition: Square the key and extract middle digits.

Element	Formula	Index	Collision
11	11²=121 → 2	2	No
21	21²=441 → 4	4	No
31	31²=961 → 6	6	No
41	41²=1681 → 8	8	No
51	51²=2601 → 0	0	No
61	61²=3721 → 2	2	Yes
71	71²=5041 → 4	4	Yes
81	81²=6561 → 6	6	Yes
91	91²=8281 → 8	8	Yes
13	13²=169 → 6	6	Yes

Biotechnology Usage: Used in molecular sequence indexing.

int hash(int key){
    int square = key * key;
    return (square/10) % 10;
}

4. Folded Method

Definition: Split key into parts and sum them.

Element	Formula	Index	Collision
1234	12+34=46 →6	6	No
2345	23+45=68 →8	8	No
3456	34+56=90 →0	0	No
4567	45+67=112 →2	2	No
5678	56+78=134 →4	4	No
6789	67+89=156 →6	6	Yes
7890	78+90=168 →8	8	Yes
8901	89+01=90 →0	0	Yes
9012	90+12=102 →2	2	Yes
1122	11+22=33 →3	3	No

Biotechnology Usage: Used in chromosome ID mapping systems.

int hash(int key){
    int part1 = key / 100;
    int part2 = key % 100;
    return (part1 + part2) % 10;
}

5. Multiplication Method

Definition: h(k) = floor(m (kA mod 1))

Element	Formula	Index	Collision
10	floor(10(100.618 mod1))	8	No
20	floor(10(200.618 mod1))	6	No
30	floor(10(300.618 mod1))	4	No
40	floor(10(400.618 mod1))	2	No
50	floor(10(500.618 mod1))	0	No
60	floor(10(600.618 mod1))	8	Yes
70	floor(10(700.618 mod1))	6	Yes
80	floor(10(800.618 mod1))	4	Yes
90	floor(10(900.618 mod1))	2	Yes
100	floor(10(1000.618 mod1))	0	Yes

Biotechnology Usage: Used in high-speed genome indexing and drug discovery databases.

int hash(int key){
    float A = 0.618;
    return (int)(10 * ((key * A) - (int)(key * A)));
}

Theoretical Overview: How to Overcome Collisions

Collision Resolution Techniques:

1. Linear Probing: Find next empty slot sequentially.
2. Quadratic Probing: Use quadratic formula for new index.
3. Double Hashing: Use second hash function.
4. Chaining: Use linked list at each index.
5. Rehashing: Increase table size and recompute.