Hashing is a method of indexing and sorting data. The idea behind hashing is to allow large amounts of data to be indexed using keys commonly created by formulas.
This is done by taking the help of some function or algorithm which is called a hash function to map data to some encrypted value which is termed as “hash code” or “hash”.
The use of hashing is best applicable to the problems where the search is performed quite often. Hashing provides better time complexity than other data structures for the implementation of search. This time complexity depends on some other factors as well, which will be discussed in further sections.
In the above example, the hash function is responsible for converting the given string into numbers using some formulas. The next step is to store these strings in an array or list.
Suppose we have a python list. We will store the string “Apple” in the 18th index of the list. Similarly, “Application” is stored in the 20th index and “Appmillers” is stored in the 22nd index. This makes accessing the elements easier.
- Hash Function – Hash function is a function that can be used to map data of arbitrary size to data of fixed size.
- Key – Key is the data input by the user in the form of strings.
- Hash Value – Hash value is the value returned by the hashing function. This is the value that is generated when the given string is converted to another form, integer for example.
- Hash Table – Hash Table is a data structure which stores data in an associative manner. In a hash table, data is stored in an array format, where each data value has its own unique index value.
- Collision – A collision occurs when two different keys to a hash function produce the same output.
The mod function holds two parameters – the number input by the user and the number of cells in the array. The mod function returns the remainder when the given number is divided by the number of elements in the array. Then, that number can be stored at the index equal to the value returned by the mod function.
For example, if there’s an array of size 24 and the mod function is given 400 as the key, then the value generated by the mod function will be 16 (400 divided by 24 leaves the remainder 16). Hence, we will store the key at the 16th position in our list/array.
#Mod Function for Hashing def mod(number, cellNumber): return number % cellNumber print(mod(400, 24)) #Output 16
The ASCII function holds two parameters – the string input by the user and the number of cells in the array. The ASCII function sums the ASCII value of each character in the string and divides it by the total number of elements in the list. The remainder of the same is returned and the string is stored at that index of the list.
For example, let the user enter a string “ABC” as the key to be hashed. The ASCII value of “A” is 65, “B” is 66, and “C” is 67. The function sums this up to 198. When 198 is divided by 24, 6 is the remainder received. Hence, we store “ABC” at the 6th index of the list.
#ASCII Function for Hashing def modASCII(string, cellNumber): total = 0 for i in string: total += ord(i) return total % cellNumber print(modASCII("ABC", 24)) #Output 6
Characteristics of a Good Hash Function
There are four main characteristics to judge a good hash function:
- The hash value is fully determined by the data being hashed.
- The hash function uses all the input data.
- The hash function uniformly distributes the data across the entire set of possible hash values.
- The hash function generates very different hash values for similar strings.
Types of Collision Resolution Techniques
A Hash Collision situation is when the resultant hashes for two or more data elements in the data set, map to the same location in the hash table. In such a situation, two or more data elements would qualify to be stored/mapped to the same location in the hash table.
In the above example, the ASCII method is used for hashing. The values generated for the first two strings are the same. This results in a collision since two strings compete for the 2nd index in the list.
Collisions are resolved using a list of elements to store objects with the same key together. In the method of Direct chaining, each cell in a hash table is made to point to a linked list of records that have the same values as generated by the hash function.
This collision resolution technique requires a hash table with fixed and known sizes. During insertion, if a collision is encountered, alternative cells are tried until an empty bucket is found. These techniques require the size of the hash table to be supposedly larger than the number of objects to be stored.
There are three types of open addressing techniques:
- Linear Probing
- Quadratic Probing
- Double Hashing
For executing the technique of Linear probing, we take a hash table of fixed size, and every time a hash collision is encountered, we linearly traverse the table in a cyclic manner to find the next empty slot.
The general idea behind quadratic probing remains the same. The only difference is that we look at the Q(i)th increment at each iteration when looking for an empty slot, where Q(i) is some quadratic expression of the index i.
Double Hashing is based upon the idea that in the event of a collision we use another hashing function with the key-value as an input to find where in the open addressing scheme the data should actually be placed at.
Hashing vs Other Data Structures Time and Space Complexity
|Operations||Array/Python List||Linked List||Tree||Hashing|