×

Trie in Python

Trie Python Feature

A Trie is a special data structure used to store strings that can be visualized like a graph. A Trie is a tree-based data structure that organizes information in a hierarchy. It constitutes nodes and edges.

The Trie data structure satisfies the following properties:

  • It is used to store or search strings in a time and space efficient way.
  • Any node in a trie can store non repetitive multiple characters.
  • Every node stores link of the next character of the string.
  • Every node has to keep track of ‘end of string’.
Image 211

In the above example, if we follow the hierarchy of the Trie and level down its edges, the following strings would be formed – AIR, AIT, BAR, BIL, BM.

The Trie data structure is important for solving various problems such as the spelling checker, auto-completion of words, etc.

Operations on Trie

Creation of Trie

To create a Trie in python, we first need to declare a “TrieNode” class. Within this class, we will initialize the __init__ function. Then, we create the “Trie” class and overwrite its __init__ function as well.

Implementation of Creation of a Trie

The “TrieNode” class represents the nodes or data elements of the trie data structure. Each object of the TrieNode class stores two data items – a list of its children nodes and a boolean value endOfString which is initially set to False.

The Trie class represents the Trie data structure itself. Within the __init__ function of this class, we declare the root node of the Trie data structure which serves as a pointer to the entire tree.

#Creation of TrieNode and Trie classes 
class TrieNode:
  def __init__(self):
    self.children = {}
    self.endOfString = False

class Trie:
    def __init__(self):
      self.root = TrieNode()

newTrie = Trie()

Time and Space Complexity

The time complexity for the creation of the Trie is O(1) because no additional time is required to initialize data items. The space complexity is O(1) as well because no additional memory is required.

Insertion of a String in a Trie

While inserting a string in Trie, we have to follow the hierarchical order in which the strings will be interpreted at the time of traversal. Then, we have to look for the common elements between the string to be inserted and the already existing strings.

Once, we find the common elements, we then have to arrange the new characters in a way that does not disturb the order of storage of other strings. Lastly, we update the endOfString boolean variable for every newly added word.

Image 212

Implementation of Insertion of a String in a Trie

For the implementation of insertion in a Trie data structure, we create the insertString function that takes the word to be inserted as its parameter. Then, we code the algorithm of the program in such a way that it tackles the following four conditions that may be encountered while insertion.

  • The Trie is blank – If the given Trie is empty, then we simply insert the required string by creating new nodes for every character of that string. Then, we set the endOfString variable to true after inserting the last alphabet of the string.
  • New string’s prefix is common to another string’s prefix – We will not insert the prefix and the following common characters in the Trie. When we find different characters, we’ll insert them in seperate nodes.
  • New string’s prefix is already present as a complete string – We use the common elements of the other string and add new nodes to hold the remaining characters of the new word. When the word is completed, we insert a node with endOfString variable set to True.
  • String to be inserted is already present in the Trie – In this case, we do not insert any node in the Trie.
#Function to insert new strings to the Trie
def insertString(self, word):
        current = self.root
        for i in word:
            ch = i
            node = current.children.get(ch)
            if node == None:
                node = TrieNode()
                current.children.update({ch:node})
            current = node
        current.endOfString = True
        print("The string has been successfully inserted.")

#Initializing the Trie
newTrie = Trie()
newTrie.insertString("App")
newTrie.insertString("Appl")

#Output
The string has been successfully inserted.
The string has been successfully inserted.

Time and Space Complexity

The time complexity of insertion of a string in a Trie is O(m) where “m” is the length of the word to be inserted. This is because we check common elements for every character of the word while inserting. The space complexity is O(m) as well because, for the worst-case scenario, we’ll have to create new nodes for every character of the string.

Searching a String in a Trie

Since data is entered in a hierarchical order in the Trie data structure, we search for a string character by character. Every prefix character should match our search for the next alphabet to be compared.

Once, we find the common elements, we then have to check whether or not the word is complete by investigating the endOfString variable.

Image 213

Implementation of Searching a String in a Trie

For the implementation of searching in a Trie data structure, we create the searchString function that takes the word to be searched as its parameter. Then, we code the algorithm of the program in such a way that it tackles the following three conditions that may be encountered while performing a search.

  • String does not exist in the Trie – We’ll compare the first character with the root node of the Trie. So, we simple terminate the search at this point and return a message “String does not exist in the Trie.”
  • String exists in the Trie – We’ll compare the first character if the string with the root node. If they are equal, then we move on to its children an compare the next character of the word. This process is continued until all the characters are compared. If each of them exist in the Trie, then we return the message “String is found in the given Trie.”
  • String is a prefix of another string but it does not exist in the Trie – For this case, we check the endOfString variable. If it coincides with the size of the word and all the elements exist in the prefix, then the string is found. Otherwise, declare that the search failed.
#Function to search a string in a Trie
def searchString(self, word):
        currentNode = self.root
        for i in word:
            node = currentNode.children.get(i)
            if node == None:
                return "String does not exist in the Trie."
            currentNode = node

        if currentNode.endOfString == True:
            return "String is found in the given Trie."
        else:
            return "String does not exist in the Trie."

#Initializing the Trie
newTrie = Trie()
newTrie.insertString("App")
newTrie.insertString("Appl")

print(newTrie.searchString("App"))
print(newTrie.searchString("Ap"))

#Output
String is found in the given Trie.
String does not exist in the Trie.

Time and Space Complexity

The time complexity of search of a string in a Trie is O(m) where “m” is the length of the word to be searched. This is because we check common elements for every character of the word while searching. The space complexity is O(1) because no additional memory is required.

Deleting a String from a Trie

For the deletion of a string from a Trie, first, we find out whether or not the string exists. If the string to be deleted is found, we remove those nodes from the Trie which do not disturb the other strings present in it. Deletion from a Trie always begins from the leaf node towards the root node.

However, we have to keep in mind that we delete entire strings and not just prefixes of the word. This can be determined by the endOfString variable which indicates the end of a given string.

Image 214

Implementation of Deletion of a String from a Trie

For the implementation of deletion of a string from a Trie, we create the function deleteString which takes three arguments – the root node of the Trie, the word to be deleted, and the index for characters in the word. The algorithm for deletion is made in such a way that it works for any of the four cases that may be encountered while deletion.

  • Some other string has the same prefix as the one we want to delete – We remove the enddOfString node of the word to be deleted. Then, we subsequently delete character nodes. If we visit a node that affects some other string of the Trie, we terminate the process.
  • The string is a prefix of another string – We will not delete this string as it is the prefix of another string. Instead, we will just adjust the endOfString variable to False so that the Trie no longer recognizes this string.
  • Some other string is the prefix of the string to be deleted – We set the endOfString variable to False for the string to be deleted. Then, we delete all nodes except the prefix string. Finally, we modify the endOfString character of the prefix string.
  • No node depends on the string – In this case, we delete the endOfString node of the string. Then we delete the character of the string from within the node.
#Function to delete a string from a Trie
def deleteString(root, word, index):
    ch = word[index]
    currentNode = root.children.get(ch)
    canNodeBeDeleted = False

    if len(currentNode.children) > 1:
        deleteString(currentNode, word, index+1)
        return False
    
    if index == len(word) - 1:
        if len(currentNode.children) >= 1:
            currentNode.endOfString = False
            return False
        else:
            root.children.pop(ch)
            return True
    
    if currentNode.endOfString == True:
        deleteString(currentNode, word, index+1)
        return False

    canNodeBeDeleted = deleteString(currentNode, word, index+1)
    if canNodeBeDeleted == True:
        root.children.pop(ch)
        return True
    else:
        return False

#Initializing the Trie
newTrie = Trie()
newTrie.insertString("App")
newTrie.insertString("Appl")

deleteString(newTrie.root, "App", 0)
print(newTrie.searchString("App"))

#Output
String does not exist in the Trie.

Time and Space Complexity

The time complexity of deletion of a string from a Trie is O(m) where “m” is the length of the word to be deleted. This is because we check common elements for every character of the word while deletion so that other strings should not be affected. The space complexity is O(1) because no additional memory is required.