A chatbot is one of the most important applications of Natural Language Processing. In the introductory article, we had discussed chatbots in brief. Chatbots are growing immensely in popularity so we must learn about them.
A good definition of a chatbot is that it is a software program that processes and mimics human conversation (either written or spoken) and that allows humans to interact with them as if they were communicating with a real person. Chatbots can be very simplistic or they can be very sophisticated.
Some of the chatbots you most commonly interact with are the ones you see when you visit a website. On the websites, a small pop-up brings up the chatbot and they ask you if you need any help with your query. The digital assistants on your phone are also chatbots but they are very sophisticated.
Why were chatbots created?
Digitization is transforming the way we live today. With easier access to the internet and cheaper prices, everyone in society now knows how to use a smartphone. Chatbots basically help to build a bridge between businesses and customers.
Businesses can leverage the power of chatbots and reduce human representatives thereby cutting back costs. This is one of the main reasons why see a boom in chatbots nowadays. Plus it is also easier for the customer to get answers to their queries quickly and efficiently. Thus to save time and money chatbots were invented but they are growing immensely now.
A basic chatbot
In this tutorial, we build a basic chatbot. Till now we have learned the concepts of vectorization (BoW and TF-IDF), we have also learned about cosine similarity in the previous article. Using these two concepts and keeping them in mind we will build a basic chatbot that can answer user queries.
The most important thing to keep in mind while building chatbots is that the corpus on which the chatbot will be trained should be relevant and exhaustive. By relevant I mean that if you are building a chatbot for an E-commerce website then the Chabot should be trained on an E-commerce corpus, not on a medical corpus.
By exhaustive I mean that the corpus should not just be a few documents, it should be vast and extensive because chatbots need a lot of data to perform well. The data we will be using for this section can be found at http://jmcauley.ucsd.edu/data/amazon/qa/. On this page, head on to the “Per-category files” section and select “Electronics” to download the data.
Building a chatbot with Python
The data we have for this tutorial is Amazon’s Q&A data for the electronics section. Suppose we are building this chatbot for Amazon’s Electronics store. Thus this data is both relevant and extensive. This data contains questions and answers in JSON format.
Below is a snapshot of the dataset in JSON format.
We begin by importing the necessary libraries and modules.
import numpy as np
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
import ast
We now have to read in the data and get a list of questions and their answers. For this purpose, we define the following function.
Let’s see what each line in the code snippet below does. In the first and second lines, we initialize two empty lists to store our questions and answers respectively. On the third line, we open the file in read mode. Please make sure that you pass the appropriate file path as a parameter here.
On the fourth line, we begin the for loop and we iterate over every line in the JSON file. On the fifth line, we use the literal_eval
function from the ast
library which helps convert our data to a dictionary. If you look at the data snapshot above, you see that the question and answer are in a string format. We convert it to a dictionary format.
On the sixth and seventh lines, we extract only the questions and answers from each line of the dataset, convert it to lower, and then append it to the empty lists we had defined before.
questions = []
answers = []
with open("/content/drive/MyDrive/Datasets/qa_Electronics.json",'r') as f:
for line in f:
data = ast.literal_eval(line) # converting to dictionary
questions.append(data['question'].lower())
answers.append(data['answer'].lower())
After running the above code, we can now observe the lists we just created. First, we look at the first five questions in the data.
questions[:5]
Output:
['is this cover the one that fits the old nook color? which i believe is 8x5.',
'does it fit nook glowlight?',
'would it fit nook 1st edition? 4.9in x 7.7in ?',
"will this fit a nook color that's 5 x 8?",
'will this fit the samsung galaxy tab 4 nook 10.1']
Now, we look at the first five corresponding answers below.
answers[:5]
Output:
['yes this fits both the nook color and the same-shaped nook tablet',
'no. the nook color or color tablet',
"i don't think so. the nook color is 5 x 8 so not sure anything smaller would stay locked in, but would be close.",
'yes',
"no, the tab is smaller than the 'color'"]
As we have seen, we were able to successfully load the dataset and have extracted the questions and answers to a more suitable format to work with. What we now have is a corpus of questions and answers on which our chatbot will be trained. After training, it will be ready to answer simple questions.
We now want to convert our text into a suitable mathematical format. So, we will perform vectorization. First, we convert all the questions to a BoW model using the Countvectorizer. Observe the code snippet below.
On the first line, we initialize our vectorizer i.e. a CountVectorizer. Note that we are also passing a parameter that will filter and remove all the stopwords from our text. On the next line, we fit this vectorizer to our data and transform it.
vectorizer = CountVectorizer(stop_words='english')
X_vec = vectorizer.fit_transform(questions)
If we check the X_vec
variable, we see that it is a sparse matrix which means that our data is now transformed into a Bag of Words model.
Now we apply the TF-IDF vectorization to this transformed data. This will give us a better representation of our data using the Term Frequency – Inverse Document Frequency method. In the code snippet below, we first initialize the vectorizer then fit and transform the previous representation to a newer one.
tfidf = TfidfTransformer()
X_tfidf = tfidf.fit_transform(X_vec)
We see that X-tfidf
too is a sparse matrix. So our questions data is now transformed and ready.
We now define a function that will take in the customer’s query and find the most similar question to it. Let’s check out the function line-by-line.
On the first line, we define our function that takes a single parameter which is our customer’s question. On the second line, we define three global variables. As you know, variables defined as global are not limited to the scope of this function only.
On the third line, we take the customer’s question or query and convert it into a BoW representation using the vectorizer initialized previously. On the next line, we transform this query further into a TF-IDF representation. Notice that these are the same steps we had taken to get the representation of our question dataset.
On the fifth line, we calculate the cosine similarity between the input query and all the questions in our corpus. We select the question from our corpus that is most similar to the customer’s question. We convert the calculation from radians to degrees after taking the inverse cosine of the calculation.
On the sixth line, we start an If-else statement. We say that if the cosine angles in degrees are greater than a threshold of 60 our chatbot will discard the query. We can customize this threshold according to our domain and business needs to maintain efficiency.
However, if the angle is below our set threshold, we will return the answer corresponding to the best match to the input question or query asked by our customer.
def conversation(im):
global tfidf, answers, X_tfidf
Y_vec = vectorizer.transform(im)
Y_tfidf = tfidf.fit_transform(Y_vec)
cos_sim = np.rad2deg(np.arccos(max(cosine_similarity(Y_tfidf, X_tfidf)[0])))
if cos_sim > 60 :
return "sorry, I did not quite understand that"
else:
return answers[np.argmax(cosine_similarity(Y_tfidf, X_tfidf)[0])]
Finally, we define our main function. On the second line, we get the customer’s name. On the third line, we ask for the customer’s query or question. On the fourth line, we start a while loop.
Inside the while loop, we first get the customer’s query. Then, we define an if-else statement. As long as the customer doesn’t type “bye”, our chatbot will carry on the conversation. Run this function using the code below.
def main():
usr = input("Please enter your username: ")
print("support: Hi " + usr + ", welcome to Q&A support. How can I help you?")
while True:
im = input("{}: ".format(usr)) # get the input query
if im.lower() == 'bye':
print("Q&A support: bye!")
break
else:
print("Q&A support: "+conversation([im]))
To start conversing with the chatbot, just call the main function.
main()
As our chatbot starts working, we first have to enter our name and then our query or question. I have asked the chatbot a few questions and queries as you can see in the snippet below. I have also provided the output our chatbot gives.
Please enter your username: Elton
support: Hi Elton, welcome to Q&A support. How can I help you?
Elton: is the transformer in stock
Q&A support: 1 brand new and 3 used.
Elton: where are the batteries
Q&A support: batteries are in the film packs not the actual camera.
Elton: does it have a radio
Q&A support: yes it does
Elton: dimensions of my parcel
Q&A support: shipping (box) dimension is approximately 55"x5"x13". weights around 32lbs.
Elton: bye
Q&A support: bye!
As you can see our basic chatbot performs well. But it is far from sophisticated. Nonetheless, we have done a good job training the model. Try to converse with it yourself and remember it can answer queries and questions related only to electronics data because that is the data we have trained it on.
Final Thoughts
In this fun article, we first were acquainted with chatbots and why they are growing in popularity. Then, using the concepts we had learned previously, we built a basic chatbot by training it on Amazon’s electronics store dataset.
Thanks for reading.