Build a Simple Chatbot with Tensorflow, Python and MongoDB

In order to learn about some of the latest neural network software libraries and tools, the following is a description of a small project to build a chatbot. Given increasing popularity of chatbots and their growing usefulness, it seemed like a reasonable endeavor to build one. Nothing complicated, but enough to better understand how contemporary tools are used to do so.

This material assumes MongoDB is already installed, a Python 3.6 environment is installed and usable. Also, basic knowledge of NoSQL, machine learning, and coding skills are useful. The code and data used for this example or located at GitHub.


Nothing complicated, just a simple experiment to play with the combinations of Tensorflow, Python, and MongoDB. The requirements are as defined:

  • A chatbot needs to demonstrate a simple conversation capability.
  • A limited set of coherent responses should be returned demonstrating a basic understanding of the user input.
  • Define context in a conversation, reducing the number of possible responses to be more contextually relevant.


Use MongoDB to store documents containing

  • a defined classification or name of user input. this is the intent of the input/response interaction
  • a list of possible responses to send back to the user
  • a context value of the intent used to guide or filter which response lists makes sense to return
  • a set of patterns of potential user input. the patterns are used to build the model that will predict the probabilities of intent classifications used to determine responses.

A utility will be implemented to build models from the database content. The model will be loaded by a simple chatbot framework. Execution of the framework allows a user to chat with the bot.

The chatbot framework loads a prebuilt predictive model and connects to MongoDB to retrieve documents which contain possible responses and context information. It also drives a simple command line interface to:

  • capture user input
  • process the input and predict an intent category using the loaded model
  • randomly pick a response from the intent document

In the database, each document contains a structure including:

  • the name of the intent
  • a list of sentence patterns used to build the predictive model
  • a list of potential responses associated with the intent
  • a value indicating the context used to filter the number of intents used for a response (contextSet – to define the context; contextFilter – used to filter out unrelated intents)

For example, the ‘greeting’ intent document in the MongoDB is defined as:

    "_id" : ObjectId("5a160efe21b6d52b1bd58ce5"), 
    "name" : "greeting",
    "patterns" : [ "Hi", "How are you", "Is anyone there?", "Hello", "Good day" ],
    "responses" : [ "Hi, thanks for visiting", "Hi there, how can I help?", "Hello", "Hey" ], 
    "contextSet" : ""


To build the model used to predict possible responses, the patterns sentences are used. Patterns are grouped into intents. Basically meaning, the sentence refers to a conversational context. The MongoDB is populated with a number of documents following the structure shown above.

This example uses a number of documents to talk about AI. To populate the database with content for the model, from a mongo prompt:

> use Intents
> db.ai_intents.insert({
    "name" : "greeting",
    "patterns" : [ "Hi", "How are you", "Is anyone there?", "Hello", "Good day" ],
    "responses" : [ "Hi, thanks for visiting", "Hi there, how can I help?", "Hello", "Hey" ], 
    "contextSet" : ""

The other documents are inserted in the same way. Various tools such as NoSQLBooster are useful when working with MongoDB databases.

There is a mongodump export of the documents used in this example in the GitHub repository.

Building the Prediction Model

The first part of building the model is to read the data out of the database. The PyMongo library is used throughout this code.

# connect to db and read in intent collection
client = MongoClient('mongodb://localhost:27017/')
db = client.Intents
ai_intents = db.ai_intents

The ai_intents variable references the document collection. Next, parse information into arrays of all stemmed words, intent classifications, and documents with words for a pattern tagged with the classification (intent) name. A tokenizer is used to strip out punctuation. Each document from the ai_intents collection in the database is extracted into a cursor using ai_intents.find().

words = []
classes = []
documents = []

# tokenizer will parse words and leave out punctuation
tokenizer = RegexpTokenizer("[\w']+")

# loop through each pattern for each intent
for intent in ai_intents.find():
    for pattern in intent['patterns']:
        tokens = tokenizer.tokenize(pattern) # tokenize pattern
        words.extend(tokens) # add tokens to list

        # add tokens to document for specified intent
        documents.append((tokens, intent['name']))

        # add intent name to classes list
        if intent['name'] not in classes:

From this categorized information, a training set can be generated. The final data_set variable will contain a bag of words and the array indicating which intent it belongs to. The bag for each pattern has words identified (flagged as a 1)  in the array. The output_row identifies which intent’s pattern documents are being evaluated.

for document in documents:
    bag = [] 
    # stem the pattern words for each document element
    pattern_words = document[0]
    pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]
    # create a bag of words array
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)

    # output is a '0' for each intent and '1' for current intent
    output_row = list(output_empty)
    output_row[classes.index(document[1])] = 1

    data_set.append([bag, output_row])

The last part is to create the model. With Tensorflow and TFLearn, it is simple to create a basic deep neural network and evaluate the data set to create a predictive model from the sentence patterns defined in the intent documents. TFLearn uses numpy arrays, so the data_set array needs to be converted to the numpy array. Then the data_set is partitioned into the input data array and possible outcome arrays for each input.

data_set = np.array(data_set)

# create training and test lists
train_x = list(data_set[:,0])
train_y = list(data_set[:,1])

Defining the neural network is done by setting its shape and the number of layers. Also defined is the algorithm to fit the model. In this case, regression. The predictive model is produced by TFLearn and Tensorflow using a Deep Neural Network, using the defined training data. Then save (using pickle) the model, words, classes and training data for use by the chatbot framework.

# Build neural network
net = tflearn.input_data(shape=[None, len(train_x[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(train_y[0]), activation='softmax')
net = tflearn.regression(net)

# Define model and setup tensorboard
model = tflearn.DNN(net, tensorboard_dir='tflearn_logs')
# Start training (apply gradient descent algorithm), train_y, n_epoch=1000, batch_size=8, show_metric=True)'model.tflearn')

pickle.dump( {'words':words, 'classes':classes, 'train_x':train_x, 'train_y':train_y}, open( "training_data", "wb" ) )

Running the above code reads the intent documents, builds a predictive model and saves all the information to be loaded by the chatbot framework.

Building the Chatbot Framework

The flow of execution for the chatbot frameworks is:

  1. load training data generated during the model building
  2. build a neural net matching the size and shape of the one used to build the model
  3. load the predictive model into the network
  4. prompt the user for input to interact with the chatbot
  5. for each user input, classify which intent it belongs to and pick a random response for that intent


Code for the chatbot driver is simple. Since the amount of data being used in this example is small, it is loaded into memory. An infinite loop is started to prompt a user for input to start the dialog.

# connect to mongodb and set the Intents database for use
client = MongoClient('mongodb://localhost:27017/')
db = client.Intents

model = load_model()

The model created previously is loaded with a simple neural network defining the same dimensions used to create the model. It is now ready for use to classify input from a user.

The user input is analyzed to classify which intent it likely belongs to. The intent is then used to select a response belonging to the intent. The response is displayed back to the user. In order to perform the classification, the user input is:

clean_up_sentence function

  1. tokenized into an array of words
  2. each word in the array is stemmed to match stemming done in model building

bow function

  1. create an array the size of the word array loaded in from the model. it contains all the words used in the model
  2. from the cleaned up sentence, assign a 1 to each bag of words array element that matches a word from the model
  3. convert the array to numpy format

classify function

  1. using the bag of words, use the model to predict which intents are likely needed for a response
  2. with a defined threshold value, eliminate possibilities below a percentage likelihood
  3. sort the result in descending probability
  4. the result is an array of (intent name, probability) pairs

A sample result with the debug switch set to true may look like

enter> what is AI
[('AIdefinition', 0.9875145)]

response function

The response function gets the list of possible classifications. The core of the logic is two lines of code to find the document in the ai_intents collection matching the name of the classification. If a document is found, randomly select a response from the set of possible responses and return it to the user.

doc = db.ai_intents.find_one({'name': results[0][0]}) 
return print(random.choice(doc['responses']))

The additional logic in this function handles context about what the user asked to filter possible responses. In this example, each document has either a contextSet or contextFilter field. If the document retrieved from the database contains a contextSet value, the value should set for the current user. A userId is added to the context dictionary with the value of the entry set to the contextSet value.

if 'contextSet' in doc and doc['contextSet']:
 if debug: print('contextSet=', doc['contextSet'])
 context[userID] = doc['contextSet']

Before querying for a document based on a classification found the response function checks if a userID exists in the context. If it does, the query includes searching with the context string to match a document containing a contextFilter field with a matching context value.

if userID in context:
 doc = db.ai_intents.find_one({'$and':[{'name': results[0][0]},{'contextFilter' : { '$exists': True, '$eq': context[userID] }}]})
 del context[userID]

After finding the document, the context is no longer needed and removed from the dictionary.


Since this a simple example for experimentation, the dialog with the chatbot will certainly not be intelligent. But, the constructs and basic logic provide a basis for learning and can be expanded upon.

A sample dialog with this implementation may go like the following:

enter> hello
Hi there, how can I help?

enter> what do you know
I can talk about AI or cats. Which do you want to chat about?

enter> AI
OK. Let's talk about AI.

enter> what is AI
AI stands for artificial intelligence. It is intelligence displayed by machines, in contrast with the natural intelligence displayed by humans and other animals.

enter> when will AI take my job
AI is artificial intelligence that is evolving to become smart robot overlords who will dominate humans.

enter> I want to talk about something else
I can talk about AI or cats. Which do you want to chat about?

enter> cats
OK. Let's talk about cats.

enter> quit

Clearing cache in JPA 2.0

Managing a data cache effectively is important for application or service performance, although not necessarily easy to do. A simple and common design with a data cache that occurs and seems to cause problems is the following:

  • a process that continually commits new data to a database
  • another process executing is a separate virtual machine, typically a web service, marshaling data from the database
The web service needs to pull the most current data from the database each time there is a client request to access data. With different product implementations of caching, not all the JPA calls seem to work the same. There can be multiple levels of caching as well. The clear() and flush() methods of EntityManager don’t clear multiple levels of caching in all cases, even with caches turned off as set with persistence properties.
Every time there is a request to pull data from the service, the current set of data in the database needs to be represented. Not what is in the cache since it may not be in synch with the database. It seems like this should be simple but it took some experimenting on my part to get this working as needed. There also doesn’t appear to be much information about handling this particular scenario. There are probably solutions posted somewhere but I add one solution here to make it easier to find.
Before making any gets for data, use the following call:


This seems to work for all cache settings and forces all caches to be cleared.  Subsequent calls to get data result in getting the latest committed data in the database. This may not be the most efficient way, but it always gets the latest data. For high systems requiring high performance, this won’t wok very well. It would be better to refresh the cache periodically and have all clients just get the latest cached values. But there is still the issue of refreshing the entire cache.

Any suggestions on doing this better are appreciated. But for now, this works consistently across platforms and reasonably quick for small to moderate amounts of data.