Build a Simple Chatbot with Tensorflow, Python and MongoDB

In order to learn about some of the latest neural network software libraries and tools, the following is a description of a small project to build a chatbot. Given increasing popularity of chatbots and their growing usefulness, it seemed like a reasonable endeavor to build one. Nothing complicated, but enough to better understand how contemporary tools are used to do so.

This material assumes MongoDB is already installed, a Python 3.6 environment is installed and usable. Also, basic knowledge of NoSQL, machine learning, and coding skills are useful. The code and data used for this example or located at GitHub.

Requirements

Nothing complicated, just a simple experiment to play with the combinations of Tensorflow, Python, and MongoDB. The requirements are as defined:

  • A chatbot needs to demonstrate a simple conversation capability.
  • A limited set of coherent responses should be returned demonstrating a basic understanding of the user input.
  • Define context in a conversation, reducing the number of possible responses to be more contextually relevant.

Design

Use MongoDB to store documents containing

  • a defined classification or name of user input. this is the intent of the input/response interaction
  • a list of possible responses to send back to the user
  • a context value of the intent used to guide or filter which response lists makes sense to return
  • a set of patterns of potential user input. the patterns are used to build the model that will predict the probabilities of intent classifications used to determine responses.

A utility will be implemented to build models from the database content. The model will be loaded by a simple chatbot framework. Execution of the framework allows a user to chat with the bot.

The chatbot framework loads a prebuilt predictive model and connects to MongoDB to retrieve documents which contain possible responses and context information. It also drives a simple command line interface to:

  • capture user input
  • process the input and predict an intent category using the loaded model
  • randomly pick a response from the intent document

In the database, each document contains a structure including:

  • the name of the intent
  • a list of sentence patterns used to build the predictive model
  • a list of potential responses associated with the intent
  • a value indicating the context used to filter the number of intents used for a response (contextSet – to define the context; contextFilter – used to filter out unrelated intents)

For example, the ‘greeting’ intent document in the MongoDB is defined as:

{
    "_id" : ObjectId("5a160efe21b6d52b1bd58ce5"), 
    "name" : "greeting",
    "patterns" : [ "Hi", "How are you", "Is anyone there?", "Hello", "Good day" ],
    "responses" : [ "Hi, thanks for visiting", "Hi there, how can I help?", "Hello", "Hey" ], 
    "contextSet" : ""
}

Implementation

To build the model used to predict possible responses, the patterns sentences are used. Patterns are grouped into intents. Basically meaning, the sentence refers to a conversational context. The MongoDB is populated with a number of documents following the structure shown above.

This example uses a number of documents to talk about AI. To populate the database with content for the model, from a mongo prompt:

> use Intents
> db.ai_intents.insert({
    "name" : "greeting",
    "patterns" : [ "Hi", "How are you", "Is anyone there?", "Hello", "Good day" ],
    "responses" : [ "Hi, thanks for visiting", "Hi there, how can I help?", "Hello", "Hey" ], 
    "contextSet" : ""
})

The other documents are inserted in the same way. Various tools such as NoSQLBooster are useful when working with MongoDB databases.

There is a mongodump export of the documents used in this example in the GitHub repository.

Building the Prediction Model

The first part of building the model is to read the data out of the database. The PyMongo library is used throughout this code.

# connect to db and read in intent collection
client = MongoClient('mongodb://localhost:27017/')
db = client.Intents
ai_intents = db.ai_intents

The ai_intents variable references the document collection. Next, parse information into arrays of all stemmed words, intent classifications, and documents with words for a pattern tagged with the classification (intent) name. A tokenizer is used to strip out punctuation. Each document from the ai_intents collection in the database is extracted into a cursor using ai_intents.find().

words = []
classes = []
documents = []

# tokenizer will parse words and leave out punctuation
tokenizer = RegexpTokenizer("[\w']+")

# loop through each pattern for each intent
for intent in ai_intents.find():
    for pattern in intent['patterns']:
        tokens = tokenizer.tokenize(pattern) # tokenize pattern
        words.extend(tokens) # add tokens to list

        # add tokens to document for specified intent
        documents.append((tokens, intent['name']))

        # add intent name to classes list
        if intent['name'] not in classes:
            classes.append(intent['name'])

From this categorized information, a training set can be generated. The final data_set variable will contain a bag of words and the array indicating which intent it belongs to. The bag for each pattern has words identified (flagged as a 1)  in the array. The output_row identifies which intent’s pattern documents are being evaluated.

for document in documents:
    bag = [] 
    # stem the pattern words for each document element
    pattern_words = document[0]
    pattern_words = [stemmer.stem(word.lower()) for word in pattern_words]
    
    # create a bag of words array
    for w in words:
        bag.append(1) if w in pattern_words else bag.append(0)

    # output is a '0' for each intent and '1' for current intent
    output_row = list(output_empty)
    output_row[classes.index(document[1])] = 1

    data_set.append([bag, output_row])

The last part is to create the model. With Tensorflow and TFLearn, it is simple to create a basic deep neural network and evaluate the data set to create a predictive model from the sentence patterns defined in the intent documents. TFLearn uses numpy arrays, so the data_set array needs to be converted to the numpy array. Then the data_set is partitioned into the input data array and possible outcome arrays for each input.

data_set = np.array(data_set)

# create training and test lists
train_x = list(data_set[:,0])
train_y = list(data_set[:,1])

Defining the neural network is done by setting its shape and the number of layers. Also defined is the algorithm to fit the model. In this case, regression. The predictive model is produced by TFLearn and Tensorflow using a Deep Neural Network, using the defined training data. Then save (using pickle) the model, words, classes and training data for use by the chatbot framework.

# Build neural network
net = tflearn.input_data(shape=[None, len(train_x[0])])
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, 8)
net = tflearn.fully_connected(net, len(train_y[0]), activation='softmax')
net = tflearn.regression(net)

# Define model and setup tensorboard
model = tflearn.DNN(net, tensorboard_dir='tflearn_logs')
# Start training (apply gradient descent algorithm)
model.fit(train_x, train_y, n_epoch=1000, batch_size=8, show_metric=True)
model.save('model.tflearn')

pickle.dump( {'words':words, 'classes':classes, 'train_x':train_x, 'train_y':train_y}, open( "training_data", "wb" ) )

Running the above code reads the intent documents, builds a predictive model and saves all the information to be loaded by the chatbot framework.

Building the Chatbot Framework

The flow of execution for the chatbot frameworks is:

  1. load training data generated during the model building
  2. build a neural net matching the size and shape of the one used to build the model
  3. load the predictive model into the network
  4. prompt the user for input to interact with the chatbot
  5. for each user input, classify which intent it belongs to and pick a random response for that intent

 

Code for the chatbot driver is simple. Since the amount of data being used in this example is small, it is loaded into memory. An infinite loop is started to prompt a user for input to start the dialog.

# connect to mongodb and set the Intents database for use
client = MongoClient('mongodb://localhost:27017/')
db = client.Intents

model = load_model()
prompt_user()

The model created previously is loaded with a simple neural network defining the same dimensions used to create the model. It is now ready for use to classify input from a user.

The user input is analyzed to classify which intent it likely belongs to. The intent is then used to select a response belonging to the intent. The response is displayed back to the user. In order to perform the classification, the user input is:

clean_up_sentence function

  1. tokenized into an array of words
  2. each word in the array is stemmed to match stemming done in model building

bow function

  1. create an array the size of the word array loaded in from the model. it contains all the words used in the model
  2. from the cleaned up sentence, assign a 1 to each bag of words array element that matches a word from the model
  3. convert the array to numpy format

classify function

  1. using the bag of words, use the model to predict which intents are likely needed for a response
  2. with a defined threshold value, eliminate possibilities below a percentage likelihood
  3. sort the result in descending probability
  4. the result is an array of (intent name, probability) pairs

A sample result with the debug switch set to true may look like

enter> what is AI
[('AIdefinition', 0.9875145)]

response function

The response function gets the list of possible classifications. The core of the logic is two lines of code to find the document in the ai_intents collection matching the name of the classification. If a document is found, randomly select a response from the set of possible responses and return it to the user.

doc = db.ai_intents.find_one({'name': results[0][0]}) 
return print(random.choice(doc['responses']))

The additional logic in this function handles context about what the user asked to filter possible responses. In this example, each document has either a contextSet or contextFilter field. If the document retrieved from the database contains a contextSet value, the value should set for the current user. A userId is added to the context dictionary with the value of the entry set to the contextSet value.

if 'contextSet' in doc and doc['contextSet']:
 if debug: print('contextSet=', doc['contextSet'])
 context[userID] = doc['contextSet']

Before querying for a document based on a classification found the response function checks if a userID exists in the context. If it does, the query includes searching with the context string to match a document containing a contextFilter field with a matching context value.

if userID in context:
 doc = db.ai_intents.find_one({'$and':[{'name': results[0][0]},{'contextFilter' : { '$exists': True, '$eq': context[userID] }}]})
 del context[userID]

After finding the document, the context is no longer needed and removed from the dictionary.

Usage

Since this a simple example for experimentation, the dialog with the chatbot will certainly not be intelligent. But, the constructs and basic logic provide a basis for learning and can be expanded upon.

A sample dialog with this implementation may go like the following:

enter> hello
Hi there, how can I help?

enter> what do you know
I can talk about AI or cats. Which do you want to chat about?

enter> AI
OK. Let's talk about AI.

enter> what is AI
AI stands for artificial intelligence. It is intelligence displayed by machines, in contrast with the natural intelligence displayed by humans and other animals.

enter> when will AI take my job
AI is artificial intelligence that is evolving to become smart robot overlords who will dominate humans.

enter> I want to talk about something else
I can talk about AI or cats. Which do you want to chat about?

enter> cats
OK. Let's talk about cats.

enter> quit
Advertisements

Installing Tensorflow in Anaconda on macOS

The Tensorflow website has good installation instructions for the MAC OS X environment. The official installation instructions for MacOS are provided at https://www.tensorflow.org/install/install_mac. Included are instructions for virtualenv, a native pip environment, using a Docker container, Anaconda command line, and installing from sources. Although straightforward, it doesn’t include installing in an Anaconda Navigator application environment.

Anaconda is a free, open source, community supported development environment for Python and R. Anaconda manages libraries and configurable environments. It’s also a good place to experiment with scientific and machine intelligence packages. The growingly more useful Tensorflow libraries can be used to experiment within an Anacondo environment.

Anaconda Navigator is a desktop graphical user interface included in Anaconda. Packages, environments, and channels are easy to manage with this GUI. Anaconda can be installed by following the instructions at the Anaconda download site. After installation, it’s best to make sure the latest versions are installed. To quickly update using a command line interface:

$ conda update anaconda anaconda-navigator

Then, launch the Anaconda-Navigator application.

In the Navigator application, select the Environments menu item in the far left column. By default, there is one Root environment. Multiple environments with different configurations can be set up here over time. It’s typically best to upgrade existing packages to current versions. The latest version of Python should be installed (3.6 at the time of this writing) should be used.

  1. Select the Environments menu item in the left column.
  2. Select the Environment to update (in this case Root).
  3. Select Upgradable from the drop-down menu.
  4. Select the version number in the Version column to define packages to upgrade. Make sure Python is the most recent version.
  5. Select Apply.

 

To install the Tensorflow packages, a new and clean environment can be created. It will contain the base packages necessary, the latest version of Python and Tensorflow will be installed.

  1. Select the Create button at the bottom of the Environments column.
  2. In the popup menu, type ‘Tensorflow’ in the Name text entry field.
  3. Select the Python checkbox.
  4. Select version 3.6 in the drop-down menu.
  5. Select Create.

tensorflow-environment

Tensorflow packages can now be installed into the new environment.

  1. Select ‘Not Installed’ from the drop-down menu at the top of the right window pane.
  2. Type ‘tensorflow’ in the Search Packages text input field and hit Return.
  3. Select the checkbox in the left column next to the two tensorflow package names.
  4. Click Apply.

tensorflow-install

To validate the installation, using the newly created Tensorflow environment:

  1. Make sure the Tensorflow environment is selected.
  2. Select the arrow next to the Tensorflow environment name.
  3. Select ‘Open with IPython’.
  4. A terminal window with the environment settings created will pop up.
  5. As recommended on the Tensorflow website, type the following into the terminal window
    import tensorflow as tf
    hello = tf.constant('Hello, TensorFlow!')
    sess = tf.Session()
    print(sess.run(hello))

Assuming there are no errors, the newly installed and configured environment is ready for developing with tensorflow.

Wanted: New architectures for IoT and Augmented Reality

Software technology changes rapidly. Many new tools and techniques arrive in the software engineering realm to take on old and new problems. There are still big architecture and implementation holes yet to be addressed. For example, as a few billion more smart phones, tablets and internet connected sensing devices come online across the world, how are they all going to discover and utilize all the available resources collaboratively?

One of the current problems with most existing architectures is data gets routed through central servers in a data center somewhere.  Typically software systems are still built using client/server architectures. Even if an application is using multiple remote sources for data, it’s still really just a slight variation. Service and data lookups are done using a statically defined address rather than through discovery. Even remote sensing and home automation devices barely collaborate locally and require a local router to communicate with a remote server in a data center.

In the past month, I have been to both the Internet of Things World and Augmented World Expo (AWE). At both of these excellent conferences, there was at least some discussion about the need for a better infrastructure

  • to connect devices in a way to make them more useful through collaboration of resources and
  • to connect devices to provide capabilities to share experiences in real time.

But it was just talk about the need. No one yet is demonstrating any functional capabilities in this manner.

On a side note: I saw only one person, out of 3000 or so, at the AWE conference wearing a device for augmentation. It was Steve Mann who is considered the father of wearables. I dare say that most proponents of the technology are not ready to exhibit it nor is the infrastructure around to support its use effectively. There is great work progressing though.

Peer-to-peer architectures used in file sharing and the architecture Skype uses start providing directional guidance for what is to come in truly distributed architectures. Enhancing these architectures to include dynamic discovery of software and hardware resources and orchestrating dynamic resource utilization is still needed.

There are a few efforts in development beginning to address some of the internet-wide distributed computing platforms needed for data sharing, augmented reality and machine learning. Those of you thinking of batch jobs or wiring up business services as distributed computing, this is not what I’m talking about. I am talking about a small footprint software stack able to execute on many different hardware devices with the ability for those devices to communicate directly with each other.

If you know about development efforts in this vain, I would like to hear about them.