If you’re a Silicon Valley fan like me I am sure you’ve seen the episode where Gylfoyle made an chatbot and made it chat with Dinesh all day without the latter even realising it untill later that day.
Alright, let’s make you a chatbot, but before we jump into this, we’ll need:
- Data? well, you gotta feed your AI, it doesn’t work for free
- Chatbot? we’ll need a conversational dialog engine
Data
To provide the bot with the data it needs we’ll need some data, or in this case your texts.
The first idea that popped into my head is of course facebook, because (1)
that’s where I can find all the text I’ll need, and (2)
we can export the data in an easily exploitable format, in this case it’s JSON
.
Make sure not to share your data or the trained model with anyone as it might contain sensitive information, you could however fix that with a De-identifier such as Google Cloud’s DLP
.
Export messages from facebook
To export your messages from facebook if you haven’t already follow these steps:
- go to
Settings
- click on
Your Facebook Information
- navigate to
Download Your Information
- select
JSON
in theFormat
select box - click on
Deselect All
and clickMessages
under theYour Information
section - click on
Create File
This could take a while, all you gotta do right now is wait for an email that says that your data is ready for download.
Preparing the data
You’ll get your data in a zip file
which contains an inbox
folder, this folder would contain all the conversations you have with all of your contacts organized on folders each folder is prefixed by the username of the contact.
Each conversation folder contains atleast a message_1.json
along with other folders such as gifs
, that’s of no interest to us in this case.
The message_1.json
file has a very simple structure and it’s as follows:
{
"participants": [
{
"name": "Simhi"
},
{
"name": "Amine Hakkou"
}
],
"messages": [
{
"sender_name": "Simhi",
"timestamp_ms": 1499810593070,
"content": "Afeen!",
"type": "Generic"
}
],
"title": "Simhi",
"is_still_participant": true,
"thread_type": "Regular",
"thread_path": "inbox/Simhi_blahblah"
}
In this article I’ll simplify it a little as we’re not going to feed it all the data from all of the conversations, We’ll do it with one, however if you feel like feeding it all the data then you could tweak the code a little for that.
function readConversation(path) {
return JSON.parse(require('fs').readFileSync(
path.concat('/message_1.json'),
{ encoding: 'utf8' },
))
}
We’ll also need to clean the messages as it might contain non text payloads, such as gifs
, videos
..
function filterMessagesWithContent(messages = []) {
return messages.filter(message => message.content)
}
If you skimmed over your messages you’ll notice that it’s sorted from new to old, we’ll need to fix that.
function reverseMessagesOrder(messages = []) {
return messages.reverse()
}
Also there is this little thing, where you or the other party of the conversation send multiple consecutive texts, for the sake of simplicity we could group them in one message.
function groupConsecutiveMessages(messages = []) {
const result = []
let previousMessage = messages[0]
for(let i = 1; i < messages.length; i++) {
const message = messages[i]
if(message.sender_name === previousMessage.sender_name) {
previousMessage = { ...previousMessage, content: content.concat(" " + message.content) }
} else {
result.push(previousMessage)
previousMessage = message
}
}
// handle last element
result.push(previousMessage)
return result
}
We’ll need to cleanup the JSON
, we’ll be losing information sure, we won’t need the rest anyway.
function messagesContent(messages = []) {
return messages.map(m => m.content)
}
And finally write the result back to the fs.
function writeMessages(path, messages) {
require('fs').writeFileSync(
path,
JSON.stringify(messages),
)
}
Putting it all together would look something like this:
const path = 'path-to-your-data/inbox/Simhi_blahblah'
const conversation = readConversation(path)
const messages = (
messagesContent(
groupConsecutiveMessages(
reverseMessagesOrder(
filterMessagesWithContent(
conversation.messages
)))))
writeMessages(path.concat('/messages.json'), messages)
Now to turn this on we’ll have to do something like:
node transform.js
The Chatbot
Alright now that we got the data ready, we’ll need to train our AI, you can find a lot of conversational engines around, such as chatterbot
and rasa
.
We’ll go with chatterbot
because it has a simpler and straightforward API.
Prequisites
You’ll need to have python3 installed on your machine.
Install
pip3 install chatterbot chatterbot-corpus
Code
the code below reads the json file we wrote using our transformer script, trains the model and answer’s to the messages it receives from the stdin
.
there’s a lot you could improve on it, I kept it simple so I used chatterbot’s ListTrainer
with no additional configuration.
import sys
import json
from chatterbot import ChatBot
from chatterbot.trainers import ListTrainer
with open('messages.json', 'r') as f:
conversation = json.load(f)
chatbot = ChatBot('Amine')
trainer = ListTrainer(chatbot)
trainer.train(conversation)
while True:
message = input('>')
response = chatbot.get_response(message)
print('You:', message)
print('Amine:', response)
Usage
Now all you gotta do is run the chatbot.
python3 chatbot.py