October 6, 2021

Making a Discord Chat Bot using Markov Chains (2/2)

Continuing on from the previous post, this post will describe the steps involved in creating a Discord bot to write randomly generated chat to a server. Before continuing with this post, make sure to set up a bot application and add it to your server.

There are four main steps needed to get up and running:

  • Get the corpus of chat text
  • Normalize the data
  • Build a Markov chain from the data
  • Connect to the server and listen for commands

Getting the chat text

Unlike the previous post, where we used an existing corpus of text, we now need to generate one ourselves. This text will come from historical chat messages for a particular server channel. We can simply grab this data and write the raw messages out to a file. To do the heavy lifting of interacting with Discord, we will rely on Discord.py. This library will also be used to build the bot functionality later on as well.

The library provides a set of client events that we can hook in to. For this, we need to implement a handler for the on_ready event. When this event is triggered, our chat scraper will get the channel information via get_channel, and then get the message history from the channel.history property. Once we have the message history, we just iterate over each message and write the message content out to a file. Put into code, it looks like the following:

async def on_ready(self):
    print("Logged on as {}!".format(self.user))

    channel = self.get_channel(self.channel_id)
    if channel is None:
        print("Could not find channel {}".format(self.channel_id))

    print("Found channel #{} for id {}. Reading messages...".format(channel.name, self.channel_id))
    with open(self.output_path , "w+", encoding="utf-8") as output_file:
        async for message in channel.history(limit=None, after=self.last_date):
            output = message.content + "\n"

That’s it for getting the chat text. After running the chat scraper against a channel, there should be an output file containing the raw message content. The next step is to clean the data up a bit and normalize it.

Normalizing the data

As described in the previous post, we want to add a normalization step in our data processing. This is done with the goal of improving the quality of sentences that the Markov chain will generate. By standardizing on capitalization, filtering out punctuation, ignoring non-printable text, and filtering out data that isn’t considered chat, the Markov chain is better able to model sentence structure. To perform the normalization, a new script is created which takes an input file containing raw chat data, and an output path to write the normalized data to. The normalization is kept pretty basic: we read in a line of text, strip out unwanted attributes (non-alphanumeric characters, extra whitespace, URLs), and write the normalized line to the output file.

def normalize_text(line):
    if not line:
        return ""

    trimmed_line = " ".join(line.split())
    split_line = ""
    for word in trimmed_line.split(" "):
        if validators.url(word):

        split_line += word.lower() + " "

    if not split_line or split_line.isspace():
        return ""

    pattern = re.compile(r"[^A-Za-z ]+", re.UNICODE)
    normalized_line = pattern.sub("", split_line)

    return normalized_line

def main(args):
    print("Reading input file {}. Writing output to {}.".format(args.inputfile, args.outputfile));

    with open(args.outputfile, "w+", encoding="utf-8") as output_file:
        with open(args.inputfile, encoding="utf-8") as input_file:
            for input_line in input_file:
                output_line = normalize_text(input_line.rstrip())
                if output_line: 

    print("Finished processing.")

After running the script, we have an output file with better data. It is certainly by no means perfect: normalizing chat messages is a very challenging problem. Even after stripping out some “non-chat” portions of the message, there are still issues with the data: a message may contain typos, a message may not be a complete sentence, a message can be non-nonsensical, i.e. a user spamming random keys, and a whole host of other issues. These issues are acknowledged, but hand-waved away in this post, since the purpose is to create a fun and simple chat bot, and not something that tries to generate the most realistic sentences possible.

Putting everything together into a bot

At this point, we should have enough knowledge to put everything together. We have the background information for how Markov chains work and how to create them, we have a (kind of) normalized data set to work with, and we have the appropriate library available to interact with Discord. We have to now glue these things together: we will create a program that takes in an input file containing chat data, and the bot token. This program will read the chat data and build the Markov chain for it — the code from the previous article will be re-used here. The bot will then connect to Discord and wait for on_message events. Once an event is received, the bot will generate a random sentence and send it to the channel. Putting this into code, you get the following:

import argparse
import collections
import discord
import random
import re

class MarkovBot(discord.Client):

    def __init__(self, args):

        self.token = args.token

        with open(args.inputfile) as input_file:
            text = input_file.read()

        self.markov_table = self.create_markov(text)
        self.word_list = list(self.markov_table.keys())

    def generate_sentence(self, markov_table, seed_word, num_words):
        if seed_word in markov_table:
            sentence = seed_word.capitalize()
            for i in range(0, num_words):
                next_word = random.choice(markov_table[seed_word])
                seed_word = next_word

                sentence += " " + seed_word

            return sentence
            print("Word {} not found in table.".format(seed_word))

    def create_markov(self, normalized_text):
        words = normalized_text.split()
        markov_table = collections.defaultdict(list)
        for current, next in zip(words, words[1:]):

        return markov_table

    def run(self):

    async def on_ready(self):
        print("Logged on as {}!".format(self.user))

    async def on_message(self, message):
        if message.author == self.user:

        response = ""
        if message.content == "!talk":
            response = self.generate_sentence(self.markov_table, random.choice(self.word_list), random.randint(7, 25))

        if response:
            await message.channel.send(response)

def main(args):
    client = MarkovBot(args)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    optional = parser._action_groups.pop()

    required = parser.add_argument_group("required arguments")
    required.add_argument("-t", "--token", required=True)
    required.add_argument("-i", "--inputfile", required=True)

    args = parser.parse_args()



There are three scripts that need to be executed in order to get the bot running. These scripts are developed throughout this series and are also available on Github.

python .\chat_scraper.py -c <Channel ID> -t <Bot Token> -o <Raw Data Filename>
python .\text_normalizer.py -i <Raw Data Filename> -o <Normalized Data Filename>
python .\markov_bot_simple.py -t <Bot Token> -i <Normalized Data Filename>


Razer black plague for that whole thing the scene down with us

Betting on youtube. thats one hand into play occasionally.

Happening. im getting a fukkin animal not sure

Radio show. i completely forgot about it was brought portal ingredients. i dont like gme momentum has blue screen combo.

Making a Discord Chat Bot using Markov Chains (1/2)

Markov chains are a really interesting statistical tool that can be used to model phenomena in a wide range of fields. They can be found in the natural sciences, information theory, economics, games, and more. They are commonly used to model stochastic processes, or more informally, a set of random variables that change over time. Markov chains can be expressed in several different notations, though the most common, and the one that will be used for this post, will be as a weighted directed graph. An example of a Markov chain is shown below:

This Markov chain has two states, denoted by vertices A and E. Each vertex has incoming and outgoing edges, with each edge having a weight value associated with it corresponding to a probability. For example, starting at state A, there is a 40% chance that the next state will be a transition to state E, and a 60% chance that the next state will stay at A. In order to be a Markov chain, the sum of all outgoing edge weights for every node must add up to 1.0 (100%); if we start at a vertex then we have to make a move somewhere in the next step. The other characteristic is that the future only depends on the immediate past: the probability of transitioning to any particular state is dependent solely on the current state, and not on the sequence of state transitions that happened earlier in time. This gives Markov chains the property of being memoryless.

An example

Despite initially seeming complex, creating a basic Markov chain is pretty straightforward. To create the Markov chain, we need to take the input corpus of text and build a directed graph for it. Each vertex in this graph will correspond to a word, and each vertex will have an edge to an adjacent word. Then, to generate a sentence, we can start with a seed word and perform a random walk starting from the seed word up to a specified length.

As an example, look at the following input text:

A paragraph is a self-contained unit of discourse in writing dealing with a particular point or idea. A paragraph consists of one or more sentences. Though not required by the syntax of any language, paragraphs are usually an expected part of formal writing, used to organize longer prose.

To quickly build a directed graph for this, we can utilize a dictionary. Each key in this dictionary will correspond to a vertex, and the list of values for this key will correspond to edges.

  'a': ['paragraph', 'self-contained', 'particular', 'paragraph'],
  'paragraph': ['is', 'consists'],
  'is': ['a'],
  'self-contained': ['unit'],
  'unit': ['of'],
  'of': ['discourse', 'one', 'any', 'formal'],
  'discourse': ['in'],
  'in': ['writing'],
  'writing': ['dealing', 'used'],
  'dealing': ['with'],
  'with': ['a'],
  'particular': ['point'],
  'point': ['or'],
  'or': ['idea', 'more'],

If we look at the dictionary output and reference the original text, we can better understand how it was built. Looking through the text, for every instance where the word a appears, we store its adjacent word. In this case, there were four places where the word a had an adjacent word:

A paragraph is …

a self-contained unit of discourse …

a particular point or idea …

A paragraph consists of one ….

The same logic follows for paragraph, is, and so on. One important thing to note is that the keys are not case sensitive. The occurrence of A and a in the text are treated as being the same word, which is a behavior that we desire. Aside from capitalization, there are also other features from the text that we would like to transform or filter out: punctuation marks, newlines, extraneous spaces, and non-printable characters should be removed from the text before the Markov chain is built. Doing this normalization process will help make the output look more consistent with how real sentences are structured.

Another important feature to notice is that there are repetitions: you can see the word paragraph multiple times for the key a. This is done for the simplicity of implementation — we are creating redundant edges from a vertex instead of building and updating a transition matrix. With the redundant approach, we can just randomly select among any edge to transition to and not have to explicitly keep track of probabilities. This has the benefit of making building the model, and generating sentences, easier, since there is no need to build and apply a transition matrix. However, there is the obvious downside of a much greater memory overhead.

Having said that, the code for everything is shown below:

import argparse
import collections
import os.path
import random
import re
import sys

def generate_sentence(markov_table, seed_word, num_words):
    if seed_word in markov_table:
        sentence = seed_word.capitalize()
        for i in range(0, num_words):
            next_word = random.choice(markov_table[seed_word])
            seed_word = next_word

            sentence += " " + seed_word

        return sentence
        print("Word {} not found in table.".format(seed_word))

def create_markov(normalized_text):
    words = normalized_text.split()
    markov_table = collections.defaultdict(list)
    for current, next in zip(words, words[1:]):

    return markov_table

def normalize_text(raw_text):
    pattern = re.compile(r"[^a-zA-Z0-9- ]")
    normalized_text = pattern.sub("", raw_text.replace("\n", " ")).lower()
    normalized_text = " ".join(normalized_text.split())

    return normalized_text

def main(args):
    if not os.path.exists(args.inputfile):
        print("File {} does not exist.".format(args.inputfile))

    with open(args.inputfile, "r", encoding="utf-8") as input_file:
        normalized_text = normalize_text(input_file.read())

    model = create_markov(normalized_text)
    generated_sentence = generate_sentence(model, normalize_text(args.seed), args.numwords)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    optional = parser._action_groups.pop()

    required = parser.add_argument_group("required arguments")
    required.add_argument("-i", "--inputfile", required=True)
    required.add_argument("-s", "--seed", required=True)

    optional.add_argument("-n", "--numwords", nargs="?", default=int(30))

    args = parser.parse_args()


This script takes an input file, a seed word, and an optional max number of words to generate. It will then open and read the input file, normalize the text, build the Markov chain, and generate a sentence. Running this script against a larger corpus of text, such as the first two chapters of Hackers, Heroes of the Computer Revolution can produce some pretty funny output:

His group that had a rainbow-colored explosion of an officially sanctioned user would be bummed among the kluge room along with

A sort of the artistry with owl-like glasses and for your writing systems programs–the software so it just as the computer did the clubroom

This particular keypunch machines and the best source the execution of certain students like a person to give out and print it

Having learned a bit more about Markov chains, the next part will cover the steps needed to build a Discord bot that utilizes them to generate chat messages.

