Project Overview
If you are considering writing a bot yourself, or enjoy reading the adventures of other developers, then this article will serve as a list of pitfalls and how to avoid them, or the quest I took on this adventure. This is a long one as I faced many challenges, so go grab that coffee or popcorn and learn something along the way.
This adventure was taken in C++. While other languages and frameworks would have made the project significantly easier, C++ was chosen for code reuse from a twitch bot I use for my programming livestream, it is my comfort language and I want to use websockets in other C++ projects. The last reason being the primary driving force.
The objective of the project was to create a Discord bot to run on my in-house server box 24/7. The server box runs Linux and at the start of the project I was running Ubuntu on my development machine for an experiment of gamedev on Linux. The project started with a search for websocket frameworks;
- https://github.com/zaphoyd/websocketpp
- https://github.com/mattgodbolt/seasocks
- https://github.com/uNetworking/uWebSockets
- And more: https://github.com/facundofarias/awesome-websockets
I did not want to use boost. While I like the concept I have nightmares about it from other projects. In fact I very much dislike dependency management in C++ but still sometimes easier than writing a custom implementation. So I dug into seasocks, got it built, linking and started using it before realizing… seasocks is specifically a websocket server and does not have client capabilities. Wasted effort.
The First Challenge, Quitting and a Minor Victory
The first 6 hours were spent jumping between various frameworks and writing a custom implementation. Then I quit the project. Yes, I quit the project after 6 hours deeming it not worth the hassle and effort. A couple day break and the project was reawakened with the intent to write the implementation myself, and power on through. I get knocked down, but I get back up again.
Secure web-sockets were required to connect to Discord although not required for my future websocket needs. The next several hours of the project were spent dealing with this. First attempting to use OpenSSL, and what a fiasco that was. With much help from twitch chat, I jumped over to LibreSSL and in a few more hours I finally had a connection with Discord. The first baby step. The connection died after a few seconds (no heartbeats sent), but successful upgrade from https:// to wss://.
I already have TCP and UDP socket implementations for my game development projects. I desired using WebSockets through those socket implementations for future needs. The challenge was getting LibreSSL to use those sockets, though this one was rather easy. Swapping tls_connect_socket()
with tls_connect_cbs()
allowing callbacks to be used for reading and writing to a custom socket implementation. After a bit of cleanup from all the prototyping I was now feeling pretty solid after two victories.
Implementing WebSocket Protocol
This started out way worse than expected, rfc6455 was a bit scary at first. As stated the https to wss had been implemented in some form to get TLS working with LibreSSL. But to send or receive data over a websocket, special frames are used which contains a header that can be somewhere between 2 and 14 bytes depending on the payload length and masking. This was a perfect place to use a C++ bitfield. Optionally manual bitwise operations could meet the needs, but bitfields are cleaner.
struct FrameHeader
{
std::uint8_t mOpCode : 4;
std::uint8_t mReserved : 3;
std::uint8_t mFinished : 1;
std::uint8_t mLength : 7;
std::uint8_t mMask : 1;
};
I admit the image in rfc6455#page-28 describing the frame did throw me for a loop and my first instinct flipped the most and least significant bits of each byte. This was pretty easy to discover and fix. Each frame will have at least two bytes, and after reading those bytes in they could be casted into the header to extract information. The actual payload length can be stored within those 7 bits, an extra 2 bytes or 8 bytes that follow the first two when length (in FrameHeader) is 126 or 127. Note to pay attention to the endianness of those larger byte sizes when sending across the network which expects big-endian to go across the wire, most significant bytes first.
An additional 4 bytes are added to describe the mask when the mask bit is on. Apparently this is prevent packets getting cached by looking similar to http. Though this made little sense to me I pressed on with help from viewers and playing with the XOR operator to mask the payload. A websocket client is expected to set the mask bit to on and xor each byte of the payload with those of the mask. The mask is randomly selected for each frame of data. Note, this doesn’t add security; as anyone can see the mask and unmask the data, it just makes two frames that would otherwise be identical, become different.
At this point data was received from Discord in the WebSocket frames and able to be parsed through json. Upon connection Discord sends a HELLO
message through the gateway which tells the bot how often the HEARTBEAT
message should be sent to keep the connection alive. Part of the heartbeat was to contain the last given sequence number from Discord, fairly straight forward, and at this point the connection could live on - but do nothing otherwise.
Becoming a Detective
After receiving a HELLO
the bot is expected to send an IDENTIFY
message which contains the bots token and returns a READY
message on success. At that point the bot is connected… Or it should have been. After sending the IDENTIFY
message my bot would immediately disconnect once a HEARTBEAT
message was sent, or never received the READY
message before disconnect by timeout if HEARTBEAT
was never sent.
There were many hours of digging into this issue. A debug tool for logging hexdumps was added to my debug framework as well. Not entirely sure how I lived this long without that tool, but it will definitely save me in the future. With the hexdumps I was able to start comparing what was getting sent to expectations. I also wrote a small ‘test’ of sorts that created a websocket frame and parsed it to ensure everything worked.
By far the hardest case to solve was that of the disappearing bug. At first it started with a few random failures and “that was weird”, but no obvious suspect found. The investigation continued and multiple suspects questioned. Undefined behavior was discovered when the bot was ran several times, without recompiling, with different results for a simple test frame with the string “INDIE”
being unmasked correctly as “INDIE”
or as “INDGD”
. In digging deeper into the handling of the payload a rookie mistake was discovered; referencing of data within a std::vector
, while calling push_back()
. Solutions were simple, either reserve the size required or not hold the reference.
When everything was looking good with the test code, Discord still failed to respond to IDENTIFY
with a READY
. A significant investigation revealed the IDENTIFY
message was larger than 125 bytes while heartbeats and tests were smaller. This lead to the discovery that endianness was ignored. I take the simplest approach first and in previous experiences endianness is often mentioned but in practice always seemed to work without messing around. Not this time.
The solution was quite easy. Just flip the order the bytes are sent or received to the way they are stored in memory. So if the uint16 length was 0x1234, and stored in memory (low-endian) as 0x34, 0x12 then sending over the wire they need to be sent (big-endian) 0x12, 0x34. The same process applies for the uint64, there are simply more bytes to swap around. Finally the READY
message arrived.
Sending a Chat Message
After receiving the READY message other messages came in as well, MessageCreate being the interesting thing to dig into. It was very easy to parse the json object to receive the contents of the message and add a very simple if (message == “!time”) { Respond(“time is…”); }
well, that was where simplicity ended. Implementing Respond() took a lot of digging into Discord documentation before figuring out that, apparently, sending a message requires the http api and cannot be done through the websocket connection.
I don’t know and cannot speculate why responding to a message is done through an entire different connection when a perfectly good connection already exists, but, I am sure there are reasons. Sending an http post was not too hard since I already had a wrapper around libcurl to do just this, with the exception that my wrapper didn’t send data, only post parameters, headers and url. It was quick work to find and implement a way to post data with CURLOPT_POSTFIELDS. However, I am cursed.
With postfields you need to give a pointer to data and the size of the data in another option. Unless using CURLOPT_COPYPOSTFIELDS the data needs to be managed on your end. This was all effortless. But it did not work. Discord sent back {"message": "Cannot send an empty message","code": 50006}
. Debug output from curl showed the entire contents of my post data was sent successfully, so how was the message empty? I checked the json, and everything about 400 times. Even used curl through command-line with --libcurl file.c
switch to compare the generated code with mine.
After a lot of attempts I removed the null-terminator that I naively copied into the data given to postfields. This was the problem. As a game-developer I am not as versed in the internet or http protocol exactly, but sending a null-terminator byte is evidently extremely bad and Discord throws the contents away. I guess this is in defense of a Null Byte Poisoning attack where the server will sanitize content to the null-byte but potententially process unsanitized content after.
Finally I smashed through the last wall victorious. The bot responded to the !time
command with my local time. A significant amount of code cleanup and refactoring occurred so the project could be maintained into the future and more commands added. There are many plans to enhance my discord server and live-streaming overlay.
Wrap-up
The takeaway is that programming is often about persistence. Digging through concrete walls with a plastic spoon. I nearly quit this project at the start, but instead I got back up and powered through. By jumping over, crushing through and going around multiple walls, I managed to get my discord bot working. It makes the project much more rewarding. For more of my projects checkout my development stream on twitch.tv/timbeaudet.
Top comments (10)
I read the headline and came into the article expecting graphic details regarding a Terminator-level apocalyptic event and the birth of Skynet. Instead I get a story about a guy who had a hard time solving a computer bug. I am disappointed. :(
There is a fine balance between a catchy title and click-bait that I am still learning. I believe "nearly killed me" is some colorful language given the MULTIPLE challenges and pain faced, and I did not feel it was misrepresenting the contents.
Clearly I was wrong for you, and I am sorry.
lol, I was only joking. Don't take things so seriously. ;)
ok this is turning into a rant. you might want to skip over this.
ive done the same at github.com/samiam308/coalbotbeta/. i ended up using boost asio + beast, and nlohmanns json. damn thing takes way too long to compile for such a simple and unsafe bot.
there is a standalone version of asio that doesnt use boost, so you could port beast to that, but i wouldnt say wasting your time with any of it is a good idea. asios completion token system is a mess. when you make a call into an io operation, you are instantiating maybe dozens of templates, depending on how many other operations that io operation does. all that gets you is big compile times, big binary sizes, and big instruction cache thrashing. i like being able to change the way the result is returned back to me, but putting it that close to the actual operation is a horrible idea.
plus, absolutely nothing can be written in separate translation units, unless you want to spend hours writing some extra complex wrappers for each operation on top of the boiler plate needed for supporting completion tokens. and while wrapping a completion token is already kinda tricky, wait until you try passing streams around, the io functions take templated buffers so you cant pass them around unless you want to limit yourself to regular
const_buffer
ormutable_buffer
, meaning you will likely have to write new wrappers for each operation and its constraints.i have no idea how asio made it into c++20 with this problem. we need to separate the completion token code into something polymorphic (good opportunity to fix
std::future
s too), and make streams polymorphic.theres no custom event loops either. having a standardized event loop is nice because then multiple libraries can use the same one. but, theres plenty of valid reasons to be using a different event loop. maybe its performance, maybe its because someone already wrote their code around another event loop before the standards came out. in the second case, if the original loop is customizable, maybe they can wrap the original loop around
io_context
before they can include that library. in the first case, they will have to use threads and maintain two event loops, and that library will not run as fast as they would like.anyway, about beast http, its also low level. you have to open the connection yourself and make the request, then either close it or manage a list of open sockets. i was never able to get this right, but thats on me, not the developer of beast.
I used the sleepy-discord library on GitHub. For another websocket project I was using Poco, but it was such a pain to take care of the tiny details of Websockets. I now have the code setup so I can either use Websocketpp or CPPRESTSDK. Websocketpp requires at least standalone ASIO. CPPRESTSDK requires boost on Linux.
I tried getting ASIO stuff for at least a few minutes and failed before the "I quit" phase.
it didn't really nearly kill you, did it. or even slightly tickle you.
to be honest if you'd chosen an existing library for C++ that doesnt tie you to asio and a ton of dependencies, you could have got a massive leg up. take a look: dpp.dev/
hope this helps!
Well this was now a good long while ago, but I did go searching for an existing library and at the time all Web Socket libraries required the use of Boost - which I was not willing to use.
Boost is hot garbage, it get all this praise but I think the issue is once you get it compiled then what ever you were working on is peanuts by comparison.
There are some nice things that go through it, but I (obviously) agree it is a pain and I avoid it when I can.