The Robust Realtime Server
This is going to be a long blog post but I promise you will find some interesting piece of engineering here, so stay till the end.
The realtime server manages the live update of webpages when the data changes
in the data storage system (database or cache). We had a realtime server
in-place but there was a big problem with scaling it.
####Problem with nowjs
I was told beforehand that I will be primarily working first on writing a realtime server
beside many other things. Vivek Prakash told me he had
written a realtime server implementation sometime
ago with
nowjs. But the problem with it is that it
doesn’t scale well beyond ~200 simultaneous connections. In a conversation on
Google Groups, I came across this:
In my experience, the underlying “socket.io” module is not able to scale well
(more than 150 connections was a problem for me), so I had to retreat from
using “nowjs” or more specifically, “socket.io” in one of my applications.
After further inspection, we also saw that there was an issue with file
descriptor leak and nowjs server reported ENFILE/EMFILE (Too many open files).
Also nowjs project was abandoned in 2012 and last commit in github repo is that
of 1 year ago.
So there was need of some good alternative
which can handle large number of simultaneous connections (or users). I didn’t have to do much research
as Vivek had already researched about it. He found Tornado and Meteor.js to be good
alternative. Going by order of preference and popularity I chose Tornado, and
also because it’s integration with existing system looked simpler and more
efficient.
####The Use Case
Vivek pretty much explained me how different components of code
submission works. Here is a quick explanation of it. User submits the code
and a POST request is sent to webserver which further sends submission details
to a message queue in RabbitMQ server (a message broker to connect various application components).
Code-checker engine (consumer of RabbitMQ here) gets the submission details,
evaluates the code and submits result back to another message queue. It also
notifies the web-servers about the result so that appropriate databse entry is
made. The whole process is completely asynchronous. An amqp listener also takes
the result out from message queue and finally sends it to the client(browser)
using the nowjs communication APIs. The flowchart
will give you a good idea of how different components are connected.
Now my first job was to replace the nowjs module with Tornado.
####A basic implementation
Let’s code! Now I knew tornado server must read submission results from
message queue and send it back to submission page (‘pages’ in case user has opened
the same submission problem page in more than one tab). I used pika
module
inside Tornado IO loop to connect with RabbitMQ and read messages from it.
On client side I used HTML5 WebSocket
to connect to the tornado server. This basic implementation was completed in two days.
#####Backend (code snippet)
####Testing locally
Everything was working as expected in modern browsers but
when I tested it on IE 7, 8, 9. This was my reaction- “IE sucks man!”.
Of course, IE doesn’t support websocket, how it
didn’t occur to me. So I was left with only one option to write a fallback
implementation in long polling (also called comet programming)
on both client and server side. Wait the problem is not yet solved. Cross domain requests are not supported.
HackerEarth webserver and realtime server (tornado) are on different top-level domains. I either have to use
CORS long polling
(only supported in major browsers but more secure) or JSONP
long polling(supported in every browser but insecure). I eventually used both.
Here is a code snippet:
####What if the client gets disconnected?
On slow internet connections especially with browsers using long polling,
messages sometimes get lost. So I had to create a buffer to store unsent
messages and when client(browser) reconnects, the server will look into
the buffer for latest message and will send it back to client and then again it
will listen for new messages from RabbitMQ. Here is code snippet:
It was also taken care of that older messages don’t replace the newer messages
on browser if the delivery order is not sequential. I also wrote lot of
fallback code to prevent the issue in older browsers, did testing with thousand
simultaneous connections and fixed some bugs that were already present before.
And that’s all that I did in just two weeks! Everything was working fine in my
local machine now.
Few days ago, we tested it on development server. And after successful
testing and few more bug fixes, it was pushed to production on 30th May, 2013 :)
Some bugs still might be there and we are fixing them, but I am confident it
would be more robust than ever before!
P.S. I am an undergraduate student at IIT Roorkee. You can find me @LalitKhattar or on HackerEarth.
Posted by Lalit Khattar, Summer Intern 2013 @HackerEarth