The HackerEarth Data Challenge
40,000+ programmers use HackerEarth. Everyday, people from all over India and other countries submit code on HackerEarth, solve problems and participate in online coding tests. Our CodeFactory server has processed over 500,000 requests till now. There are different types of challenges running every month. The technology stack consists of multiple servers of different types e.g. search-server, realtime-server, web-server, log-server, etc. running at any time. Over 100,000 lines of code are running to serve your requests, and we deploy a dozen times everyday.
And we have been able to achieve that with relatively very high uptime all along. To make this possible, we have written many monitoring services for our backend. The public status page listing few of the services is now publicly available at http://status.hackerearth.com/.
To make it even more interesting, we are making the data collected by status monitoring services public. All the data is in JSON format, they are over 800,000 in number, and they are available in a schema-less database - RethinkDB. Are you curious how that data looks like? There might be gold-rush in there, and we invite you to find that gold, to find something interesting out of the data and show what you can do with that in hand. There are umpteen stories to uncover, you just need to dig!
###Data Access
The data is available in JSON format in RethinkDB. Following are the details of host, database and table:
- Endpoint: status-data-challenge.hackerearth.com
- Port: 80
- Database name: careerstack
- Tables
- hackerearth_status: for HackerEarth webserver
- api_status: for HackerEarth API
- realtime_status: for Realtime server
- code_checker_status: for CodeChecker server
- celery_status: for task queue
- rabbitmq_status: for message queue
- Web UI: http://status-data-challenge.hackerearth.com:8080/
To get started, you need to install rethinkdb-client drivers on your machine.
The query language is very simple and easy to get. You should go through RethinkDB QL for getting started with the database query.
Below is a sample Python code for reading hackerearth_status table:
It prints following output in the console:
The data format varies a little for code_checker_status, celery_status, and rabbitmq_status. They have more key-value data in message.
You might have noticed, these are the following primary key-value pairs:
- id: Unique identifier for the JSON document
- status: Status code returned from the service
- message: Message returned from the service ping-pong
- request_time: Number of seconds since epoch when the service was pinged
- response_received_time: Number of seconds since epoch when the service responded
You might have realized that response_received_time - request_time is the service latency.
To see the data stored in code_checker_status table, copy-paste the following script in the Data Explore in Web UI and hit ‘Run’.
The data explorer and Web UI is completely exposed. This means anyone can delete the data too. But you are not advised to do so. In any case, the data is restored to original state every 10 minutes using a periodic asynchronous task.
Quering data is very easy and intuitive in RethinkDB in multiple languages. For creating visualizaton in the frontend, we recommend using d3js, a very good JavaScript library for creating graphs and other visualizations. You can read the basic tutorials here.
###To Enter Data Challenge
You have to host your code repository on Github and send a link to the repository along with images of your graph(s), table(s), or any other data analysis to vivek@hackerearth.com before midnight, October 13, 2013 IST.
###Prizes
We will vote on the favorite visualization and there will be a cash prize of $100 for the top entry. The winning entry will be featured in our blog. We will also send HackerEarth T-shirts to the next 5 entries. Winners will be announced in the week of October 21st, 2013.
Good luck with Gold rush!
Posted by Vivek Prakash. Follow me @vivekprakash. Write to me at vivek@hackerearth.com.