Profiling Django views with Scooby profiler

20 Sep 2018

Earlier in 2016, I came up with the idea of creating a Python module which could provide functionality of profiling functions w.r.t. SQL queries and tell exactly at what lines of function, the queries were happening. I called that package Goofy and created it with the engineering of Python AST manipulation. You can read the post on it here later. It helped us profiling views serving AJAX requests but it had some limitations too. E.g. we couldn’t see the whole callstack trace of the queries and couldn’t analyze the queries.

In an internal hackathon @HackerEarth which happened in Nov 2017, I revisited this problem and tried to come up with a profiler, through which we could see the stats on front-end and which should be much lightweight and would work with AJAX requests. I named this package Scooby.

Idea

We have been using the package django-debug-toolbar, but it wasn’t enough because it didn’t have the functionality of profiling AJAX requests. And because of the size of our codebase, it had made serving web pages slower in develop mode because it injects the stats in HTML by rendering and rendering takes time. So we needed an alternative.

The idea while creating the Scooby package was to have a similar type of package as of django-debug-toolbar but instead of rendering the stats to HTML, we can dump the stats data to some backend store e.g. Redis where it will reside temporarily. And show the stats on front-end using a chrome extension.

Implementation

We had to create a Python/Django package for backend and a npm package for front-end which would build the chrome-extension. We took the decision of using ReactJS as rendering framework for the chrome-extension.

In backend, we just had to create and put a new middleware which will do the job of collecting profiled stats for different plugins (E.g. SQL, Memcache etc.) and put the stats to Redis against some UUID as key. We would put that key as value of a custom header (X-Scooby) in HTTP response, so that chrome-extension could collect stats for that request-response later using that key.

The chrome-extension would put a hook to network calls in browser and whenever a response comes for a request, check if the response has that custom header which contains the UUID. If yes, then collect the data from url /scooby/get-data/<uuid>/ in same domain.

This is the code of middleware (You can skip directly to Further improvements section if you are not interested in code):

import uuid

from .base import ScoobyData
from .utils import get_redis


class ScoobyMiddleware(object):
    def process_request(self, request):
        request.scooby_data = ScoobyData()
        request.scooby_data.on_process_request(request)

    def process_view(self, request, view, view_args, view_kwargs):
        request.scooby_data.on_process_view(
            request, view, view_args, view_kwargs)

    def process_response(self, request, response):
        request.scooby_data.on_process_response(request, response)
        unique_hex = uuid.uuid4().hex
        response['X-Scooby'] = unique_hex
        # Set data in redis for 10 minutes.
        redis = get_redis()
        redis.set(unique_hex, request.scooby_data.as_json(), 600)
        return response

And this is how ScoobyData class is defined. For each stage of middleware (process_request, process_view and process_response), it deferred to all available plugins so that those could collect their respective data on their own.

import json

from .plugin_finder import get_plugins

class ScoobyData(object):
    def __init__(self):
        self.plugins = get_plugins() # Instances of different plugin classes.
        self.plugins_data = {}
        for plugin in self.plugins:
            self.plugins_data[plugin.name] = plugin.Data()

    def on_process_request(self, request):
        for plugin in self.plugins:
            plugin_data = self.plugins_data[plugin.name]
            plugin.on_process_request(plugin_data, request)

    def on_process_view(self, request, view, view_args, view_kwargs):
        for plugin in self.plugins:
            plugin_data = self.plugins_data[plugin.name]
            plugin.on_process_view(plugin_data, request,
                                   view, view_args, view_kwargs)

    def on_process_response(self, request, response):
        for plugin in self.plugins:
            plugin_data = self.plugins_data[plugin.name]
            plugin.on_process_response(plugin_data, request, response)

    def as_json(self):
        plugins_data_json = {}
        for plugin_name in self.plugins_data:
            plugin_data = self.plugins_data[plugin_name]
            plugins_data_json[plugin_name] = plugin_data.as_json_dict()
        return json.dumps({
            'plugins_data': plugins_data_json
        })

Here is the code of a very simple plugin which would collect data regarding which view was hit and what all args and kwargs were passed to it:

class ViewNamePluginData(object):
    def __init__(self):
        self.view_name = None
        self.args = ()
        self.kwargs = {}

    def as_json_dict(self):
        return {
            'view_name': self.view_name,
            'args': self.args,
            'kwargs': self.kwargs,
        }

class ViewNamePlugin(object):
    Data = ViewNamePluginData

    def __init__(self):
        self.name = 'ViewName'

    def on_process_request(self, request):
        pass

    def on_process_view(self, plugin_data, request,
                        view, view_args, view_kwargs):
        plugin_data.view_name = '%s.%s' % (view.__module__, view.__name__)
        plugin_data.args = view_args
        plugin_data.kwargs = view_kwargs

    def on_process_response(self, request, response):
        pass

This has been the generic way of adding new plugins, and collecting/serving data regarding those plugins. We added the plugin for SQL queries with same pattern.

Further improvements

After this was build in hackathon, there have been many additions to this package later on.

  1. We added the plugins for queries happening in Memcache and Thriftpy along with SQL.
  2. Added a plugin for raw Python cProfiler using which you could cProfile your views with just enabling it in chrome-extension. You don’t need to put temporary code for cProfiling views anymore.
  3. Added the option to enable profiling on front-end instead of always collecting/dumping data on backend. This way there is no overhead of this profiler on backend when you are not using the chrome-extension.
  4. Added a logger with scooby package too which would behave alternative to print/logger.debug for debugging purposes. With this new logger you wouldn’t have to look into console but instead look into the chrome extension in browser for those logs. Here’s how you can do that:
import scooby
scooby.log("foo", "bar")

Screenshot

Here is the screenshot of how SQL queries look in chrome-extension.

In the right panel under SQL tab, you can see stacktraces of all different queries. There is an option to group all similar queries together. Let’s say if you are making same type of sql query in a for loop, just few parameters are changed in different queries. By grouping them together you can see the count of those and also see how much total time those took. By this, N+1 query proplems are analyzed.

Open sourcing it

Both backend and front-end packages of Scooby profiler are open-sourced.

If you are using Django as the backend framework, give this profiler a try. We would be glad to hear the feedback.

Posted by Shubham Jain. You can follow me on Github and Twitter.


blog comments powered by Disqus