Monday, March 26, 2018

Why use the middleware approach for Django analytics?

Because of either proxy servers, ad blockers (like ABP, uBlock, etc.), or browser settings that stop JS scripts. Depending on the level of users and how much value they place on their privacy, they can outright BLOCK all forms of analytics. This ends up with our analytics data being unreliable or at least viewed with less confidence.

But why the middleware?

Django middleware is a good place because it's called for all request and response cycle.

The upside to this is:

  1. It's easy to customise. Django middleware is just a plain Python class with some methods. 
  2. The Django middleware is well-documented. 
  3. Allows us to setup whatever rules like exclude errors and only track HTTP 200 responses.
  4. We can setup tracking to be asynchronous. This way, our pages are not at the mercy of an external resource.
Writer's note: If your analytics needs are simple, you can install django-google-analytics and follow Usage #2 - Middleware + Celery.

There are a couple of downsides to this though:
  1. We could be impacting performance in a big way if the analytics middleware is doing too many things or worst, is misconfigured. 
  2. Not as bad as #1 but if we went down the asynchronous path, we will end up with a Celery server with some message broker like RabbitMQ on the tech stack. Another thing we'll have to manage. 
TL:DR For analytics with Django, HTML tag slows down pages and could be blocked; better to use a middleware approach.