You are hereGetting to know your client using django middleware

Getting to know your client using django middleware


By meledictas - Posted on 07 October 2008

It's important (and excite) to know how client visits your website. Especially low traffic, specific user like my blog.

You can get usage statistic by registering your site to some web statistic service . But if you want it work the way you want, you have to build it by your own!

I've built django application to get user information and store it in the database. I also have capability to log only specific path.

Another thing here, I want to band a certain set of IP address, mostly any spam that is logged by my application.

The access log model

from django.db import models
from django.conf import settings
from django.utils.translation import ugettext_lazy as _

class BandedIP(models.Model):
    ip = models.IPAddressField()

    def __unicode__(self):
        return self.ip

class Path(models.Model):
    path = models.CharField(_('path'), max_length=250, null=True, blank=True)

    def __unicode__(self):
        return '%s' % self.path

class AccessLog(models.Model):
    user = models.CharField(_('user'), max_length=100, null=True, blank=True)
    session = models.TextField(_('session'))
    cookies = models.TextField(_('cookies'))
    timestamp = models.DateTimeField(_('timestamp'))
    path = models.CharField(_('path'), max_length=250, null=True, blank=True)
    method = models.CharField(_('method'), max_length=5, null=True, blank=True)
    host = models.CharField(_('host'), max_length=100, null=True, blank=True)
    remote_host = models.CharField(_('host'), max_length=500, null=True, blank=True)
    remote_addr = models.IPAddressField(_('remote'), null=True, blank=True)
    referer = models.CharField(_('referer'), max_length=250, null=True, blank=True)
    query = models.TextField(null=True, blank=True)
    agent = models.CharField(_('agent'), max_length=250, null=True, blank=True)
    type = models.CharField(_('agent'), max_length=100, null=True, blank=True)
    encoding = models.CharField(_('encoding'), max_length=100, null=True, blank=True)
    language = models.CharField(_('language'), max_length=100, null=True, blank=True)
    length = models.IntegerField(_('length'), null=True, blank=True)
    time = models.IntegerField(_('time'))
    errors = models.TextField(_('errors'))

    def __unicode__(self):
        return '%s - %s' % (self.path, self.user)

if 'django.contrib.databrowse' in settings.INSTALLED_APPS:
    from django.contrib import databrowse
    databrowse.site.register(AccessLog)

I have Path model to hold the path pattern that I don't want to log. I also have a little code to automatically register your model to databrowse, built-in django contrib application for displaying the data. AccessLog will be used to store the log(no surprise). BandedIP will store IP address that I want to band.

Every request come in to my website will be processed by a set of middleware. Well, what is middleware.

A middleware is a code that you can put into any stage of execution, here is the list of certain point of execution you can put your code,

process_request(self, request),

Process when django receive the request from web server. The request is not need to be completely transmitted. This is good if you want to handle long running request.

process_view(self, request, view_func, view_args, view_kwargs)

When django instantiated request object and resolve the view for given request url, this code will be executed. So that we call say that this code will be executed before django pass a request to the view.

process_response(self, request, response)

After a request has been processed the someone (request middleware or view), before you return it to client, you can do something with http response. So this code will be executed before the response will be sent back to client. If you want to modify something in the response (http header, cookies, or event response content), you can do it here.

process_exception(self, request, exception)

If something go wrong, and Python fire an exception, this code will be executed.

This is one of my beloved feature(yes, there are another beloved feature, :P). it makes django can be used with any kind of web application!

Back to my application. For my code, below is the middleware,

The access log middleware

import datetime

from django.conf import settings
from accesslog.models import AccessLog, Path    

class BandedIPMiddleware(object):
    def process_request(self,request):
        ips = [ip.ip for ip in BandedIP.objects.all()]
        if request.META.get('REMOTE_ADDR', '') in ips:
            return HttpResponseForbidden("You are not allowed to use this website.")

class AccessLogMiddleware(object):

        def process_request(self,request):
            if is_log_path(request.path):
                accesslog = AccessLog()
                accesslog.timestamp = datetime.datetime.now()

                if request.user.is_authenticated():
                    accesslog.user = request.user.username
                else:
                    accesslog.user = 'anonymous'

                accesslog.session = get_session_text(request.session)
                accesslog.path = request.path
                accesslog.method = request.META.get('REQUEST_METHOD', '')
                accesslog.host = request.META.get('HTTP_HOST', '')
                accesslog.remote_host = request.META.get('REMOTE_HOST', '')
                accesslog.remote_addr = request.META.get('REMOTE_ADDR', '')
                accesslog.referer = request.META.get('HTTP_REFERER', '')
                accesslog.query = request.META.get('QUERY_STRING', '')
                accesslog.agent = request.META.get('HTTP_USER_AGENT', '')
                accesslog.type = request.META.get('CONTENT_TYPE', '')
                accesslog.encoding = request.META.get('HTTP_ACCEPT_CHARSET', '')
                accesslog.language = request.META.get('HTTP_ACCEPT_LANGUAGE', '')

                if not request.META.get('CONTENT_LENGTH', ''):
                    accesslog.length = 0

                request.accesslog = accesslog

        def process_exception(self,request,exception):
                if getattr(request, "accesslog", None):
                    request.accesslog.errors = exception
                    time_delta = datetime.datetime.now()- self.timestamp
                    request.accesslog.time = str(time_delta.microseconds)
                    request.accesslog.save()

        def process_response(self,request,response):
                if getattr(request, "accesslog", None):
                    time_delta = datetime.datetime.now()- request.accesslog.timestamp
                    request.accesslog.time = str(time_delta.microseconds)
                    request.accesslog.save()
                return response


def is_log_path(path):
    if not path.endswith('/'):
        path = path+'/'
    except_paths = [p.path for p in Path.objects.all()]
    for p in except_paths:
        wildcard =  p.endswith('*')
        if wildcard:
            p=p.replace('*', '')

        if p == path:
            return False
        #wildcard?
        if wildcard:
            if path.startswith(p):
                return False
    return True

def get_session_text(session):
    text = '%s - ' % session.session_key
    for k,v in session.items():
        text += '[%s, %s]'%(k, v)
    return text

The BandedIPMiddleware class have a request middle. If a request IP is in the banded list, it will return http forbidden to that user.

Look at the AccessLogMiddleware class. There are three middlewares here, first is request middleware. It gather available data at the request start stage. When request is processed, the response middleware will calculate processing time used by the request and save the model to database.

If an exception is thrown, exception response will be called. It get the exception and save the model.

Every request come in will be check to see if it needs to be logged. The exceptional path is read from Path model, function is_log_path is the logic used to determine whether the request is needed to be logged or not. This support wild card, for example, /admin/* will prohibit to log.

The settings

To use middleware, you have to add middleware class to MIDDLEWARE_CLASSES setting. Be low is my setting,

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware',
    'django.middleware.http.SetRemoteAddrFromForwardedFor',

    'accesslog.middleware.AccessLogMiddleware',
)

I also configure a middleware to get the remote address. This will extract client IP in the case of you're sitting behind a reverse proxy that causes each request's REMOTE_ADDR to be set to 127.0.0.1.

Middleware processing

If you want to see the logs via databrowse application, you have to include databrowse's url as below,

from django.contrib import databrowse
urlpatterns = patterns('',
    (r'^browse/(.*)', databrowse.site.root),
)

If you want to only allow authenticated user, change your url setting to,

from django.contrib import databrowse
from django.contrib.auth.decorators import login_required

urlpatterns = patterns('',
    (r'^browse/(.*)', login_required(databrowse.site.root)),
)

Some trip

  • To reduce database querying, use caching
  • You should set the admin, static path to not be logged.
  • Middleware is processed in order on appearance at the setting, make sure that you set it correctly.
  • Do not enter your IP address to the band list (for testing), you will be banded and have to go to the database to delete your IP. I have done this before. :D

This is all the logging feature in my blog. And now, YOU ARE BEING LOGGED!