2008-10-16

Exploiting Spammers, Search Engines, and Other Bots

Django is a great platform for building web projects in a clean and manageable way. But when you want to find its relationship with poverty for a blog post on Blog Action Day, it becomes a difficult puzzle. But still that's a great challenge!

Django has a large community all over the world and someone who is gold at statistics could write a cool analytical report about djangoers' relations with poverty in the world or the user activity on Blog Action Day site using the maps and numbers provided in the links.

The Django community or the social groups mentioned above are just small parts of the big network. Looking at the bigger image of the world, internet, and the future, you might realize that everybody is connected to somebody and something, all those connections get defined in social networks, and the web is like a living and evolving organism. We are kind of going towards a machine that we can use and it can use us (See the Matrix).

We can look at a newly created site based on Django as at a small node in the web. The site together with its visitors forms a network. Different APIs and inter-site communication extend the network even more. One part of this network is various bots, i.e. search engine indexers and comment spambots. Thank to the error-reporting mechanism in Django, recently I found out that you can find bugs in large-scale projects not only by manual browsing or test cases. Bugs might be detected by visits of spammers and search engines. Broken pages or unexpected usage are reported by email to the administrator of the site. Moreover, I started brainstorming how to exploit spammers instead of fighting against them.

And then I remembered the Poor Man's Cron module for Drupal. The module is an alternative for projects which have no cron jobs supported. It runs scheduled tasks in approximately regular intervals of time, triggered by page views. Executing scheduled tasks via the page views for people might be annoying if that takes much time. But I wouldn't mind to waste a few seconds of machines. On one hand this idea supports the poverty theme for the Blog Action Day, because this kind of module is dedicated to those who can't afford getting a server with cron jobs. On another hand, we can punish spammers using their time for our needs. When the tasks are quite time consuming and takes 5 or more minutes to execute, let's use the time of comment spammers. Spammers might be recognized by captchas, Akismet filters, and similar technologies. When more precise intervals between executions are necessary, let's exploit the feeds-subscriber calls and search indexers. Those can be recognized by request.META['HTTP_USER_AGENT'].

I shall point you to the experimental conceptual code of this idea soon.



Similar projects:
http://code.google.com/p/django-cron/
http://www.djangosnippets.org/snippets/1126/

How would you suggest to exploit spambots? What drawbacks do you see from the suggested approach?



P.S. To those who reside in Berlin! Today at 19:00 there will be Djangoers' meetup at newthinking store, Tucholskystraße 48, 10117 Berlin.

No comments:

Post a Comment