Our little spider trap caught another scraper who ignores robot.txt and it apparently igonres nofollow directives as well.
IP and User Agent of Weblexbot
WeblexBot (http://www. weblex .org/bot.html)
Who is Weblexbot?
Who knows? Legit crawlers don’t obscure their ownership in domain registrar data. Bogus name and phone number.
What is Weblexbot crawling my site?
The website claims:
Weblex.org is a search engine of a different color. We believe that the major engines have it all wrong, and we plan to do it better.
Our crawler, WeblexBot, respects robots.txt and crawls at a rate that will not disrupt your web site.
You can disallow the bot using the user-agent WeblexBot
However, our robots.txt clearly states that our trap is off limits.
How can I block Weblexbot Using Mod Security 2.x?
SecRule HTTP_User-Agent "Weblexbot" "deny,log,status:403"