We have been getting a lot of hits lately emanating from Verisign. The ips-agent robot requests robots.txt but we are just not going to allow their domain browsing.
Example Server Logs:
69.58.178.36 - - "GET /robots.txt HTTP/1.1" 406 261 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12; ips-agent) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7"
69.58.178.39 - - "GET / HTTP/1.1" 406 251 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12; ips-agent) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7"
69.58.178.29 - - "GET / HTTP/1.1" 406 251 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12; ips-agent) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7"
That particular IP range resolves to:
root@server[~]# whois 69.58.178.29
OrgName: VeriSign Infrastructure & Operations
OrgID: VIO-2
Address: 21345 Ridgetop Circle
City: Dulles
StateProv: VA
PostalCode: 20166
Country: US
We have their robot blocked via ModSecurity 2+ rules.
Written by admin on July 4th, 2008 with comments disabled.
Read more articles on ModSecurity and Spiders.
Lately, there has been a huge amount of annoying website scrappers combing through our sites using the useragent below.
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Obviously, legitimate human visitors do not have the term “User-Agent:” in their useragent field. Months ago I had added a ModSecurity rule to help identify and block these bandwidth wasters and copyright infringer’s.
Example of log details:
69.14.204.163 – - [04/Jul/2008:13:09:58 -0400] “GET / HTTP/1.1″ 410 317 “-” “User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)”

Thumbs Down To "User-Agent" Scrappers
STEPS
1) Using ModSecurity 2.0+, I instituted this rule to modsec2.conf located in usr/local/apache/conf/
SecRule HTTP_User-Agent “User-Agent” “deny,log,status:410″
2) Restart Apache
/sbin/service httpd restart
Written by admin on July 4th, 2008 with comments disabled.
Read more articles on How To and ModSecurity.
ModSecurity, a very helpful Linux tool for preventing unwanted server intrusions, caught a user using the program Python-urllib trying to access on of my websites.
Python-urllib is often used by web users as an email harvester. It is also used by Hanzo to archive websites.
216.246.65.100 – -”GET HTTP/1.1″ 406 269 “-” “Python-urllib/2.4″
This means that our ModSec rules identified the useragent ‘Python-urllib’ and return ed a ‘406′ response which means Not Acceptable.
216.246.65.100 resolves back to unknown.ord.servercentral.net.
Written by admin on July 4th, 2008 with comments disabled.
Read more articles on ModSecurity and Spiders.