Spiders
You are currently browsing the articles from Server Fu matching the category Spiders.
Noticing that Linkwalker requests robots.txt and is followed by BDFetch.
Example of the Linkwalker robot:
72.14.164.165 - - "GET /robots.txt HTTP/1.1" 406 1326 "www.seventwentyfour.com/" "LinkWalker/2.0"
Examples of the BDFetch robot:
72.14.164.177 - - "GET /robots.txt HTTP/1.1" 5816 "www.brandimensions.com" "BDFetch"
72.14.164.193 - - "GET /index.html HTTP/1.1" 5816 "www.brandimensions.com" "BDFetch"
72.14.164.173 - - "GET /robots.txt HTTP/1.1" 5816 "www.brandimensions.com" "BDFetch"
72.14.164.191 - - "GET /index.html HTTP/1.1" 5816 "www.brandimensions.com" "BDFetch"
72.14.164.177 - - "GET /robots.txt HTTP/1.1" 5816 "www.brandimensions.com" "BDFetch"
72.14.164.188 - - "GET /index.html HTTP/1.1" 5816 "www.brandimensions.com" "BDFetch"
Solution to bandwidth gobbling bots in this IP range:
Add these to the deny list on the router.
72.14.163.0/24
72.14.164.0/24
Written by admin on September 28th, 2009 with comments disabled.
Read more articles on How To and Spiders.
151.138.13.244 – - [May/2009] “HEAD /foo.htm HTTP/1.0″ 200 – “-” “SuperPagesBot/0.1″
151.138.13.244 – - [May/2009:13:59:11 -0400] “GET /foo.htm HTTP/1.0″ 200 8967 “-” “SuperPagesBot/0.1″
151.138.13.244 – - [May/2009] “GET /foo.htm HTTP/1.0″ 406 1314 “-” “Lynx/2.8.5rel.1 libwww-FM/2.14″
151.138.13.244 – - [May/2009] “GET / HTTP/1.0″ 406 1314 “-” “Lynx/2.8.5rel.1 libwww-FM/2.14″
151.138.13.244 – - [May/2009] “HEAD /directory HTTP/1.0″ 301 – “-” “SuperPagesBot/0.1″
151.138.13.244 – - [May/2009] “HEAD /directory/ HTTP/1.0″ 200 – “-” “SuperPagesBot/0.1″
Whois IP 151.138.13.244?
OrgName: Idearc Media Corp
OrgID: IMC-97
Address: 2200 W Airfield Drive
City: DFW Airport
StateProv: TX
PostalCode: 75261
Country: US
Written by admin on May 5th, 2009 with comments disabled.
Read more articles on Spiders.
We have been getting a lot of hits lately emanating from Verisign. The ips-agent robot requests robots.txt but we are just not going to allow their domain browsing.
Example Server Logs:
69.58.178.36 - - "GET /robots.txt HTTP/1.1" 406 261 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12; ips-agent) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7"
69.58.178.39 - - "GET / HTTP/1.1" 406 251 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12; ips-agent) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7"
69.58.178.29 - - "GET / HTTP/1.1" 406 251 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12; ips-agent) Gecko/20050922 Fedora/1.0.7-1.1.fc4 Firefox/1.0.7"
That particular IP range resolves to:
root@server[~]# whois 69.58.178.29
OrgName: VeriSign Infrastructure & Operations
OrgID: VIO-2
Address: 21345 Ridgetop Circle
City: Dulles
StateProv: VA
PostalCode: 20166
Country: US
We have their robot blocked via ModSecurity 2+ rules.
Written by admin on July 4th, 2008 with comments disabled.
Read more articles on ModSecurity and Spiders.
ModSecurity, a very helpful Linux tool for preventing unwanted server intrusions, caught a user using the program Python-urllib trying to access on of my websites.
Python-urllib is often used by web users as an email harvester. It is also used by Hanzo to archive websites.
216.246.65.100 – -”GET HTTP/1.1″ 406 269 “-” “Python-urllib/2.4″
This means that our ModSec rules identified the useragent ‘Python-urllib’ and return ed a ‘406′ response which means Not Acceptable.
216.246.65.100 resolves back to unknown.ord.servercentral.net.
Written by admin on July 4th, 2008 with comments disabled.
Read more articles on ModSecurity and Spiders.
This robot, “obot” emanating from Ripe net was trying to spider the server and did not respect robots.txt. It is now banned.
194.153.113.8 – - [26/Jun/2008:16:19:12 -0400] “GET / HTTP/1.1″ 404 8686 “-” “Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0; obot)”
The IP referenced belongs to cobion.com.
Written by admin on June 29th, 2008 with comments disabled.
Read more articles on Spiders.