Just had an interesting bot that got caught in our spiderweb. If you crawl a directory prohibited in the robots.txt you IP and user-agent will be logged and you IP banned.
This YPBot / Raven1.1.3 user agent is a new one to me. What is even more interesting is that it is using the Googlebot identifier in the user-agent line.
Offending IP Address: 18.104.22.168
rwhois = null.ev1.yellowpages.com
YPBot/Raven1.1.3 (compatible; Googlebot/2.1;+http://www.yellowpages.com/about/legal/crawl)
Tell me more about the YPBOT
In an effort to provide quality local search experience on the Web, YELLOWPAGES.COM and/or YP.COM (“YPC”) deploys a robot to collect information from the web. Our mission is to help consumers find the products and services they need. Part of this mission is having accurate and up-to-date information about the businesses we connect to consumers. Our crawlers are simply verifying and augmenting data that can be found at yp.com.
If you have any question about our crawling process or would like to be removed from the crawl list, please send e-mail to firstname.lastname@example.org with the name, address, website address, and contact number for your business.