DreamHost Blocks Googlebot?


The owner of Romanian site Zoso.ro allegedly received an email recently from his web host Dreamhost ordering him to block Googlebot from his sites that have high traffic as it was causing heavy load on the webserver.

Here’s the alleged email sent by Dreamhost:

This email is to inform you that a few of your sites were getting hammered by Google bot. This was causing a heavy load on the webserver, and in turn affecting other customers on your shared server. In order to maintain stability on the webserver, I was forced to block Google bot via the .htaccess file.

[Limit GET HEAD POST
order allow,deny
deny from 66.249
allow from all]

You also want to consider making your files be unsearchable by robots and crawlers, as that usually contributes to high number of hits. If they hit a dynamic file, like php, it can cause high memory usage and consequently high load…

I moved to Dreamhost just over a month now and so far I haven’t encountered any hosting problems related to Googlebot nor have I received any emails like the one above. It could probably be because my blog doesn’t have much site traffic. Googlebot is known to take up an enormous amount of bandwidth causing websites to exceed their bandwidth limit and be taken down temporarily. I guess a part of this story is true, that some of the sites involved did cause a heavy load on the shared server due to site traffic and Googlebot activity.

I’m not sure about this but it seems to be an unlikely move by an established web hosting company such as Dreamhost. I’m thinking that this is just a hoax and I haven’t confirmed if this story is true or not. But if it were true, it would definitely affect the image of Dreamhost and the confidence of current and future customers.

Hopefully someone from Dreamhost gets to read this and can give an explanation or confirm if this story is true or not. I’ll update this post as soon as I confirm the validity of this story.

Owner and editor of JaypeeOnline. Self-proclaimed geek. New media writer and consultant. WordPress advocate. Loves blogging, gadgets, video games and sports. You can follow him on Google+, Facebook or Twitter.

10 Comments

  1. JP Habaradas

    June 3, 2007 at 3:05 AM

    @Sarj – What type of error page are you talking about? Have you been able to fix this issue yet?

  2. sarj

    June 1, 2007 at 7:26 AM

    Yup. but there are times when my visitors land on an error page. Grr. But after a while, it returns to normal.

  3. JP Habaradas

    May 31, 2007 at 1:24 PM

    @Sarj – So you're also hosted on Dreamhost? So far, I haven't had any issues of that kind and hopefully I won't ever have to deal with it in the future. If Dreamhost didn't send you any warning or notice about Googlebot, you don't have to block it. Anyways, you're welcome! :)

  4. Sarj

    May 30, 2007 at 11:01 AM

    Oh, so that explains why there were times when my site experiences internal error which disables the site temporarily. At first I thought my site got banned (haha!) or there's a problem with the hosting. Turns out, it was Dreamhost. Joni, have I told you that?:P Anyway, maybe I'd just have to block googlebot from my site forever then. Thanks for the info! :)

  5. JP Habaradas

    May 28, 2007 at 1:15 AM

    @Nick – Thanks for the info you provided. So this story is confirmed then. Dreamhost does block Googlebot when they deem necessary. I dunno if other hosts like Media Temple do this too. I hope that Dreamhost and Google work something out or find a solution to this issue.

  6. Nick Tan

    May 28, 2007 at 12:02 AM

    Incredibly, Dreamhost responded to my email, and confirmed this. I think it's great that they responded, and that they responded quickly, and that they gave a clear answer, which is more than what you get at most web hosts. I'm not sure I buy what they are saying, though. Even if you have a flood of crawlers hammering your site, it won't sink your server unless your server is nearly maxed out to begin with… But hey, you get what you pay for.

    Nick

    Reply from Dreamhost:

    As of lately we have been working on our very heavy usage customers (people using over a quarter to in some cases 250% of what the full server should be processing). In excess of half of these cases the cause is google's crawler malfunctioning in how it interacts with the site resulting in heavily loaded or even crashing machines (I have had cases where googlebot has over 95% of the last 10,000 hits on a site that doesn't even have that many pages). Our terms of service

    (http://www.dreamhost.com/tos.html) specifically state that "if your processes are adversely affecting server performance disproportionately DreamHost Web Hosting reserves the right to negotiate additional charges with the Customer and/or the discontinuation of the offending processes" so that we can ensure that we keep machines working for everyone on them

    and not just one user. In a case where a faulty googlebot interaction is killing a machine we have two options:

    1. disable the site

    2. block the bot

    We feel that the best solution for our customers is to stop just the malfunctioning behavior and keep _everyone's sites working as well as possible on the machine. The blocks are always removable for any customer, but we just ask them to slow down the crawl speed of the bot by

    going to the following address:

    https://www.google.com/webmasters/tools/siteoverv

    This will allow Googlebot to run on our servers without any problems, thus no block needed. We have been working with a Googlebot engineer on this issue as Google is aware of this issue themselves, but until then we're trying everything possible to keep all customers services up and running with other people bring down a whole server due to googlebot

    malfunctioning on a dynamic page that causes high loads.

    Let me know if you have any other questions.

    Thanks,

    Mike P.

  7. JP Habaradas

    May 27, 2007 at 12:08 AM

    @Nick – I still have to confirm this story. Let me know what you find out about it. Hehe, you're right. Even the Chinese government don't completely block Google. :D

  8. Nick Tan

    May 26, 2007 at 10:34 PM

    Yikes… We were planning to move to Dreamhost from Powweb as soon as we can afford their dedicated server. Will have to investigate this. On the Internet, Google is GOD and you do not block God.

  9. JP Habaradas

    May 22, 2007 at 4:12 PM

    @Joni – They "allegedly" sent that email. Still trying to find out if this story is true or not. I know, being a web host they should expect stuff like this. We both don't have to worry bout these things. Hehe :D

  10. Joni

    May 22, 2007 at 2:43 PM

    Huwat! Dreamhost sent that email? That is outrageous. The problem is with their server and not on their client's website! Weird.

    I have Dreamhost hosting too but I also know I won't encounter this kind of problem! My blogs don't get that much traffic too hahaha

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">