BurnTees
11-23-2009, 07:02 AM
Bluehost suspended my account because apparently I'm bogging down the servers. My site basically runs one script called CPShop which brings in products from my CafePress site so that I can sell them at my .com site instead of thru CafePress.
From the guy who wrote the script:
I noticed the same thing on my development server a couple of days ago. The culprit, as it turned out, was a combination of MSN and Yahoo both attempting to spider every page in the shop at an insane rate -- around 40-60 unique pages per second, in a fairly small store. In as short as a minute, the server would come to a screeching halt.
As a temporary measure, I blocked them using robots.txt, which fixed the problem in seconds.
The correct long-term way to do it is to limit the crawl rates for each of the major bots:
Google: http://www.google.com/support/webmasters/bin/answer.py?answer=48620
Yahoo & Bing/MSN: http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html (scroll down to the part about crawl delay)
the process I went through was:
1. Disabled the web server (which Bluehost has done for you).
2. Edited robots.txt to block all robots from spidering, just as a temporary measure:
User-agent: *
Disallow: /index.cgi/
3. Re-enabled web server.
4. Watched site to make sure load didn't make it fall over again.
5. Logged into Google Webmaster Tools, and set the crawl rate to something reasonable. (And actually, Google was never the problem.)
6. Re-edited robots.txt to limit msnbot and Slurp:
User-agent: *
crawl-delay: 15
User-agent: *
Disallow: /index.cgi/bt/
Disallow: /index.cgi/kippygo/
Disallow: /index.cgi/peonproductions/
Disallow: /index.cgi/wackywade/
Disallow: /index.cgi/bizarretees/
#repeat for every one of your aliases
7. Once you've proven that your site isn't coming down, you can remove the second section, and just stick with the first:
User-agent: *
crawl-delay: 15
This all seems very helpful, but I have no idea what it means and I don't know where to go to edit my robots.txt file? Does anyone have any advice here?
THANK YOU in advance for your help.
From the guy who wrote the script:
I noticed the same thing on my development server a couple of days ago. The culprit, as it turned out, was a combination of MSN and Yahoo both attempting to spider every page in the shop at an insane rate -- around 40-60 unique pages per second, in a fairly small store. In as short as a minute, the server would come to a screeching halt.
As a temporary measure, I blocked them using robots.txt, which fixed the problem in seconds.
The correct long-term way to do it is to limit the crawl rates for each of the major bots:
Google: http://www.google.com/support/webmasters/bin/answer.py?answer=48620
Yahoo & Bing/MSN: http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html (scroll down to the part about crawl delay)
the process I went through was:
1. Disabled the web server (which Bluehost has done for you).
2. Edited robots.txt to block all robots from spidering, just as a temporary measure:
User-agent: *
Disallow: /index.cgi/
3. Re-enabled web server.
4. Watched site to make sure load didn't make it fall over again.
5. Logged into Google Webmaster Tools, and set the crawl rate to something reasonable. (And actually, Google was never the problem.)
6. Re-edited robots.txt to limit msnbot and Slurp:
User-agent: *
crawl-delay: 15
User-agent: *
Disallow: /index.cgi/bt/
Disallow: /index.cgi/kippygo/
Disallow: /index.cgi/peonproductions/
Disallow: /index.cgi/wackywade/
Disallow: /index.cgi/bizarretees/
#repeat for every one of your aliases
7. Once you've proven that your site isn't coming down, you can remove the second section, and just stick with the first:
User-agent: *
crawl-delay: 15
This all seems very helpful, but I have no idea what it means and I don't know where to go to edit my robots.txt file? Does anyone have any advice here?
THANK YOU in advance for your help.