Page 3 of 4 FirstFirst 1234 LastLast
Results 21 to 30 of 33

Thread: .htaccess ban list

  1. #21
    Join Date
    Feb 2006
    Location
    Somewhere where I don't know where I am
    Posts
    2,154

    Default

    Quote Originally Posted by chadderuski
    I take it that this method is used for bots that do NOT respect the robots.txt file? Is this really that common a problem?
    Yeah, this is manly for the robots that dont follow the robots.txt file. Although I do have some listed that DO follow the robots.txt file. I still have to delete those. For people like me, I don't want ANY robots in my site. So this method I can block, if needed, any robots and spammers that do not follow the set guide lines.
    Sign Up Now!
    Unlimited Storage, Unlimited Transfer, Host Unlimited domain names, 1 Free Domain Name
    BlueHost Features | BlueHost Help Desk | Become a BlueHost Affiliate | BlueHost CEO Blog
    (888) 401-4678 | Create a support ticket

  2. #22
    Join Date
    Feb 2006
    Posts
    6

    Default

    Quote Originally Posted by dvessel
    The way it was implemented as a drupal module I could check the logs on why they were rejected + a handy link to whois with the ip of the visitor. Whois has a spam blacklist listed and always they were clear.

    Could be related to proxies I guess.. not sure but it was always because the visitor couldn't accept headers. It just didn't add up so I removed it.
    Interesting. Most of my rejection reasons are rather clear (mostly malformed headers), so I'm probably OK for now.

    I'm just not a huge fan of the captcha concept, as I don't have good eyes anymore and just about all the examples I encounter make them hurt.

    Thanks,
    --TSK

  3. #23
    Join Date
    Mar 2006
    Location
    joisey
    Posts
    442

    Default

    Quote Originally Posted by T_S_Kimball
    Interesting. Most of my rejection reasons are rather clear (mostly malformed headers), so I'm probably OK for now.

    I'm just not a huge fan of the captcha concept, as I don't have good eyes anymore and just about all the examples I encounter make them hurt.
    I've read reports in the drupal community on how BB would lockout the admins under certain circumstance. It was fixed by bypassing BB for logged in users. The underlying rules to trigger BB I'm guessing is still there. Maybe it's just the way it was done in Drupal that caused the problem but I honestly don't have the need now to fight so aggressively towards spammers. Might look into it again when and if I gain a lot of traffic/spam.

    I know what you mean by the image captchas. I changed it to a simple math challenge captcha. (x + y) I tweaked the site for text based browsers and it made no sense using the image.
    {0,o}
    |)__)
    -"-"-
    On permanent hiatus...

  4. #24
    Join Date
    Nov 2007
    Location
    Chicago, IL
    Posts
    2

    Default

    Thanks for the list! -- Just used it on www.joeychgo.com

  5. #25

    Default

    Can anyone tell me what this code use for ? :confused:
    KIMSON Handicraft Co., LTD
    www.handicraft-vn.com

  6. #26
    Join Date
    Nov 2006
    Location
    Sydney, Australia
    Posts
    4,944

    Default

    Quote Originally Posted by kimsonvu View Post
    Can anyone tell me what this code use for ? :confused:
    It blocks access to your site for the selected useragents which are all used by known spambots.

  7. #27
    Join Date
    Mar 2008
    Location
    North Carolina
    Posts
    24

    Default Update on Preventing Unwanted Bots?

    I have a few bot questions:

    1) I've noticed a variety of bot activities in my server logs over a period of time that I think are attributed to the panscient.com crawler. Since the quoted code below (which was listed at the beginning of the thread) is several years old, I wanted to receive confirmation on whether adding this code -- as well as a few other bot recommendations provided in subsequent threads -- to the HTACCESS file is still the recommended way to block unwanted bots.

    2) If this is still the recommended procedure, can someone perhaps provide an UPDATED LIST?

    3) Just to clarify, I should I put this code into an HTACCESS file in the public_html directory, not just in the sub-domain that I'm concerned about? (I currently have one other sub-domain that I'm not so concerned about, as far as bot activity.)

    4) Is there any concern over allowing Twiceler to crawl one's websites? I notice it crawls my site quite frequently. Should I be blocking it, too?

    Thanks much for your feedback.

    Best,
    Deb

    Quote Originally Posted by areidmtm View Post
    I got tired to stupid bots crawling my site. So i created a .htaccess ban list. If you have any to add, please do so!

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^NPBot [NC,OR]
    RewriteCond %{REMOTE_ADDR} ^12.148.196.(12[8-9]|1[3-9][0-9]|2[0-4][0-9]|25[0-5])$ [NC,OR]
    RewriteCond %{REMOTE_ADDR} ^12.148.209.(19[2-9]|2[0-4][0-9]|25[0-5])$ [NC,OR]
    RewriteCond %{REMOTE_ADDR} ^63.148.99.2(2[4-9]|[3-4][0-9]|5[0-5])$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtractor [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailReaper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailMagnet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebPix [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mozilla/2.0\ \(compatible;\ NEWT\ ActiveX;\ Win32\) [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCollector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PICgrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^mister\ pix [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^x-Tractor [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebMiner [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSnake [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Webdup [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebBandit [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^teleport [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteCopy [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ninja [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MSIECrawler [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JoBo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Extreme\ Picture\ Finder [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [NC,OR]
    RewriteCond %{REMOTE_ADDR} ^64.140.49.6([6-9])$ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ClariaBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Diamond [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^[a-z]+$ [NC]

    RewriteRule ^.* - [F,L]

  8. #28
    Join Date
    Feb 2006
    Location
    Somewhere where I don't know where I am
    Posts
    2,154

    Default

    I'm using the bot trapper method. Any bot that does not follow the standard rules of the robots.txt file will be banned automatically.

    In my robots.txt file I specify what bots I allow to crawl my site.

    Code:
    User-agent: Googlebot
    Disallow: /bot-trap/
    
    User-agent: MSNBot
    Disallow: /bot-trap/
    
    User-agent: Slurp
    Disallow: /bot-trap/
    
    User-agent: *
    Disallow: /
    Then on my page at the very top I have a 1 pixel link

    HTML Code:
    <a href="/bot-trap/index.php"><img src="pixel.gif" width="1" height="1" alt=" " border="0" /></a>
    If any bot follows that link they will be blocked by the index.php file inside /bot-trap/. I will also receive and email when that happens.

    PHP Code:
    <?php
    $agent 
    explode(' '$_SERVER['HTTP_USER_AGENT']);

    $to 'support@domain.com';
    $subject 'Bad Bot Alert';
    $message "A bad bot hit " $_SERVER['HTTP_HOST'] . " on " date('F d, Y \a\t g:i a') . " \n";
    $message .= "IP: " $_SERVER['REMOTE_ADDR'] . " \n";
    $message .= "Agent: " $_SERVER['HTTP_USER_AGENT'] . " \n";
    $headers "From: support@domain.com\r\n" .
               
    "Reply-To: support@domain.com\r\n" .
               
    'X-Mailer: PHP/' phpversion();

    mail($to$subject$message$headers); 

    $contents file_get_contents($_SERVER['DOCUMENT_ROOT'] . '/.htaccess');
    $lines explode("\r\n"$contents);
    $start array_search('#START_SPAM_BOTS'$lines);
    $chunk array_chunk($lines$start);

    $deny 'Deny from env=bad_bot';
    $endTag '#END_SPAM_BOTS';
    $c count($chunk[1]);

    $chunk[1][$c-2] = 'SetEnvIfNoCase User-Agent "' $agent[0] . '" bad_bot';
    $chunk[1][$c-1] = $deny;
    $chunk[1][$c] = $endTag;

    $ht array_merge($chunk[0], $chunk[1]);

    foreach(
    $ht as $line) {
        
    $ht_lines .= $line "\r\n";


    $ht_lines rtrim($ht_lines);

    $handle fopen($filename'w');
    fwrite($handle,$ht_lines);
    fclose($handle);

    header('Location:' $_SERVER['HTTP_HOST']);
    ?>
    This will add a line to the .htaccess file that will block that spam bots user agent.

    In your .htaccess file there needs to be this section
    Code:
    #START_SPAM_BOTS
    #END_SPAM_BOTS
    My .htaccess file looks like

    Code:
    #START_SPAM_BOTS
    SetEnvIfNoCase User-Agent "SurveyBot/2.3" bad_bot
    SetEnvIfNoCase User-Agent "ia_archiver" bad_bot
    SetEnvIfNoCase User-Agent "psycheclone" bad_bot
    SetEnvIfNoCase User-Agent "SBIder/0.8-dev" bad_bot
    SetEnvIfNoCase User-Agent "Mozilla/1.0" bad_bot
    SetEnvIfNoCase User-Agent "Mozilla/2.0" bad_bot
    SetEnvIfNoCase User-Agent "Mozilla/3.0" bad_bot
    SetEnvIfNoCase User-Agent "aipbot/1.0" bad_bot
    SetEnvIfNoCase User-Agent "Java/1.4.1_04" bad_bot
    SetEnvIfNoCase User-Agent "Java/1.5.0_06" bad_bot
    SetEnvIfNoCase User-Agent "Java/1.5.0_02" bad_bot
    SetEnvIfNoCase User-Agent "Java/1.5.0_10" bad_bot
    SetEnvIfNoCase User-Agent "Python-urllib/1.16" bad_bot
    SetEnvIfNoCase User-Agent "Snapbot/1.0" bad_bot
    SetEnvIfNoCase User-Agent "TMCrawler" bad_bot
    SetEnvIfNoCase User-Agent "findlinks/1.1.1-a5" bad_bot
    SetEnvIfNoCase User-Agent "topicblogs/0.9" bad_bot
    SetEnvIfNoCase User-Agent "Moreoverbot/3.00" bad_bot
    SetEnvIfNoCase User-Agent "Xenu" bad_bot
    SetEnvIfNoCase User-Agent "MJ12bot/v1.0.8" bad_bot
    SetEnvIfNoCase User-Agent "dragonfly(ebingbong#playstarmusic.com)" bad_bot
    SetEnvIfNoCase User-Agent "EmailSearch" bad_bot
    SetEnvIfNoCase User-Agent "Java/1.6.0_04" bad_bot
    SetEnvIfNoCase User-Agent "CCBot/1.0" bad_bot
    SetEnvIfNoCase User-Agent "Microsoft" bad_bot
    SetEnvIfNoCase User-Agent "DomainCrawler/1.0" bad_bot
    SetEnvIfNoCase User-Agent "Mozilla/4.0" bad_bot
    SetEnvIfNoCase User-Agent "Java/1.6.0_11" bad_bot
    Deny from env=bad_bot
    #END_SPAM_BOTS
    I'm sure you could probably find a better way of doing this. I have used this for years and it works for me. Try searching google for Bot Trapper.
    Sign Up Now!
    Unlimited Storage, Unlimited Transfer, Host Unlimited domain names, 1 Free Domain Name
    BlueHost Features | BlueHost Help Desk | Become a BlueHost Affiliate | BlueHost CEO Blog
    (888) 401-4678 | Create a support ticket

  9. #29
    Join Date
    Jan 2008
    Location
    cardboard box
    Posts
    389

    Default

    What if a bot copies a browser's user agent?

    For example, I don't think IE7 users can browse your site. Or really any browsers from Mozilla 1-4.
    Have you tried turning it off and on again?

  10. #30
    Join Date
    Feb 2006
    Location
    Somewhere where I don't know where I am
    Posts
    2,154

    Default

    Quote Originally Posted by wysiwyg View Post
    What if a bot copies a browser's user agent?

    For example, I don't think IE7 users can browse your site. Or really any browsers from Mozilla 1-4.
    Ah, you're right. I didn't notice the Mozilla/4.0. Good catch. Mozilla 1-3 are really really old browsers. Like IE 5 and below. With my site, I really don't need to target everyone since it's only for personal.

    If a bot is faking a user agent, then the only thing you can do is ban the IP address.

    But like I said, I'm sure there are better way. This was just happens to work out well for my use.
    Sign Up Now!
    Unlimited Storage, Unlimited Transfer, Host Unlimited domain names, 1 Free Domain Name
    BlueHost Features | BlueHost Help Desk | Become a BlueHost Affiliate | BlueHost CEO Blog
    (888) 401-4678 | Create a support ticket

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •