+ Reply to Thread
Results 1 to 6 of 6

Thread: banning using user agent in .htaccess

  1. #1
    Join Date
    Jun 2006
    Posts
    6

    Default banning using user agent in .htaccess

    Here is a really newbie question. I am banning robots using .htaccess by both ip and user agent. For user agents I use a block of code like this...

    SetEnvIfNoCase User-Agent "^Mo College 1.9" bad_bot

    <Limit GET POST>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>

    When I use cp panel/file manager to edit the .htaccess file and try to ban a longer user agent, the line splits into two lines and then is not recognized by the compiler. How can I ban longer user agent like this...


    SetEnvIfNoCase User-Agent "^Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)" bad_bot

    Is there a line continuation character or something?
    I hope someone can help.

  2. #2
    Early Out's Avatar
    Early Out is offline Former Moderator, Still Respected
    Join Date
    Mar 2006
    Location
    Sector R
    Posts
    4,650

    Default

    It's just a text file, so I'd just copy .htaccess to my PC, use Notepad (or any other text editor) to make the desired changes, then FTP it back to the server.

  3. #3
    Join Date
    Jun 2006
    Posts
    6

    Default

    duh. I guess I should have tried that before posting. Thanks,

  4. #4

    Default

    Quote Originally Posted by jlo View Post
    Here is a really newbie question. I am banning robots using .htaccess by both ip and user agent. For user agents I use a block of code like this...

    SetEnvIfNoCase User-Agent "^Mo College 1.9" bad_bot

    <Limit GET POST>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>

    When I use cp panel/file manager to edit the .htaccess file and try to ban a longer user agent, the line splits into two lines and then is not recognized by the compiler. How can I ban longer user agent like this...


    SetEnvIfNoCase User-Agent "^Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1)" bad_bot

    Is there a line continuation character or something?
    I hope someone can help.
    I see this post is really old, but this seems to be the closest google has gotten me to an answer.

    My site is very country specific, it serves no beneficial purpose to anyone outside South Africa, but as its an online store it gets quite abused by non-South African traffic to the point that its started negatively affecting my server, my only solution was to limit the access by ip address to South African IP addresses only. The problem with that is i am now effectively blocking search engines too.

    What i was hoping is to get some help adjusting the htaccess file to allow the bots by user-agent instead of searching for an adding thier ip addresses to the list.

    My htaccess looks something like

    <Limit GET HEAD POST>
    order deny,allow
    # Country: SOUTH AFRICA
    # ISO Code: ZA
    # Total Networks: 882
    # Total Subnets: 15,870,464
    allow from 41.0.0.0/11
    allow from 41.48.0.0/13
    allow from 41.66.64.0/18
    allow from 41.72.128.0/19
    #
    deny from all
    </Limit>

    The list of ip addresses is much larger, but unnecessary for this question.
    The most important user agents are

    User-agent: Teoma
    User-agent: ia_archiver
    User-agent: msnbot
    User-agent: Slurp
    User-agent: Googlebot

    Can i just include these int eh htaccess and it will work or are there other lines i need to add in.

    Thanks a lot for any assistance

  5. #5
    Join Date
    Nov 2006
    Location
    Sydney, Australia
    Posts
    4,533

    Default

    Why not find out what IP addresses the search engines use and add those.

    Both Internet Explorer and Firefox allow the useragent to be set to anything at all and so anyone using either of those could easily get around any useragent test.

  6. #6

    Default

    Quote Originally Posted by felgall View Post
    Why not find out what IP addresses the search engines use and add those.

    Both Internet Explorer and Firefox allow the useragent to be set to anything at all and so anyone using either of those could easily get around any useragent test.
    I do know where to find the ip addresses, but googlebot alone uses over 100 ip addresses and it is reported that they do change, if possible it would be better if not fool proof to use the useragent, if users really want to go through all the trouble to change their useragent to see my site that's fine, if they want to go through all that effort for effectively nothing then by all means.

    I just need to bots to be able to access is hopefully without having to search every few weeks to make sure i have all the correct ip addresses.

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts