Blocking Baidu and Yandex

Discussion in 'Administrating Your Community' started by Michael, Apr 8, 2010.

  1. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    I have this in my htaccess file:
    RewriteCond %{HTTP_USER_AGENT} ^baiduspider [NC]
    RewriteRule .* - [F]

    RewriteCond %{HTTP_USER_AGENT} ^yandex [NC]
    RewriteRule .* - [F]

    and also this in robots.txt:

    User-agent: Baiduspider
    Disallow: /
    User-agent: Yandex
    Disallow: /


    Yet theyre still accessing the forum. Does anyone have any other ideas to put a complete stop to both bots from accessing a site? They seem to ignore the robots and get around the htaccess rules.
  2. Angelic

    Azhria Lilu Barry & Brad Bodyswapping?

    Likes Received:
    1,054
    Software You Use:
    IPB, XenForo
    From what I'm reading, Baidu completely ignores the robot.txt file and is causing trouble for lots of people. The recommended action appears to be to edit the .httpd.conf file to say:-


    Code:
    SetEnvIfNoCase User-Agent "^Baidu" bad_bot
    <Directory />
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Directory>
    I can't confirm or deny how well this works.. just the most popular suggestion I'm seeing across the net.
  3. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    Thank you very much I will try that, hopefully it will work and stop the need for the rules in each sites htaccess and robots.txt as I dont want on them on any of our sites :p
  4. Angelic

    Azhria Lilu Barry & Brad Bodyswapping?

    Likes Received:
    1,054
    Software You Use:
    IPB, XenForo
    It appears that neither spider takes a blind bit of notice of the robots.txt so anything in there will be completely ignored. I've read a few cases where it's spidering forums set as private areas as well.
  5. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    :p You would think they would respect them wouldnt you. I always thought having such rules is like a no to them, please bugger off.

    The main reason im wanting rid of them is that everytime they come online they dont just bring one, there is at least 100 spiders crawling fast as hell and our VPS cant handle that as well as good bots, guests and members, not at the rate they crawl at either and I dont want to upgrade to compensate them when we get little to no traffic from them each month :D It just doesnt seem worth it lol

    I have added that ruleset using the includes editor in WHM, hope it works!
  6. Angelic

    Azhria Lilu Barry & Brad Bodyswapping?

    Likes Received:
    1,054
    Software You Use:
    IPB, XenForo
    Chinese Spider... I guess they make their own rules lol
  7. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    You'd think that you could complain about it or something, I am sure that we would all be really annoyed and complaining if Google was crawling our sites and we had no choice but to accept that they ignore robots.txt rules. :p
  8. MjrNuT Shaft Central-ish

    Likes Received:
    32
    LOL!! :rofl:
  9. Cowboy

    Shawn Gossman Well-Known Member

    Likes Received:
    60
    What is so bad about them? Do they suck bandwidth or something like that?
  10. Mooooody

    Barry Probably not Brad ;)

    Likes Received:
    439
    Software You Use:
    XenForo, SMF
    I know nothing about coding, but the following code has been added to a forum with Baiduspider problems and it stopped them coming.

    RewriteEngine On

    <Files *.*>
    order allow,deny
    allow from all
    deny from 220.181.
    </Files>
  11. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    Well no it is the amount of them that is coming online, it isnt just one bot, it is up to 100 and once and for every bot that could be a user online. Our forum doesnt gain much traffic from baidu's search engine, roughly 20 visits per month from them and I dont think it ever will increase much, so I would rather block them and allow for 100 extra guests or users online. They do use bandwidth which I am not fussed over at all, it is just the amount theyre sending, it isnt just one or two. It isnt just baidu either, yandex spider also behaves the same.

    Thanks Barry, I have added a whole IP list along with the one you posted to the htaccess just in case they get on somehow.
  12. Mooooody

    Barry Probably not Brad ;)

    Likes Received:
    439
    Software You Use:
    XenForo, SMF
    Can you post back the reults in a couple of days?

    That way we can see what is working.
  13. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    Certainly. I just hope they dont come back :rofl: If they get through all of these measures god knows what will keep them out :X3:
  14. MjrNuT Shaft Central-ish

    Likes Received:
    32
    Well, I'm getting hit with Baidu today...but don't see/notice any issues thus far. About 36 right now....
  15. MikeDVB Web Host Extraordinaire!!

    Likes Received:
    13
    For a client I ended up blocking the Baidu IP ranges in the firewall because they were doing thousands upon thousands of hits one very page every day draining up to 200gb/month from the client.

    The problem with blocking them in a .htaccess is that they'll still see a "Forbidden" page which will be hits on the server and CPU cycles although it's a static page it won't use much it will *still* use up connections and resources even on a small scale.

    In a shared environment dropping Baidu in the system firewall simply won't be an option at most providers but on a VPS/Dedicated it would be up to you.
    • Like Like x 1
  16. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    Would adding them to the firewall be more effective than having them rules posted earlier in apache's config?
  17. Paul M vb.org Administrator

    Likes Received:
    59
    I dont block any spiders, never seen a need.
  18. MikeDVB Web Host Extraordinaire!!

    Likes Received:
    13
    Yes in that they wouldn't even be able to open a two-way connection much less request any files.
  19. Egghead

    Michael Well-Known Member

    Likes Received:
    81
    Do you know how to add baidu within csf at all? :p
  20. MikeDVB Web Host Extraordinaire!!

    Likes Received:
    13
    You would just add their IP ranges to the csf.deny file and then restart CSF.

Share This Page