Protect your website from hacker attacks and malicious bots, and optimize resources – II
in previous article, we saw that, Unfortunately, exist bot malevoli, whose purpose is to probe the network, in search of weaknesses of our site. It matters little that the site is based on wordpress (per se, very safe, but its weak point is the plugin), Coppermine Gallery, Joomla, or use PHP code written by us.
It is useful, Then, disable the indexing of folders, and avoid having too many pages with names “recognizable”, type login.php, to make life more difficult for these damn bots!
However, last year, after having had several problems of slowing, I studied the matter further, to try to remove all the weaknesses, foreclosing access to bots and visitors “malicious” that NOT have an interest to visit the site, but only to consume resources and, possibly, be able to edit its content.
We will analyze the contents of two very important files, that should be placed in the root of our site, and allow us to restrict access to some or all of the file, from certain IP addresses or bot.
These files are called robots.txt and .htaccess.
What to put in the htaccess file
- Block all traffic from China
I do not know about you but, in my case, traffic “real” from China is poor, if not very little, and is largely behave consistently from botnet that connect to my site, and shall perform continuously indexing, looking for any weaknesses. This characteristic, Unfortunately, is shared by Ukraine and Russia.
To overcome this,, we will use a method “brutal”: will block access to the entire Asian giant.
go on countryipblocks.net, spuntiamo “CHINA->.htaccess deny” and click on “Create ACL”, and copy the result on top of our htaccess file
- Deters hackers Indonesian Islamic zone-h
These little people, Years, probing the network, and perform thousands of hits to your site, trying to load a test file “nyet.gif” on your FTP space.
In case of success, have proof that your site is an ideal target, and continue with the hack real.
To avoid that these subhuman wasting even a single “drop” of the resources of your server, we block a priori access to the generic file nyet.gif, inserting, in .htaccess:
#nyet.gif islamic script kiddies <files "nyet.gif"> Order Allow,Deny Deny from all </files>
- Always provide your robots.txt file and an error page, even if you block an IP / bot
If you do not explicitly permit access to the page “Access denied” (HTTP error 403), there is the risk of creating a loop, with the bot to turn tries all the time to access, without that manages to “figure out” because there can.
Likewise, It always provide the robots.txt file, if the bot banned you finally decide to rispiettarne directives.
#avoid redirect loop when blocking bots - always delivery robots.txt, http403 and 410 pages <FilesMatch "(^ 403 .shtml $|^ 410 .shtml $|^robots.txt$)"> Order Allow,Deny Allow from all </FilesMatch>
- Blocks access to pages like login.php or wp-login.php
As we saw in the previous article, many bots shoot in the dark, and try to access, Maybe, to a login page in WordPress, perhaps hoping that the password is “admin” the “Password”. 🙂
For this purpose, should rename all your pages of type “login.php”, use the plugin Rename wp-login.php by WordPress, and insert, htaccess file in the:
#blocks access to malicious login and wp-login in the root and subfolder <FilesMatch "(^wp-login.php$|^login.php$|^login.aspx$)"> Order Allow,deny Deny from all </FilesMatch>
- Blocks access to malicious bot
There are many, are annoying, do not create any value added tax and, Unfortunately, continue to change as the years: are the bot malware that is well exclude, for a million good reasons. 🙂
Also, in so doing, let's stop even curl and wget, avoiding that one can make a copy 1:1 your site.
RewriteEngine on #inherit from root htaccess and append at last, necessary in root too RewriteOptions inherit
#block bad bots RewriteCond %{HTTP_USER_AGENT} ^ $ [or] RewriteCond %{HTTP_USER_AGENT} 360Spider [or] RewriteCond %{HTTP_USER_AGENT} A(?:ccess|PPID|hrefsBot) [NC,or] RewriteCond %{HTTP_USER_AGENT} C(?:turnpike|bind|opy|Rawl|url) [NC,or] RewriteCond %{HTTP_USER_AGENT} D(?:they|evSoft|the(?:main|wnload)) [NC,or] RewriteCond %{HTTP_USER_AGENT} and(?:have got|zooms) [NC,or] RewriteCond %{HTTP_USER_AGENT} filter [NC,or] RewriteCond %{HTTP_USER_AGENT} genieo [NC,or] RewriteCond %{HTTP_USER_AGENT} And(?:map|will) [NC,or] RewriteCond %{HTTP_USER_AGENT} LI(?:brary|nk|bww) [NC,or] RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,or] RewriteCond %{HTTP_USER_AGENT} nutch [NC,or] RewriteCond %{HTTP_USER_AGENT} Pr(?:Oxygen|Blish) [NC,or] RewriteCond %{HTTP_USER_AGENT} robot [NC,or] RewriteCond %{HTTP_USER_AGENT} S(?:craper|istrix|phospholipids) [NC,or] RewriteCond %{HTTP_USER_AGENT} The(?:get|(?:in(32|http))) [NC] RewriteRule .? - [F]
- Limit attempts to spam comments and guestbooks
A hacking attempt widespread, contiste groped in, the case, inserting spam in the comments or in the guest book, leaving as a useful reminder messages about blue pills, or on how to lose 50kg in 10 days. 🙂
These efforts have in common the fact of attack, to the end of the shift guestbook, codes of the type ++++++++result+chosen+nickname+”nomeacaso”;+sent
# Limit hacking attempts type: +result:+chosen nickname "acqqicny06";+sent; PLM o = 0 RewriteRule ^(.*)result:(.*)$ - [F,NC] RewriteRule ^(.*)\+\[PLM=0]\[n]\+get(.*)$ - [F] RewriteRule ^(.*)\+\[PLM=0]\+get(.*)$ - [F]
What to put in the robots.txt file
- Prevent crawling of unnecessary folders or dangerous
E'inutile that googlebot, bingbot or other crawlers to go “dig” in folders that do not contain any useful content to the visitor. Think of the folders containing plugin, script vari…
in this case, exclude some unnecessary folders WordPress (/blog), avoid creating duplicate in Coppermine Gallery (/Gallery), and exclude some folders unnecessary for the user (In. the folder containing the scripts to the minify, o lightbox).
User-Agent: * Allow: / Disallow: /cgi-bin/ Disallow: /min/ Disallow: /lightbox/ Disallow: /gallery/addfav.php?* Disallow: /gallery/login.php?* Disallow: /gallery/thumbnails.php?album=*favpics* Disallow: /gallery/thumbnails.php?album=*lastup* Disallow: /gallery/thumbnails.php?album=*lastcom* Disallow: /gallery/thumbnails.php?album=*topn* Disallow: /gallery/thumbnails.php?album=*toprated* Disallow: /gallery/thumbnails.php?album=*search* Disallow: /gallery/thumbnails.php?album=*slideshow Disallow: /blog/wp-admin/ Disallow: /blog/wp-includes/ Disallow: /blog/wp-content/plugins/ Disallow: /blog/wp-content/cache/ Disallow: /blog/wp-content/themes/ Disallow: /blog/wp-content/languages/ Disallow: /blog/wp-content/wptouch-data/ Disallow: /blog/trackback/ Disallow: /blog/*/trackback/ Disallow: /blog/feed/ Disallow: /blog/*/feed/ Disallow: /blog/wp-login.php Disallow: /blog/wp-signup.php
- Block the few bot “unnecessary” that respect the robots.txt file
Molti bot “malicious” absolutely do not respect the robots.txt file, why we excluded them just above, thanks to the htaccess file, identifying and blocking their user agent.
luckily, some of these, are more “correct”, and need a simple rule in robots.txt.
# Block due to SEO or pseudo-SEO which is not useful to me. User-Agent: AhrefsBot Disallow: / User-Agent: Ezooms Disallow: / User-Agent: Exabot Disallow: / User-Agent: MJ12bot Disallow: / User-Agent: CCBot Disallow: / User-Agent: meanpathbot Disallow: / User-Agent: SearchmetricsBot Disallow: / User-Agent: Baiduspider User-agent: Baiduspider-video User-agent: Baiduspider-image Disallow: / User-Agent: Sogou Disallow: /
Thanks to these simple rules, found here and there on StackExchange, webmasterworld, google groups or personal experience, I excluded the vast majority of traffic useless and potentially harmful.
E'tutto! Let me know if you have benefited, as I drew I. 🙂