How to configure and use search engine crawlers with either a robots.txt file or an .htaccess file.
The following sections describe the formatting for allowing or disallowing crawlers to access specific folders on your web site.
ATTENTION: Search engine crawlers do not scan the robots.txt file each time they crawl your site, so changes to your robots.txt file might not be read by the search engine for up to a week.
If you are performing development work on your site, and would prefer Google or Bing to not crawl your site, blocking your site from search engines is an option.
If you would like to block all of your folders from search engine crawlers, configure an allow rule.
If your site is experiencing a large amount of traffic, and it appears to be caused by multiple search engine crawlers simultaneously visiting your site, configure a search engine crawler delay.
ATTENTION: Adding a crawl delay to your robots.txt file is considered a non-standard entry, and some search engines do not abide by this rule. You will need to check with the specific search engine you want to delay for specific details.
The following table is a list of search engines and their corresponding bot names:
Search Engines | Search Bot Name |
googlebot | |
Bing | bingbot |
Baidu | baiduspider |
MSN Bot | msnbot |
Yandex.ru | yandex |
All Search Engines | * |
For example, to block Google bot from viewing your /photos folder, the following con figures a line in your robots.txt file:
User-agent: googlebot Disallow: /photos
Depending on the way your website is configured, your robots.txt file might not properly work with search engine crawlers. You can make changes to your .htaccess file instead.
ATTENTION: Search engine crawlers do not scan the robots.txt file each time they crawl your site, so changes to your robots.txt file might not be read by the search engine for as long as a week.
RewriteCond %{HTTP_USER_AGENT} ^[crawler]$ [NC]
RewriteRule .* - [R=403,L]
For example, to block Yandex from crawling any pages of your site, the .htaccess file will look something like this:
RewriteEngine On RewriteCond %{HTTP_USER_AGENT} ^Yandex$ [NC,OR] RewriteRule .* - [R=403,L]
If adding a crawl delay to the robots.txt file was unsuccessful, add the following to your .htaccess file:
SetEnvIf User-Agent [botname] GoAway=1 Order allow,deny Allow from all Deny from env=GoAway
SetEnvIf User-Agent [botname] GoAway=1
Order allow,deny Allow from all
Deny from env=GoAway
For 24-hour assistance any day of the year, contact our support team by email or through your Client Portal.
Our award-winning customer care team is here for you.
Contact Support