Overview
Web Robots, also known as Web Wanderers, Crawlers, or Spiders, are programs that traverse the web automatically. Search engines, such as Google or Yahoo, use them to index the web content of your site. However they can also be used inappropriately, such as spammers using them to scan for email addresses. The following are a few methods you can use to prevent this.
STATEMENT OF SUPPORT:
Please keep in mind that troubleshooting the configuration/functionality of third-party applications is not covered by our statement of support. These resources were provided as a courtesy to assist you to the extent of our abilities. For more information on our statement of support, feel free to click here.
Instructions
WordPress
If you are using WordPress as your CMS, there is a feature built-in to allow you to discourage search engines from indexing your site.
- Log into your WordPress Admin Dashboard.
- Click on Settings >> Reading.
- Ensure that Search Engine Visibility is checked.
As the description implies it will discourage search engines from indexing your site, but it is up to the search engine to honor that request.
Robots.txt
As an alternative to the WordPress solution above, you can also use a robots.txt file to prevent search engine crawlers from requesting your site.
- Use a File Manager File Manager or FTP FTP FTP to navigate to your website's root directory
- Edit the robots.txt file, or create new one if there isn't one currently.
- Enter the following into robots.txt:
User-agent: * Disallow: /
- Save your changes. And that's it!
If you'd like some more advanced directives for robots.txt, feel free to check out the information below. Remember to remove the # sign for any command you wish the robots to follow, but be sure not to un-comment the commands description. For details on all the rules you can create please visit: http://www.robotstxt.org/
# Example robots.txt from (mt) Media Temple
# Learn more at http://mediatemple.net
# (mt) Forums - http://forum.mediatemple.net/
# (mt) System Status - http://status.mediatemple.net
# (mt) Statement of Support - http://mediatemple.net/support/statement/
# How do I check that my robots.txt file is working as expected
# http://www.google.com/support/webmasters/bin/answer.pyanswer=35237
# For a list of Robots please visit: http://www.robotstxt.org/db.html
# Instructions
# Remove the "#" to uncomment any line that you wish to use, but be sure not to uncomment the Description.
# Grant Robots Access
#######################################################################################
# This example allows all robots to visit all files because the wildcard "*" specifies all robots:
#User-agent: *
#Disallow:
#To allow a single robot you would use the following:
#User-agent: Google
#Disallow:
#User-agent: *
#Disallow: /
# Deny Robots Access
#######################################################################################
# This example keeps all robots out:
#User-agent: *
#Disallow: /
# The next is an example that tells all crawlers not to enter into four directories of a website:
#User-agent: *
#Disallow: /cgi-bin/
#Disallow: /images/
#Disallow: /tmp/
#Disallow: /private/
# Example that tells a specific crawler not to enter one specific directory:
#User-agent: BadBot
#Disallow: /private/
# Example that tells all crawlers not to enter one specific file called foo.html
#User-agent: *
#Disallow: /domains/example.com/html//var/www/vhosts/example.com/httpdocs/
Password protect your pages/website
Search engines and crawlers do not have access to password protected pages. As such it's quite effective in index prevention. However, this method will require users to enter a password in order to view your content. This can be implemented on individual pages, or across your whole site. For detailed instructions, feel free to visit the article below:
Comments