This guide to using noindex, nofollow, and disallow will come in handy if your web pages need to be invisible to search engines, indexing robots, and web page crawlers.
There may be times when you need to make your web pages invisible to search engines, indexing robots, and web page crawlers. In these instances, you might consider adding “noindex,” “nofollow,” and/or “disallow” to your web page attributes, tags, metadata, and commands; this includes sites used for development, testing, or staging, or if you want to limit access to pages (e.g., portals login or photo galleries), or if pages or certain links are considered redundant, outdated, archived, or contain trivial content.
This guide will help you understand how to use “noindex,” “nofollow,” and/or “disallow” as part of your website maintenance and management routine.
Syntax examples
Index web pages
The following examples highlight several options and combinations available for metadata tags that can be added within thetag.
This metadata tag will tell all search engines to index your entire website; it will also index all your other webpages.
This metadata tag will instruct search engines to not index this page in particular, but it will crawl through the rest of the web pages on your website.
This metadata tag tells search engines to only index this page and stop crawling any further.
This metadata tag instructs search engines to not index this page and to not crawl any further.
Let’s suppose you only want to block “googlebot” from indexing your website; you would use this syntax.
Linking
You can also use the “nofollow” within specific active links within pages that you may not want to be indexed. The syntax for a nofollow link looks like this anchor tag ColdFusion cfm example.
Robots.txt disallow
You can also use a robots.txt file and put it in your web root directory or other directory depending on your web server configuration. A typical robots.txt file will contain just a few lines of code, which gives the command to robots using what is known as the Robots Exclusion Protocol/Standard. The syntax examples below illustrate several ways of implementing the feature.
This example commands all robots to keep out of your website.
User-agent: *
Disallow:/
This example commands all robots to keep out of specific directories.
User-agent: *
Disallow: /backup/
Disallow: /archive/
Disallow: /cgi-mail/
This example commands all robots to keep out of a specific file.
User-agent: *
Disallow: /any-directory/any-file.htm
You can list multiple, specific robots to keep out of specific or all areas of your website. Several examples are displayed below.
User-agent: badbot
Disallow: /private/
User-agent: anybot-news
Disallow: /
User-agent: googlebot
Disallow: /
Caveat
While these strategies will help you in the quest to managing access, using them does not automatically guarantee that your designated “noindex,” “nofollow,” and/or “disallow” tags or commands will be observed by all search engines, spiders, and crawlers. It might take time for these methods to take effect, especially if pages were previously allowed to be indexed or followed and then are subsequently set to a nofollow or a noindex. You may still see the pages in search engine results because their indexing has not been refreshed or updated recently.