How To Create A Robots.txt File

This tutorial explains what a robots.txt file is and how to create a robots.txt file and upload it to your server.

How To Create A Robots.txt FileAlways backup your WordPress site before making changes to important files and make copies of any files that you are about to modify.

If you don’t want to edit WordPress files yourself, ask an experienced person to help you.

***

Important

WordPress generates a virtual robots.txt file if there is no physical robots.txt file found on your server. In this tutorial, you will learn how to create and upload a physical robots.txt file to your server to override the virtual file, how to edit your robots.txt file and how to test your robots.txt file instructions to search engines.

robots.txt File

A robots.txt file instructs search engine spiders and co-operative ‘bots’ about the parts of your website that you want to keep private, and which areas they can and cannot access. Bots, or ‘robots’, are programs used by search engines like Google and other software to gather information for their databases.

About robots.txt

The following entry is sourced from robotstxt.org:

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

It works likes this: a robot wants to visit a website URL, say http://www.yourdomain.com/welcome.php. Before it does so, it firsts checks for http://www.yourdomain.com/robots.txt, and finds:

How To Create A Robots.txt File

The “User-agent: *” part contains an asterisk.

As stated in Wikipedia,

In software, a wildcard character is a single character, such as an asterisk (*), used to represent a number of characters or an empty string. It is often used in file searches so the full name need not be typed.

(Source: Wikipedia)

This means that this section applies to all search robots.

The “Disallow: /” part tells the robot that it should not visit any pages on the site.

There are two important things to consider when using /robots.txt:

  • Robots can ignore your /robots.txt instructions. This includes malware robots that scan the web for security vulnerabilities and email address harvesters used by spammers.
  • The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don’t want robots to use.

So, don’t use /robots.txt to try and hide information.

If you need to protect content on your WordPress site, see the tutorial below:

Accessing Your Robots File

Your robots.txt file must be located in the same directory as your WordPress installation …

Location of robots.txt file

(Location of robots.txt file)

To view your site’s robots.txt file after it has been installed, simply type your site’s URL into your browser and add “robots.txt” to the end of the URL, e.g.:

http://www.yoursite.com/robots.txt

This will bring up your robots.txt file …

robots.txt file

(robots.txt file)

How To Configure Your Robots File

Typically, with a new WordPress installation, if you configure your WordPress Reading Settings to allow search engines to index your site, your robots.txt file will look something like this …

WordPress Reading Settings - Search Engine Visibility Allowed

(WordPress Reading Settings – Search Engine Visibility Allowed)

If you disable the ‘Search Engine Visibility’ option in the Reading settings, the instructions in your robots.txt file will be modified to discourage search engines from crawling your site …

WordPress Reading Settings - Search Engine Visibility Disallowed

(WordPress Reading Settings – Search Engine Visibility Disallowed)

A robots.txt file can be configured in many different ways, depending on what instructions you want to give to search engine spiders and other visiting bots (robots).

Here are just some examples …

Examples of robots.txt file instructions

(Examples of robots.txt file instructions. Source: Wikipedia)

How To Create A Robots.txt File – Step-By-Step Tutorial

Important Info

Before creating a robots.txt file, we recommend creating an XML sitemap for your site.

  • If you plan to use a WordPress SEO plugin like Yoast SEO (recommended), the plugin will automatically create an XML sitemap for your website and add these to your WordPress installation.
  • If you do not plan to use the Yoast SEO plugin, you can use a standalone WordPress XML Sitemap plugin like Google XML Sitemaps to generate an XML sitemap for your site.

Let’s show you how to manually create, configure and upload a robots.txt file to your server that performs a number of important functions …

robots.txt file

(robots.txt file)

At the very top of the robots.txt file (i.e. the first line) you will add a link to your site’s XML sitemap. This sitemap enables search engines to find and index all of your site’s pages faster.

Sitemap: http://www.yoursite.org/sitemap.xml

Under the sitemap entry, a recommended entry is Google’s ‘Mediapartners’ agent. By allowing this agent access, you prevent white space or public service ads from appearing on your pages if you have added Google AdSense to your site. This happens if Google has not had the opportunity to index a page on your site with Google AdSense ads or determine what the page is about yet.

The entry looks like this:

User-agent: Mediapartners-Google Disallow:

The rest of the text in your file can be organized however you like. In the sample robots.txt file provided below, you’ll see that the file text has been organized into segments or groupings for ease of readability and management.

In our sample robots.txt file, two groupings are preceded by “User-agent: *” followed by a list of directories or files with each list item preceded by “Disallow“. This represents the directories and files we don’t want search engines to access.

For example, if you have a folder on your site called “private” that you do not want search robots to crawl, then you would add the following line into your robots.txt file:

Disallow: /private/

Of particular note are the image files – since searches for some images (e.g. system images) can result in non-profitable access to your site, you may want to disallow search engines from accessing these.

What follows these sections in our example robots.txt file is a long list of user-agents that are prevented from accessing the site. These are selected not necessarily because they are ‘bad’ but because they would simply use up your server’s bandwidth and other resources if allowed to access and spider your site. The entries for these items look similar to the example below:

User-agent: ia_archiver Disallow: /

Having “/” as a “Disallow” value prevents any access to any file or directory on your site.

Adding A Robots.txt File To Your Site

Click the share button below to download sample robots.txt files that you can use for your own site:

Important Info

If using the sample robots.txt file provided above, feel free to copy and use the file as is, but make sure to make the changes indicated in the file as shown below:

How To Create A Robots.txt File

Modify the portions in red (i.e. replace yourdomain.com with your URL), and remove the instructions (including the brackets), then resave and reupload the file to your server (see the next section for details on adding a robots.txt file to your server).

Editing Your Robots.txt File

If using the sample robots.txt file provided, either download the zip file provided above or click on the link provided to view the robots.txt file in a new browser window.

Do one of the following:

Option #1

  1. Click “File” > “Save Page As …”
  2. Save the page to a location on your hard drive. The file will be saved as “robots.txt”
  3. Open up the file and change the first line of the file as per the above instructions.
  4. Resave and upload the robots.txt file via FTP to the root directory of your site.

Option #2

  1. Select all of the content in the browser window and copy it to your clipboard.
  2. Create a new plain text file (e.g. notepad).
  3. Paste the contents of your clipboard into the text file.
  4. Save the file as “robots.txt”
  5. Open up the file and change the first line of the file as per the above instructions.
  6. Resave and upload via FTP to the root directory of your site.

After uploading the robots.txt file to your server, verify that it has been uploaded correctly (i.e. open a browser and go to http://www.yoursite.com/robots.txt).

Also, verify the following:

  • All the folders you want disallowed have been entered correctly
  • “Disallow” is correctly defined. For example, “Disallow: /” means no access at all, “Disallow:” (with no proceeding value) means full access.
  • You have not left spaces or extra characters. Search engines are very particular and may not respond in the way you expect them to. For example, too many spaces between groupings is not advised (one is ok). Also, special characters like ‘#’ have special meaning in the robots.txt file so be sure to leave those out.

Testing Your Robots.txt File

You can test your robots.txt file using a checking tool like the robots.txt Tester provided by Google Search Console (formerly Google Webmaster Tools).

Important

To use the robots.txt file checker tool you will need to have a Google Search Console  account set up.

If you haven’t set up your webmaster accounts yet, see the tutorial below:

To test a site’s robots.txt file, do the following:

Log into your Google Search Console account …

Google Search Console Login

(Google Search Console Login)

On the Google Search Console home page, click the site you want to check …

Google Search Console Home Page

(Google Search Console Home Page)

In the Search Console menu, select Crawl > robots.txt Tester …

Crawl > robots.txt Tester

(Crawl > robots.txt Tester)

This brings you to the robots.txt Tester screen …

robots.txt Tester screen

(robots.txt Tester screen)

If your site has been set up correctly and already indexed by Google, the tool will populate the fields on this screen with information about your site.

You can view the content of your robots.txt file and check for errors or warnings …

Inspect your robots.txt file content

(Inspect your robots.txt file content)

You can also enter URLs into a URL test field, select the user-agents you want to test against from the ‘User-agents’ drop-down menu list (e.g. Googlebot, Googlebot mobile, Mediapartners-Google, etc.) …

Test your URLs to see if any are blocked

(Test your URLs to see if any are blocked)

Click the Test button to analyze your site and view your results.

The tool will show you if the URL or directory is allowed (i.e. indexable by Google) …

Allowed - Google will crawl this URL or directory

(Allowed – Google will crawl this URL or directory)

Or if the URL or directory is not allowed …

Blocked - Google will not crawl this URL or directory

(Blocked – Google will not crawl this URL or directory)

Useful Information

Any changes you make in this tool will not be saved. To save any changes, you’ll need to copy the contents and paste them into your robots.txt file.

Useful Tip

Refer to the Google Webmaster Tools Help documentation for more details on how to use the tools and how to analyze your results.

Robots.txt – Additional Information

If you have a WordPress SEO Plugin like Yoast SEO installed and discourage search engines from indexing your site (see How To Configure WordPress Reading Settings), you will block access to robots and get an error message like the one shown below …

Blocking access to robots can affect some SEO plugin settings

(Blocking access to robots can affect some SEO plugin settings)

This makes sense … why would you want to optimize your website for search engines if you are instructing WordPress to block access to search engines from visitng your site?

To learn more about configuring WordPress SEO Plugins, see the tutorial below:

To learn more about configuring your WordPress Reading Settings, see the tutorial below:

To learn how to effectively block your WordPress site from being accessed by search engines, see the tutorial below:

For technical information on the benefits and advantages of using a robots.txt file, visit this site:

Congratulations! Now you know how to add a robots.txt file to prevent search engines and compliant bots from indexing pages or sections of your site.

How To Create A Robots.txt File

(Source: Pixabay)

***

"This is AMAZING! I had learnt about how to use WordPress previously, but this covers absolutely everything and more!! Incredible value! Thank you!" - Monique, Warrior Forum

***

Recommended Video Courses For WordPress Users

Using Layers WPUsing Layers WP

Learn how to use Layers WP, a powerful, flexible and user-friendly WordPress theme that lets you build professional and stunning websites easily using point-and-click.

More info: Using Layers WP

Recommended Video Courses For WordPress Users

How To Set Up A Forum On WordPressHow To Set Up A Forum On WordPress

Learn how to set up a forum for your visitors, prospects, customers, members, or affiliates using WordPress.

More info: How To Set Up A Forum On WordPress

Recommended Video Courses For WordPress Users

How To Use FTPHow To Use FTP

This video course shows you how to use FTP (File Transfer Protocol) to transfer and upload files between your hard drive and your server using a free FTP program called Filezilla.

More info: How To Use FTP

Recommended Video Courses For WordPress Users

How To Use cPanelHow To Use cPanel

cPanel is a powerful and simple-to-use web hosting management software application that gives website owners the ability to quickly and easily manage their servers and websites using a simple and intuitive dashboard.

This video course will teach you how to use the main features of cPanel to manage your web hosting.

More info: How To Use cPanel

Recommended Video Courses For WordPress Users

How To Install, Set Up & Use WordPress MultisiteHow To Install, Set Up & Use WordPress Multisite

Learn how to quickly and securely set up an online community of websites or blogs using the powerful WordPress Multisite feature.

More info: WordPress Multisite

Recommended Video Courses For WordPress Users

WordPress SecurityWordPress Security

Learn how to keep your WordPress site or blog secure and protected from malware, hackers and brute-force attacks.

More info: WordPress Security

Recommended Video Courses For WordPress Users

Build A WordPress Site ... Fast!Build A WordPress Site ... Fast!

Need a new website or blog in a hurry?

This video course shows you how to quickly install and configure a WordPress website or blog on your own domain name and be up and running in no time!

More info: Build A WordPress Site ... Fast!

Recommended Video Courses For WordPress Users

Using Password ManagersUsing Password Managers

Password Managers provide an easy and secure way to keep track of all your passwords. This video course shows you how to use two FREE powerful password management tools.

More info: Using Password Managers

Recommended Video Courses For WordPress Users

Sell Online With PayPalSell Online With PayPal

This video course shows you how to set up PayPal on your website to receive payments online, with additional tips on using PayPal payment tools.

More info: Sell Online With PayPal

Recommended Video Courses For WordPress Users

How To Add A WordPress Site To FacebookHow To Add A WordPress Site To Facebook

Learn how to add your WordPress site to Facebook. Learn how to obtain and set up a free SSL (Secure Sockets Layer) certificate for your WordPress site, how to change your site protocol from "http" to "https" (secure site) and how to set up a Facebook app and import your entire WordPress site into Facebook.

More info: How To Add A WordPress Site To Facebook

Author: Martin Aranovitch

Martin Aranovitch is the founder of WPCompendium.org and the author of The Small Business Digital Manager. WPCompendium.org provides hundreds of FREE tutorials that show you how to use WordPress to grow your business online with no coding skills required! Get our FREE "101+ WordPress Tips, Tricks & Hacks For Non-Techies" e-course with loads of useful WordPress tips!

Originally published as How To Create A Robots.txt File.