First, what is “Discourage search engines from indexing this site”?
Discourage search engines from indexing this website, simply means that request search engine to not list website in any of the search results. Search engine will still crawl your website whenever it finds link or reference to your website on some other website. So, Preventing Crawling and preventing indexing are two different topics, here we are discussing about the indexing of the website only.
Now, why would someone want that after all, below are certain cases for which you would be needing such settings:
Unfinished/ un-launched website: Your website is still under construction and you are experimenting with pages, content and design, basically trial and error, so avoiding getting visitors from search engine at this point.
Restricted websites: You want the website to be restricted, password protected or invite only and hence do not plan to get listed on the search results.
Duplicate or testing sites: You have created a duplicate of the original website for testing some feature or other stuff like theme or updates, hence you do not want to list the duplicate or testing website in the search results.
Robot.txt vs metatags
Most of the results “discourage search engines from indexing” on google will discuss about the robot.txt and it usage, here we are not going to discuss about it, as robot.txt is not in play here / or not in use here.
With WordPress 5.x , where in Selecting “discourage search engines from indexing” does not generate the robot.txt with disallow content settings as know.
As disallowing search engines from accessing a site in the robots.txt file can result in search engines listing a site without content. Hence when selecting “Discourage search engines form indexing the site” under WordPress => Settings => Reading, now adds meta tag as below in head / header of the WordPress website.
Meta tag added to webpages
Below Metatag is added to webpages when above settings is done.
“<meta name=’robots’ content=’noindex,nofollow’ />”
We are requesting the search engine to not index pages and content and not to follow links on the website and list them in search results, and it is up to the search engine to honor our request. In a website there are many contents which do not use header like Images, Scripts, CSS and all the other stuff, hence both no follow and no index in the head.