Robots.txt file and disallowing irrelevant folders - On Premise – Configurations & Setup - On Premise – Configurations & Setup - Progress Community
 On Premise – Configurations & Setup

Robots.txt file and disallowing irrelevant folders

  • Robots.txt file and disallowing irrelevant folders
  • I'm new to using the sitefinity deployment environment and I want to disallow access to folders that are irrelevant that don't need to be crawled using the robots.txt file.   

    1.  Does anyone have any best practices or info regarding what folders to prevent being crawled (script and code folders, etc.)

    2.  Also, what specific folder should the file be placed in, obviously the root folder, but what name does it go by if any? 

    Thanks,

    lk

  • Hi Liz,

    The file should be named robots.txt and it must be in the root of your site. Basically there are no folders or files you need to list in there. Search engines will figure it out themselves. If you have pages that are not relevant to be found in search engines (e.g. your error page or iframes), you can optionally list them. Also you can add a line to point to your XML Sitemap if you have one. So then robots.txt looks like this, for example:

    Sitemap: www.example.com/sitemapxml.xml

    User-agent: *

  • Hi Arno

    I understand robots.txt but I am not totally familiar with sitefinity on windows.  I typically use apache.  So my question is the same, what folders are irrelevant? This does make a difference when it comes to seo.  For example in wordpress environments there are folders that have code and scripts we do not want the search engines crawling, was curious if sitefinity is similar.  There should be best practices info on this issue I would think.

    thanks.

  • Hi Liz,

    There's really no need to add such folders to robots.txt, regardless of which CMS or operating system you're using. Search engines start crawling at your homepage, or even another page, and simply follow links from there. They are not interested in anything else but pages and assets like images, videos, etc.  Excluding a folder like \bin or \Templates does not make a difference.

    The less you exclude the better it is. A few years ago there was some concern that excluding css files was actually harmful as search engines could think you had something to hide. Not sure if that's still the case, but in any case there is no need to block specific folders or scripts.

    Perhaps someone else can provide a list of folders you can exclude. I don't have one as I never did it, and never had an issue with search engines either ;-)

    Note that robots.txt is just a suggestion to search engines. It does not guarantee anything. If there are files that are not to been seen to the public, they need to be password protected.