Robots.txt Webflow | How to Configure It

Article

Webflow

How to configure a Robots.txt file on Webflow to optimize your SEO?

Webflow is a versatile tool for achieving good organic search engine optimization (SEO). The software comes with a built-in SEO assistance tool. Knowing how to properly configure its various settings will give you every chance to rank your site well for relevant Google queries.

‍

Robots.txt files are among the settings you can adjust that help generate traffic to your site. They ensure the proper functioning of search engine content analysis bots. If configured correctly, this will allow you to improve your SEO score.

‍

In this article, you'll find everything you need to know to configure

your Robots.txt file correctly on Webflow.

I. What is a robots.txt file?

A robots.txt file is a text file attached to your website. It controls whether bots that index your pages in various search engines are allowed to access them.

‍

This allows you to hide some of your pages from Google,

but they will always remain available to users.

II. What is the purpose of adding a robots.txt file?

To achieve the best possible SEO on various search engines, it is essential to offer pages that provide the most value to users. It can therefore be useful to hide pages that are under construction or have bugs.

‍

For a site with a very large number of pages, the robots that "crawl" your site have a time limit to allocate to it. ‍

From an SEO perspective, it is therefore effective to prioritize your most valuable pages.

‍

As a reminder, "crawling" is the action performed by robots to determine the content of a web page. Thanks to this analysis, search engines can respond more accurately to user queries.

III. What content should be included in a robots.txt file?

A robots.txt file is divided into 2 parts.

1. Targeted robots

First, we will specify the targeted "agent." The targeted "agent" is the robot with which we wish to communicate. For example, if we want to hide pages from Google's robots, the agent would be:

User-agent: Googlebot

In most cases, it will be preferable to inform all robots that crawl the web, such as those from Yahoo or Bing, for example. To target all possible agents, you must use an asterisk (*).

The robots.txt format will therefore be:

User-agent: *

2. Content to allow or hide

Once robots are targeted, you need to specify the pages you want to hide.

To do this, three options are available to you:

a. Allow crawling of specific identified pages

The first method involves allowing only certain pages to be visited by Google bots. The bot will therefore only be able to crawl the specified pages and will not see any others. To do this, specify the slug of the page to be allowed after writing "Allow"

As a reminder, a page's slug is the part of the URL that corresponds to it. Let's take the example of a site called "mysite.fr".

The URL for the contact page will be " https://www.monsite.fr/contact/". The slug for the contact page is therefore "/contact".

In the case of allowing a "Contact" page, the text to write will therefore be:

Allow: /contact

b. Disallow crawling of specific identified pages

The second method involves disallowing only certain pages from being visited by Google bots. We recommend this method. It helps prevent oversights that could lead to hiding pages you would have wanted to highlight..

‍

Here, we'll follow a similar structure, by writing the page's slug after the keyword 'Disallow'.

To use the same example, hiding a 'contact' page would be done by writing:

Disallow: /contact

c. Combining the two aforementioned methods

By combining the two previous methods, you can target pages precisely. For example, you can disallow access to a folder, but allow a specific page within it.

‍

Let's take the example of a 'top 10 tips' article within a blog. If you want to disallow access to the entire blog but allow access to that specific article, you can write:

Disallow: /blog
Allow: /blog/top10astuces

To discover all available options, explore Google's documentation on the robots.txt file!

IV. How to add a robots.txt file on Webflow?

To add a robots.txt file on Webflow, go to your site settings.

‍

You will find in the "SEO" tab, under the "Indexing" section, the text area to modify your robots.txt file

Once your file is written, don't forget to click "Save changes" to save your modifications. Then click "Publish" to make your robots.txt file effective online!

V. Our Tips

1. Allow crawling of all your pages

For a site with fewer than 10,000 pages, we recommend allowing all pages for all robots. This practice helps prevent pages from being poorly indexed, which would severely harm your site from an SEO perspective.

To apply this recommendation, you can copy the following code:

User-agent: *
Disallow:

2. Hide your statistics from competitors

Beyond Google and Yahoo bots, many analytics tools also create bots that 'crawl' websites. The goal of this practice is to obtain data equivalent to Google Analytics, which is normally private.

‍

‍This data reveals a lot of information such as traffic density, its origin, bounce rate... so much information that allows for identifying SEO strategies and gaining an advantage over competitors.

It's difficult to block all analytics bots due to their diversity. Moreover, new bots are constantly being created.

‍

Nevertheless, we provide you with a non-exhaustive list of agents allowing you to hide your data from the most frequently used crawl bots:

User-agent: AhrefsBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: Rogerbot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: ScoutJet
Disallow: /

User-agent: sistrix
Disallow: /

User-agent: SearchmetricsBot
Disallow: /

User-agent: SEOkicks-Robot
Disallow: /

User-agent: Lipperhey Spider
Disallow: /

User-agent: Exabot
Disallow: /

User-agent: NCBot
Disallow: /

User-agent: BacklinkCrawler
Disallow: /

User-agent: archive.org_bot
Disallow: /

User-agent: meanpathbot
Disallow: /

User-agent: PagesInventory
Disallow: /

User-agent: Aboundexbot
Disallow: /

User-agent: spbot
Disallow: /

User-agent: linkdexbot
Disallow: /

User-agent: Gigabot
Disallow: /

User-agent: dotbot
Disallow: /

User-agent: Nutch
Disallow: /

User-agent: BLEXBot
Disallow: /

User-agent: Ezooms
Disallow: /

User-agent: Majestic-12
Disallow: /

User-agent: Majestic-SEO
Disallow: /

User-agent: DSearch
Disallow: /

User-agent: MJ12
Disallow: /

User-agent: BlekkoBot
Disallow: /

User-agent: NerdyBot
Disallow: /

User-agent: JamesBOT
Disallow: /

User-agent: TinEye
Disallow: /

User-agent: Baiduspider
Disallow: /

User-agent: serpstat
Disallow: /

User-agent: spyfu
Disallow: /