SEO

What is the robot.txt file in SEO? How does it work?

665 Share

Published: 5 OCT 2022

What is the robot.txt file in SEO? How does it work?

Crawler/Google

Plain text ASCII character encoded files basically created by web-developers to instruct web search engine robots on how to crawl pages on their website.

The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includes directives like meta robots, as well as page-, subdirectory-, or site-wide instructions for how search engines should treat links such as “follow” or “nofollow” tags

In practice, robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website.

Basic standard:

User-agent: *
Disallow: /admin
Disallow: /user/signup
Disallow: /privacy-policy

Specifying a crawl agent:

Here we disallow crawling for all robots, except Twitter’s fetching for caching.

we further specify which directories to crawl. in this case assets like images

User-agent: Twitterbot
Disallow: *

Allow: /images
Allow: /archives

If you mistakenly submit disallowed links in your sitemap.xml file to Google webmaster console or bing, they will automatically be unindexed if they are specified in the robot.txt file.

What is the robot.txt file in SEO? How does it work?

Javascript

Web-hosting