SEO

What is the robot.txt file in SEO? How does it work?

525  
Published: 
What is the robot.txt file in SEO? How does it work?
Crawler/Google

Plain text ASCII character encoded files basically created by web-developers to instruct web search engine robots on how to crawl pages on their website.

The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includes directives like meta robots, as well as page-, subdirectory-, or site-wide instructions for how search engines should treat links such as “follow” or “nofollow” tags

In practice, robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website.

Basic standard:

User-agent: *
Disallow: /admin
Disallow: /user/signup
Disallow: /privacy-policy


Specifying a crawl agent:

Here we disallow crawling for all robots, except Twitter’s fetching for caching.

we further specify which directories to crawl. in this case assets like images

User-agent: Twitterbot
Disallow: *

Allow: /images
Allow: /archives


If you mistakenly submit disallowed links in your sitemap.xml file to Google webmaster console or bing, they will automatically be unindexed if they are specified in the robot.txt file.



SUGGESTED:
Transform your iOS PWA into a native-feeling application.
Bluehost review: Is it a good shared hosting service?
E-nose that guesses a fragrance like Shazam