Crawling and indexing are two different concepts which we really need to understand. We know how to see robots.txt but there is misconception about robots.txt. Most of the peoples consider robots.txt is for “noidex” or “doindex”. But please understand robots.txt is not for used for indexing purpose.
It is mainly used to instruct Google bot or any other third party search engines bots to crawl certain part of the website. It may be image, text or any multimedia which we use on web page, apart from this there is also sensitive data specially in ecommerce business where peoples use their credit or debit card information for transaction purpose. To hide those sensitive data from google bot or any bot which want to crawl our website, we recommend to any website we must use robots.txt file.
How to check whether robots.txt is available for any website.
Robots.txt mainly resides mainly at the root of the domain.
- So, we can check any domain robots.txt file at www.xyz.com/robots.txt. How to implement robots.txt on our website
- If you are WordPress user, you can use various plugins to configure your robots.txt
- If you are non-WordPress user, then you can simply use cPanel of your domain. And you add robots.txt file at .htaccess place. Basic Structure of robots.txt User-agent:* Disallow: /wp-admin Disallow: /track/
This will prevent robots from crawling your admin folder followed tracks, comment feed, comments
Summery:
Robots.txt file is mainly used to stop the search engine bots from crawling particular parts of your website. If you want to deindex the already indexed pages of your website, then I would recommend remove those urls from google.
Top comments (0)