{"id":24,"date":"2018-04-13T11:11:56","date_gmt":"2018-04-13T11:11:56","guid":{"rendered":"http:\/\/www.webtrainingindia.com\/blog\/?p=24"},"modified":"2022-05-02T18:22:10","modified_gmt":"2022-05-02T12:52:10","slug":"24","status":"publish","type":"post","link":"https:\/\/www.webtrainingindia.com\/blog\/24\/","title":{"rendered":"Robots.txt"},"content":{"rendered":"<p>It is a txt file created for robots by the webmasters to direct web robots how to crawl through web pages of a website. These robots are used by search engines to crawl on websites, search engines use their robots to crawl through websites as to seek whether sites follow the standards and also to index pages content, these robots also help search engines to store data into their databases. Robots.txt file is a part of Robot Exclusion Protocol (REP), a group pf web standards which instructs robots how to crawl through web. These robots are called out by various names by their search engines like:<\/p>\n<ol>\n<li>GoogleBot by Google<\/li>\n<li>Baidu Spider by Baidu<\/li>\n<li>MSNBot\/BingBot by Bing<\/li>\n<li>YandexBot by Yandex<\/li>\n<li>Soso Spider by Soso<\/li>\n<li>ExaBot by 3ds<\/li>\n<li>Sogou Spider by Sogou<\/li>\n<li>Google Plus Share by Google<\/li>\n<li>Facebook External Hit by Facebook<\/li>\n<li>Google Feedfetcher by Google<\/li>\n<\/ol>\n<p>The crawl function is there to \u201callow\u201d or \u201cdisallow\u201d bots to read all the sections or particular sections of a website. This function helps webmasters to save themselves from badbots, badbots are generally spammers, email crawlers, etc., these bots are used to steal information and spam the sites.<\/p>\n<p><strong>Basic format:<\/strong><\/p>\n<p>User-agent: [user-agent name]<br \/>\nDisallow: [URL string not to be crawled]<\/p>\n<p>With these two lines together it is considered as complete robots.txt file. A robot file can contain multiple lines of <a href=\"https:\/\/omgomgomg5j4yrr4mjdv3h5c5xfvxtqqs2in7smi65mjps7wvkmqmtqd.cc\/\">omg<\/a> user agents and their directives.<\/p>\n<p><strong>Blocking web crawlers from all the content of website:<\/strong><\/p>\n<p>User-agent: *<br \/>\nDisallow: \/<\/p>\n<p><strong>Allow web crawlers for all the content<\/strong><\/p>\n<p>User-agent: *<br \/>\nDisallow:<\/p>\n<p><strong>Block a crawler for particular content:<\/strong><\/p>\n<p>User-agent: *<br \/>\nDisallow: \/example-subfolder\/<br \/>\nBlocking specific bot from crawling the website:<\/p>\n<p>User-agent: Badbot<br \/>\nDisallow: \/<\/p>\n<p><strong>Blocking crawler from visiting specific page of website:<\/strong><\/p>\n<p>User-agent:<br \/>\nDisallow: \/example-subfolder\/blocked-page.html<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It is a txt file created for robots by the webmasters to direct web robots how to crawl through web pages of a website. These robots are used by search engines to crawl on websites, search engines use their robots to crawl through websites as to seek whether sites follow the standards and also to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":25,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-24","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-newsandupdates"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.5.1 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Robots.txt, Web Crawler, Standards to crawl a web page<\/title>\n<meta name=\"description\" content=\"Robots.txt file is a part of Robot Exclusion Protocol (REP), a group of web standards which instruct robots how to crawl through the web. These robots are called out by various names by their search engines.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.webtrainingindia.com\/blog\/24\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Robots.txt, Web Crawler, Standards to crawl a web page\" \/>\n<meta property=\"og:description\" content=\"Robots.txt file is a part of Robot Exclusion Protocol (REP), a group of web standards which instruct robots how to crawl through the web. These robots are called out by various names by their search engines.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.webtrainingindia.com\/blog\/24\/\" \/>\n<meta property=\"og:site_name\" content=\"Web Training India\" \/>\n<meta property=\"article:published_time\" content=\"2018-04-13T11:11:56+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-05-02T12:52:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.webtrainingindia.com\/blog\/wp-content\/uploads\/2018\/04\/robots.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/24\/\",\"url\":\"https:\/\/www.webtrainingindia.com\/blog\/24\/\",\"name\":\"Robots.txt, Web Crawler, Standards to crawl a web page\",\"isPartOf\":{\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/#website\"},\"datePublished\":\"2018-04-13T11:11:56+00:00\",\"dateModified\":\"2022-05-02T12:52:10+00:00\",\"author\":{\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/#\/schema\/person\/11bc8eab55d0448c1ddb1abeb075dded\"},\"description\":\"Robots.txt file is a part of Robot Exclusion Protocol (REP), a group of web standards which instruct robots how to crawl through the web. These robots are called out by various names by their search engines.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/24\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.webtrainingindia.com\/blog\/24\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/24\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.webtrainingindia.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Robots.txt\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/#website\",\"url\":\"https:\/\/www.webtrainingindia.com\/blog\/\",\"name\":\"Web Training India\",\"description\":\"Web Training India - A Learning Destination\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.webtrainingindia.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/#\/schema\/person\/11bc8eab55d0448c1ddb1abeb075dded\",\"name\":\"admin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.webtrainingindia.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/11788b794fbcf70c821a71d41ab3accb9e488988c2a2e9a4a286798fda96328c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/11788b794fbcf70c821a71d41ab3accb9e488988c2a2e9a4a286798fda96328c?s=96&d=mm&r=g\",\"caption\":\"admin\"},\"url\":\"https:\/\/www.webtrainingindia.com\/blog\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Robots.txt, Web Crawler, Standards to crawl a web page","description":"Robots.txt file is a part of Robot Exclusion Protocol (REP), a group of web standards which instruct robots how to crawl through the web. These robots are called out by various names by their search engines.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.webtrainingindia.com\/blog\/24\/","og_locale":"en_US","og_type":"article","og_title":"Robots.txt, Web Crawler, Standards to crawl a web page","og_description":"Robots.txt file is a part of Robot Exclusion Protocol (REP), a group of web standards which instruct robots how to crawl through the web. These robots are called out by various names by their search engines.","og_url":"https:\/\/www.webtrainingindia.com\/blog\/24\/","og_site_name":"Web Training India","article_published_time":"2018-04-13T11:11:56+00:00","article_modified_time":"2022-05-02T12:52:10+00:00","og_image":[{"width":800,"height":600,"url":"https:\/\/www.webtrainingindia.com\/blog\/wp-content\/uploads\/2018\/04\/robots.jpg","type":"image\/jpeg"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.webtrainingindia.com\/blog\/24\/","url":"https:\/\/www.webtrainingindia.com\/blog\/24\/","name":"Robots.txt, Web Crawler, Standards to crawl a web page","isPartOf":{"@id":"https:\/\/www.webtrainingindia.com\/blog\/#website"},"datePublished":"2018-04-13T11:11:56+00:00","dateModified":"2022-05-02T12:52:10+00:00","author":{"@id":"https:\/\/www.webtrainingindia.com\/blog\/#\/schema\/person\/11bc8eab55d0448c1ddb1abeb075dded"},"description":"Robots.txt file is a part of Robot Exclusion Protocol (REP), a group of web standards which instruct robots how to crawl through the web. These robots are called out by various names by their search engines.","breadcrumb":{"@id":"https:\/\/www.webtrainingindia.com\/blog\/24\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.webtrainingindia.com\/blog\/24\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.webtrainingindia.com\/blog\/24\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.webtrainingindia.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Robots.txt"}]},{"@type":"WebSite","@id":"https:\/\/www.webtrainingindia.com\/blog\/#website","url":"https:\/\/www.webtrainingindia.com\/blog\/","name":"Web Training India","description":"Web Training India - A Learning Destination","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.webtrainingindia.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.webtrainingindia.com\/blog\/#\/schema\/person\/11bc8eab55d0448c1ddb1abeb075dded","name":"admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.webtrainingindia.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/11788b794fbcf70c821a71d41ab3accb9e488988c2a2e9a4a286798fda96328c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/11788b794fbcf70c821a71d41ab3accb9e488988c2a2e9a4a286798fda96328c?s=96&d=mm&r=g","caption":"admin"},"url":"https:\/\/www.webtrainingindia.com\/blog\/author\/admin\/"}]}},"jetpack_featured_media_url":"https:\/\/www.webtrainingindia.com\/blog\/wp-content\/uploads\/2018\/04\/robots.jpg","_links":{"self":[{"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/posts\/24","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/comments?post=24"}],"version-history":[{"count":5,"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/posts\/24\/revisions"}],"predecessor-version":[{"id":1372,"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/posts\/24\/revisions\/1372"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/media\/25"}],"wp:attachment":[{"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/media?parent=24"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/categories?post=24"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webtrainingindia.com\/blog\/wp-json\/wp\/v2\/tags?post=24"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}