I dig this image post for the SEO beginners who do not well understand about Robots.txt side. Ok let’s get in to the topic to learn the to block from crawling the pages, images, directories and a whole website from Search Engine Bots.
A robots.txt file is used to restrict your entire site, pages, images and directories from search engine bots which crawls your website.
to specify which URL, Folder, Image Path and Directory to block
User-agent: * (This applies for all bots)
User-agent: Googlebot (This applies to Google Search Engine Bot)
User-agent: Bingbot (This applies to Bing Search Engine Bot)
To block the entire website
User-agent: *
Disallow: /
To allow indexing the whole website
User-agent: *
Disallow:
(OR)
User-agent: *
Allow: /
To block the specific folder
User-agent: *
Disallow: /foldername/
Disallow a folder from Googlebot but allow indexing a specific file in that folder
User-agent: Googlebot
Disallow: /foldername1/
Allow: /foldername1/specificfilename.html
To block a directory
User-agent: *
Disallow: /directoryname/
To block a page
User-agent: *
Disallow: /page-name.html
To block a specific image from Google images
User-agent: Googlebot – Image
Disallow: /images/imagename.jpg
To block all images from Google Images
User-agent: Googlebot – Image
Disallow: /
To block specific image file type
User-agent: Googlebot – Image
Disallow: /*.jpg$
To block all URLs which includes a “?” (Question Mark)
User-agent: *
Disallow: /*?
To block URLs that ends with .html
User-agent: *
Disallow: /*.html$
To block the page from indexing using meta tag
(Or)
Adding Sitemap in Robots.txt File:
Use the below syntax to add a sitemap URL in Robots.txt file
Sitemap: http://www.domainname.com/sitemap.xml
Sitemap: http://www.domainname.com/image-sitemap.xml
tor/”>SEOBOOK Robots.txt File Generator
tor/”>MCANERIN Robots.txt Code Generation Tool
Already interested! Do you have any project to working with?
Get Started