Skip to content

Robots.txt Generator


Default - All Robots are:  
    
Crawl-Delay:
    
Sitemap: (leave blank if you don't have) 
     
Search Robots: Google
  Google Image
  Google Mobile
  MSN Search
  Yahoo
  Yahoo MM
  Yahoo Blogs
  Ask/Teoma
  GigaBlast
  DMOZ Checker
  Nutch
  Alexa/Wayback
  Baidu
  Naver
  MSN PicSearch
   
Restricted Directories: The path is relative to root and must contain a trailing slash "/"
 
 
 
 
 
 
   



Now, Create 'robots.txt' file at your root directory. Copy above text and paste into the text file.


About Robots.txt Generator

Every website has a robots.txt file — or should have one. It is a plain text file that lives at the root of your domain (for example, digitalsub.pro/robots.txt) and contains instructions for search engine crawlers: which parts of your site they are allowed to access, and which they should skip.

It works on a simple opt-out basis. By default, search engine bots will attempt to crawl everything they can find on your site. A robots.txt file lets you redirect that effort — away from pages that waste crawl resources and toward pages that matter for your SEO.

The file has been a web standard since 1994, formalised as RFC 9309 in 2022. Every major search engine respects it, including Google, Bing, DuckDuckGo, and most AI crawlers. That said, it is not a security tool. The file is public — anyone can read it — and malicious bots ignore it entirely. It is a directive for cooperating crawlers, not a block for hostile ones.

The DigitalSub Pro Robots.txt Generator lets you create a correctly formatted, error-free robots.txt file in under a minute, without needing to know the syntax or worry about a typo breaking your entire crawl configuration.

How to Use the Generator

  1. Select the search engine bots you want to allow or block from the options provided
  2. Enter any specific directories or file paths you want to disallow (e.g. /wp-admin/, /private/)
  3. Add your XML sitemap URL so crawlers know where to find your indexed pages
  4. Click Generate — the tool produces a ready-to-use robots.txt file
  5. Copy the output and upload it as a plain text file to the root of your domain

Once uploaded, you can verify the file is working by visiting yourdomain.com/robots.txt in your browser, then test it using the Robots.txt Tester in Google Search Console under Settings.

What the Directives Actually Mean

A robots.txt file is built from a small set of instructions. Understanding what each one does prevents the most common (and costly) mistakes.

Directive What it does Example
User-agent: Specifies which bot the following rules apply to. * means all bots. User-agent: *
Disallow: Tells a bot not to crawl the specified path. Does not prevent indexing. Disallow: /wp-admin/
Allow: Explicitly permits a path that would otherwise be blocked by a broader Disallow rule. Allow: /wp-admin/admin-ajax.php
Sitemap: Points crawlers to your XML sitemap. Speeds up discovery of your important pages. Sitemap: https://example.com/sitemap.xml
Crawl-delay: Tells a bot to wait N seconds between requests. Reduces server load. Not supported by Googlebot — use Google Search Console for that instead. Crawl-delay: 10
# Comment A line starting with # is a comment — ignored by crawlers, useful for your own notes. # Block admin area

A Standard Robots.txt File — What One Actually Looks Like

Here is a typical, correctly formatted robots.txt for a WordPress site. This is the kind of output the generator produces:

Generated robots.txt — WordPress example

# Allow all major search engines
User-agent: *
Disallow:

# Block WordPress admin and login
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Allow: /wp-admin/admin-ajax.php

# Block search result pages (duplicate content risk)
User-agent: *
Disallow: /?s=
Disallow: /search/

# Block low-value utility pages
User-agent: *
Disallow: /feed/
Disallow: /trackback/
Disallow: /xmlrpc.php

# Sitemap location
Sitemap: https://yoursite.com/sitemap.xml

Notice what is not blocked here: your CSS files, JavaScript files, images, theme assets, or any page you want to rank. Blocking those causes rendering problems that hurt your search visibility — a common mistake covered in detail below.

The One Distinction That Trips Almost Everyone Up

Disallow blocks crawling. It does not block indexing.

This is the single most misunderstood aspect of robots.txt, and the source of the most damaging mistakes. If you add a page to robots.txt with a Disallow directive, you stop Google from visiting and reading that page. But if other websites link to that page, Google can still discover its URL through those links and add it to its index — it just will not know what is on the page.

Google's own documentation and Search Advocate John Mueller have confirmed this: a Disallow directive alone is not a reliable way to remove a page from search results. If another site links to a disallowed URL, that URL can appear in results with a message like "No information is available for this page."

The correct tools for each job:

  • Use Disallow in robots.txt when you want to save crawl budget — to stop bots spending time on pages that have no SEO value, like admin dashboards, internal search results, or staging content
  • Use <meta name="robots" content="noindex"> when you want a page removed from search results — but crucially, you must allow crawling so Google can read and obey the noindex tag
  • Never use both on the same page at the same time: if you block a page in robots.txt, Google cannot access the noindex tag on it, so the noindex tag is invisible and useless

What You Should and Should Not Block

Usually safe to Disallow Never Disallow these
Admin and login pages (/wp-admin/, /admin/) Your homepage or any page you want to rank
Internal search result pages (/?s=, /search/) CSS files (/wp-content/themes/, /assets/)
Staging or development directories (/staging/, /dev/) JavaScript files — Google needs these to render your pages correctly
Duplicate parameter URLs (?sort=, ?filter=, ?ref=) Image or media directories you want to appear in Google Images
Private member-only content Pages where you need noindex to work — block and noindex cancel each other out
Utility files (/xmlrpc.php, /wp-login.php, /feed/) The entire site (Disallow: /) — this blocks everything including your content

Blocking CSS or JavaScript prevents Google from rendering your pages correctly, which can harm your Core Web Vitals scores and overall rankings. Source: Google Search Central documentation.

Why the File Location Matters

Your robots.txt file must be placed at the root of your domain — not in a subdirectory, not at a subdomain (unless intended for that subdomain specifically). The file must be accessible at exactly:

https://yourdomain.com/robots.txt

If it is placed anywhere else, crawlers will not find it and will ignore it entirely, defaulting to unrestricted access. After uploading, visit that URL in your browser to confirm it is accessible. If you see the plain text of the file, it is correctly placed.

One additional syntax note that trips up even experienced developers: robots.txt is case-sensitive on the directory paths. Disallow: /Admin/ and Disallow: /admin/ are two completely different directives. If your server runs on Linux (which most do), the path casing must match exactly what is on your server.

Crawl Budget — Why It Matters for Larger Sites

For most small blogs and brochure sites (under 500 pages), crawl budget is not a concern — Google will crawl your entire site regularly regardless. But for larger sites — eCommerce stores with thousands of product variants, news sites with deep archives, or SaaS platforms with user-generated content — it becomes genuinely important.

Crawl budget is roughly the number of pages Googlebot will crawl on your site over a given period. When that budget gets used up on low-value pages (filtered product listings, duplicate parameter URLs, printer-friendly versions, internal search results), fewer of your genuinely important pages get crawled and re-indexed promptly. This delays the ranking benefits of content updates and new page launches.

A well-configured robots.txt file that blocks those low-value URLs redirects crawl capacity toward the pages that actually drive traffic. For a large eCommerce site, this can mean the difference between product pages being re-indexed within hours of a price update and those updates taking days to appear in search results.

A Honest Limitation Worth Knowing

Robots.txt is a request, not a lock. Well-behaved bots — Googlebot, Bingbot, and most legitimate crawlers — will respect your Disallow directives. Poorly-behaved bots, scrapers, and malicious crawlers will ignore them completely. If you need to actually block access to content for security reasons, robots.txt is the wrong tool. Use server-level access controls, password protection, or firewall rules instead.

Similarly, robots.txt does not protect your content from being seen or copied by humans who know the URL directly. It only limits automated crawler access — which is the right scope for an SEO tool, but worth being clear-eyed about before assuming it provides any security.

What to Do After Generating Your File

  1. Upload it to your domain root via FTP, your hosting file manager, or your CMS. In WordPress, Yoast SEO and Rank Math both allow you to edit robots.txt directly from the plugin settings without FTP access.
  2. Verify it is live by visiting yourdomain.com/robots.txt in your browser. You should see the plain text of the file.
  3. Test it in Google Search Console — go to Settings → robots.txt and use the built-in tester to check whether specific URLs on your site would be blocked or allowed by your current file.
  4. Submit your sitemap — if you have not already, generate your XML sitemap using the XML Sitemap Generator and reference it in your robots.txt Sitemap: directive. This directly helps crawlers discover your content.
  5. Check your index coverage after any robots.txt change using the Google Index Checker to confirm that pages you want indexed are still being found and indexed correctly.

Tools to Use Alongside This One

XML Sitemap Generator Google Index Checker Meta Tags Analyzer Keyword Position Checker Broken Links Finder Redirect Checker

Frequently Asked Questions

Does my website need a robots.txt file? +

Technically, no — a missing robots.txt file does not harm your SEO. When Google finds no robots.txt at your domain root, it defaults to crawling everything. However, having a correctly configured file is still good practice for two reasons: it lets you add your sitemap location so crawlers can find it immediately, and it lets you block genuinely low-value URLs (admin pages, internal search results, duplicate parameter pages) from consuming crawl capacity. For small sites, the benefit is minimal. For larger sites, it becomes meaningfully important.

Can I block a specific page from Google using robots.txt? +

You can stop Google from crawling it, but not reliably from indexing it. If another website links to that URL, Google can still discover and index the URL without ever visiting the page. The reliable way to remove a page from Google's index is to add <meta name="robots" content="noindex"> to the page's HTML head section — while keeping the page allowed in robots.txt so Google can crawl and read that noindex tag. Using Disallow and noindex on the same page at the same time makes the noindex invisible and ineffective.

I accidentally blocked my whole site. How do I fix it? +

The culprit is almost always a line that reads Disallow: / under User-agent: *. That single rule tells every bot to skip your entire site. Fix it immediately by either deleting the Disallow line, replacing it with Disallow: (which means disallow nothing), or uploading a corrected file. After fixing, go to Google Search Console, use the URL Inspection tool on your homepage, and request indexing. Google typically re-crawls within a few hours to a few days after the block is removed. Check the Index Coverage report to confirm pages are returning to the index.

Should I block Bing, DuckDuckGo, and other search engines? +

In almost every case, no. Each additional search engine represents additional organic traffic potential. DuckDuckGo alone has a reported 3–4% global search market share. Bing powers search results for Microsoft Edge, Yahoo, and several AI assistants. Blocking them reduces your potential traffic without any offsetting benefit for most sites. The only situations where blocking a specific bot makes sense: you are running a private or internal site that should not appear in any search engine, or you want to opt out of a specific AI training crawler (like GPTBot or CCBot) for content ownership reasons. Use specific User-agent: blocks for those cases rather than a blanket block.

Why does my robots.txt file not affect some bots? +

Because robots.txt is a voluntary protocol. Responsible crawlers like Googlebot, Bingbot, and most legitimate scrapers respect it by convention. Malicious bots — scrapers harvesting email addresses, content theft bots, vulnerability scanners — ignore it completely. If you see unexpected bot traffic despite having Disallow rules, the bots are simply not reading the file. Blocking those requires server-level tools: IP blocking, firewall rules, rate limiting, or a security service like Cloudflare. Robots.txt is not the right layer for that kind of protection.

How often should I update my robots.txt file? +

There is no set schedule — update it when your site structure changes in a way that introduces new directories worth blocking, or when you launch content you want to restrict. Common triggers: adding a staging environment you do not want indexed, launching a new faceted navigation that generates thousands of duplicate parameter URLs, or restructuring your admin area. After any update, test it immediately in Google Search Console's robots.txt tester and watch your crawl stats in the Coverage report for the following week to confirm nothing important was accidentally blocked.

What is the difference between robots.txt and an XML sitemap? +

They serve opposite functions. Robots.txt tells crawlers what to skip. An XML sitemap tells crawlers what to prioritise and crawl. They complement each other: your robots.txt blocks low-value paths and points to your sitemap, while your sitemap lists all the high-value URLs you want discovered and indexed. Using both together gives search engines a clear picture of your site's structure — here is what to crawl, and here is where to focus. Generate your sitemap using the XML Sitemap Generator and add its URL to the Sitemap directive in your robots.txt file.

Is the Robots.txt Generator free to use? +

Yes — completely free, no account needed, no usage limits. Generate as many files as you need for any number of domains. All 47 tools on DigitalSub Pro work the same way.

Where to upload robots.txt on common platforms

WordPress

Edit via Yoast SEO or Rank Math under SEO → Tools → File Editor. Or upload manually to /public_html/ via FTP or file manager.

Shopify

Shopify auto-generates a robots.txt file. You can customise it by creating a robots.txt.liquid template in your theme's layout folder.

Wix

Go to Marketing & SEO → SEO Tools → Robots.txt. Wix provides a built-in editor for managing your directives.

Static / HTML sites

Save the generated content as robots.txt (plain text, no extension). Upload it to your root directory via FTP or cPanel File Manager.