Robots.txt Generator
About Robots.txt Generator
Every website has a robots.txt file — or should have one. It is a plain text file that lives at the root of your domain (for example, digitalsub.pro/robots.txt) and contains instructions for search engine crawlers: which parts of your site they are allowed to access, and which they should skip.
It works on a simple opt-out basis. By default, search engine bots will attempt to crawl everything they can find on your site. A robots.txt file lets you redirect that effort — away from pages that waste crawl resources and toward pages that matter for your SEO.
The file has been a web standard since 1994, formalised as RFC 9309 in 2022. Every major search engine respects it, including Google, Bing, DuckDuckGo, and most AI crawlers. That said, it is not a security tool. The file is public — anyone can read it — and malicious bots ignore it entirely. It is a directive for cooperating crawlers, not a block for hostile ones.
The DigitalSub Pro Robots.txt Generator lets you create a correctly formatted, error-free robots.txt file in under a minute, without needing to know the syntax or worry about a typo breaking your entire crawl configuration.
How to Use the Generator
- Select the search engine bots you want to allow or block from the options provided
- Enter any specific directories or file paths you want to disallow (e.g.
/wp-admin/,/private/) - Add your XML sitemap URL so crawlers know where to find your indexed pages
- Click Generate — the tool produces a ready-to-use robots.txt file
- Copy the output and upload it as a plain text file to the root of your domain
Once uploaded, you can verify the file is working by visiting yourdomain.com/robots.txt in your browser, then test it using the Robots.txt Tester in Google Search Console under Settings.
What the Directives Actually Mean
A robots.txt file is built from a small set of instructions. Understanding what each one does prevents the most common (and costly) mistakes.
| Directive | What it does | Example |
|---|---|---|
| User-agent: | Specifies which bot the following rules apply to. * means all bots. |
User-agent: * |
| Disallow: | Tells a bot not to crawl the specified path. Does not prevent indexing. | Disallow: /wp-admin/ |
| Allow: | Explicitly permits a path that would otherwise be blocked by a broader Disallow rule. | Allow: /wp-admin/admin-ajax.php |
| Sitemap: | Points crawlers to your XML sitemap. Speeds up discovery of your important pages. | Sitemap: https://example.com/sitemap.xml |
| Crawl-delay: | Tells a bot to wait N seconds between requests. Reduces server load. Not supported by Googlebot — use Google Search Console for that instead. | Crawl-delay: 10 |
| # Comment | A line starting with # is a comment — ignored by crawlers, useful for your own notes. | # Block admin area |
A Standard Robots.txt File — What One Actually Looks Like
Here is a typical, correctly formatted robots.txt for a WordPress site. This is the kind of output the generator produces:
Generated robots.txt — WordPress example
# Allow all major search engines User-agent: * Disallow: # Block WordPress admin and login User-agent: * Disallow: /wp-admin/ Disallow: /wp-login.php Allow: /wp-admin/admin-ajax.php # Block search result pages (duplicate content risk) User-agent: * Disallow: /?s= Disallow: /search/ # Block low-value utility pages User-agent: * Disallow: /feed/ Disallow: /trackback/ Disallow: /xmlrpc.php # Sitemap location Sitemap: https://yoursite.com/sitemap.xml
Notice what is not blocked here: your CSS files, JavaScript files, images, theme assets, or any page you want to rank. Blocking those causes rendering problems that hurt your search visibility — a common mistake covered in detail below.
The One Distinction That Trips Almost Everyone Up
Disallow blocks crawling. It does not block indexing.
This is the single most misunderstood aspect of robots.txt, and the source of the most damaging mistakes. If you add a page to robots.txt with a Disallow directive, you stop Google from visiting and reading that page. But if other websites link to that page, Google can still discover its URL through those links and add it to its index — it just will not know what is on the page.
Google's own documentation and Search Advocate John Mueller have confirmed this: a Disallow directive alone is not a reliable way to remove a page from search results. If another site links to a disallowed URL, that URL can appear in results with a message like "No information is available for this page."
The correct tools for each job:
- Use Disallow in robots.txt when you want to save crawl budget — to stop bots spending time on pages that have no SEO value, like admin dashboards, internal search results, or staging content
- Use
<meta name="robots" content="noindex">when you want a page removed from search results — but crucially, you must allow crawling so Google can read and obey the noindex tag - Never use both on the same page at the same time: if you block a page in robots.txt, Google cannot access the noindex tag on it, so the noindex tag is invisible and useless
What You Should and Should Not Block
| Usually safe to Disallow | Never Disallow these |
|---|---|
Admin and login pages (/wp-admin/, /admin/) |
Your homepage or any page you want to rank |
Internal search result pages (/?s=, /search/) |
CSS files (/wp-content/themes/, /assets/) |
Staging or development directories (/staging/, /dev/) |
JavaScript files — Google needs these to render your pages correctly |
Duplicate parameter URLs (?sort=, ?filter=, ?ref=) |
Image or media directories you want to appear in Google Images |
| Private member-only content | Pages where you need noindex to work — block and noindex cancel each other out |
Utility files (/xmlrpc.php, /wp-login.php, /feed/) |
The entire site (Disallow: /) — this blocks everything including your content |
Blocking CSS or JavaScript prevents Google from rendering your pages correctly, which can harm your Core Web Vitals scores and overall rankings. Source: Google Search Central documentation.
Why the File Location Matters
Your robots.txt file must be placed at the root of your domain — not in a subdirectory, not at a subdomain (unless intended for that subdomain specifically). The file must be accessible at exactly:
https://yourdomain.com/robots.txt
If it is placed anywhere else, crawlers will not find it and will ignore it entirely, defaulting to unrestricted access. After uploading, visit that URL in your browser to confirm it is accessible. If you see the plain text of the file, it is correctly placed.
One additional syntax note that trips up even experienced developers: robots.txt is case-sensitive on the directory paths. Disallow: /Admin/ and Disallow: /admin/ are two completely different directives. If your server runs on Linux (which most do), the path casing must match exactly what is on your server.
Crawl Budget — Why It Matters for Larger Sites
For most small blogs and brochure sites (under 500 pages), crawl budget is not a concern — Google will crawl your entire site regularly regardless. But for larger sites — eCommerce stores with thousands of product variants, news sites with deep archives, or SaaS platforms with user-generated content — it becomes genuinely important.
Crawl budget is roughly the number of pages Googlebot will crawl on your site over a given period. When that budget gets used up on low-value pages (filtered product listings, duplicate parameter URLs, printer-friendly versions, internal search results), fewer of your genuinely important pages get crawled and re-indexed promptly. This delays the ranking benefits of content updates and new page launches.
A well-configured robots.txt file that blocks those low-value URLs redirects crawl capacity toward the pages that actually drive traffic. For a large eCommerce site, this can mean the difference between product pages being re-indexed within hours of a price update and those updates taking days to appear in search results.
A Honest Limitation Worth Knowing
Robots.txt is a request, not a lock. Well-behaved bots — Googlebot, Bingbot, and most legitimate crawlers — will respect your Disallow directives. Poorly-behaved bots, scrapers, and malicious crawlers will ignore them completely. If you need to actually block access to content for security reasons, robots.txt is the wrong tool. Use server-level access controls, password protection, or firewall rules instead.
Similarly, robots.txt does not protect your content from being seen or copied by humans who know the URL directly. It only limits automated crawler access — which is the right scope for an SEO tool, but worth being clear-eyed about before assuming it provides any security.
What to Do After Generating Your File
- Upload it to your domain root via FTP, your hosting file manager, or your CMS. In WordPress, Yoast SEO and Rank Math both allow you to edit robots.txt directly from the plugin settings without FTP access.
- Verify it is live by visiting
yourdomain.com/robots.txtin your browser. You should see the plain text of the file. - Test it in Google Search Console — go to Settings → robots.txt and use the built-in tester to check whether specific URLs on your site would be blocked or allowed by your current file.
- Submit your sitemap — if you have not already, generate your XML sitemap using the XML Sitemap Generator and reference it in your robots.txt
Sitemap:directive. This directly helps crawlers discover your content. - Check your index coverage after any robots.txt change using the Google Index Checker to confirm that pages you want indexed are still being found and indexed correctly.
Tools to Use Alongside This One
XML Sitemap Generator Google Index Checker Meta Tags Analyzer Keyword Position Checker Broken Links Finder Redirect Checker
Frequently Asked Questions
Where to upload robots.txt on common platforms
WordPress
Edit via Yoast SEO or Rank Math under SEO → Tools → File Editor. Or upload manually to /public_html/ via FTP or file manager.
Shopify
Shopify auto-generates a robots.txt file. You can customise it by creating a robots.txt.liquid template in your theme's layout folder.
Wix
Go to Marketing & SEO → SEO Tools → Robots.txt. Wix provides a built-in editor for managing your directives.
Static / HTML sites
Save the generated content as robots.txt (plain text, no extension). Upload it to your root directory via FTP or cPanel File Manager.