Skip to content
Back to blog
seo strategy / ai search

How to Get Your Website Cited by ChatGPT, Perplexity and Google AI

Aspect 20 min read
How to Get Your Website Cited by ChatGPT, Perplexity and Google AI
Why this matters right now

In June 2025, AI search platforms sent 1.13 billion referral visits to websites — a 357% year-over-year increase. ChatGPT alone accounts for 78% of that traffic. Every AI citation converts at 5× the rate of organic search. The businesses capturing those visits are not getting lucky. They are doing specific, learnable things to earn citations. This guide covers exactly what those things are, step by step.

Getting cited by an AI engine is fundamentally different from ranking in a traditional search result. Google can rank your page because of its backlink profile and domain authority even if the content is poorly structured. An AI citation, by contrast, requires that your content be extractable — structured so that a language model can pull a clear, accurate, attributable answer from it without ambiguity. You cannot buy your way into an AI citation with links alone. You have to earn it with the right content structure, technical access, and brand presence.

This guide is organised as a step-by-step implementation process, starting from the technical foundation — because if AI crawlers cannot reach your site, nothing else matters — and working through content structure, platform-specific optimisation, brand signal building, and measurement. Each step is self-contained. You can implement them in order or apply whichever step addresses the gap your site currently has.

1.13B
AI referral visits to websites in June 2025 alone
Pixelmojo, 2026
Better conversion rate from AI citations vs organic search traffic
Pixelmojo / Previsible, 2025
38%
Of AI Overview citations come from top-10 organic results — down from 76% in 2025
Ahrefs, early 2026
11%
Of domains cited by both ChatGPT and Perplexity — each platform has distinct source preferences
Leapd, 680M citations

That 38% figure deserves to be read twice. In mid-2025, 76% of AI Overview citations came from pages ranking in the traditional top 10. By early 2026 that had dropped to 38% — and some research puts it as low as 17%. Ranking first on Google no longer guarantees inclusion in AI answers. Content structure, E-E-A-T signals, and multi-channel brand presence now determine AI citation eligibility independently of your ranking position. This is the gap between traditional SEO and what you actually need to do to appear in AI-generated answers.


The Hidden Problem Blocking Most Sites From AI Citations

Before any content optimisation conversation, there is a technical issue that affects a significant number of sites silently and completely: AI crawlers being blocked. Not intentionally — most site owners have no idea it is happening. But a default CDN security setting, an overaggressive SEO plugin, or a robots.txt file that predates AI crawlers can be blocking every AI bot from ever reading your pages. A site can have perfect content structure, excellent backlinks, and strong E-E-A-T signals — and still have zero AI citations because GPTBot never got through the front door.

Check this first — it may already be costing you citations

Cloudflare's Bot Fight Mode and Super Bot Fight Mode — enabled by default on many plans — blocks AI crawlers including PerplexityBot and OAI-SearchBot at the CDN layer, before your robots.txt is even read. The bot hits your site, gets a Cloudflare challenge or block, and your server never sees the request. One developer published that this was "silently blocking AI citations for months" without any indication in their analytics or server logs. If your site runs behind Cloudflare, check your Bot Fight Mode settings immediately.

The Critical Distinction: Training Bots vs Retrieval Bots

This is the most technically important piece of information in this entire guide, and it is almost never explained in general SEO content. AI crawlers come in two fundamentally different types — and treating them the same way in your robots.txt is a mistake that either blocks AI citations entirely or gives away your content unnecessarily.

Bot User-Agent Company Type What It Does Recommended
OAI-SearchBot OpenAI Search/Retrieval Fetches pages in real-time when a ChatGPT user asks a question — cites your page in the answer Allow ✓
ChatGPT-User OpenAI Search/Retrieval ChatGPT's browsing retrieval bot — the one that actually cites pages in ChatGPT answers Allow ✓
GPTBot OpenAI Training Only Collects pages to train OpenAI's models — your content enters the model's weights, no citation or traffic Optional block
Claude-SearchBot Anthropic Search/Retrieval Real-time retrieval for Claude's web search feature — cites pages in Claude answers Allow ✓
Claude-User Anthropic Search/Retrieval Claude's browsing agent — fetches pages when Claude users ask it to browse Allow ✓
ClaudeBot / anthropic-ai Anthropic Training Only Trains Claude's language models — content used for training, no citation credit Optional block
PerplexityBot Perplexity Search + Index Powers Perplexity's citation engine — directly determines whether you appear in Perplexity answers Allow ✓
Perplexity-User Perplexity Search/Retrieval Real-time page fetch when Perplexity users ask a question — the citation retrieval bot Allow ✓
Google-Extended Google Training + AI Used for Bard/Gemini training and Google AI Overviews — blocking this limits AI Overview eligibility Allow for AIO
CCBot Common Crawl Training Only Feeds training data to dozens of open-source LLMs — no citation credit, no traffic Block if preferred

The strategic conclusion from this table: allow all search/retrieval bots and optionally block training-only bots. The retrieval bots are what send citations — and traffic — to your site. The training bots consume your content without providing attribution or referral traffic in return. Most content sites in 2026 allow retrieval bots and block training bots, giving them AI citation visibility without donating content to model training datasets.


The Step-by-Step Implementation Guide

1
Audit your robots.txt and fix AI crawler access
Foundation — everything else depends on this

Before writing a single word of new content, confirm that all retrieval bots can actually reach your site. This is the most common reason websites with good content have no AI citations. Check your current robots.txt file, then compare it against the recommended configuration below.

To check your current robots.txt, visit yourdomain.com/robots.txt in your browser. Look for any lines containing Disallow: / under AI bot user-agents. A disallow on any retrieval bot effectively removes you from that platform's citation pool entirely.

robots.txt — Recommended AI Crawler Configuration Copy this
# Standard search engine crawlers — unchanged User-agent: * Allow: / # ── AI SEARCH / RETRIEVAL BOTS (allow these — they send citations + traffic) ── User-agent: OAI-SearchBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: Claude-SearchBot Allow: / User-agent: Claude-User Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: / # ── AI TRAINING BOTS (optional block — no traffic or citations either way) ── User-agent: GPTBot Disallow: / # OpenAI training (optional) User-agent: anthropic-ai Disallow: / # Anthropic training (optional) User-agent: CCBot Disallow: / # Common Crawl (optional)

Use the Robots.txt Generator to generate a clean robots.txt file, then manually add the AI bot entries above. After updating, verify your key pages are still indexed by Google using the Google Index Checker — a misconfigured robots.txt can accidentally block Googlebot too.

Verify it worked: Check your server access logs 2–3 days after updating. You should start seeing PerplexityBot, OAI-SearchBot, and ChatGPT-User appearing. Their absence after a week indicates a CDN-level block (likely Cloudflare) that overrides robots.txt directives.
2
Confirm every key page is indexed and accessible
An unindexed page cannot be cited by any AI engine

Every AI platform that does real-time retrieval — Google AI Overviews, Perplexity, ChatGPT Search — relies on an index of known web pages. For Google AI Overviews, 97% of citations come from the top 20 organic results. That means if Google has not indexed your page at all, it will not appear in Google AI Overviews regardless of content quality. For Perplexity and ChatGPT Search, they rely on Bing's index — being indexed by Bing as well as Google therefore doubles your retrieval eligibility.

  • Use the Google Index Checker on your 10 most important pages and confirm they are all indexed
  • Check every key page's meta tags with the Meta Tags Analyzer — a single accidental noindex directive removes a page from all AI citations immediately
  • Submit your sitemap.xml to both Google Search Console and Bing Webmaster Tools — Bing indexing expands your ChatGPT and Perplexity eligibility
  • Check page speed with the PageSpeed Insights Checker — AI retrieval bots have time budgets. Perplexity's crawler has been documented abandoning pages that load too slowly, meaning fast pages are systematically over-represented in Perplexity citations
Critical check: Use the Search Engine Spider Simulator to confirm your content is visible in raw HTML — not rendered by JavaScript after page load. AI retrieval crawlers generally do not execute JavaScript. Content that only exists in the rendered DOM is invisible to all AI citation systems.
3
Restructure your content for AI extraction
The single most impactful content change you can make

AI citation engines do not read your entire article and summarise it. They scan your page for extractable "answer units" — self-contained passages that directly answer a question without requiring surrounding context. The most-cited passages in AI Overviews average 40 to 167 words, open with a direct answer to the question implied by the heading, and are comprehensible as standalone text.

The structural changes that consistently produce more AI citations:

1. Answer-first writing. Every H2 and H3 section should begin with a direct, complete answer to the question implied by that heading — in the first one to two sentences. The detail, examples, and nuance come after. AI systems extract the lead. If your lead sentence is a preamble ("In this section, we will explore…"), the AI moves on.

Not citation-ready

There are many factors that go into how Google ranks websites. In this section, we're going to look at some of the most important ones and why they matter for your overall SEO strategy…

Citation-ready ✓

Google ranks websites based on three core signal categories: relevance (does the content match the query), authority (do other trusted sites link to it), and user experience (does the page load fast and deliver what was promised). Of these, authority signals from backlinks remain the strongest independent predictor of ranking position.

2. Question-format headings. Phrase your H2 and H3 headings as the question a user would actually type or ask aloud. "How does Domain Authority affect rankings?" performs better than "Domain Authority and Rankings." This directly maps your content to the query format that triggers AI answer generation.

3. Named entity density. Mention your brand name, topic, and key named entities explicitly in the first 100 words of your page. AI systems use entity recognition to classify content — pages with clear, explicit entity mentions ("This guide covers keyword density checking for SEO") are more confidently categorised than pages that assume context.

4. Add statistics with sources to every major claim. A specific, sourced data point is the single highest-impact content change for AI citation rates (+40% visibility from the Princeton GEO research). Replace generalisations with numbers. Replace vague attribution with named sources and dates.

Check keyword usage: Use the Keyword Density Checker on your target pages to confirm your primary topic terms appear consistently throughout the content — not just in the intro and conclusion. AI retrieval systems use semantic frequency as a relevance signal for topic classification.
4
Add schema markup — the machine-readable layer AI systems prefer
JSON-LD tells AI engines exactly what your content is and who wrote it

Schema markup is structured data embedded in your page that labels your content for machines. It does not change what human readers see — it tells AI crawlers, search engines, and language models what type of content they are reading, who authored it, when it was published, and what questions it answers. Three schema types have the highest direct impact on AI citation rates:

FAQPage Schema — the most impactful for AI citations

FAQPage schema creates explicitly labelled question-and-answer pairs in your page's structured data. This is the closest you can get to pre-packaging your content as citation-ready answers for AI systems — the question is labelled as a question, the answer is labelled as an answer, and both are machine-readable independently of your page's visual formatting.

FAQPage Schema — JSON-LD (add to <head> or end of <body>)
<script type="application/ld+json"> { "@context": "https://schema.org", "@type": "FAQPage", "mainEntity": [ { "@type": "Question", "name": "What is keyword density and why does it matter?", "acceptedAnswer": { "@type": "Answer", "text": "Keyword density is the percentage of times a target keyword appears relative to the total word count of a page. It matters because Google uses keyword frequency as one signal for determining what a page is about. A density of 1–2% is generally considered healthy — enough to signal relevance without appearing manipulative." } } ] } </script>

Article Schema — communicates authorship and freshness

Article schema tells AI systems who wrote the page, when it was published, and when it was last updated. Author credentials are an E-E-A-T signal that AI citation systems weigh — content from named authors with professional credentials is more likely to be cited than anonymous content. The dateModified field is particularly important: AI systems use it to assess content freshness, and pages with a recent modification date are prioritised over older, unupdated content.

Article Schema — JSON-LD
{ "@context": "https://schema.org", "@type": "Article", "headline": "Your Article Title Here", "datePublished": "2026-06-10", "dateModified": "2026-06-10", // Update this quarterly minimum "author": { "@type": "Person", "name": "Your Name", "url": "https://yoursite.com/about" }, "publisher": { "@type": "Organization", "name": "Your Site Name", "url": "https://yoursite.com" } }
Verify schema is in the HTML: After adding schema markup, use the Source Code Viewer to confirm the JSON-LD appears in the page's raw HTML. Schema injected by JavaScript after page load is not reliably read by AI retrieval crawlers that do not execute JavaScript.
5
Build the consensus signal — your most powerful long-term strategy
AI platforms scan for agreement across multiple independent sources before citing you

This is the insight from Profound and SEMrush research that most content guides miss entirely. AI recommendation systems do not just look at one source before citing a brand. They scan for multi-source consensus — does this brand or claim appear consistently across Reddit, YouTube, industry publications, review platforms, and its own website? If your positioning appears consistently across multiple independent sources, AI systems gain enough confidence in that positioning to cite you. A single well-optimised page on its own website is rarely enough — it is the cross-channel corroboration that triggers reliable citations.

1Your own website

Well-structured, schema-marked content pages with direct answers. The foundation — but not sufficient alone.

2Reddit / forums

Genuine participation in niche communities where your audience asks questions. Not spam — real helpful answers that mention your brand when relevant.

3YouTube

Ahrefs' research on 75,000 brands found YouTube brand mentions are the strongest single correlating factor with AI Overview visibility. Even one or two tutorial videos matter.

4Review platforms

G2, Trustpilot, Capterra — brands with review profiles have 3× higher AI citation chances. Third-party validation is a trust signal AI systems read.

5Industry publications

Being mentioned or quoted in recognised publications in your niche builds parametric memory in AI training data. Digital PR is now an AI visibility strategy.

6LinkedIn + author profile

Named author presence with professional credentials increases the E-E-A-T signal for all content attributed to that author across all AI citation systems.

The critical thing to understand about YouTube specifically: it is not just the videos that create the signal — it is the text. YouTube auto-generates transcripts for every video. Google's AI Overview system can read those transcripts and cites YouTube as a source. A five-minute tutorial video where you mention your brand name alongside your topic creates a brand mention in a YouTube transcript that Google's AI system reads and cross-references with your website. This is why Ahrefs found YouTube to be the strongest single factor — it creates a high-authority third-party corroboration that no amount of on-site content can replicate.

6
Platform-specific optimisation — each AI engine has different citation logic
Only 11% of domains are cited by both ChatGPT and Perplexity — treat them separately
G
Google AI Overviews
Real-time retrieval from Google's index. 97% of citations from top-20 organic results. YouTube is the strongest external signal.
  • Traditional SEO ranking still matters — improve it via internal linking and backlinks
  • Google-Extended must be allowed in robots.txt
  • FAQPage and Article schema directly feed into AIO selection
  • Update content regularly — freshness is weighted heavily
  • Aim for Performance score 70+ on mobile via PageSpeed
C
ChatGPT / SearchGPT
Combines training data (static) with live Bing retrieval. The 78% of AI referral traffic. Favors high-authority sources — Wikipedia-adjacent content performs well.
  • Allow OAI-SearchBot and ChatGPT-User in robots.txt
  • Submit to Bing Webmaster Tools — ChatGPT Search uses Bing's index
  • Brand mentions across the web build parametric memory
  • Domain authority (Moz DA) is the strongest predictor of ChatGPT citations
  • Direct, authoritative writing style — not conversational fluff
P
Perplexity AI
Always does live retrieval. Most citation-transparent platform. Favors specialised blogs and press mentions.
  • Allow PerplexityBot and Perplexity-User — without this, zero citations possible
  • High-frequency crawling — Perplexity revisits popular pages often
  • Tables, structured lists, and numbered steps are heavily favoured
  • Source your claims explicitly — Perplexity cross-checks citations
  • Press mentions in specialised media are strong Perplexity signals
Gm
Google Gemini
Deeply integrated with Google's Knowledge Graph. Most traditional SEO overlap.
  • Strong E-E-A-T signals carry over directly from Google SEO
  • Organization schema with sameAs linking to Wikipedia, Wikidata, and social profiles helps Knowledge Graph inclusion
  • Author expertise signals matter more here than on other platforms
  • Local business schema helps for location-based queries
7
Measure your AI citation performance — set up GA4 tracking
You cannot improve what you cannot see

AI referral traffic requires specific tracking setup in GA4 — it does not appear correctly in default channel groupings because AI platforms are categorised as "Referral" or miscategorised entirely. Setting up a dedicated AI Traffic channel lets you see exactly which AI platforms are sending visitors, which pages they land on, and how those visitors behave compared to organic search traffic.

Set up an AI Traffic custom channel in GA4:

  1. Go to GA4 Admin → Data Display → Channel Groups
  2. Create a new channel group or edit your existing one
  3. Add a new channel named "AI Search"
  4. Set the Source condition to match regex: chatgpt\.com|perplexity\.ai|claude\.ai|gemini\.google\.com|copilot\.microsoft\.com|you\.com
  5. Save and allow 24–48 hours for data to populate

Manual citation tracking — the direct audit:

Once a week, run your 10–20 most important target queries through ChatGPT, Perplexity, and Google AI Overviews. Note: which sources are cited? Are you cited? Which competitors appear that you do not? Track this in a simple spreadsheet — column per platform, row per query, mark whether you appear. Over 4–8 weeks, this gives you a clear picture of where you have citation share and where you do not, so you can focus your GEO efforts on the specific queries and platforms where the gap is largest.

What to look for: If you appear in Google AI Overviews but not in Perplexity for the same query, the issue is likely that PerplexityBot is blocked or your domain lacks the press mentions Perplexity favours. If you appear in neither, start at Step 1 — the technical access issue is the most likely root cause.

Hidden Tactics Most Competitors Are Not Using Yet

These are the less-obvious strategies that consistently appear in practitioner case studies but are rarely covered in general GEO content. They represent genuine first-mover opportunities in 2026.

1. Use question-format H2 and H3 headings across every article

The majority of queries that trigger AI answer generation are question-format queries. When your subheadings are written as questions — "How does keyword density affect rankings?" rather than "Keyword Density and Rankings" — your content structurally aligns with the exact query format that triggers AI answer generation. This is a five-minute edit on any existing article that can measurably improve citation frequency within days.

2. Create a dedicated "About" page with Organization schema

A well-structured About page with Organization JSON-LD schema — including your brand's sameAs links to LinkedIn, Twitter/X, YouTube, and any relevant Wikipedia or Wikidata entries — helps Google's Knowledge Graph associate your brand with your topic area. This is one of the direct inputs into Gemini's brand recognition and is increasingly important for building parametric memory across all AI platforms. It takes less than an hour to implement correctly and has lasting compounding benefits.

Organization Schema with sameAs — add to your About page or homepage
{ "@context": "https://schema.org", "@type": "Organization", "name": "Your Brand Name", "url": "https://yoursite.com", "description": "Clear, direct one-sentence description of what your brand does", "sameAs": [ "https://linkedin.com/company/yourbrand", "https://twitter.com/yourbrand", "https://youtube.com/@yourbrand" ] }

3. Publish a "statistics" or "data" page for your niche

AI systems are highly biased toward citing pages that are the primary source of data — rather than pages that summarise data from other sources. A page titled "SEO Statistics for 2026" or "[Your Niche] Data and Statistics" that aggregates real data with clear attribution becomes a citation magnet because AI systems looking for statistics to include in their answers will default to curated statistics pages over individual articles. This single content type generates disproportionate AI citations relative to the effort required to create it.

4. Write content at the exact length AI extractors prefer

Research on AI-cited passages shows a clear sweet spot: individual answer units of 50 to 167 words are cited most frequently. Very short answers (under 40 words) lack the context to be useful. Very long answers (over 300 words without a break) get truncated or passed over for a more compact source. Structure each H2 section with: a 50–80 word direct answer first, followed by a 100–200 word expansion with evidence, followed by a practical example or application. This three-part structure consistently produces higher citation rates than either very short or very long sections.

5. Include a comparison table in every major guide

Tables are the highest-performing content format for Perplexity citations specifically — the platform's rendering shows tables prominently and its retrieval system consistently prefers structured tabular data over prose for comparison queries. Adding a comparison table (comparing options, platforms, approaches, or tools) to any guide that covers multiple choices increases Perplexity citation likelihood significantly. The table data should be original — not lifted from another source — to be maximally citation-worthy.

6. Keep your best pages broken-link free

A broken link on a page signals to AI retrieval systems that the content is unmaintained and potentially outdated. Perplexity in particular has been documented favouring freshly maintained pages over pages with dead outbound links. Run the Broken Links Finder on every page you want to earn AI citations from — a single broken citation link can undermine a page's citation eligibility disproportionately to its actual importance.


Priority Order — If You Only Have One Hour This Week

Highest impact per hour of work

If you have limited time, these actions produce the fastest results in order of impact per hour invested:

  • First 15 minutes: Check your robots.txt and add explicit Allow directives for all retrieval bots (OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot, Google-Extended). If you use Cloudflare, check Bot Fight Mode settings.
  • Next 20 minutes: Rewrite the first two sentences of every H2 and H3 in your top three articles to be direct answer statements. This single change applies to every AI platform simultaneously.
  • Final 25 minutes: Add FAQPage schema to each of those three articles. Pick the 4–5 questions users actually ask about your topic (use the Keyword Suggestion Tool to find question-format queries), write direct 60–100 word answers, and implement the JSON-LD above.

The Complete AI Citation Checklist

  • Check robots.txt — explicitly allow OAI-SearchBot, ChatGPT-User, PerplexityBot, Claude-SearchBot, Claude-User, Google-Extended, Perplexity-User
  • Check Cloudflare or CDN Bot Fight settings — confirm AI retrieval bots are not blocked at the network layer
  • Verify all key pages are indexed using the Google Index Checker
  • Check all key pages for accidental noindex tags using the Meta Tags Analyzer
  • Submit sitemap.xml to both Google Search Console and Bing Webmaster Tools
  • Check page speed — aim for 70+ mobile Performance score on all citation-target pages using PageSpeed Insights Checker
  • Verify content is in raw HTML using Spider Simulator — not JavaScript-rendered only
  • Rewrite H2/H3 sections to begin with direct 50–80 word answer statements
  • Convert headings to question format where applicable
  • Add at least one specific, sourced statistic to every major claim
  • Add FAQPage schema (JSON-LD) to top 5 content pages
  • Add Article schema with named author, datePublished, and dateModified
  • Add Organization schema with sameAs links to all brand social profiles
  • Fix all broken outbound links on citation-target pages using Broke