Everyone in marketing has noticed it by now. Some brands show up in ChatGPT answers constantly. Others — with better content, stronger sites, more domain authority — don't appear at all. Ask your SEO team why and you'll get a shrug and a theory. Ask ten different consultants and you'll get ten different answers, most of them speculative.

The honest version is that ChatGPT citation has felt like a black box: no documentation, no rank tracker, no audit trail. The usual playbook — build links, write comprehensive content, earn authority — doesn't obviously translate. People are guessing.

AirOps has now done something nobody else has at this scale: they ran 16,851 queries through the ChatGPT UI (not the API), captured every fan-out sub-query, every retrieved URL, and every cited source, then analyzed what separated cited pages from ignored ones. The Fan-Out Effect report is the most granular public dataset on ChatGPT citation behavior that exists. What it shows is both clarifying and, depending on your situation, a little deflating.


What ChatGPT Actually Does Before It Answers

The first thing the report clarifies is the mechanism — which most marketing teams have not internalized.

When a user types a query into ChatGPT, ChatGPT doesn't answer from memory alone. It issues web searches first. These are called fan-out queries: reformulated versions of the original question that ChatGPT sends to Bing. It retrieves the top results, reads them, then decides what to cite in its response.

According to the AirOps data, 88.6% of queries generate exactly two fan-out sub-queries. A small fraction (8.8%) generate zero — these are simple product or entity lookups where ChatGPT draws from memory. The rest generate four or more, mostly on complex comparative questions.

Before your content can ever be cited, it has to appear in Bing search results for one of those fan-out queries. That's the gate.

The practical consequence: before your content can ever be cited, it has to appear in Bing search results for one of those fan-out queries. That's the gate. Everything else — content quality, schema, readability — comes after.


The Uncomfortable Finding: It's Mostly a Bing Problem

The AirOps report is unambiguous on its main finding: retrieval rank is the dominant signal for citation, by a wide margin.

Pages at position 0 in Bing retrieval have a 58% citation rate. Pages at position 10 have a 14% citation rate. That's a 4x gap, and it holds across query types and content categories.

The more uncomfortable version: a mediocre page ranked at position 0 — with heading similarity below 0.60, meaning weak content match — gets cited 56% of the time. A strong content page at rank 6 or higher gets cited 26% of the time. Rank overrides content quality. Not somewhat — substantially and consistently.

This is the finding that makes the whole discipline feel slightly absurd. You can write the most carefully structured, precisely matched article in your category. If it doesn't rank on Bing page 1, ChatGPT will effectively never see it.

The domain authority angle makes this more pointed. According to the same study, pages that were never cited across three runs had a median domain authority of 56 and 3.2 million backlinks. Pages cited in all three runs had a median DA of 53 and 1.1 million backlinks. The relationship isn't just weak — it's slightly inverse. High-authority sites that perform well in Google don't automatically transfer that authority to ChatGPT citations.

Reddit illustrates this precisely. Domain authority of 92. Citation rate of 29.9%. Strong authority, indifferent citation performance. The system doesn't care about your link graph.

Site / Signal Domain Authority Citation Rate Note
Wikipedia 95 59.2% Median Bing rank 24 — wins on content density alone
Health Publishers 90 46.4% Best retrieval rank + high query match
Major News (Forbes, NYT) 94 32.0% Authority doesn't compensate for format mismatch
Reddit 92 29.9% Proves authority means nothing in isolation
YouTube 100 2.4% Minimal indexable text

What You Can Actually Control

Here's where the finding becomes useful rather than just demoralizing.

Once you accept that Bing rank is the primary gate, the question shifts: what can you do to both rank on Bing and extract maximum citation rate from that ranking? The AirOps data gives clear answers on the content layer.

Heading match to the query

This is the strongest content signal after rank. The best-performing pages match their primary heading closely to the original query. The study found a monotonic increase from 30% citation rate at low similarity to 41% at high similarity. At top retrieval ranks (0–2), that gap widens to 19 percentage points. Write your H1 as the exact question the user asked. Not a creative paraphrase. The actual question.

500–2,000 words

Pages in this range outperform pages over 5,000 words — which also underperform pages under 500. Longer isn't better. Focused is better. This contradicts years of "comprehensive content" guidance that the SEO industry has produced.

4–10 subheadings

For articles, this range is optimal. More than ten dilutes the signal. Fewer than four underperforms. The data finds that 7–20 H2–H4 subheadings overall is the optimal range across content types, but for article formats specifically, the 4–10 range holds.

JSON-LD schema

This is the one purely technical lever with a measurable independent effect: +6.5 percentage points, controlling for word count, headings, domain authority, and query match. FAQPage and BreadcrumbList are the top-performing schema types in the data. MedicalWebPage also performed strongly in health verticals.

College-level writing

The report finds Flesch-Kincaid grade 16–17 optimal, with a citation rate of 35.9% at that level. Not simplified, not dumbed down. Precise and substantive.

Moderate subtopic coverage, not exhaustive

Pages covering 26–50% of fan-out subtopics outperform pages that cover 100%. Exhaustive breadth appears to dilute the primary heading signal. This is counterintuitive if you've been following the "comprehensive content" playbook — but the data is consistent across multiple similarity thresholds.


What Doesn't Work

The report is equally clear about what has no effect, or a negative one.

Backlinks and domain authority are effectively irrelevant to citation rate. The data shows no positive correlation; if anything, the slightly inverse relationship suggests that high-authority sites have content formats that are systematically less citation-friendly — marketplaces with excessive heading counts, news sites, YouTube with minimal indexable text.

Long pillar pages don't perform. Pages over 5,000 words underperform pages under 500 words. If you've been investing in exhaustive definitive guides as your AI citation strategy, that effort is misdirected by the data.

Exhaustive subtopic coverage — matching every possible fan-out sub-query — adds noise without adding citation lift. The system rewards depth on the primary question, not breadth across all possible related questions. The Wikipedia exception (59.2% citation rate, median Bing rank 24) is a function of extreme content density — 4,383 average words, 31 lists per page, 6.6 tables per page — combined with a trust signal that is not replicable for most brands.


The Layer That Closes the Loop

There is one significant limitation to this framework: it tells you what to optimize, but not whether it's working.

Even if you execute all of this correctly — you rank on Bing, your H1 matches the query precisely, your schema is implemented, your word count is in range — you still don't know which specific prompts are triggering your content to appear, what ChatGPT says about you when it does appear, or which prompts your competitors are winning that you're losing.

The AirOps data shows that 58% of pages are never cited across three runs, and only 2.3% of page-query combinations are cited consistently in all three runs. Citation is volatile. The same page can be retrieved and ignored in one run, then cited in the next. Without monitoring, you're optimizing toward a target you can't observe.

This is the gap that matters. ChatGPT citation is more tractable than people assume — the signals are real, measurable, and mostly familiar to anyone who has done SEO work. But knowing your Bing rank and your schema status doesn't tell you what the AI says about your brand, which prompts you're winning, or where your narrative breaks down. Monitoring is the layer that closes that loop.

Source: AirOps, "The Fan-Out Effect: What Happens Between a Query and a Citation"