What is AEO, and what do AI engines actually cite?
The complete research behind the video and article: our original 270-answer citation study, plus an adversarially-verified review of the external literature — every claim tied to its source.
270
AI answers measured
28
External sources read
22
Claims verified (of 124)
3
Claims refuted
00 · Executive summary
What we did, and what we found.
Two independent bodies of research sit behind this project. Part A is an original measurement study: we asked 3 AI engines the same 30 buyer questions, 3 times each, and recorded every source cited — 270 answers in total. Part B is an adversarially-verified review of the public literature: 28 sources, 124 candidate claims, narrowed to 22 that survived independent 3-vote fact-checking.
The engines cite almost entirely different sources — ChatGPT, Gemini, and AI Overviews overlap on only 9–14% of cited domains. AI visibility is fragmented per engine.
Cite rates diverge sharply — AI Overviews cites a source on 100% of answers, ChatGPT 81%, Gemini just 58%.
Reddit is the #1 cited domain on every engine, but the engines otherwise split: ChatGPT leans editorial/review sites; AI Overviews leans community + video.
The proven content levers (from the peer-reviewed GEO paper) are citing sources, adding quotations, and adding statistics. Keyword-stuffing measurably backfires.
Part A
The original citation study.
A first-party measurement of what three AI answer engines cite when asked real buying questions — designed to be reproducible and sliceable.
A.1 · Method
How the study was run
Engines (3): ChatGPT, Google Gemini, and Google AI Overviews — the three most-used AI answer surfaces. (Perplexity and Claude were scoped out: lower usage and, for Claude, 8.2% share of AI-chat visits.)
Prompts (30): real buyer questions spanning 6 verticals × 5 query types (see A.2). A deliberate blend of generic ("best protein powder") and brand-specific ("Purple vs Casper").
Runs (3 per prompt/engine): AI answers vary run-to-run, so each prompt was asked 3× per engine and results aggregated — 90 samples per engine, 270 total.
Measurement: for each answer we recorded every cited domain (the sources named in the answer, distinct from the broader retrieval pool). Location US, language EN, captured 16 Jun 2026.
Taxonomy: each domain was bucketed (forum/UGC, encyclopedia, video, review/comparison, social, news/media, brand/other). Heuristic — raw domains are retained so the classification can be revised.
Two outcomes we measure — "citation" vs. "mention"
A citation (linked source) — the engine attaches a clickable link to a webpage it drew from. The publisher earns the attribution and the referral click. We report this as the link rate; the per-domain counts in this report are citations in this sense.
A mention — the engine names a brand or product in the answer, recommending it to the buyer, whether or not a source is linked. The brand earns the visibility. We report this as the mention rate.
They come apart: an engine can say "use HubSpot" with no link at all. A publisher wants to be a cited source (the click); a brand wants to be mentioned (the recommendation). Because bare "cited" is ambiguous, we keep the two separate throughout.
Reproducibility
Run via the same engine endpoints the LLMRanks platform uses in production; 270/270 cells returned cleanly (0 errors). Raw per-answer data retained.
A.2 · Instrument
The 30 prompts
6 verticals × 5 query types · US / EN
Vertical
best [X]
[A] vs [B]
how to
best [X] for…
worth it / alt
B2B software
best CRM for small business
HubSpot vs Salesforce
choose project-mgmt software
email platform for ecommerce
Mailchimp alternatives
Consumer tech
noise-cancelling headphones
iPhone 16 vs Galaxy S25
pick a robot vacuum
laptop for video editing <$1500
is the Dyson V15 worth it
Health/wellness
best protein powder
creatine vs pre-workout
start strength training at home
fitness tracker for runners
is AG1 worth the money
Personal finance
best budgeting app
Roth vs traditional IRA
start investing with $1000
travel credit card for cashback
is YNAB worth it
Home/lifestyle
mattress for back pain
Purple vs Casper
choose a standing desk
robot mower for a large yard
Peloton alternatives
Travel/DTC
best carry-on luggage
Away vs Monos
find cheap flights
travel insurance for a family
is Allbirds worth it
Full citation matrix — every prompt × engine
Each cell = the number of distinct domains that engine cited for that prompt (3 runs combined), with the single most-frequent one named. Gemini's blank cells and AI Overviews' spikes are both visible at a glance.
All 30 prompts · distinct domains cited per engine
ChatGPT
Gemini
AI Overviews
B2B Software
bestWhat's the best CRM for a small business?
10techradar.com
0no cite
10reddit.com
vsHubSpot vs Salesforce — which should I choose?
6sasanova.com
5resonatehq.com
6reddit.com
howtoHow do I choose project management software for a remote team?
8softabase.com
9slack.com
3asana.com
usecaseWhat's the best email marketing platform for ecommerce?
5ecomstacksolutions.com
8emailtooltester.com
6moosend.com
altWhat are the best alternatives to Mailchimp?
13activecampaign.com
8emailtooltester.com
5zapier.com
Consumer Tech
bestWhat are the best noise-cancelling headphones?
3rtings.com
8techradar.com
5youtube.com
vsiPhone 16 vs Samsung Galaxy S25 — which is better?
3techradar.com
9phonebot.com.au
4youtube.com
howtoHow do I pick a robot vacuum?
7bestrobovacuums.com
0no cite
5youtube.com
usecaseWhat's the best laptop for video editing under $1500?
7techradar.com
3gagadget.com
1google.com
altIs the Dyson V15 worth it?
7rtings.com
7purewow.com
4reddit.com
Health & Wellness
bestWhat's the best protein powder?
5verywellhealth.com
3forbes.com
13reddit.com
vsCreatine vs pre-workout for beginners?
2preworkoutsups.com
0no cite
7reddit.com
howtoHow do I start strength training at home?
2healthline.com
0no cite
5health.ucdavis.edu
usecaseWhat's the best fitness tracker for runners?
4techradar.com
4runnersworld.com
2google.com
altIs AG1 worth the money?
8healthline.com
0no cite
14reddit.com
Personal Finance
bestWhat's the best budgeting app?
5appstested.com
8kiplinger.com
7reddit.com
vsRoth IRA vs traditional IRA?
2fidelity.com
4farther.com
26startengine.com
howtoHow do I start investing with $1000?
8richmoneyflow.com
7friendsthatinvest.com
3youtube.com
usecaseWhat's the best travel credit card for cashback?
3financepedia.us
4forbes.com
5reddit.com
altIs YNAB worth it?
8senticmoney.com
7ynab.com
5reddit.com
Home & Lifestyle
bestWhat's the best mattress for back pain?
3sleepfoundation.org
5theguardian.com
8reddit.com
vsPurple vs Casper mattress?
3casper.com
5sleepopolis.com
6reddit.com
howtoHow do I choose a standing desk?
9maplin.co.uk
0no cite
6youtube.com
usecaseWhat's the best robot lawn mower for a large yard?
6therobowire.com
14navimow.segway.com
1google.com
altWhat are the best alternatives to a Peloton bike?
1homefitnesslab.com
6cnet.com
2reddit.com
Travel & DTC
bestWhat's the best carry-on luggage?
6forbes.com
5forbes.com
25nbcnews.com
vsAway vs Monos luggage?
3goodhousekeeping.com
6rd.com
4google.com
howtoHow do I find cheap flights?
9forbes.com
4moneysavingexpert.com
11reddit.com
usecaseWhat's the best travel insurance for a family trip?
6haznos.org
9explorewitherin.com
5usnews.com
altIs Allbirds worth it?
9trustpilot.com
7neverendingvoyage.com
3google.com
Cited domains (3 runs combined):01–23–45–67+
A.3 · Results
Per-engine profile
270 answers · 90 per engine · cited domains
Engine
Cite rate
Distinct domains
Top-10 share
Editorial (review+news)
UGC+video
ChatGPT
81%
134
30%
20%
7%
Gemini
58%
123
27%
15%
9%
AI Overviews
100%
144
35%
10%
19%
Cite rate — % of answers citing ≥1 source
AI Overviews
100%
ChatGPT
81%
Gemini · consumer app
58%
Gemini · direct API
73%
Cross-validation — a second, independent method
Because "Gemini cites least" is a headline finding, we re-ran it a different way: querying Gemini's developer API with forced google_search grounding (n=30, 0 errors). It cited on 73% of answers — higher than the consumer app (gemini.google.com surfaces fewer sources to users than the model actually grounds on), but still the lowest of the three engines, and even with a live search tool Gemini cited nothing on ~27% of buyer questions. The finding holds under both methods.
Mentions vs links — being named ≠ being linked
A linked citation and a brand mention are different outcomes — an engine can recommend a brand by name without linking any source. A companion run (270 cells) captured the full answer text and extracted the brands each engine names, alongside the links:
Companion run · 270 answers · 90 per engine
Engine
Links a source
Names a brand
Recommends, no link
Avg brands / answer
AI Overviews
99%
97%
0%
7.1
ChatGPT
83%
96%
14%
5.2
Gemini
53%
82%
29%
5.7
Every engine names brands more than it links them, and Gemini's gap is the widest — it recommends a brand by name on 82% of answers but links a source on only ~53%, so ~1 in 3 Gemini answers name a brand with zero links. (Link rates here corroborate the headline study — AIO 100 / ChatGPT 81 / Gemini 58 — within run-to-run variance.) This refines "Gemini cites least": it gives the fewest links, but still recommends — visibility without attribution. The implication for AEO: on Gemini, being named (in-model knowledge / grounding) is a different and arguably more valuable target than being linked.
Reddit ranks #1 on all three engines. Cross-engine totals: reddit.com (92) · youtube.com (46) · forbes.com (32) · google.com (29) · techradar.com (18).
Source type by query type
% of citations, all engines · "brand/other" = vendor & long-tail pages
Query type
brand/other
forum/UGC
video
review
news
best
64
8
5
10
11
vs
78
7
4
6
4
how-to
64
11
11
4
5
use-case
80
9
1
7
4
alternatives
66
12
2
6
12
How-to questions spike on video (people want to be shown); best/alternatives pull the most news/media (editorial listicles); vs/use-case lean hardest on specific brand pages.
Source type by vertical
% of citations, all engines
Vertical
brand/other
review
news
forum
video
consumer-tech
54
25
4
8
9
health-wellness
64
1
15
9
6
travel-dtc
69
1
17
7
1
b2b-software
72
6
2
14
6
personal-finance
76
7
4
9
3
home-lifestyle
79
6
1
9
5
Concentration & the long tail
Citations are not concentrated in a few big aggregators. The top-10 domains account for only 27–35% of each engine's citations; the remaining two-thirds spread across 123–144 distinct domains per engine, with 64–80% going to specific vendor / niche "brand/other" pages.
A.4 · Limitations
What this study does and doesn't claim
Snapshot in time — captured 16 Jun 2026, US/EN. AI citation behavior shifts quickly; treat absolute shares as a point-in-time reading.
Commercial-query bias by design — these are buyer/transactional questions, so the picture differs from informational queries (e.g. Wikipedia, prominent in mixed-query studies, barely appears here).
Heuristic taxonomy — "brand/other" lumps official sites, product pages, and long-tail blogs; raw domains are retained for re-classification.
"Cited," not "retrieved" — we measured the sources named in the answer, not the wider pool an engine may have read.
Self-references aren't filtered — on AI Overviews, google.com (Shopping/Flights) and youtube.com are Google's own surfaces, not independent publishers; they lift AI Overviews' domain counts and aren't actionable AEO targets (flagged in A.5).
30 prompts × 3 runs — robust for direction and ranking, not a census; per-cell counts are small.
A.5 · The full ranking
Every domain and brand, ranked
The two tables aggregate the entire dataset — every linked source (citation) and every named brand (mention) across all 270 answers, three runs combined. The head is short and the tail is very long. The complete, unabridged lists ship as raw JSON alongside this report — citation-study-data.json and mention-study-data.json — so you can run your own analysis.
Most-cited domains — who AI engines link
Citations = cells linking the domain · 90 cells per engine · top 20 of 339
#
Domain
ChatGPT
Gemini
AI Ovw
Total
1
reddit.com
14
17
61
92
2
youtube.com
1
4
41
46
3
forbes.com
11
9
12
32
4
google.com
0
1
28
29
5
techradar.com
12
3
3
18
6
nerdwallet.com
2
0
12
14
7
healthline.com
7
0
6
13
8
goodhousekeeping.com
5
5
0
10
9
mattressnerd.com
1
3
6
10
10
zapier.com
0
3
6
9
11
rtings.com
3
4
2
9
12
cnet.com
0
6
3
9
13
instagram.com
0
0
9
9
14
pcmag.com
0
4
3
7
15
businessinsider.com
0
4
3
7
16
amazon.com
0
0
6
6
17
cleanmyspace.com
0
3
3
6
18
sleepfoundation.org
3
0
3
6
19
slack.com
2
3
0
5
20
popularmechanics.com
3
2
0
5
339 distinct domains, 987 total citations — but long-tailed: 44% were cited only once, and the top-10 draw barely 28% of all citations. Reddit (92) is cited about twice as often as #2 (youtube.com, 46); AI Overviews drives the overwhelming share of the Reddit, YouTube and Google volume.
⚠ Google's own surfaces: google.com (29 — all but one from AI Overviews) and youtube.com (46 — 41 from AI Overviews) are Google-owned. The google.com hits are almost all product-shopping queries — Google's AI linking its own Shopping / Flights surfaces, i.e. citing itself. Counted as-is, but these aren't third-party AEO targets: you reach them via a Shopping/Merchant feed or a YouTube video, not by publishing citable content.
Most-named brands — who AI engines recommend
Mentions = cells naming the brand · 90 cells per engine · top 20 of 606
#
Brand
ChatGPT
Gemini
AI Ovw
Total
1
Apple
12
17
8
37
2
HubSpot
7
6
9
22
3
YNAB
6
6
6
18
4
Away
6
6
6
18
5
Mailchimp
6
5
6
17
6
Klaviyo
6
6
5
17
7
Shopify
6
7
3
16
8
Salesforce
5
6
3
14
9
Brevo
4
4
6
14
10
ActiveCampaign
4
4
6
14
11
Fidelity
5
3
5
13
12
Zoho
4
3
5
12
13
WooCommerce
5
5
2
12
14
Omnisend
4
3
5
12
15
MailerLite
3
4
4
11
16
Slack
3
5
2
10
17
Monarch Money
3
4
3
10
18
Monos
3
4
3
10
19
Pipedrive
3
3
3
9
20
Trello
3
3
3
9
606 distinct brands, 1,613 total mentions, with 54% named only once — recommendations spread even wider than source citations. Strikingly, the most-named brands split evenly across all three engines (YNAB and Away are 6/6/6; Mailchimp 6/5/6): so while the engines cite wildly different sources (9–14% overlap, see A.3), they broadly agree on which brands to recommend.
Reading the two tables together
They are different units: a citation's value accrues to a publisher (reddit.com, forbes.com); a mention's accrues to a product (HubSpot, YNAB). A brand can be widely named without its own site being cited — the mention-vs-link gap quantified in A.3.
Part B
The background literature research.
An adversarially-verified review of what's publicly known about AEO — so every external claim in the video and article is independently checked, not repeated on faith.
B.1 · Method
Multi-source sweep → 3-vote verification
The question was decomposed into 6 angles (definition/origin · the academic GEO paper · empirical citation data · citability levers · market growth · skeptical/contrarian). Sources were gathered per angle, key falsifiable claims extracted, then each ranked claim was put through independent 3-vote adversarial verification — voters instructed to refute; a claim needed a majority to survive.
Sources fetched
28
Falsifiable claims extracted
124
Claims sent to verification
25
Confirmed
22
Refuted / killed
3
Merged into findings
9
B.2 · Verified findings
What survived fact-checking
High confidencevote 3–0
No one definitively coined the term "AEO." The most-repeated origin story credits Jason Barnard / Kalicube with pioneering it — but that's a self-reported claim, and even his own write-ups disagree on the year. The practice is real; the "who invented it" story isn't settled.
Why it matters: don't state "X coined AEO" as fact. And note that in practice people use "AEO" and "GEO" interchangeably — only GEO has a rigorous academic definition (next finding).
The GEO paper's headline result: the right content changes can boost how prominently a source appears inside an AI answer by up to ~40% in the best case. The three strongest levers were citing your sources (≈+28%), adding direct quotations (≈+41%), and adding original statistics (≈+34%).
Hedge (mandatory): "up to 40%" is best-case, relative, on a citation-prominence proxy metric, measured on a 2023–24 benchmark; the one live-engine validation (Perplexity) was ~22%. Present as direction-of-effect, not a guarantee.
The largest disclosed external dataset: Semrush analyzed ~230,000 prompts and 100M+ AI citations across ChatGPT Search, Google AI Mode, and Perplexity over 13 weeks (Jul–Oct 2025). (Vendor-published; Gemini not in scope.)
Why it matters: credible scale, but a dated snapshot — and the gap it leaves on Gemini is exactly what our Part A study fills.
How much an engine's citations overlap with Google's own top-10 organic results varies enormously. Perplexity tracks Google closely — >91% of the domains it cites also rank in the top 10 (82% at the exact-URL level); Google's AI Overviews ~86% domain / ~67% URL; its newer AI Mode ~54% / ~35%. ChatGPT was the clear outlier: Semrush measured it with the weakest overlap of any engine — lowest on both domain and URL, correlating more with Bing than Google — and an independent Ahrefs study found just ~10% of the exact pages ChatGPT cites also rank in Google's top 10 (≈32% at the domain level).
Why it matters: "rank #1 on Google = cited by ChatGPT" is simply false — and two independent datasets agree. Our Part A data pushes it further: the engines also diverge sharply from each other. (Semrush reports domain/URL overlap across 5,000 keywords but gives no ChatGPT figure — only that it ranked weakest; Ahrefs separately measured ChatGPT on 3,311 head terms — ~10% page-level / ~32% domain-level overlap, Sept 2025. ChatGPT cites the right site far more often than the exact ranking page.)
The GEO paper formally coins "generative engines" and frames them as "rapidly replacing" traditional search (3–0); Barnard frames AEO as a discipline distinct from SEO — optimizing for machine understanding and credibility (2–1, attributed viewpoint).
B.3 · Refuted — shown for transparency
Claims that did not survive
Refutedvote 0–3
GEO's "40% lift" was achieved on deployed commercial engines.
Reality: it was a benchmark / proxy result, not a live-engine measurement. Do not state it as a real-world commercial figure.
Refutedvote 0–3
The GEO paper is a Princeton-only project.
Reality: omits IIT Delhi and independent co-authors. Attribute to Princeton + IIT Delhi.
Refutedvote 1–2
Google AI Mode cites LinkedIn ~15% (top) and Wikipedia only ~2%.
Reality: failed verification — excluded from all deliverables.
B.4 · Market context
The "why now" numbers
Verified — gap-fill pass, 2026-06-16
Each figure was confirmed in a targeted verification pass (primary source + a corroborating second source). They are industry-reported by the named orgs and date-stamped — directional "why now" context, not peer-reviewed measurement.
Metric
Reported value
Source
ChatGPT scale
800M+ weekly active users (Oct 2025); 900M cited by early 2026
Time-sensitivity: every empirical citation statistic is a dated snapshot of a fast-moving field — cite with dates.
Source tiers: the GEO-paper facts are peer-reviewed (gold); the citation-behavior data (Semrush, Ahrefs, Profound) is vendor-published (credible, commercial); the AEO-origin material is self-published first-party (use as "X claims").
The GEO 40% must be hedged — best-case, relative, proxy-metric, benchmark-based.
AEO ≈ GEO in practice; any "AEO vs GEO" distinction is editorial framing, not established fact.
Open gap: schema / llms.txt impact appears overstated in early sources, but that specific claim is pending the verification pass before it ships.
Bibliography
Sources (28)
Tiered by quality. primary peer-reviewed / institutional · secondary reputable reporting · first-party self-published (use as attributed claim) · weak excluded from claims.
Jason Barnard — "The Trustpilot white-paper that started AEO" first-party jasonbarnard.com
Every external claim was cross-examined by independent verifiers; only the 22 that survived appear in the deliverables, and the 3 that failed are listed above in full. That's the standard for credibility — and it's why the article that comes out of this is itself built to be cited.