llmranks.io
Blog

teardownaeo

How we made llmranks.io itself AI-citable

We pointed our own audit engine at llmranks.io — 30 canonical mismatches, copy hidden from Google's renderer, stale AI-facing pricing. Every fix, measured.

We sell citation-readiness. So before opening the platform to the public, we did the only honest thing available: we pointed our own audit engine at our own marketing site and published what it found.

This is that teardown — every number below was measured on llmranks.io in June 2026, before and after the fixes. If a company selling AI visibility had these problems, your site probably has a few of them too.

What our own audit caught

The crawl covered all 31 public pages, fetched the way an AI crawler fetches them: raw HTML, no JavaScript. The product's diagnose checks ran against the result, plus a set of marketing-surface checks we ended up building because of what we found.

The canonical-host contradiction. Every canonical tag, sitemap entry, and JSON-LD @id on the site declared llmranks.io as the official address — while the server 307-redirected that exact address to www.llmranks.io. Thirty pages telling search engines "the real me is over there," with "over there" bouncing visitors straight back. One domain-config change fixed all thirty at once.

Metadata written without counting the template. Our layout appends · LLMRanks — eleven characters — to every title. The titles were written without budgeting for it: 17 ran past Google's ~60-character truncation point, while 9 others (including the pricing page, the single highest-intent line of text we own) were under 20 characters and said almost nothing. Meta descriptions were worse: 26 of 31 fell outside the 120–160 character window, topping out at 304 characters on a page where searchers would see barely half of it.

Structured data pointing at pages that don't exist. The breadcrumb markup on our comparison pages pointed to a parent page that 404'd — and three of those pages emitted breadcrumb self-URLs that 404'd too, because the markup was built from a marketing slug instead of the route path. We were feeding crawlers structured data whose URLs failed the most basic test: clicking them.

The file written for AI engines was lying about our prices. We publish an llms.txt — a plain-text summary designed specifically for LLMs to read. It was a static file, and it had quietly survived a pricing pivot: for about a month, the one document aimed directly at AI engines described a tier ladder we no longer sold. Nothing on the visible site was wrong. Only the machine-facing copy was.

Assorted self-inflicted wounds. No og:image anywhere, so every shared link rendered as a bare gray card. An organization-schema logo URL that 404'd. A Content-Security-Policy that silently blocked a chunk of our own analytics beacons. And a hydration error on the pricing page traced to a <tbody> nested inside a <tbody> — invalid HTML the browser rewrote, which React then refused to reconcile.

The part no crawler test catches

Here's the finding we think most marketing sites share without knowing it.

LLM crawlers don't execute JavaScript — they read raw HTML, and ours was complete. Fetching our pages as an AI crawler returned 1,800–3,800 words per page with every claim intact. For ChatGPT, Claude, and Perplexity, the site was always fully readable.

Google is different: it renders pages with JavaScript and takes what amounts to a visual snapshot. Our homepage sections animated in on scroll — opacity: 0 until they entered the viewport. In a render where nobody scrolls, that meant 52% of the homepage's content words were invisible at a standard viewport. Even at the very tall viewport Google's renderer uses, 22% of the copy never triggered its animation, because the page is 17,000 pixels tall.

The measurement is simple enough to reproduce: walk the DOM's text nodes, skip scripts and styles, and count any text whose ancestors compute to an opacity below 0.05 as hidden. We ran it before and after.

The fix did not mean deleting the animations. Fade-ins now start at 25% opacity instead of zero and trigger 400 pixels before an element enters view. To a human the motion looks the same; to Google's snapshot, nothing is hidden anymore. After deploying: 0% hidden, with the only remaining exception being FAQ accordions — which Google explicitly treats as indexed content, and which our FAQPage markup mirrors anyway.

Before and after

CheckBeforeAfter
Pages with canonical/serving-host mismatch300
Meta descriptions outside 120–160 chars26 of 310
Titles outside the 60-char budget26 of 310
Broken breadcrumb target URLs80
Homepage copy hidden in Google's render52%0%
Pages with a social share image0all
llms.txt accuracy~1 month stalerenders from billing code

Making it stay fixed

Finding problems once is an audit. Keeping them fixed is architecture. Three changes:

  1. Facts now have one source. Prices, the engine lineup, CMS integrations, and credit costs derive from the same constants that drive billing. The pages, the FAQ, the schema offers, and llms.txt all render from it — llms.txt is no longer a file at all, but a route that cannot disagree with what the product charges.
  2. The audit runs on every deploy. A CI job re-crawls the live site after each production deployment and fails loudly on metadata-budget violations, canonical mismatches, broken breadcrumb targets, duplicate titles, and routes missing from the sitemap. New pages are covered the moment they exist.
  3. The boring discipline. Every competitor claim on our comparison pages now carries an as-of date and a source URL, because we re-verified each one against the competitor's live pricing page and found one of them mid-promotion — a claim that would have read as false the day we shipped it.

Five checks you can run on your own site today

  1. Fetch a key page the way an LLM does. curl it with an AI crawler user-agent, strip the tags, and count the words. If the number is near zero, AI engines cannot cite you, full stop.
  2. Compare your canonical host to your serving host. Request both the apex and www versions of your domain and watch the redirects. Then look at which one your canonical tags name. They should agree.
  3. Count your titles with the template suffix included. Whatever your layout appends, it counts against the same ~60 characters.
  4. Fetch every URL in your structured data. Breadcrumb items, schema logos, sitemap entries. Anything that 404s is an anti-signal you wrote yourself.
  5. Measure what's visible without scrolling. If your sections animate from opacity: 0, check what fraction of your copy is hidden in a render that never scrolls — that render is closer to Google's view than your own browsing is.

Or skip the manual work: the free AI visibility check runs your domain through three engines in about thirty seconds, no account needed. It's a bounded slice of the same pipeline this teardown came from — which is, of course, the point.

How we made llmranks.io itself AI-citable · LLMRanks