v0.2.0 benchmark · July 1, 2026 · 5 arms, 25 runs

llms.txt saves an agent real tokens. But it won't read it unless you ask.

Page-side links: 0 / 20 reps used llms.txt
One line in the prompt: 5 / 5 reps used it
Tokens, with the nudge: −33% on the same page

Summary

Ship the discovery layer, then tell the agent to use it.

Publishing an llms.txt is increasingly common, though far from universal: in our survey of the CrUX top-100k, only about 1 in 4 sites (27% of the 50,074 we reached) ship one. The harder part is getting an AI agent to actually read the file. In an earlier a14y benchmark, a clean discovery layer (an llms.txt index plus markdown mirrors) cut an agent's token use roughly in half at the same answer quality. In a new benchmark, the agent captured that saving only when we told it to look. Not one of the in-page signals we tried (a <head> tag, a visible "For agents" footer link) got it to use llms.txt on its own. A single line in the prompt did.

← All research

The payoff is real

Let's start with the good news. In our earlier discovery-layer study, we ran the same evaluation task twice against the same site: once with the agent-discovery files served, once with them missing. Same content, same prompt, same model. The discovery layer cut the cost about in half.

Condition	Tokens (mean)	Tool calls	Answer quality (judge)
Discovery layer present	~123K	~10	tied
Discovery layer absent	~240K	~20	tied

Roughly 2x fewer tokens, about half the tool calls, no measurable drop in quality. When an agent can pull a tight markdown index instead of crawling and re-parsing a pile of HTML, it does less work to reach the same answer.

But the agent won't take the payoff on its own

Five arms, identical content and identical discovery files (all returning 200). The only thing that changed was how the page points at /llms.txt. Claude 2.1.141, isolated Docker, five reps each, adoption read from the web server's request log.

Arm	Signal	Fetched /llms.txt	Tokens (avg ± sd)	Tools	Page GETs	Judge	Pass
no-link (control)	none	0 / 5	266,591 ± 32,066	14.6	59	75.2 ± 6.8	80%
head link	`<link rel="llms-txt">`	0 / 5	257,743 ± 53,953	13.2	47	72.4 ± 8.3	80%
head link	`<link rel="alternate" type="text/markdown">`	0 / 5	254,641 ± 47,968	14.4	65	80.4 ± 5.5	100%
visible footer link	`"For agents" block`	0 / 5	251,741 ± 38,857	12.8	48	76.0 ± 3.3	100%
nudge	`prompt: "check /llms.txt first"`	5 / 5 (+4/5 sitemap.md)	177,735 ± 32,131	17.0	91	88.8 ± 1.6	100%

We tried three genuinely different ways to point the page at /llms.txt: a machine-readable <link rel="llms-txt"> in the head, a standards-style <link rel="alternate" type="text/markdown">, and a plain visible "For agents" footer link a human would click. The agent ignored all three. The four page arms cluster tightly, 252K to 267K tokens and 0 of 5 adoption, with the spread between link styles sitting well inside the noise (token sd of 32K to 54K, judge sd of 3 to 8). The agent received the bytes (it fetched the homepage, head tags and all) and acted on none of them.

It's not that we picked the wrong link format. Every link format failed. The lever is the instruction, not the markup. We changed one line of the prompt and left the page exactly as it was, with no link at all. Adoption went to five out of five (and four of five also pulled sitemap.md), and tokens dropped from 266,591 to 177,735. That is 88,856 fewer tokens, a 33% reduction on the same page, attributable purely to the agent using a layer that was already there in every arm. Sitting next to the earlier ~2x, the two figures bracket the opportunity: the prior study measured having the layer at all, this one measures actually using it, isolated to a single line of prompt.

A few details from the run that sharpen the picture:

The cost is ingestion, not generation. Output tokens were tiny across every arm, roughly 5K to 9K per rep. Almost the entire bill is input plus cache, the cost of reading the site in. That is exactly the cost the discovery layer cuts, which is why "use the index" shows up as a token drop and not a quality drop.
More fetches, far less content. The nudge arm actually made more requests than the control (91 page GETs versus 59), yet used fewer tokens. It pulled the lean .md mirrors (for example /scorecards/0.2.0/checks/llms-txt.exists.md) instead of full HTML pages. More round trips, a fraction of the bytes each.
Quality went up, not down. The nudge arm posted the highest judge score (88.8) and the tightest spread (± 1.6) of any arm, at 100% pass. Using the structured index made the answers cheaper and more consistent.
Clean measurement. Zero requests hit the live a14y.dev in any arm. Adoption was read from the static server's log, filtered to the agent container's IP, so there is no contamination from background traffic. (a14y.dev's own copy documents the llms-txt.exists check, so this was a friendly case for content-based discovery; adoption was still 0 of 5 without the nudge.)

This is not just a Claude quirk

It would be easy to write this off as one model's behavior on one site. It is not. When we went looking for whether any major agent or crawler consumes llms.txt in the wild, the evidence pointed the same direction.

The cleanest distinction to hold onto is publish versus consume. Almost everyone publishes. Cursor serves an llms.txt for its own docs. Anthropic publishes one. Every site built on Mintlify gets an llms.txt (and an llms-full.txt) automatically, zero maintenance, which is how a whole category of docs sites gained one at once. That is a real and healthy trend. But publishing a file is not the same as an agent reading one, and on the consume side the record is close to empty.

Ahrefs analyzed 137,210 domains. Of the roughly 38,000 with a valid llms.txt, 97% received zero requests for that file in the month studied. Of the requests that did land, most were ordinary bots, not AI agents. No AI bot went looking for files that did not exist.
Google's John Mueller has said, more than once, that no AI system currently uses llms.txt, that you can see in your server logs the bots do not even check for it, and that it is "comparable to the keywords meta tag." When llms.txt files showed up on Google's own developer properties, he said plainly it was a CMS artifact, not an endorsement.
Google's official generative-AI search guidance now states you do not need to create llms.txt or any special machine-readable files to appear in AI search, and that Search ignores them: keeping one is fine, it just "won't harm (nor help)" rankings.
Independent server-log audits (Adobe AEM, OtterlyAI) put verified LLM fetches of llms.txt at roughly 0.1% to 1% of traffic to those files. Most of the rest is scanners and auditors.

And the tools people assume are quietly reading it? For GitHub Copilot, Windsurf, Cline, Aider, Continue, Devin, ChatGPT's browsing and Atlas modes, and Gemini's agent, we found no public statement either way (as of this research, mid-2026). Silence is not adoption. We also went in expecting to find that Claude Code would fetch llms.txt when linked from the page. The hope did not survive verification. Claude Code can show up in server logs fetching llms.txt, but only when a user or a prompt points it there, not on its own. We don't have any evidence that any major agent fetches llms.txt by default.

What to actually do with this

The wrong conclusion here would be "don't bother." That is not what the data says: the value of the discovery layer is real and significant, and it lands whenever you control the agent or its instructions. Three situations where that is true today.

Your own agents

If you build a docs assistant, a support bot, an MCP server, or an internal coding agent that reads your site, you write the prompt. Tell it to start from /llms.txt. This data says that a single instruction is worth roughly a third of the tokens on the spot, and up to half against a no-discovery baseline.

Agentic tooling you can configure

A growing set of harnesses let the user or the project point the agent at a markdown index. The payoff shows up the moment the pointer exists. Notably, Chrome's experimental "agentic browsing" audit in Lighthouse already checks for llms.txt, a hint that the consume side may start catching up to the publish side.

Future-proofing at near-zero cost

Publishing the layer is cheap and mostly static. If and when agents start consuming it by default, you are already done. If they never do, you have lost very little, and your own tooling benefited the whole time.

What the data definitely does not support is treating llms.txt as an SEO move for third-party AI search. Today, the big crawlers mostly ignore it. Publish it because it makes the agents you point at your content dramatically cheaper and more accurate, not because ChatGPT or Gemini will discover it on their own.

Also, llms.txt is not a standard. It is a good idea that caught on, but a better convention may well replace it. What the data backs is narrower and more durable: a clean and lean discovery layer saves real tokens, and the way you capture that saving today is to tell the model to use it. So bet on the two things that hold no matter how the conventions shake out, the token savings and the instruction, rather than on llms.txt being the permanent answer.

Why keep this in the scorecard

At a14y we score sites on agent-readability, and llms.txt is part of that score. This research does not weaken that, it sharpens it. A high score means your site is ready to be read cheaply by an agent. Whether a given agent takes you up on it depends on how that agent was built or prompted, which is exactly the variable we just isolated. We would rather tell you both sides than sell you one.

If you want to see the gap for yourself, run a14y against your docs, then point your own agent at the llms.txt we find (or flag as missing) and watch the token count. That is the experiment that convinced us, and it takes about ten minutes. We would genuinely like to see your numbers, especially if you can get any agent to adopt llms.txt from a page-side link alone.

npx a14y https://example.com

Caveats

Where to read these numbers with care.

Small n, one setting. Five reps per arm (adoption is a low-frequency, near-binary event), a single retrieval prompt, a single model (Claude 2.1.141), and a single site. Treat the adoption pattern as directional, not the last word.
One site's content. Every arm serves the same a14y.dev content; only the linking changes. And because a14y.dev documents its own discovery checks, it is a friendly case for content-based discovery. The relative ordering should generalize; the absolute numbers are tied to this content.
A snapshot of a moving target. Agent behavior and vendor support change fast. The "no public statement" list and the "no agent fetches by default" finding are true as of mid-2026 and worth re-checking over time.

Method notes

The page-side and nudge figures are from the a14y benchmark llms-txt-linking-2026-06-27 (Claude 2.1.141, isolated Docker, n=5 per arm), with adoption read from the static server's request log filtered to the agent container's IP. The ~2x figure is from the earlier discovery-layer case study. The agent reached our test build via in-container HTTP, so it received the full page including the head tags and chose not to act on them. External claims are linked inline. Full data and the reproducible harness live in the a14y benchmark repo.