Changes vs v0.2.0
+7 new 2 bumped −0 removed → methodology View all changes →
- + added discovery.in-page-link Agent files are linked in-page v1.0.0 by @unknown · #0
- + added discovery.no-duplicate-content No URLs share a canonical with another announced URL v1.0.0 by @unknown · #0
- + added html.ssr-content Initial HTML contains substantive text v1.0.0 by @unknown · #0
- + added http.no-interstitial Content is not gated behind a blocking interstitial v1.0.0 by @unknown · #0
- + added markdown.navigation-stripped Markdown mirror has navigation chrome stripped v1.0.0 by @timothyjordan · #34
- + added markdown.size-reduction Markdown mirror is meaningfully smaller than the HTML v1.0.0 by @timothyjordan · #34
- + added markdown.valid-markdown Markdown mirror is actually markdown v1.0.0 by @timothyjordan · #34
- ~ bumped html.json-ld.date-modified JSON-LD declares dateModified v1.0.0 → v1.1.0 by @jesserobbins · #13
- ~ bumped sitemap-xml.has-lastmod sitemap entries include <lastmod> v1.0.0 → v1.1.0 by @jesserobbins · #13
Scorecard v0.3.0-draft
Draft scorecard — subject to change before release. PRs adding or revising checks land here, then this manifest is frozen at cut time. See CONTRIBUTING.md.
Site checks
Evaluated once per audit against the site's origin. These cover discoverability surfaces: llms.txt, robots.txt, sitemaps, and agent skill files.
Discoverability · 16 checks
- llms-txt.exists llms.txt is published Pass if llms.txt or llms-full.txt is reachable at /, /.well-known/, or /docs/.
- llms-txt.content-type llms.txt served as text/plain Pass if the llms.txt response Content-Type starts with text/plain.
- llms-txt.non-empty llms.txt is not empty Pass if the llms.txt body has any non-whitespace content.
- llms-txt.md-extensions llms.txt links use .md or .mdx Pass if every link in llms.txt points at a .md or .mdx URL (the format agents can ingest cleanly).
- robots-txt.exists robots.txt is published Pass if /robots.txt returns a 2xx response.
- robots-txt.allows-ai-bots robots.txt allows AI bots Pass if robots.txt does not disallow GPTBot, ClaudeBot, CCBot, or Google-Extended from fetching the site root.
- robots-txt.allows-llms-txt robots.txt does not disallow llms.txt Pass if /llms.txt and /.well-known/llms.txt are reachable to all user-agents per robots.txt rules.
- sitemap-xml.exists sitemap.xml is published Pass if /sitemap.xml (or sitemap_index.xml / sitemap-index.xml) returns a 2xx response.
- sitemap-xml.valid sitemap.xml parses as urlset or sitemapindex Pass if the sitemap parses as XML and contains <urlset> or <sitemapindex>.
- sitemap-xml.has-lastmod sitemap entries include <lastmod> Pass if every <url> in sitemap.xml has a <lastmod> child whose value parses as a W3C Datetime (the format sitemaps.org requires).
- sitemap-md.exists sitemap.md is published Pass if /sitemap.md, /docs/sitemap.md, or /.well-known/sitemap.md returns a 2xx response.
- sitemap-md.has-structure sitemap.md has headings and links Pass if sitemap.md contains at least one heading and one link.
- agents-md.exists AGENTS.md (or equivalent) is published Pass if any of AGENTS.md, agents.md, .well-known/agents.md, docs/AGENTS.md, llms-full.txt, CLAUDE.md, .cursor/rules, or .cursorrules is reachable.
- agents-md.has-min-sections agent skill file documents at least 2 of install/config/usage Pass if the discovered skill file has heading-level sections matching at least 2 of: installation, configuration, usage/examples.
- discovery.no-duplicate-content No URLs share a canonical with another announced URL Pass if no two crawled URLs collapse to the same canonical. N/A in single-page mode (no cross-page view).
- discovery.in-page-link Agent files are linked in-page Pass if a top-level page (the root URL or a first-level path like /docs) links in-page (in-DOM <a href>) to an agent-discovery file (/llms.txt, /llms-full.txt, /sitemap.md, /AGENTS.md, or the page's .md mirror); warn if only a deeper page does; fail if no crawled page does. N/A in single-page mode.
Page checks
Evaluated for every discovered page. These cover HTTP basics, HTML metadata, structured data, content structure, markdown mirrors, code blocks, and APIs.
HTTP · 5 checks
- http.status-200 Page returns HTTP 200 Pass if the final response status (after redirects) is exactly 200.
- http.redirect-chain Redirect chain is at most 1 hop Pass if the page was reached with 0 or 1 redirect hops.
- http.content-type-html Content-Type is text/html; charset=utf-8 Pass if the response Content-Type is text/html and declares utf-8 charset. N/A on URLs with non-HTML extensions (.md, .json, .xml, etc.).
- http.no-noindex-noai x-robots-tag does not block agents Pass if the x-robots-tag response header does not contain noindex, noai, or noimageai.
- http.no-interstitial Content is not gated behind a blocking interstitial Pass if the initial HTML response does not contain a known consent-platform interstitial (OneTrust, Cookiebot, TrustArc, Sourcepoint, Quantcast) or a top-level open dialog. AI crawlers cannot click "Accept", so a blocking modal hides the content from them entirely.
HTML metadata · 5 checks
- html.canonical-link Has <link rel="canonical"> Pass if the page declares a canonical URL via <link rel="canonical">. N/A on non-HTML responses.
- html.meta-description Has meta description (>= 50 chars) Pass if <meta name="description"> exists and its content is at least 50 characters.
- html.og-title Has og:title Pass if <meta property="og:title"> exists with non-empty content.
- html.og-description Has og:description Pass if <meta property="og:description"> exists with non-empty content.
- html.lang-attribute Root <html> has lang attribute Pass if the <html> element declares a lang attribute.
Structured data · 3 checks
- html.json-ld Has parseable JSON-LD block Pass if the page has at least one <script type="application/ld+json"> block whose content parses as JSON.
- html.json-ld.date-modified JSON-LD declares dateModified Pass if any JSON-LD node on the page contains a dateModified parseable as a schema.org Date or DateTime (ISO 8601 — YYYY-MM-DD or full date-time with timezone designator). Calendar values must be real.
- html.json-ld.breadcrumb JSON-LD declares a BreadcrumbList Pass if any JSON-LD node on the page has @type "BreadcrumbList".
Content structure · 4 checks
- html.headings Has at least 3 section headings Pass if the page contains 3 or more <h1>/<h2>/<h3> headings.
- html.text-ratio Text-to-HTML ratio is above 15% Pass if visible body text takes up more than 15% of the raw HTML byte length.
- html.glossary-link Links to a glossary or terminology page Pass if the page contains an <a> whose text mentions glossary or terminology.
- html.ssr-content Initial HTML contains substantive text Pass if the initial HTML response (no JS executed) carries at least 50 words of visible text after stripping <script>, <style>, <noscript>, and <template>. Agents like Anthropic's, Perplexity's, and OpenAI's SearchBot do not run JS, so an SPA shell that hydrates client-side is invisible to them even when Googlebot can render it.
Markdown mirror · 9 checks
- markdown.mirror-suffix Has .md or .mdx mirror Pass if the corresponding <page>.md or <page>.mdx URL returns 2xx.
- markdown.alternate-link HTML declares <link rel="alternate" type="text/markdown"> Pass if the HTML page advertises a markdown alternate via <link rel="alternate">.
- markdown.frontmatter Markdown mirror has required frontmatter Pass if the markdown mirror has YAML frontmatter declaring title, description, doc_version, and last_updated.
- markdown.canonical-header Markdown mirror sends canonical Link header Pass if the markdown mirror response includes a Link header with rel="canonical".
- markdown.content-negotiation Server returns markdown for Accept: text/markdown Pass if refetching the page URL with Accept: text/markdown returns a text/markdown response.
- markdown.sitemap-section Markdown mirror includes a Sitemap section Pass if the markdown mirror body contains a "## Sitemap" heading.
- markdown.navigation-stripped Markdown mirror has navigation chrome stripped Pass if the markdown mirror body contains no residual <nav>, <header>, <footer>, or <aside> tags.
- markdown.size-reduction Markdown mirror is meaningfully smaller than the HTML Pass if the markdown mirror body is at least 30% smaller than the HTML response for the same URL.
- markdown.valid-markdown Markdown mirror is actually markdown Pass if the markdown mirror body is markdown rather than HTML mis-served with a markdown content type. Fails when the body starts with an HTML prologue or when more than 30% of the body is HTML tag markup.