HBR Citations Carry C-Suite Weight in AI Search. Getting Published Is Harder.

Sanity, Contentful, Strapi, Storyblok, and Payload all promise structured content, but only some produce the entity graph ChatGPT and Perplexity actually cite. The choice is content-modeling discipline, not developer ergonomics.

By Owen McCarthy, Sales Engineering · May 26, 2026 · 16 min read

When the engineering team at a mid-market B2B SaaS company asked us in March 2026 why their citation rate on ChatGPT had dropped 34 percent year-over-year despite a 60 percent increase in published content volume, the answer was sitting in their CMS configuration. They had migrated from WordPress to a headless setup the previous summer, picked the platform their front-end developers liked, and modeled their content as a flat Article type with a free-text body field. Every reference to an author, a customer, a product, or a research study lived inside the body field as plain prose. The new front end rendered beautifully. The schema.org output was empty. The entity graph that the previous WordPress install had accidentally built through Yoast and a decade of plugin sprawl was gone.

The team had picked a tool. They had not picked a content model. And in the answer-engine era, the content model is the product.

Headless CMS adoption has accelerated through 2025 and into 2026, with the MACH Alliance reporting that composable architecture adoption among enterprise marketers has climbed past 60 percent of new content infrastructure projects. The category leaders — Sanity, Contentful, Strapi, Storyblok, and Payload CMS — each take a different posture on content modeling, preview rendering, and multi-channel publishing. Those differences used to be a matter of developer taste. Now they determine how much of your content gets cited by ChatGPT, Claude, Perplexity, Gemini, and the LLM training pipelines that quietly sample the open web every few months.

This is not a feature comparison. There are plenty of those, and most miss the AEO dimension entirely. This is a working-operator's view of which headless CMS choices produce citation-worthy output, which produce schema-poor flat content that AI crawlers cannot parse, and how to retrofit the model if you have already picked the wrong stack.

Why Content Modeling Beats Platform Choice for AEO

The platform debate — Sanity versus Contentful versus the rest — gets disproportionate attention because it is the choice the buying team makes most visibly. The model the team builds inside the platform is the choice that actually determines AEO outcomes. A well-modeled Strapi install will outperform a badly modeled Contentful install on citation rate every time, because answer engines extract entities and relationships, not platforms.

The right mental model is to think of content as a graph. An Article is a node. Its Author is a separate node. Its Topic is a separate node. The Company it discusses is a separate node. Each relationship between nodes is a typed edge — wrote, mentions, isAbout, hasReviewed. The CMS's job is to let editors create and maintain these nodes and edges without writing code. The front end's job is to render them as schema.org JSON-LD when the page is served and as semantic HTML in the body. The AI crawler's job is to ingest both surfaces and reconstruct your graph inside the model's representation of the world.

The five platforms we are comparing all support some version of this pattern. They differ on how easy it is to express it, how much it costs to maintain at scale, and how cleanly the modeled relationships translate into machine-readable output. The hard part is rarely the technology. The hard part is the modeling discipline — choosing to make Author a typed reference rather than a free-text string on every article.

A useful exercise before picking a CMS: take twenty of your highest-traffic articles and identify every entity mentioned in each one — every person, company, product, location, dataset, regulation, methodology. Then count the number of those entities that have a dedicated CMS record versus the number that live inside body text. If the ratio is worse than 1:5, your AEO problem is not your platform. It is your model. We documented the broader move toward entity context in Schema markup dying: the markup-versus-context shift makes content modeling the load-bearing layer, not the markup syntax.

The Five Platforms Compared on Citation-Worthy Output

We evaluated Sanity, Contentful, Strapi, Storyblok, and Payload CMS on five AEO-relevant dimensions: content model expressiveness, reference field semantics, schema.org output paths, draft and preview handling, and multi-channel publishing. The scoring reflects how the platforms behave in practice across roughly sixty B2B implementations our team has audited or built since 2023.

Platform	Content Model Expressiveness	Reference Field Strength	Native Schema.org Output	Draft and Preview AEO Hygiene	Multi-Channel Publishing
Sanity	High — Portable Text, custom block types	Strong — typed references, weak references, GROQ joins	None native; clean to add via front-end render	Strong — preview tokens, environment routing	Strong — GROQ API, webhooks, structured exports
Contentful	Medium-high — content types, fields, links	Strong — typed links, validation rules	Limited native; UI plugins available	Strong — environments, scheduled publishing	Strongest — mature webhooks, partner API ecosystem
Strapi	High — flexible, open-source self-hosted	Medium-strong — relations, custom components	None native; full control to add	Medium — depends on self-hosted config	Medium — REST and GraphQL, custom feeds
Storyblok	Medium — visual editor, block-based components	Medium — multi-option, single-option references	Built-in schema markup plugin	Medium-strong — preview URLs, releases	Strong — content delivery API, webhooks
Payload CMS	High — TypeScript-first, block fields, relationships	Strong — typed relationships, polymorphic refs	None native; clean to add via Next.js	Strong — token-based preview, draft system	Medium — REST, GraphQL, local API

The platforms are closer than category-level marketing makes them appear. None ship native schema.org output as a default for arbitrary content models, which means every team will write some rendering layer. The differences are at the edges: Sanity's Portable Text gives the cleanest separation between content and presentation, Contentful's environments make safe staged rollouts easier, Strapi's open-source posture gives full server-log visibility for AI crawler tracking, Storyblok's visual editor shortens the marketer-to-publish loop, and Payload's TypeScript ergonomics reduce maintenance cost for engineering teams that are already TypeScript-native.

Sanity: The Modeling Maximalist

Sanity's documentation treats the content model as a first-class design surface, and it shows in the output. The schema definitions are JavaScript or TypeScript files that an engineering team commits to version control, which means content model changes go through code review. Typed references are first-class: an Article schema can declare an author field as a reference to a Person document, and the editor UI enforces the type. The query language, GROQ, lets the front end dereference and join arbitrary relationships at render time, which makes producing schema.org JSON-LD with linked entities trivial.

The cost is rendering work. Sanity ships no website. Every team writes a front end — most commonly Next.js — and is responsible for translating the modeled relationships into JSON-LD, semantic HTML, llms.txt, and any other AEO surface. Teams that invest the rendering work get among the cleanest citation footprints we have measured. Teams that skip it produce expensive-to-maintain content with no AEO advantage over a WordPress install.

Contentful: The Enterprise Workhorse

Contentful's API documentation emphasizes the platform's hosted infrastructure and enterprise tooling. Content types, fields, and links are defined through a UI or API, and changes flow through environments — sandbox, staging, master — that mirror traditional software deployment. For enterprise organizations with multiple regional content teams and strict review workflows, this is the path of least operational resistance.

The reference field model is strong: typed links between entries are validated against content types, and the Delivery API returns linked entries inline with includes. The native schema.org story is weak — there are UI extensions and marketplace apps, but most teams write their own rendering layer for JSON-LD. The webhook tooling is the deepest in the category, which makes Contentful the strongest choice for multi-channel publishing into partner feeds, mobile apps, voice skills, and downstream LLM training surface seeding.

Strapi: The Open-Source Self-Hosted Choice

Strapi's content management documentation covers content types, components, relations, and dynamic zones. Strapi is the leading open-source headless CMS by adoption, and its key AEO advantage is self-hosting. When a Strapi install runs on infrastructure the team controls, the team can read every AI crawler hit in the server logs, segment GPTBot from Claude from PerplexityBot, and build the kind of crawler-traffic dashboard that surfaces actual answer-engine ingestion patterns. Hosted platforms abstract this away.

The content modeling story is flexible but less opinionated than Sanity. Relations between content types are first-class, and the components system lets editors compose pages from reusable blocks. The administrator UI is functional rather than polished, and complex content models can become unwieldy without disciplined naming. Strapi is the strongest choice for teams that already run their own infrastructure, value the audit and crawler-log access, and have engineering capacity to maintain a self-hosted CMS.

Storyblok: The Visual Editor Optimizer

Storyblok's content management documentation leads with the visual editor, which lets marketers see live previews of in-progress content alongside the structured field UI. For organizations where the bottleneck is marketer throughput rather than engineering capacity, this matters. The block-based component system maps content to design components, which produces consistent rendering but can constrain modeling flexibility for editorial content that does not fit a templated layout.

Storyblok ships a schema markup plugin that handles common JSON-LD types out of the box — Article, Product, Organization, Person — which is the closest any platform in this comparison comes to native schema.org output. For organizations whose AEO surface is dominated by these standard types, the plugin meaningfully reduces implementation cost. Custom entity types still require front-end work, and the visual editor's component constraints can push teams toward shallow modeling.

Payload CMS: The TypeScript-Native Newcomer

Payload CMS is the youngest platform in this comparison, and the most opinionated about TypeScript. Schema definitions, hooks, access control, and admin UI customization are all expressed as TypeScript code. For engineering teams already standardized on Next.js and TypeScript, the integration cost is the lowest of any option. The relationship field types — including polymorphic relationships that let a single reference field point to multiple content types — are the most expressive in the category, which is useful for modeling cases where one Mentions edge might point to a Person, a Company, or a Product.

The trade-off is that Payload's hosted and self-hosted footprint is smaller than the established players, which means fewer pre-built integrations and a smaller plugin ecosystem. For teams willing to build, the modeling ceiling is high. For teams that want to buy, Contentful and Sanity are more mature.

Content Modeling for Entity Extraction: The Five-Layer Stack

Regardless of platform, the content model that produces citation-worthy output follows a consistent five-layer pattern. We have implemented this on Sanity, Contentful, Strapi, Storyblok, and Payload installs and the structure transfers cleanly across all five.

Layer 1 — Atomic entity documents. Create a dedicated document type for each entity class your content discusses: Person, Organization, Product, Location, Concept, Methodology, Research Study, Regulation. Each document carries the schema.org-relevant fields for its type: a Person document has givenName, familyName, jobTitle, worksFor reference, sameAs URL array, alumniOf reference, knowsAbout topic references. An Organization document has legalName, foundingDate, url, sameAs array, address, industry. These are the nodes in your entity graph.

Layer 2 — Topical concept taxonomy. Build a flat or shallowly hierarchical Concept document type for the topics your content covers. An article on the headless CMS AEO topic references Concept documents for "headless CMS," "content modeling," "schema.org," "answer engine optimization," and so on. Concepts are reused across hundreds of articles, and their definitions, related concepts, and sameAs links to Wikipedia, Wikidata, and category authority sources become a long-lived asset. AI models build their own concept graphs, and your concept layer is what your content claims when those graphs reach your domain.

Layer 3 — Editorial container types. Article, Guide, Case Study, Research Report, Whitepaper, FAQ, Glossary Entry, Comparison. Each has its own field set and its own schema.org type — Article maps to Article, Guide maps to HowTo or TechArticle, Case Study maps to Article with a citation to ItemReviewed, Research Report maps to Article with linked Dataset. Editorial containers reference entity documents and concept documents; they do not duplicate the data inside.

Layer 4 — Relationship edges with typed predicates. When an Article mentions a Person, the relationship lives in a typed reference field with a predicate — author, expertCited, quotedSubject, productManager. When an Article references a Research Study, the relationship is methodology, supportingEvidence, refutedClaim. The predicate is what lets your rendering layer produce schema.org markup with the correct property — author versus mentions versus citation. Without typed predicates, every mention collapses into about, and the model loses information about why the entity appeared.

Layer 5 — Publication metadata for crawler visibility. Every document carries published date, last modified date, canonical URL, language, region, content review status, and llms.txt eligibility. This is the layer that lets the front end produce correct llms.txt manifests, correct hreflang declarations, correct datePublished and dateModified in schema, and correct robots metadata for draft and archive states.

The five layers are not a Sanity model or a Contentful model. They are a content model that gets implemented inside whichever platform you have chosen. Implementing the layers requires editorial discipline more than engineering. The two most common failure modes are skipping Layer 1 entities and treating people, companies, and products as free-text fields, and skipping Layer 4 predicates and using untyped reference fields with generic relatedItems names.

The Draft, Preview, and Crawlability Trap

The most expensive AEO bug we see in headless CMS installs is preview pages leaking into AI crawler indexes. The pattern is consistent: a team sets up a preview environment for editorial review, configures it as a separate Vercel or Netlify deployment, forgets to gate it with authentication or robots metadata, and discovers months later that ChatGPT is citing an old draft from a forgotten preview URL.

Sanity, Contentful, Strapi, Storyblok, and Payload all support preview workflows. The implementation details matter. The hygiene checklist that prevents drafts from leaking:

Gate every preview environment behind a token, basic auth, or platform-level access control. Vercel preview deployments expose all preview URLs publicly by default unless explicitly configured otherwise.
Set noindex robots metadata on all non-production environments. Match the policy in your llms.txt disallow list, which we covered in detail in the broader JSON-LD schema stack implementation guide.
Use environment-aware canonical URLs so that preview pages declare their production counterpart as canonical. This prevents preview content from creating duplicate-content signal even if it does get crawled.
Configure your CMS publish workflow to invalidate preview URLs when content goes live, so that the preview is no longer accessible at its preview URL after publication.
Audit preview deployments quarterly. Run a server log analysis filtered to preview hosts and look for GPTBot, ClaudeBot, PerplexityBot, and CCBot traffic. Any hits are bugs.

The mirror-image problem is published content that AI crawlers cannot reach. Single-page-application rendering, hash-routed URLs, JavaScript-rendered content without a server-side fallback, and pages behind authentication are all invisible to AI crawlers. The cleanest test is to disable JavaScript in a browser, load a representative sample of pages, and read what the crawler sees. If the page is empty or fragmentary, the citation rate will be empty or fragmentary.

Multi-Channel Publishing as LLM Training Corpus Seeding

The most underappreciated AEO advantage of a headless CMS is that content modeled once can publish to many surfaces, and each surface is an independent ingestion path for the crawlers that build LLM training corpora. Common Crawl samples the open web on a monthly cadence. The Internet Archive's Wayback Machine snapshots a subset. Anthropic, OpenAI, and Google each run their own crawl infrastructure with different sampling biases. Content available at a single URL has one chance per crawl. Content available at five interlinked surfaces has five chances per crawl.

The multi-channel surfaces that matter for AEO, from highest to lowest ingestion likelihood:

Main editorial site with full server-side rendering and llms.txt. The baseline. Without this, the rest of the surface stack does not matter.

JSON or RSS feed at a well-known path. /feed.xml or /rss.xml or /api/articles.json. AI crawlers and aggregator crawlers both ingest these. Feeds also feed news aggregators, which produce inbound links that signal authority.

Documentation site with code samples and structured prose. Stripe and Twilio documentation are among the most-cited sources in developer LLM responses. A headless CMS that publishes docs from the same content model as marketing content gets the citation lift without doubling editorial cost. The pattern transfers to non-developer products through how-to and reference content.

Partner syndication API. A JSON or GraphQL endpoint that lets partners pull your content into their own surfaces. This was historically a B2B-only play; it is now a quiet AEO advantage because partner-domain syndication produces independent crawlable copies of your content.

Newsletter archive or blog cross-post on a third-party platform. Substack, beehiiv, Medium, LinkedIn newsletters. Each is independently crawled, and the cross-domain content reinforces authorship and topic association. The strategy connects to the broader content repurposing playbook: a single article modeled in your headless CMS can power six surfaces with minimal editorial overhead.

Voice or app surface. Lower direct citation impact, but increasingly relevant as multimodal models ingest voice and app metadata. The headless CMS makes this cheap once the content model exists.

The publishing pipeline that produces these surfaces from a headless CMS is rarely more than a small set of build scripts and webhook handlers. The bottleneck is editorial — having one content model worth publishing to all of them, rather than five inconsistent ones that each require their own editorial workflow.

Retrofit Playbook: From Flat WordPress to Entity-Modeled Headless

If you have inherited a flat content model, whether on WordPress or on a poorly modeled headless install, the retrofit path is sequential rather than parallel. Attempting all layers at once is the most common reason these projects fail. The four-stage sequence that has worked across the implementations we have audited:

1. Inventory and entity extraction (weeks 1-3). Pull your top 200 articles by traffic. For each, list every Person, Organization, Product, Location, and Concept mentioned. Cluster the mentions: how many distinct Persons appear, how many distinct Organizations, how many distinct Concepts. The output is a ranked list of entities by mention count. This is your Layer 1 backfill priority list.

2. Entity documents and canonical URLs (weeks 4-8). Create the schema for your Person, Organization, Product, and Concept document types. Backfill the top 50 entities by mention count first, with full schema.org-aligned fields and sameAs URLs to Wikipedia, Wikidata, LinkedIn, Crunchbase, official sites. Give each entity a canonical URL on your site — /people/jane-smith, /companies/acme-corp, /concepts/headless-cms. These URLs become long-lived assets that LLMs cite as entity definitions.

3. Reference field backfill on top articles (weeks 9-16). Take your top 50 articles and rewrite them so that every Person, Organization, Product, and Concept reference becomes a typed reference to the Layer 1 entity document rather than free text. The rendering layer translates these references into linked schema.org JSON-LD with sameAs URLs. The articles themselves do not need to change visibly; the references resolve inline in the body and the schema markup populates automatically.

4. Predicate typing and multi-channel publishing (weeks 17-24). Introduce typed predicates on reference fields — author, expertCited, productMentioned, methodologyReference — and update the rendering layer to map predicates to the correct schema.org properties. Then activate multi-channel publishing: RSS feed, partner JSON API, llms.txt manifest, docs site cross-publish if applicable. By month six, the citation rate baseline should have shifted by 25 to 40 percent across the top queries the retrofit targeted, based on the implementations we have measured.

The discipline is the rate-limiting step, not the engineering. Teams that complete stages one and two but skip three and four end up with clean entity records that nothing references. Teams that try to do everything at once produce inconsistent partial coverage and abandon the project around month four.

Vendor and Stack Choice: The Five-Question Decision

The headless CMS decision now reduces to five questions about your team and your content surface. The platform recommendation falls out of the answers.

Is your bottleneck engineering capacity or marketer throughput? If engineering capacity is constrained, Storyblok's visual editor and Contentful's mature templates reduce engineering load. If marketer throughput is the bottleneck, Sanity, Strapi, and Payload give engineering more control over the modeling and authoring UX.

Do you require self-hosting for compliance or crawler-log access? Strapi and Payload self-host cleanly. Sanity, Contentful, and Storyblok are hosted-only or hosted-primary.

How rich is your entity graph? If your content discusses tens of thousands of distinct entities — case studies across hundreds of customers, comparison content across hundreds of products — Sanity and Payload's modeling expressiveness pays off. If your entity universe is small and stable, any platform will work.

How important is multi-channel publishing? If you publish to docs, newsletter, partner feeds, and mobile, Contentful's webhook and partner-API ecosystem leads. If you publish primarily to a single web surface, the multi-channel advantage flattens.

How mature is your engineering team's TypeScript and Next.js stack? If TypeScript is the team's native language, Payload's TypeScript-first approach reduces maintenance cost. If the team is polyglot, Sanity, Contentful, Strapi, and Storyblok are all easier to adopt for non-TypeScript engineers.

There is no universal best answer. There is a best answer for a specific team, a specific content surface, and a specific AEO posture. The teams that get the decision right invest twenty hours upfront in clarifying the answers to these five questions before evaluating platforms. The teams that get it wrong evaluate platforms first and then discover their content model does not fit the platform they bought.

Takeaway: The headless CMS choice is no longer a developer-ergonomics question. It is the load-bearing layer of your AEO program because the content model produces — or fails to produce — the entity graph that ChatGPT, Claude, Perplexity, and Gemini cite. Sanity, Contentful, Strapi, Storyblok, and Payload all support citation-worthy modeling when wielded with discipline, and all support flat-content failure when wielded without it. The five-layer entity model — atomic entities, concepts, editorial containers, typed predicates, publication metadata — transfers across all five platforms. Pick the platform that fits your team's engineering and editorial capacity, then commit to the modeling discipline that produces relationships the crawlers can extract. The platform is the substrate. The model is the product.

Frequently Asked Questions

What is a headless CMS and why does it matter for AEO in 2026?

A headless CMS stores content as structured data and exposes it through APIs rather than rendering HTML directly. For AEO, that architecture matters because answer engines reward content that is modeled as discrete entities with explicit relationships rather than as flat HTML pages. A headless CMS with a strong content model lets a marketing team define an Author entity, a Company entity, a Product entity, and a Research Study entity, then connect them with reference fields that translate cleanly into schema.org Person, Organization, Product, and Dataset markup. Coupled with multi-channel publishing — web, app, voice, partner feeds — the same content becomes available to multiple crawler and LLM ingestion paths from a single source of truth. The downside is that headless adds rendering complexity, and a misconfigured front end can hide content from AI crawlers entirely.

Which headless CMS is best for AI search visibility and entity modeling?

No single platform wins on every axis. Sanity has the most expressive content model and the strongest reference-field semantics, which translate well into schema.org relationships, but it requires custom rendering work. Contentful has the deepest enterprise feature set and mature webhook tooling for downstream feeds, but its content model is more rigid. Strapi is the strongest open-source option with self-hosting control, which matters for teams that want full crawler-log visibility. Storyblok leads on visual editing for non-technical teams and ships built-in schema markup tooling. Payload CMS has the cleanest TypeScript-first developer experience and the most flexible block-level modeling. For AEO specifically, Sanity and Payload tend to produce the cleanest entity output, while Contentful and Storyblok offer the smoothest enterprise multi-channel publishing for LLM corpus seeding.

How do reference fields in a headless CMS map to schema.org relationships?

Reference fields are the bridge between content modeling and the schema.org entity graph. A reference field in Sanity, Contentful, Strapi, Storyblok, or Payload lets one document point to another — an Article references its Author, a Product references its Manufacturer, a Case Study references the Customer Organization. When the front end renders the document, those references translate directly into JSON-LD: Article.author becomes a Person node with sameAs links, Product.manufacturer becomes an Organization node, and Case Study fields populate ItemReviewed and reviewBody. The pattern lets editors maintain one Author record with credentials, sameAs URLs, and biographical detail, and have it propagate automatically to every article that references it. Without reference fields, schema.org markup must be hand-coded per page, which decays quickly as the content library grows.

Are draft and preview pages visible to AI crawlers and should they be?

Draft and preview pages should not be visible to AI crawlers in nearly every case, and most headless CMS platforms make this configurable through preview tokens, environment-based routing, and robots metadata. Sanity, Contentful, Strapi, Storyblok, and Payload all support preview workflows that route unpublished content to authenticated preview environments while published content flows to the public production domain. The risk is misconfiguration: if a preview environment is publicly accessible without authentication, AI crawlers will index it, and outdated or incorrect content can enter LLM training corpora and become a long-lived citation liability. The fix is straightforward — gate preview routes behind a token, set noindex on preview environments, and add preview hosts to llms.txt disallow lists. Teams that fail this step typically discover the problem months later when an old draft surfaces in a Perplexity citation.

How does multi-channel publishing from a headless CMS help with LLM training corpus inclusion?

Multi-channel publishing means the same content body, modeled once in the CMS, is rendered into multiple distribution endpoints — the main website, a developer documentation site, an RSS or JSON feed, a partner syndication API, a mobile app, a voice assistant skill, a static export to GitHub. Each endpoint becomes an independent ingestion path for LLM training crawlers. Common Crawl, the dataset that underlies most foundation model training, samples broadly across the open web, and content available at multiple crawlable surfaces is more likely to be sampled than content available at a single URL. A headless CMS with mature webhook and feed tooling — Contentful, Sanity, and Storyblok lead here — lets a team publish once and seed the content into ten ingestion paths. The effect compounds over multiple model training cycles.