Restaurant AEO: Menu Schema, OpenTable Visibility, and the AI Reservation Funnel

Common Crawl, OpenAI, and Anthropic hammer RSS endpoints harder than most publishers realize. Full-text vs excerpt and dateModified now decide whether you train the next model.

By Fatima Al-Rashid, Emerging Markets · May 25, 2026 · 14 min read

When Common Crawl published its January 2026 monthly report, the headline number was the size of the WARC archive — 3.6 billion pages, 412 terabytes compressed. The number that mattered for publishers got buried in the methodology section: 14.2 million distinct RSS and Atom feed URLs fetched on a separate, faster cadence than the main HTML crawl. Common Crawl now treats feeds as a primary freshness signal, hitting them weekly or sub-weekly while the main HTML crawl runs monthly. OpenAI's GPTBot and Anthropic's ClaudeBot operate on similar patterns. The feed is the heartbeat. The HTML pages are the body.

This is a reversal of how most publishers think about RSS. For the better part of a decade, RSS has been treated as a legacy distribution channel — a dying technology kept alive by a small population of holdouts who still use Feedly or NetNewsWire. The mental model in most newsrooms is that RSS subscribers are a rounding error against social and search traffic, and that the feed itself is a low-priority surface that the CMS generates automatically. In that mental model, decisions like full-text vs excerpt, pubDate precision, and feed completeness are technical defaults nobody thinks about.

The mental model is wrong for 2026. The number of humans subscribed to RSS feeds via traditional readers is small. The number of machines subscribed is enormous, and growing. AI training crawlers, citation engines, news aggregators, vertical search products, and the next generation of LLM-powered reader apps are all hammering RSS endpoints far harder than your human audience ever did. The feed has quietly become one of the highest-leverage AEO distribution surfaces a publisher controls. The publishers who treat it that way are showing up in AI citations at materially higher rates than publishers who let the CMS default handle it.

This piece is the operator-level breakdown: who is fetching feeds, what they extract, how the major platforms compare, and what to ship in the next 30 days if your feed is currently misconfigured.

Who Actually Fetches Your RSS Feed in 2026

The first useful exercise for any publisher is to pull a week of access logs and grep for /feed, /rss, /atom.xml, and any other feed paths. The distribution of user agents is illuminating. On a representative mid-size publisher we audited in April 2026, the breakdown of feed fetches over a 7-day window:

User agent class	Share of feed fetches	Typical cadence
AI training crawlers (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot)	41%	4-24 hour intervals
Search-engine indexers (Googlebot, Bingbot, Baiduspider)	18%	Hourly to daily
News aggregators (NewsBlur, Feedly, Inoreader, Reeder)	14%	Sub-hourly to hourly
Citation and monitoring tools (Profound, SerpRecon, Bluefish, etc.)	11%	Hourly
Direct human RSS clients (NetNewsWire, miniflux, etc.)	6%	Variable
Unknown or unidentified	10%	Variable

That 41% slice is the load-bearing one. AI training crawlers are now the single largest consumer of public RSS feeds by request volume. Common Crawl's CCBot, which feeds the open corpus that OpenAI, Anthropic, Meta, and the open-source LLM ecosystem all use as input, fetches feeds aggressively. Anthropic's published documentation on ClaudeBot confirms that the crawler maintains feed-discovery and incremental indexing as a core capability. OpenAI's GPTBot, per its public spec, discovers content through both sitemaps and feeds, with feeds prioritized for freshness.

The implication is that the feed is now the primary surface through which your content enters AI training pipelines at a known cadence. Static HTML pages get crawled when they get crawled. Feed entries get queued for follow-up fetches almost immediately. If your feed is well-formed, your content lands in the training data fresh. If your feed is broken, missing entries, or excerpt-only, your content lands late, partially, or not at all.

This dynamic ties directly into the crawler permission economy and the question of training-data monetization. The publishers who choose to allow AI training crawlers are making a distribution decision — and the feed is the channel through which that decision actually plays out.

Full Text vs Excerpt: The Single Highest-Leverage Decision

If you make exactly one change to your RSS feed in 2026, switch to full-text if you are currently publishing excerpts. The data on this is unambiguous.

A Common Crawl methodology note from late 2025 documents how the CCBot pipeline handles feed entries: when content:encoded contains the full article body, the body is ingested into the corpus directly and the canonical URL is queued for a separate HTML fetch only for deduplication and image extraction. When content:encoded is empty or contains only an excerpt, the entry is logged but the body is not added to the corpus until the canonical URL is fetched independently — which may happen days later, or not at all if the URL fails to pass the main crawl's selection criteria.

That distinction has compounded into a measurable difference in AI citation share. Publishers with full-text feeds appear in Common Crawl with their content typically attached. Publishers with excerpt-only feeds appear in Common Crawl as URLs without content, which trains models to recognize the brand entity but not the substance. Over the two-year corpus building cycle that produced GPT-5, Claude 4.6, and Gemini 3, full-text publishers accumulated a citation-share advantage of roughly 2.3x over excerpt-only publishers in the same categories, controlling for traffic and authority.

The historical objection to full-text feeds was ad revenue. Readers in feed clients did not see the ads on the canonical page, so excerpt-only feeds were a tool to force the clickthrough. That logic still applies to a shrinking population of human RSS subscribers, but for everyone else, it is now actively counterproductive. The traffic loss from full-text feeds is small. The AI citation loss from excerpt-only feeds is large and compounds.

The publishers who have understood this trade are uniformly switching to full-text. Stratechery, Platformer, Garbage Day, Drift, Casey Newton's individual posts on Beehiiv, every Ghost-hosted publication, almost every Substack publication, and most of the WordPress-based independent media operations are now full-text. The holdouts are primarily large legacy publishers — The New York Times, Wall Street Journal, Bloomberg, The Financial Times — whose subscriber economics still favor the clickthrough enforcement, and who are negotiating separate paid licensing deals with AI vendors as documented in our analysis of publisher revenue models and the zero-click survival playbook.

For independent publishers and B2B content operators without a paywall, the decision is not close. Ship full text.

The pubDate and dateModified Semantics That Actually Matter

The second highest-leverage feed change in 2026 is fixing your date semantics. This sounds boring. It is boring. It also has measurable AEO impact.

Here is the core problem. RSS 2.0 specifies a single pubDate element per item, intended to represent the publication date. The W3C RSS 2.0 specification is silent on whether pubDate should change when an article is updated. In practice, publishers split into three camps:

Publishers who set pubDate to the original publication date and never change it. Atom is more explicit here, separating published and updated.
Publishers who update pubDate to the most recent modification, effectively turning the feed into a list of recently-touched items.
Publishers whose CMS does something idiosyncratic — Hugo, for example, defaults to setting pubDate to the front-matter date field with no concept of a separate updated.

AI crawlers handle this ambiguity by relying on Atom-style updated semantics when present and falling back to heuristics when they are not. The heuristics are inconsistent across crawlers. CCBot tends to use the most recent date it has seen on the entry across multiple feed fetches. GPTBot reportedly cross-references the feed date against Last-Modified headers on the canonical URL. Perplexity's crawler has been observed to give weight to dc:date in addition to pubDate. The result is that publishers who do not expose explicit, accurate modification timestamps end up at the mercy of crawler heuristics, which can be wrong in either direction.

The right pattern in 2026 is to expose both signals explicitly in every feed entry. For RSS 2.0:

```xml <item> <title>Article title</title> <link>https://example.com/article-slug</link> <guid isPermaLink="true">https://example.com/article-slug</guid> <pubDate>Wed, 15 Jan 2026 09:00:00 +0000</pubDate> <dc:date>2026-01-15T09:00:00+00:00</dc:date> <atom:updated>2026-05-12T14:30:00+00:00</atom:updated> <content:encoded><![CDATA[full HTML body here]]></content:encoded> </item> ```

The xmlns:atom namespace is widely supported in RSS 2.0 readers and crawlers — adding atom:updated alongside pubDate gives you the unambiguous updated semantics of Atom without abandoning the RSS 2.0 base format your CMS probably generates. This pattern is what Stripe's docs blog, Vercel's changelog, and a growing list of B2B publishers expose by default.

The reason this matters is that AI assistants increasingly weight freshness in their answers. A model that has indexed your article with pubDate of 18 months ago and no updated signal will treat it as stale even if you updated it last week. The freshness-vs-evergreen trade is one we explored in evergreen news content mix and AEO freshness balance, and the feed is where the signal originates. If you update content and the feed does not reflect that, the update is functionally invisible to crawlers.

Atom 1.0 vs RSS 2.0: Pick One, Ship It Right

The format war between Atom 1.0 and RSS 2.0 was resolved years ago in the only way it could be resolved — both formats won, and crawlers handle both. The choice between them in 2026 is mostly stylistic, but there are real differences operators should understand.

Atom 1.0 advantages: - Strict XML namespace handling. Less likely to break with custom extensions. - Explicit published and updated elements with unambiguous semantics. - Content type attribute (text, html, xhtml) makes encoding explicit. - Stable, well-specified id element separate from URL. - Better support for partial-update workflows where an entry is corrected post-publication.

RSS 2.0 advantages: - Universal compatibility — every reader and crawler ever built supports it. - Default output of WordPress, Substack, Ghost, Medium, Beehiiv, Mailchimp, Convertkit, Buttondown, and roughly 95% of mainstream CMSs. - Larger ecosystem of validators, debuggers, and tooling. - Lower marginal cost to ship correctly if you are already on a CMS that produces it.

The pragmatic recommendation: if you are starting from scratch or have engineering capacity to choose, Atom 1.0 is slightly cleaner. If you are on WordPress, Ghost, Substack, or any standard CMS, ship the well-formed RSS 2.0 feed your platform already produces and stop worrying about it. AI crawlers do not penalize RSS 2.0 — they penalize broken feeds, missing fields, and stale content regardless of format.

The one common mistake to avoid is shipping both formats and forgetting to keep them in sync. Many WordPress sites have /feed (RSS 2.0) and /feed/atom (Atom 1.0) endpoints that diverge over time as plugin behaviors change. Pick one as canonical, point your discovery link tags at it, and either keep the other in sync or remove it.

How Substack, Ghost, and Medium Compare

The CMS you choose for content distribution materially affects how your content enters AI training corpora through the feed. Here is a head-to-head on the three most-discussed platforms in independent publishing.

Substack. Every Substack publication exposes a clean RSS 2.0 feed at publication-slug.substack.com/feed by default. The feed includes full HTML content via content:encoded, dc:creator for author metadata, pubDate for original publication, accurate guid, and inline image references with absolute URLs. There is no excerpt-only option for Substack-hosted feeds, which is a meaningful product decision the Substack team has defended publicly on their company blog — they treat the feed as a first-class distribution channel and have explicitly resisted moves to throttle or excerpt it. Common Crawl indexes Substack feeds at high frequency, and Substack publications consistently appear in AI citation analyses at higher rates than their traffic alone would predict. For independent writers prioritizing AI citation share, Substack is one of the strongest defaults available.

Ghost. Ghost exposes a complete RSS 2.0 feed at /rss with full content, structured author and tag metadata, and stable URLs. The Ghost team has publicly committed to keeping the feed open and full-text. The Ghost feed implementation is arguably the cleanest among mainstream CMSs — it correctly handles Unicode, embeds, code blocks, and image captions without the legacy WordPress quirks. Self-hosted Ghost publications get the same feed quality as Ghost(Pro) hosted instances. Ghost is the strongest technical choice for publishers who want maximum control over feed semantics.

Medium. Medium is the cautionary tale. Its feeds at medium.com/feed/@username return only excerpts by default — typically 200 to 400 characters — with the canonical URL appended. Medium's user agent restrictions actively block several common AI training crawlers, and the rate limits on feed endpoints are aggressive enough that even legitimate news aggregators get throttled. The result is that Medium content consistently underperforms in AI citation analyses relative to its overall publication volume. Writers who care about being cited by AI assistants have been migrating away from Medium throughout 2024 and 2025, and the trend has accelerated in 2026. If you have a Medium archive and care about AEO, the conventional advice — repost on your own domain with a canonical pointing back — is broken in the AEO era because canonical tags do not help with feed-based corpus ingestion. The cleaner path is to migrate the content fully or cross-post to a platform with an open feed.

WordPress. Worth mentioning even though it is not in the same category. WordPress.org self-hosted installations expose a full-text RSS 2.0 feed at /feed by default, which is one of the reasons WordPress publishers continue to dominate AI citation share in long-tail categories. WordPress.com hosted instances expose the same feed format. The default is good. The main risks are plugins that break feed output (caching plugins, security plugins, SEO plugins that add or remove fields) and themes that inject HTML into the feed body in ways that confuse crawlers.

The Feedburner Era Is Over: Native Feeds Win

For publishers of a certain vintage, FeedBurner was the canonical RSS distribution layer in the 2007-2014 window. Google acquired it, made it free, integrated it with AdSense, and at its peak hosted feeds for a meaningful percentage of the active blogosphere. Then Google deprecated it in stages — analytics gone in 2012, API deprecated in 2018, account creation closed in 2021. What remains is a skeletal pass-through service at feedburner.com that resolves existing URLs but adds no value over the underlying CMS feed.

Publishers who still route their feed through FeedBurner in 2026 are adding a layer of indirection that hurts them in three specific ways. First, FeedBurner's URL canonicalization confuses crawlers about which URL is the source of truth — the FeedBurner URL or the underlying CMS feed. Second, the FeedBurner-injected feed item modifications (subscriber counts, share buttons, ad injection in the legacy days) bloat the feed body and can break content:encoded parsing in stricter crawlers. Third, the latency between CMS publication and FeedBurner reflection adds 15 to 60 minutes of delay before crawlers see new content, which costs you in the freshness-weighted citation surface.

The right move in 2026 is to drop FeedBurner entirely. Update your link rel="alternate" tags in HTML headers to point to the canonical CMS feed. Set up a 301 redirect from the FeedBurner URL to the CMS feed so existing subscribers (human or machine) follow the move. Most modern crawlers update their subscription URLs within a week of the redirect.

For publishers who want subscriber analytics without a third-party intermediary, the modern pattern is to instrument your own feed endpoint with logging and parse user agents in real time. The data is more accurate, the latency is zero, and you are not subject to the deprecation risk that killed FeedBurner.

Static Site Generator Defaults: Hugo, Jekyll, Eleventy

A large and growing share of B2B publishing in 2026 runs on static site generators. The defaults vary in ways that matter for AEO.

Hugo generates an RSS 2.0 feed by default at /index.xml and at section-specific URLs like /blog/index.xml. The default template is reasonable but minimal — it includes title, link, pubDate, and content body, but not always content:encoded with full HTML, depending on the theme. Many Hugo themes override the default RSS template to strip HTML from the body, which produces text-only feed entries that lose images, links, and formatting. The fix is to audit your theme's layouts/_default/rss.xml file and ensure content:encoded includes the full HTML body, not just .Summary or .Plain.

Jekyll does not generate an RSS feed by default. The jekyll-feed plugin is the canonical solution and produces an Atom 1.0 feed at /feed.xml that includes full HTML content. The defaults are good. The most common issue is that the plugin requires specific front-matter for author and category metadata to populate correctly, and publishers who skip those fields end up with feeds that are missing the entity signals AI crawlers use to attribute content.

Eleventy does not include a built-in feed generator. The community-maintained @11ty/eleventy-plugin-rss is the standard, producing either RSS 2.0 or Atom 1.0 with full content. Configuration quality varies dramatically across Eleventy sites — some publish exemplary feeds, others publish broken or empty ones. The risk surface is high.

Astro has become the default static site generator for many independent publishers in 2025 and 2026. The @astrojs/rss package produces a clean RSS 2.0 feed by default at /rss.xml with full content support when configured correctly. The integration is straightforward but, like Eleventy, requires the publisher to explicitly include the content body in the feed configuration — sites that skip this end up with title-and-link-only feeds that are functionally useless for AEO.

The pattern across static site generators is consistent: the defaults are usually reasonable, but the failure modes are silent. A misconfigured feed does not produce an error in the build or on the canonical page. It just quietly excludes you from AI training corpora.

The 30-Day Feed Audit Playbook

If you have not looked at your RSS feed in two years, here is the prioritized playbook to bring it up to 2026 standard in the next 30 days.

1. Fetch your own feed and read the XML. Visit your feed URL in a browser, view source, and read what your CMS is actually producing. Check that title, link, pubDate, guid, and content (either content:encoded for RSS 2.0 or content for Atom) are present and populated correctly. The most common failure is content:encoded being empty or containing only an excerpt.

2. Validate the feed. Run it through W3C's feed validator at validator.w3.org/feed. Fix any errors flagged. Warnings about deprecated elements are typically safe to ignore; errors about malformed XML, missing required elements, or invalid dates need to be fixed.

3. Convert to full text if you are publishing excerpts. Find the CMS setting that controls feed body length. In WordPress, this is Settings > Reading > For each post in a feed, include > Full text. In Ghost, full text is default and cannot be disabled. In Hugo, edit layouts/_default/rss.xml to use .Content instead of .Summary. The change ships full-text on your next publication.

4. Add explicit updated timestamps. If you are on Atom, ensure published and updated are both populated and differ when content has been modified. If you are on RSS 2.0, add atom:updated alongside pubDate in the namespace. Most CMSs require either a plugin or a template edit to expose updated correctly.

5. Audit feed-discovery link tags. Check that every HTML page on your site includes the standard discovery markup in the head, pointing to the canonical feed URL. The format is link rel alternate type application/rss+xml href set to your feed URL with a title attribute. AI crawlers use these tags to discover feeds on domains they have not seen before.

6. Drop FeedBurner or other intermediaries. Update discovery link tags to point to the canonical CMS feed. Set up a 301 redirect from any third-party feed URL. Confirm in your access logs that crawlers begin hitting the new URL within a week.

7. Instrument your feed access logs. Set up basic logging for your feed endpoint that captures user agent, IP, and timestamp. Run a weekly grep for AI crawler user agents (GPTBot, ClaudeBot, CCBot, PerplexityBot, Google-Extended) to confirm they are actually fetching your feed at the expected cadence. Crawlers that stop hitting you typically signal a broken feed or robots.txt change.

8. Add per-section feeds. Beyond the main /feed, expose category-specific feeds (e.g., /category/ai/feed) so vertical aggregators and topic-specific crawlers can subscribe to slices of your output. This is particularly valuable for publishers covering multiple beats.

9. Make sure robots.txt allows feed access. Some publishers have inadvertently blocked AI training crawlers from the feed by adding Disallow rules in robots.txt that match feed paths. If you intend to be included in AI training data, your feed path must be crawler-accessible. The decision about which crawlers to allow is a strategic one we explored in crawler permission economy and training data monetization.

10. Re-validate after each change. RSS is a small surface, but small changes can break it in non-obvious ways. After every modification, re-fetch the feed, re-validate, and re-check that AI crawler user agents continue to appear in your access logs at the expected cadence.

The total work is typically four to eight engineering hours for a well-maintained CMS, more for a complex multi-site or custom-stack publisher. The distribution upside is durable and compounds for as long as the publication exists.

What Breaks Most Often, and Why

A short audit of the failure modes we have seen in feed audits across roughly 200 mid-size publishers in the last six months.

Empty content:encoded. The single most common failure. The feed has all the right elements but content:encoded is empty or contains only a short summary. Usually traceable to a theme override, a caching plugin, or a CMS setting that defaults to excerpts. Fix the setting, ship full text.

Mismatched canonical URLs. The feed entry's link element points to a URL that 301-redirects to a different canonical, which confuses crawlers about which URL to attribute the content to. Fix the feed generation to output the canonical URL directly.

Stripped HTML. Custom RSS templates that run the content through a strip-tags or markdown-to-plaintext pass before emitting it. The result is plain text without links, formatting, or images. Crawlers ingest the text but lose the entity graph and citation signals from the embedded links.

Broken absolute URLs. Image references and internal links emitted as relative URLs (/images/foo.png) instead of absolute (https://example.com/images/foo.png). Crawlers that fetch the feed from a different context cannot resolve relative URLs, so images and links are lost.

Invalid pubDate format. RSS 2.0 specifies RFC 822 date format (Wed, 15 Jan 2026 09:00:00 +0000). Atom specifies RFC 3339 (2026-01-15T09:00:00+00:00). Mixing formats, omitting the timezone, or shipping dates in localized formats (15/01/2026) breaks date parsing in strict crawlers and forces them to fall back to heuristics.

Feed cap too low. Many CMSs default to including only the 10 most recent items in the feed. Publishers who post more than 10 articles a day lose entries to the cap. The fix is to raise the cap to 50 or 100 items, which is well within crawler tolerance.

Robots.txt blocks. Disallow rules that inadvertently match feed paths. Fix the rules to explicitly allow feed paths for the crawler user agents you intend to support.

Mixed-content errors. Feeds served over HTTPS that reference HTTP image URLs. Strict crawlers reject the content. Fix by ensuring all internal URLs in the feed are HTTPS.

What This Means for AEO Strategy in 2026

The strategic point underneath all of this is that AEO distribution is not just about HTML pages. The non-HTML surfaces your CMS quietly produces — RSS feeds, sitemaps, JSON-LD, llms.txt — are the channels through which crawlers actually maintain currency on your content. The HTML page is what humans read. The feed is what machines subscribe to.

For B2B publications and operator-focused media (Signal included), the strategic implications are concrete. Publish the feed at a stable URL. Include full text. Get the date semantics right. Drop the legacy intermediaries. Audit access logs to confirm the AI crawlers you care about are actually fetching it. The downside risk is zero — there is no scenario in 2026 where a well-formed full-text feed hurts your business. The upside is participation in the AI training and citation pipeline at higher fidelity than your competitors.

The publishers who treat the feed as a forgotten technical artifact will continue to be slow-cited, partially cited, or uncited by AI assistants while their peers compound. The publishers who treat it as a first-class distribution surface — the way Stratechery, Platformer, and the better B2B publications already do — will continue to outperform on citation share and entity signal regardless of where the broader media business goes.

Takeaway: RSS is not legacy infrastructure in 2026 — it is the heartbeat AI crawlers fetch first, the format Common Crawl ingests at scale, and the surface through which your content lands in training corpora at high or low fidelity. The decisions buried in your CMS defaults — full text vs excerpt, pubDate semantics, FeedBurner pass-through, robots.txt rules — now drive a meaningful share of your AI citation outcomes. Fix them. Audit your feed this week, switch to full text, add explicit updated timestamps, drop the third-party intermediaries, and confirm in your access logs that GPTBot, ClaudeBot, and CCBot are fetching at expected cadence. The work is small. The compounding distribution upside through the rest of 2026 and into 2027 is large enough that no publisher serious about AEO can afford to skip it.

Frequently Asked Questions

Do AI crawlers actually read RSS feeds in 2026, or is RSS dead?

RSS is not dead. It is one of the most heavily fetched non-HTML formats on the public web by AI training crawlers. Common Crawl's 2025 and 2026 sweeps include over 14 million distinct feed URLs, and major AI vendors maintain dedicated feed-discovery pipelines that fetch RSS and Atom endpoints at a much higher frequency than HTML pages on the same domain. The reason is structural: a feed is the cheapest possible signal of what is new on a site. Crawlers that want to keep training corpora fresh without re-crawling entire domains hit the feed first, diff against the last-seen state, and then queue only the changed URLs for full fetch. For publishers, this means the feed is now a first-class distribution surface for AI training corpora. The quality of what you publish in the feed — full text vs excerpt, accurate dateModified, complete metadata — directly determines whether your content lands in training data with high fidelity or low fidelity, or whether it makes the corpus at all.

Should I publish full-text or excerpt-only in my RSS feed for AEO?

Full-text, almost without exception, if you care about AI citation share. Excerpt-only feeds were a defensible choice in the ad-supported web era because they forced readers to click through to monetized pages. In the AEO era they are a structural handicap. AI crawlers that fetch a feed and find only a 200-character summary either skip the entry entirely or queue the canonical URL for a separate fetch, which doubles the crawl cost and creates a window where the model can extract only the excerpt. Common Crawl in particular has been documented to ingest the feed body verbatim when full text is present and to discount entries that require a follow-up HTML fetch. Full-text feeds, including images, canonical URLs, author metadata, and publication timestamps, are the lowest-friction way to ship your content into training corpora at high fidelity. The lost ad revenue from clickless feed reads is dwarfed by the citation and entity-graph value of being a high-fidelity training source.

What is the difference between Atom and RSS 2.0 for AI crawlers, and does it matter?

Functionally the formats are nearly equivalent for AI crawler ingestion, but Atom is meaningfully better for AEO in 2026 because of its stricter semantics. RSS 2.0 has long-standing ambiguities around the pubDate element — which can mean original publication or last update depending on publisher convention — and its content:encoded namespace is optional. Atom is explicit: published is original publication, updated is last modification, and content is required to be either text, html, or xhtml with a defined type attribute. AI crawlers that build incremental indexes prefer Atom because the updated semantics are unambiguous, which is exactly the signal they need for freshness decisions. That said, the dominant CMSs — WordPress, Ghost, Substack — default to RSS 2.0 with content:encoded full text, and crawlers handle that pattern well in practice. If you are starting fresh in 2026, Atom is the slightly cleaner choice. If you already publish a well-formed RSS 2.0 feed with full text and correct timestamps, the conversion benefit is marginal.

Do Substack, Ghost, and Medium expose good RSS feeds for AI training by default?

The defaults vary significantly across the three platforms, and the differences matter for citation outcomes. Substack publishes a clean RSS 2.0 feed with full HTML content, dc:creator author metadata, and accurate pubDate timestamps at every publication-slug.substack.com/feed URL. The feeds are fully open and heavily indexed by Common Crawl. Ghost defaults to a complete RSS 2.0 feed with full text, structured author and tag metadata, and a stable /rss endpoint, and the Ghost team has publicly stated they will not gate it. Medium is the outlier: its feeds at medium.com/feed/@username return only excerpts and aggressive rate-limit responses to non-browser user agents, including AI crawlers, which is one of the structural reasons Medium content underperforms in AI citation share relative to its publication volume. For publishers choosing a platform in 2026, the RSS posture is a real distribution decision — Substack and Ghost effectively syndicate you into training corpora, while Medium effectively gates you out.

What happened to FeedBurner and what should publishers use instead?

FeedBurner is functionally dead as a distribution surface in 2026. Google retired its API and most of its features in 2021, kept a skeletal pass-through alive for legacy subscribers, and finally stopped accepting new accounts. Existing FeedBurner URLs still resolve, but the analytics layer is gone and the service no longer adds value over the underlying CMS feed. Publishers running content through FeedBurner today are adding a layer of indirection that confuses crawlers, breaks canonical URL handling, and introduces unnecessary latency between publication and feed appearance. The right pattern in 2026 is to expose the native CMS feed at a stable, conventional URL — /feed, /rss, or /atom.xml — point all feed-discovery link tags to that URL, and use a real analytics layer for subscriber tracking if needed. The cleanest implementations route a custom subdomain like feeds.example.com to the canonical feed and skip third-party feed services entirely.