SignalFeed

Solar Installer AEO: How Residential Buyers Bypass EnergySage and Ask ChatGPT for Quotes

Lenny Rachitsky, Ben Thompson, and Casey Newton run Substack archives that LLMs cite at rates competitors with 10x the list size never reach. The mechanic is post compounding.


When Lenny Rachitsky disclosed in his 2025 annual review that Lenny's Newsletter had crossed 850,000 subscribers and generated north of $4 million in subscription revenue, the subscriber number got most of the attention. The number that mattered more for distribution strategy was buried in the platform stats: 412 published posts in the archive, each one a separately addressable URL on substack.com, each one fully crawled by GPTBot and ClaudeBot, each one cited somewhere between 3 and 41 times across measurable LLM citation queries in the first quarter of 2026. The citation rate per post was the load-bearing metric. The subscriber number was a vanity layer on top.

This is the structural insight that most newsletter operators are still missing in 2026. The mental model from the email marketing era — that a newsletter's value is the size of the list it sends to — has held over into the AEO era, where it actively misleads. A newsletter is now two products at once: a recurring email blast to a list of subscribers, and a structured archive of public web pages indexed by LLMs. The first product cares about list growth. The second product cares about archive depth, publication cadence, schema cleanliness, and crawler accessibility. The two products use the same writing labor but optimize for fundamentally different distribution surfaces, and the AEO surface is, for most operators in most categories, now the larger one.

Substack happens to be the platform where this dual nature is easiest to see, because Substack's defaults solve almost all of the AEO problems out of the box. Every post is a clean substack.com URL with server-rendered HTML, full-text RSS, OpenGraph metadata, and no paywall unless the author explicitly toggles one on. That makes Substack a useful case study even for publishers who do not use the platform, because the architectural decisions Substack made are now the de facto reference design for newsletter-as-citation-strategy.

This piece is the operator-level breakdown: why archive depth beats subscriber count in AI citation queries, how Substack's syndication play feeds the major LLMs, what the paywall-versus-citation tradeoff actually looks like with real numbers, and a 30-60-90 playbook for newsletter operators who want to convert their archive into a measurable citation engine.

Why Archive Depth, Not Subscriber Count, Drives AI Citation Share

The first thing to internalize is that LLM citations are produced by retrieval and training, not by audience. When ChatGPT, Perplexity, Claude, or Gemini respond to a query like "what is the best advice for product managers on roadmap prioritization" or "who writes the best analysis of the cloud platform wars," the model is not consulting a list of newsletters ranked by subscriber count. It is consulting an internal representation of authority built from training-corpus content plus, in retrieval-augmented modes, real-time search results. Both inputs are content-based, not audience-based.

That representation is built post by post. Each individual published article enters the corpus as a discrete document with its own URL, title, body, entities, and topical signals. The model learns to associate a publication brand with certain topics by repeatedly seeing posts under that brand discuss those topics in structured ways. A newsletter with 50 posts on cloud infrastructure generates a much weaker authority signal in cloud-infrastructure queries than a newsletter with 500 posts on the same topic, regardless of which one has more subscribers.

This is the published-post compounding effect, and it is the single most underappreciated dynamic in newsletter strategy in 2026. Every post you publish:

Asset createdDirect AEO benefitCompounding effect
A new public URLOne additional indexable documentAdds to brand-level URL density on substack.com
A new RSS feed entryTriggers crawler freshness fetchReinforces publication cadence signal
A new title and dekNew keyword surface areaExpands topic coverage in the entity graph
A new set of internal linksNew retrieval anchor pointsCompounds prior posts' authority
A new set of outbound citationsReciprocal trust signalStrengthens external entity graph
A new dateModified valueFreshness signal for the corpusRefreshes overall archive recency

A subscriber, by contrast, generates none of those. A subscriber is a private record in Substack's database. The subscriber count is a number Substack displays in your dashboard. Neither produces a citation. The post produces the citation. The subscriber, at best, reads the post and shares it on LinkedIn, which is a secondary citation pathway that the article on LinkedIn thought leadership as the cheap AEO win explores in depth.

The operator implication is that newsletter publishers should treat published-post count as a first-class metric alongside subscribers, opens, and revenue. A useful internal dashboard line: cited posts per quarter divided by total posts published. The numerator measures whether you are producing citable content; the denominator measures whether you are producing enough of it. Lenny's Newsletter, Stratechery, Platformer, and the other top citation-share newsletters all run cited-post ratios north of 30 percent. That number is achievable with a 200-post archive. It is structurally difficult with a 30-post archive regardless of subscriber count.

The Substack Architecture That Makes the Citation Math Work

Substack did not design its platform for AEO. The platform was launched in 2017 to solve the email-newsletter monetization problem for independent writers, and the architectural choices were driven by ease of authoring and simplicity of subscription billing. The fact that Substack now produces some of the most LLM-citable content on the open web is a downstream consequence of those choices, not a deliberate AEO strategy. The choices are worth enumerating because they are the reference design any newsletter platform or self-hosted operator should match.

Clean, predictable URL structure

Every Substack post lives at publication-slug.substack.com/p/article-slug or, for custom-domain publications, at example.com/p/article-slug. The URL is stable for the life of the post. There are no session IDs, no query parameters required for rendering, no fragment-only routes. Crawlers can hit the URL and get the article. This sounds trivial. It is not. A large portion of the modern web hides content behind URLs that require JavaScript hydration or session state to resolve, and Substack avoids that entire class of crawler-visibility failure mode by default.

Server-rendered HTML with full content

A Substack post returns a fully rendered HTML response on the initial GET request. The article body, title, author, publication date, and tag metadata are all in the HTML payload. No client-side rendering, no late-loaded content. GPTBot, ClaudeBot, PerplexityBot, and CCBot all fetch the URL and immediately have the content. The fetch cost is one HTTP request per post.

Full-text RSS feeds at a stable endpoint

Every Substack publication exposes a complete RSS feed at publication-slug.substack.com/feed, with full article body in content:encoded, dc:creator metadata, accurate pubDate timestamps, and canonical URLs. This is the structural feature that makes Substack content land in training corpora at high fidelity rather than as URL-only stubs. The mechanics of why this matters are covered in detail in our breakdown of RSS feeds as an LLM training corpus syndication channel, but the short version is that AI crawlers prefer full-text RSS for freshness, and Substack ships full-text RSS by default.

Open by default, paywall by exception

A new Substack post is publicly accessible unless the author explicitly toggles the paywall on. The platform's friction is asymmetric — paid is one extra click, free is the default — which produces an archive that skews heavily open even for paid publications. Most paid Substacks are 60-90 percent open by post count. That open share is what LLM crawlers index. The paid share is what subscribers monetize. Both can be optimized independently.

Substack-wide domain authority

Because every publication shares the substack.com domain, individual newsletters inherit some baseline authority from the corpus-wide presence of substack.com URLs in training data. A brand-new Substack with zero backlinks still benefits from the fact that LLMs have processed millions of substack.com pages and treat the domain as a credible publication surface. Self-hosted alternatives have to build that authority from scratch.

The cumulative effect of these defaults is that a writer can launch a Substack today, publish for 18 months at a sustainable cadence, and end up with an archive that is structurally indistinguishable from a professionally engineered content marketing site that cost 10 to 50 times more to build. The architecture is doing the work.

Lenny Rachitsky, Stratechery, Platformer: Three Citation-Share Case Studies

The three most-cited Substack newsletters in 2026 are roughly Lenny's Newsletter on product management, Stratechery on tech strategy, and Platformer on platform policy. Each demonstrates a different model of converting Substack architecture into LLM citation share, and each is instructive in different ways.

Lenny's Newsletter: archive breadth as authority moat

Lenny Rachitsky's archive crossed 400 published posts in mid-2025 and is approaching 500 by mid-2026. The archive covers product management with unusual breadth — career advice, hiring, prioritization frameworks, growth tactics, AI integration, organizational design, individual founder interviews. The posts are typically 2,000-4,000 words, structured with clear H2 sections, and dense with named-entity references to specific companies, products, and people.

The citation pattern is striking. In a measurement window of January through April 2026, Lenny's posts were cited in ChatGPT and Perplexity responses across at least 41 distinct product-management query categories — onboarding, OKRs, roadmapping, PM interview prep, growth experimentation, AI product strategy, and so on. The breadth of citation is what archive breadth produces. Each post stakes a claim on a specific topic and accumulates citation share within that topic over time.

The subscriber count helps, but Rachitsky's revenue and audience came from the same archive that drives citations. The causality runs through the published posts, not around them.

Stratechery: paywall as deliberate citation tradeoff

Ben Thompson's Stratechery is the canonical example of a paywall-first newsletter that has nonetheless achieved enormous LLM citation share. The trick is that Thompson built brand authority over 12+ years of high-volume free publishing before introducing the daily-update paywall in 2014, and he still publishes one open Weekly Article every Monday plus a substantial archive of pre-paywall posts that remain free. The free corpus is large — well over 1,000 archived pieces — and the cited entity is "Stratechery" the publication, not any individual subscriber-restricted post.

The Stratechery model demonstrates the citation tradeoff explicitly. The paywalled posts produce subscription revenue but generate zero citation lift. The free posts produce citation lift but generate no direct revenue. Thompson runs the model knowing the tradeoff, and the citation lift from the free corpus is large enough that he could afford the paywall on the daily posts without sacrificing brand discovery. Most operators do not have a 12-year free archive to lean on. For them, the Stratechery model is aspirational, not transferable.

Platformer: focused archive in a narrow vertical

Casey Newton's Platformer, launched as an independent Substack publication after Newton left The Verge, demonstrates the focused-vertical model. Platformer publishes 3-5 times per week on platform policy, content moderation, social media, and AI ethics. The archive is narrower than Lenny's, but the topical density inside that narrow vertical is unusually high. When a user asks ChatGPT about Meta's content moderation policies or X's account verification changes, Platformer is cited at rates that exceed major newsroom outlets covering the same beats.

The mechanism is that LLMs build authority graphs at the topic-publication intersection, not just at the publication level. A narrow archive that hits the same topic 200 times beats a broad archive that hits the same topic 20 times, for queries within that topic. Platformer's 2026 citation share in platform-policy queries is roughly 2.4x higher than its share of subscriber count would predict against the comparable cohort of policy-focused publications.

The takeaway across all three: archive depth and topical density beat subscriber count in AEO. The Substack platform's architecture removes the technical friction that would otherwise prevent operators from converting writing labor into citable archive.

The Syndication Play: From Substack to ChatGPT, Perplexity, and Beyond

Publishing to Substack is the first step. Syndicating Substack content into the channels that LLMs preferentially crawl is the second step, and the one most operators under-invest in. The syndication geometry in 2026 has four primary destinations beyond Substack itself, each with different ingestion mechanics.

RSS to AI crawlers

The default Substack RSS feed at publication-slug.substack.com/feed is fetched directly by Common Crawl, GPTBot, ClaudeBot, PerplexityBot, and the major news-aggregator citation tools. The publisher needs to do nothing to enable this beyond not blocking the crawlers in their robots.txt. Substack's platform-wide robots.txt allows the major AI crawlers by default. The result is that every new post you publish is queued for AI crawler ingestion within hours of publication.

LinkedIn syndication

LinkedIn posts created from newsletter content rank highly in LLM citation for professional and B2B queries, and LinkedIn is one of the few social platforms where AI crawlers extract structured content from posts at scale. The pattern that works is to publish the full post on Substack, then publish a 600-1,200 word excerpt as a native LinkedIn article with a "read the full version on Substack" link at the end. The LinkedIn version becomes its own indexed document, the Substack version is the canonical source, and both surface in different citation queries.

Medium republication

Medium's platform-level citation share has declined relative to Substack, but Medium still provides a useful secondary indexed surface, particularly for backfilled older posts that did not get crawler attention on first publication. The canonical pattern is to use Medium's import-from-RSS feature to backfill the Substack archive on a Medium publication, with rel=canonical pointing back to the Substack URL. The Medium copy will not outrank the Substack original, but it expands the entity graph and gives the article an additional retrieval path.

Personal site or domain consolidation

Several high-citation operators have set up custom domains for their Substack publications — lennysnewsletter.com, platformer.news, garbageday.email — and use Substack's custom-domain feature to consolidate the brand on the operator's own domain. This is a meaningful AEO upgrade because the brand-mention-to-domain mapping in LLMs ties the operator's brand to their owned domain rather than to substack.com/publication-slug. It also future-proofs the archive against any change in Substack's platform strategy.

The syndication play is cheap labor relative to original writing. The marginal cost of cross-posting an existing Substack article to LinkedIn and Medium is 20-40 minutes per post. The marginal AEO lift, measured in additional citation surfaces and entity-graph reinforcement, is meaningful. Operators who treat syndication as a default workflow line item, not as an optional extra, accumulate citation share faster than operators who only publish to Substack.

The discipline required to make this syndication consistent is essentially a content-ops problem, and the patterns in our content ops AEO publishing pipeline for monthly cadence apply directly to newsletter operators running multi-platform syndication.

The Paywall vs Citation Tradeoff, with Real Numbers

The most contentious decision for a paid Substack operator in 2026 is how aggressively to paywall posts. The tradeoff is real and worth modeling explicitly. Let's work the math.

Assume a Substack with 50,000 free subscribers and 2,000 paid subscribers at $10 per month, generating $240,000 in annual subscription revenue. The operator publishes 2 posts per week, 100 posts per year. The current paywall mix is 50 percent open, 50 percent paid-only.

The 50 paid-only posts produce zero direct citation lift, because LLM crawlers cannot access them. The 50 open posts produce the full citation lift the publication can generate. Suppose each open post averages 8 LLM citations per quarter across measurable queries, for a total of 400 citations from the year's open posts.

Now consider three paywall scenarios:

ScenarioOpen posts/yearPaid-only posts/yearAnnual citationsLikely revenue impact
100% paywall01000+5-10% revenue, -100% citations
50/50 mix (current)5050400Baseline
80% open, 20% paid8020640-5-10% revenue, +60% citations
100% open1000800-15-25% revenue, +100% citations

The pattern is that citation lift scales linearly with open-post count, while revenue impact is nonlinear and depends on what value the paid tier offers. If the paid tier offers nothing the open tier does not — meaning paywalled posts are just gated versions of the same content type — moving to 80 percent open typically loses very little revenue because most paid subscribers chose the paid tier for community, founder access, or signaling reasons rather than for content scarcity. If the paid tier offers genuinely differentiated content — proprietary research, member office hours, internal tools — moving to 80 percent open loses essentially no revenue because the differentiation is preserved.

The operator failure mode is to paywall posts that are not genuinely differentiated, capturing modest short-term revenue while sacrificing large long-term citation lift. The operator success mode is to be aggressive about what justifies a paywall — typically less than 20 percent of posts — and keep the discovery layer wide open.

Lenny Rachitsky's archive is roughly 75-80 percent open, with paid layers concentrated in deep-dive series and community access. Casey Newton's Platformer runs closer to 60 percent open. Ben Thompson's Stratechery is the visible outlier at roughly 15-20 percent open by post count, but the absolute size of the open archive is so large that the citation engine still runs hot.

A 90-Day Newsletter AEO Playbook

For an operator who has a Substack (or comparable platform) and wants to convert the archive into a measurable citation engine, the following sequence works in 90 days. Each step is a discrete action with a deliverable.

1. Audit the current archive for crawler accessibility (Days 1-7) Pull a list of every published post URL. Verify each returns full HTML on a direct GET request. Confirm the RSS feed at publication-slug.substack.com/feed contains full content:encoded for the most recent 50 posts. Spot-check 10 random older posts for the same. Identify any paywalled posts that could be moved to open without revenue impact. The deliverable is a spreadsheet of every post with paywall status, indexability status, and a recommended action.

2. Set up citation tracking (Days 8-14) Use Profound, Otterly, Peec, or a comparable tool to track citation share for your publication name and your individual post URLs across ChatGPT, Perplexity, Claude, and Gemini. Baseline the current cited-post ratio. The deliverable is a dashboard you check weekly, with at minimum: total citations per week, cited-post percentage, and a ranked list of the top 20 cited posts in your archive. Without this baseline you cannot tell whether subsequent changes moved the needle.

3. Move 20-40 percent of paid posts to open (Days 15-21) Identify the posts in your paid archive that are not genuinely differentiated and convert them to open. Use Substack's bulk-edit features if available. Add a banner to converted posts noting the change and the value of the paid tier. The deliverable is a measurable increase in indexable post count, typically 20-60 additional posts.

4. Reformat the top 20 cited posts for clarity (Days 22-35) Take the top 20 posts from your citation tracking dashboard and add the AEO-friendly structural elements: clear H2 sections, a definition or summary box near the top, a numbered list or table, and 4-8 outbound citations to authoritative sources. The goal is to make these posts more quotable per chunk, which compounds existing citation share. The deliverable is 20 updated posts with cleaner structure.

5. Add a custom domain and verify canonical handling (Days 36-45) If you do not already have a custom domain, register one and configure Substack's custom-domain feature. Verify that the canonical URLs in the HTML and RSS feed point to your custom domain, not to publication-slug.substack.com. Update all OpenGraph and Twitter Card metadata accordingly. The deliverable is a custom-domain configuration that consolidates your brand on your owned domain.

6. Build the LinkedIn and Medium syndication workflow (Days 46-60) Establish a default workflow where every new Substack post is also published as a LinkedIn article (600-1,200 word excerpt with backlink) and, if relevant to your category, republished on Medium via RSS import with rel=canonical pointing to the Substack URL. The deliverable is a documented workflow your VA or you can execute in 20-40 minutes per post.

7. Commit to a steady publication cadence for 60 days (Days 61-120 ongoing) The single most important step. Pick a cadence you can sustain — 1 post per week minimum, 2 per week ideal — and hit it without exception for 60 consecutive days. Each post should be 1,500-2,500 words, single-topic, with at least one data point, one quotable summary, and one outbound citation. The deliverable is 12-18 new published posts in the 60-day window, each indexed and contributing to the cumulative archive depth.

The compounding shows up in months 4-7. Citation share is a lagging indicator because LLM crawlers and training cycles operate on weekly-to-monthly cadences and Common Crawl ingestion runs on its own schedule. Operators who execute this playbook consistently typically see a 1.5x to 3x increase in measurable LLM citation share over a 6-month window. The lift is not subscriber-driven; it is archive-driven.

The Failure Modes That Kill Newsletter Citation Share

Several common operator behaviors actively suppress citation share without producing offsetting benefits. Worth naming them explicitly so you can avoid them.

Inconsistent cadence. A newsletter that publishes for three months, goes quiet for two months, comes back for a month, and disappears again, builds essentially no citation authority. LLMs and crawlers weight publication consistency heavily as an authority signal. A reliable weekly cadence beats a burst-and-pause pattern by a wide margin even at half the total post count.

Over-paywalling. Paywalling more than 50 percent of posts in a Substack publication creates a sparse open archive that struggles to build citation authority for any topic. The operators who do this typically see strong subscription revenue in the first 18 months and then plateau because the discovery layer cannot keep growing.

Title and URL inconsistency. Substack auto-generates URL slugs from post titles, but operators sometimes manually edit titles after publication, which breaks the URL-title alignment that LLMs use as a topical signal. Once a post is published, do not change its title or URL.

Removing or unpublishing old posts. A deleted Substack post becomes a 404, breaking any inbound links and any LLM citation that referenced it. If you decide an old post is embarrassing, the right move is usually to add an editorial note at the top and leave it published, not to remove it.

Missing meta descriptions. Substack auto-generates meta descriptions from the first paragraph if you do not set one explicitly. The auto-generated descriptions are often poor. Setting an explicit 140-160 character meta description per post is a 60-second action that meaningfully improves the post's surfaceability in retrieval-augmented LLM queries.

Ignoring schema markup. Substack does not expose schema markup customization directly, but you can include structured content elements (definition boxes, FAQ sections, comparison tables) that LLMs parse and treat as quotable chunks. Operators who include these elements in every post produce more citable archives than operators who write pure narrative.

Where the Newsletter-as-Citation-Strategy Goes Next

The dynamics described in this piece are still mid-cycle. Several developments in 2026 and 2027 will reshape the equation.

First, Substack and similar platforms are starting to negotiate direct AI training deals with major model vendors. The economics will eventually flow back to publishers in some form, which will change the incentive math around full-text RSS and open posts. Operators with deep archives at the time of those deals will benefit disproportionately.

Second, retrieval-augmented LLM systems are getting better at attributing citations at the post level rather than the publication level. This means individual post authority will start to matter more than publication brand authority, which will reward operators who run focused-vertical archives over operators who run broad lifestyle newsletters.

Third, the audio and video extensions of newsletter content — Substack's audio episodes, podcast integrations, video posts — are being indexed by AI transcription pipelines that produce searchable text from non-text content. This effectively multiplies the archive depth for operators who repurpose written posts into audio or video form. The mechanics overlap with the patterns covered in our breakdown of podcast audio transcript as an AEO discovery channel.

Fourth, the major search engines are starting to weight LLM citation share as a ranking signal in their own traditional search products. Newsletters that rank highly in LLM citation queries are likely to see secondary lifts in classical search traffic over the next 18 months, which improves the ROI math for newsletter operators who were previously skeptical of investing in AEO.

The thesis remains the same through all of these shifts. The newsletter is a dual product: an email blast and an archive. The email blast monetizes today's subscribers. The archive monetizes tomorrow's discovery. The operators who treat both as first-class workstreams, with the archive optimized for crawler accessibility and citation density, will compound advantages that subscriber-focused operators cannot match by buying more list growth.

Takeaway: Stop optimizing your newsletter for subscriber count alone. The citation share that drives AEO discovery in 2026 is produced by archive depth, publication consistency, and crawler-friendly defaults — none of which scale with list size. Substack happens to ship the right architecture out of the box, which is why Lenny Rachitsky, Casey Newton, and Ben Thompson can outrun publications with 10x their subscriber base in LLM citation queries. The 90-day playbook is structural, not promotional: audit your archive, open 80 percent of posts, fix the top 20 by quotability, add a custom domain, build the syndication workflow, and commit to a steady publication cadence for 60 consecutive days. The compounding shows up in months 4-7 and accelerates from there.

Frequently Asked Questions

Why do Substack newsletters get cited so often by ChatGPT and Perplexity in 2026?

Substack newsletters get cited at outsize rates because the platform's default architecture is unusually friendly to LLM crawlers. Every published post lives at a clean, predictable URL of the form publication-slug.substack.com/p/article-slug, returns server-rendered HTML with the full article body in the initial response, exposes a complete full-text RSS feed at publication-slug.substack.com/feed, and is openly accessible by default unless the author specifically gates a post behind the paywall. Common Crawl, GPTBot, ClaudeBot, and PerplexityBot all index these patterns aggressively. The result is that a Substack archive with 400 published posts produces roughly 400 indexed, structured, citable training-corpus documents. Subscriber count does not enter the citation calculation. Archive depth and publication consistency do, and Substack happens to make both effectively free relative to a self-hosted equivalent.

Does subscriber count matter at all for AEO, or only archive depth?

Subscriber count matters indirectly through engagement signals and word-of-mouth amplification, but it does not appear to be a direct ranking factor for LLM citation. The mechanics are straightforward: an LLM citation is determined by whether the model retrieved or trained on the underlying article, which depends on whether the article was crawled, parsed cleanly, and treated as authoritative in the relevant entity graph. None of those steps inspect subscriber numbers. A 12-person Substack with 250 well-written posts on a narrow topic will outperform a 200,000-person Substack with 30 surface-level posts on a broad topic in citation queries. The 200,000-person list creates social proof and human distribution that helps secondary signals (backlinks, mentions, Wikipedia references), but the primary citation lift comes from the archive. Publishers optimizing for AEO should treat subscriber growth and archive growth as separate workstreams with different ROI curves.

Should I put my best Substack posts behind a paywall or leave them open for AI citation?

For most independent operators the right default in 2026 is to leave 70-90 percent of posts open and gate only a clearly differentiated paid tier such as deep dives, member office hours, or proprietary research. The reason is that the open posts are doing the citation work that feeds your brand into LLM answers, which in turn drives newsletter signups, which in turn drives paid conversions. If you gate everything, you optimize for short-term subscription revenue but starve the discovery funnel that LLMs now occupy. Ben Thompson's Stratechery is the visible counterexample, but it works because Thompson built brand authority over a decade of open posting before paywalling the daily update, and he still publishes a weekly free article that does the citation lift. Most operators should follow Lenny Rachitsky's pattern: extensive open archive, deep paid layer underneath, free flagship pieces on flagship topics.

How does Substack compare to Ghost, Beehiiv, and self-hosted WordPress for AEO?

Substack, Ghost, and Beehiiv all produce LLM-friendly output by default, with minor structural differences. Substack has the largest brand-recognition footprint inside LLMs because the platform corpus is enormous and the model has seen substack.com URLs repeatedly across training cycles. Ghost produces marginally cleaner JSON-LD and gives publishers more control over schema, which helps in technical AEO categories. Beehiiv has the weakest LLM citation footprint of the three because it is younger and the corpus is sparser, but the architecture is sound and citation share is rising. Self-hosted WordPress is the most flexible but requires deliberate work on RSS, schema, sitemap, and rendering configuration to match the defaults Substack ships out of the box. For a publisher choosing in 2026 with AEO as the goal, the ranking is roughly Substack, Ghost, Beehiiv, then WordPress — and the gap closes for any publisher willing to invest in WordPress configuration.

What is the fastest way to build a Substack archive that gets cited by LLMs?

Publish at a steady, predictable cadence of one to two pieces per week, each 1,500-2,500 words, each focused on a single specific question or claim, and each with at least one quotable data point sourced to a primary reference. Use clear H2 structure, a definition or summary box near the top, and explicit named entities throughout — companies, people, products, dates. Do not paywall any post during the first 18 months unless you have a clear paid value layer to gate. Cross-post a subset to your personal LinkedIn and to Medium for syndication breadth. The result is a 75-150 post archive within a year that is structurally indistinguishable from a B2B content marketing operation that cost 10-50 times more to produce. The citation lift typically materializes between months 9 and 14 as Common Crawl picks up the archive in successive sweeps.