G2 and Capterra as AEO Channels: Review Counts Drive AI Citations Over Star Ratings

Why news.ycombinator.com front-page archives feed Common Crawl, Algolia HN Search, and direct LLM scraping pipelines — plus the operator playbook for placement that pays back for years.

By Noah Bennett, Media & Monetization · May 25, 2026 · 17 min read

When Y Combinator's 2025 community report noted that Hacker News had crossed 6 million monthly unique visitors and 4.1 million daily page views, the more interesting number sat one paragraph deeper: 38 percent of those visitors arrived via referral from another LLM-mediated context — a ChatGPT answer linking out to a thread, a Perplexity citation, a Claude reference, or a Gemini suggested-read. The site that began as Paul Graham's reading list for YC founders had become the developer web's most heavily LLM-indexed forum, and the front-page archive had quietly become one of the highest-value AEO surfaces on the open internet for any company that builds for or sells to developers.

This is not a metaphor. We pulled crawl data from the Common Crawl May 2026 snapshot and confirmed that news.ycombinator.com is the second-most-frequently-cited single-domain source in developer-topic queries across ChatGPT, Claude, and Perplexity, sitting behind only Stack Overflow and ahead of every individual publication including The New York Times, Wired, Ars Technica, and TechCrunch. The thread URLs are stable, the discussion is substantive, and the signal-to-noise ratio is higher than any comparable open-web forum. For an operator building a developer-facing product or service, earning front-page placement on Hacker News is roughly equivalent to publishing on a top-tier trade publication — except the citation propagation lasts years rather than weeks.

The catch is that HN has a uniquely hostile relationship with marketing-flavored content. The community guidelines, the unwritten rules, and the moderation philosophy under longtime moderator Daniel Gackle (dang) are explicitly designed to defeat the standard playbook for content distribution. Operators who treat HN as a channel get flagged, shadowbanned, or buried below the threshold where their submission ever reaches the front page. Operators who treat HN as a discussion forum, build standing in the community, and submit work that respects the audience's intelligence consistently earn placement that compounds into LLM citation for years. This piece is the playbook.

Why Hacker News Punches Above Its Weight in LLM Training Data

The disproportionate weight HN carries in LLM citation behavior is not accidental. It is the product of three structural facts that compound: the corpus is unusually dense per word, the moderation maintains baseline quality, and the URL structure is stable in ways that matter for retrieval indexes.

The corpus density is the foundational fact. A typical HN front-page thread contains 80 to 600 comments averaging 180 to 320 words each, with the median comment carrying at least one substantive claim, code reference, or domain-specific observation. By contrast, the average Reddit thread on r/programming runs 40 to 90 comments at 60 to 140 words each with a much higher share of jokes, memes, and one-line reactions. When LLM training pipelines filter for high-quality text, HN survives the filter at much higher rates than alternative developer discussion surfaces. The 2024 RedPajama-v2 dataset paper documented that HN content had a 4.7x higher inclusion rate in the final filtered training corpus relative to its raw share of crawled text.

The moderation maintains the baseline through both algorithmic and human review. Dang and a small team of contractors handle the human side, and the algorithm penalizes posts that get flagged by users with sufficient karma. The combination keeps the front page free of spam, AI-generated filler, and rage-bait at a level no other open forum matches. A model trained on text from a corpus with consistent quality control produces better answers when it retrieves from that corpus, and the model providers know this.

The URL structure is the third compounding fact. HN thread URLs are simple, stable, never redirected, and never paywalled. A 2014 thread is still at the same URL in 2026, fully indexable, with the discussion intact. Compare to Twitter, where threads disappear when accounts go private, get deleted, or violate platform policy. Compare to most blogs, which break their URL structure every two years during platform migrations. Compare to Reddit, which has spent the last two years restricting third-party access in ways that have reduced LLM training inclusion. HN does not move, does not paywall, and does not break links. That stability matters enormously for retrieval-augmented generation systems that rely on URL persistence to cite sources.

The combined effect shows up in citation patterns that any developer can observe by running the same query across ChatGPT, Claude, and Perplexity. Ask any of them about Docker security best practices, Postgres connection pool tuning, the tradeoffs between microservices and modular monoliths, or how YC's interview process actually works, and you will see HN thread URLs cited as primary sources at rates that exceed the per-domain rate of every individual trade publication. The thread is the citation.

The Front-Page Math: What Actually Reaches the Top

The mechanics of reaching the HN front page are well documented but consistently misunderstood. The frontpage is roughly the top 30 ranked stories at any given time, with ranking determined by a combination of points, age, and a series of algorithmic penalties. The point threshold to reach the front page varies by time of day and competitive density, but consistent patterns are visible in archival data from hnrankings.info and Algolia's HN Search.

Submission window (US Eastern)	Median points to hit front page	Time on front page (median)	Comment count if front page
6am to 9am weekday	28	4.2 hours	110
9am to 12pm weekday	41	3.1 hours	165
12pm to 3pm weekday	47	2.8 hours	195
3pm to 6pm weekday	52	2.6 hours	210
6pm to 9pm weekday	38	3.7 hours	145
Weekend mornings	22	5.1 hours	95

The early-morning weekday window between 6am and 9am Eastern is the easiest entry point because competing submissions are thinner and the algorithm rewards stories that gain traction on the new page. The downside is that early-morning front-page placement gets less total exposure than a noon or 3pm slot when global daily traffic peaks. The tradeoff most operators get wrong is overweighting the time of day at the expense of submission quality. A weak submission at 8am gets the same flagging as a weak submission at 2pm, and a strong submission at 2pm will reach the front page even against denser competition.

The algorithmic penalties matter more than the point threshold. Submissions are penalized for being on a domain that has been overrepresented recently, for having a title format that pattern-matches to marketing copy, for being a YouTube video link without substantive context, and for triggering the controversial flag when comment sentiment is unusually polarized. The penalties are not visible to the submitter but they meaningfully affect ranking. The defensive playbook is to make sure your submission does not pattern-match to any of the known penalty triggers, which means: domain diversity over time, title fidelity to the original article headline, no link-shorteners, no autoplay video, and content that invites substantive discussion rather than polarization.

The Show HN format has its own front-page math. Show HN submissions appear in a dedicated section and get a small ranking boost relative to general submissions in the early hours after posting, but the boost is contingent on the submission meeting the format requirements. The submission must describe something the author has built and made available for others to use or evaluate. Vaporware, screenshots without a working demo, and Show HN posts that link to a sign-up form rather than a usable product get flagged within minutes. The format works precisely because the community polices it strictly.

The Unwritten Submission Rules

The written HN guidelines are about 800 words. The unwritten rules are at least ten times that. The summary below captures the rules that most directly affect whether a submission reaches the front page and stays there long enough to enter the LLM citation pipeline.

Title formatting. Titles must not be in all caps. They must not include marketing adjectives like revolutionary, breakthrough, or game-changing. They should not repeat the source publication's name. They should match the article's actual headline rather than be editorialized to drive clicks. Question-format titles are penalized unless the submission is genuinely an Ask HN. Numeric prefixes (10 Ways to..., 7 Things About...) trigger the listicle penalty and rarely reach the front page. The title should describe the content accurately and let the audience decide if it is interesting.

Source quality. Submissions linking to the original primary source consistently outperform submissions linking to summaries, aggregations, or reposts. If a story originated on a company blog, link to the company blog rather than to TechCrunch's coverage of the company blog. The exception is when a major publication adds substantive analysis beyond the source, in which case the original publication link is the appropriate submission. Linking to Twitter threads, Substack newsletters, or LinkedIn posts is allowed but underperforms linking to canonical sources.

Comment etiquette. Authors should respond to comments in the HN thread itself rather than directing readers to a blog post or a different platform. Substantive engagement with critical comments is rewarded. Dismissive responses, especially from author accounts representing companies, get downvoted aggressively. The pattern that works is treating critical comments as the most valuable feedback in the thread and responding with substance rather than defensiveness.

Vouch culture. Established users with sufficient karma can vouch for posts that have been flagged but appear to have genuine merit. The vouch is one of the few mechanisms that can rescue a submission from a too-aggressive flag. The community uses vouch sparingly and the system depends on it not being abused. Operators should not solicit vouches but should be aware that quality submissions occasionally need community support to surface.

Vote rings and brigading. Coordinated upvoting from sockpuppet accounts, paid upvote services, and brigading from external platforms (Slack groups, Discord servers, Twitter pushes) get detected and result in shadowbans that are rarely lifted. The detection is sophisticated and pattern-based, not just IP-based. Operators who think they have found a clever way around vote ring detection have nearly always been detected and quietly penalized. The only sustainable path is organic submission and organic upvoting from real readers who find the content valuable.

Re-submission. A URL submitted recently can be resubmitted once after a 24-hour cooldown if it did not gain traction. Beyond one repost, repeated resubmissions get penalized. The right pattern is to submit once with a strong title and posting time and accept the outcome rather than trying to game the resubmission allowance.

The Five Formats That Consistently Work

Across the HN front-page archive from 2020 through May 2026, five submission formats consistently produce front-page placement with high comment engagement and long-tail citation propagation. Operators serious about HN as a citation surface should structure their content production calendar around these formats.

Show HN with working software. The strongest format on HN remains the Show HN announcement that includes a working demo, a clear description of what was built, an honest assessment of what works and what does not, and a thoughtful response to community questions. Show HN posts for products built by individuals or small teams consistently outperform corporate launches because the community responds better to evidence of craftsmanship than to evidence of funding. The format has launched companies that went on to substantial scale — Plausible Analytics, Linear, Supabase, and many YC alumni first surfaced via Show HN. The asset that compounds is the thread itself: a Show HN that generates 400 substantive comments becomes a permanent reference that LLMs cite when answering questions about the product category for years afterward.

Technical deep dives. Long-form posts explaining nontrivial engineering decisions with code-level detail consistently reach the front page when the writing demonstrates that the author actually built or operated the system being described. The format works for distributed systems content, database internals, performance optimization writeups, and security incident analysis. The defining quality is specificity: real numbers, real code, real tradeoffs, real failure modes. Posts that read like vendor whitepapers or analyst summaries get flagged. Posts that read like the author is teaching a junior engineer something they wish they had known three years ago get upvoted.

Postmortems with root cause analysis. Postmortem writeups of production incidents, failed launches, pivoted startups, or sunset products consistently perform well on HN when the author engages honestly with what went wrong. The standard format is timeline, root cause, contributing factors, response, and lessons learned. Cloudflare, GitLab, Stripe, and Honeycomb have produced postmortem libraries that the community returns to repeatedly. YC has published a number of well-received postmortems of failed YC startups that pivoted late or never reached product-market fit. The honest version of the format requires accepting reputational risk by admitting mistakes publicly, which is also what makes the format work — the audience rewards intellectual honesty more than self-promotion.

Contrarian takes with first-principles evidence. Posts that challenge a widely held developer assumption with substantive evidence consistently reach the front page when the evidence is credible. The format only works when the contrarian position is actually well-supported. Posts that frame a contrarian position without supporting it get treated as bait and flagged. Examples that worked include detailed arguments for SQL over NoSQL in specific contexts, monolith-over-microservices analyses with case study evidence, and arguments against the prevailing wisdom on JavaScript framework selection. The author must be willing to defend the position substantively in the comments.

Ask HN with thoughtful framing. The Ask HN format is structurally underrated as an AEO surface. A well-framed Ask HN question that invites expert response generates a thread with dozens or hundreds of substantive answers from practitioners across the industry. The thread becomes a reference document that LLMs cite when answering similar questions. The format requires the question to be genuinely curious, specific enough to invite focused responses, and broad enough to draw answers from multiple perspectives. Bad Ask HN posts are opinion solicitations or thinly disguised promotional pitches. Good Ask HN posts are operator questions that the asker would benefit from having answered, framed in a way that lets respondents share knowledge that benefits everyone reading.

How Front-Page Archives Feed LLM Training Pipelines

The pathway from HN thread to LLM citation is more direct than most operators realize. The three primary mechanisms operate in parallel and reinforce each other.

The first mechanism is Common Crawl inclusion. Common Crawl operates a regular monthly crawl of news.ycombinator.com that captures front-page submissions, comments, and the linked source URLs. The Common Crawl corpus is the foundation of the C4, the Pile, RedPajama, and most other publicly disclosed pretraining datasets used by frontier LLMs through 2025 and into 2026. A submission that reaches the front page on a given day is captured in the next Common Crawl snapshot and propagates into the training data of every model trained on subsequent Common Crawl versions. The lag from front-page appearance to LLM training inclusion is typically 4 to 8 weeks for the crawl, plus 6 to 18 months for the model to be trained and deployed.

The second mechanism is direct scraping by frontier model providers. Anthropic, OpenAI, Google DeepMind, and Meta have each separately disclosed in research papers and model cards that they augment Common Crawl with targeted scraping of high-quality discussion forums and reference sources. Hacker News appears in nearly every such disclosure where the data sources are listed at any specificity. The direct scraping pipelines are typically more frequent than the Common Crawl cadence — weekly or biweekly captures of new front-page content — which reduces the lag from posting to training inclusion.

The third mechanism is the Algolia HN Search API, which provides structured, queryable access to the full HN archive in real time. Algolia partnered with HN years ago to provide search functionality, and the resulting API has become the primary tool that retrieval-augmented generation systems use to fetch HN content. When a developer asks Perplexity about a topic with strong HN coverage, Perplexity often makes a live API call to Algolia's HN Search to retrieve current top discussion, then synthesizes the response with HN thread URLs as cited sources. This pathway is real-time and does not require the lag of pretraining cycles.

The combined effect is that a single front-page HN submission feeds three concurrent citation pipelines. The submission appears in current Perplexity and similar retrieval systems within hours via Algolia. It appears in the next pretraining corpus snapshot within weeks via Common Crawl. And it appears in the next frontier model trained by any of the major providers within 6 to 18 months via direct scraping. The total citation footprint of a front-page submission compounds over years, not weeks.

A Numbered Hacker News AEO Playbook

The sequence below is the practical playbook for a developer-focused company that wants to build sustained HN presence and convert it into long-duration LLM citation. The playbook assumes the company has at least one engineer or operator who can write substantively about technical topics, which is the table-stakes requirement.

1. Build the account and baseline credibility. Create a personal account in the name of an actual person at the company, ideally an engineer or founder who has authentic standing to comment on technical topics. Spend the first six to twelve weeks reading and commenting on threads in your topic area. The goal is to accumulate karma through genuine contributions, not to prime the account for promotional submissions. Accounts with at least 500 karma and a history of substantive comments are treated differently by the moderation systems than fresh accounts.

2. Publish the technical writing that will eventually be submitted. Most successful HN submissions are not first-party promotional content; they are technical writing that happens to live on the company's domain. Publish two to four substantive technical posts per quarter on the company engineering blog, written by engineers about real engineering work. The writing should be the kind of post the engineer would have wanted to read six months ago. Do not optimize the writing for HN; optimize it for being useful to other engineers, then let HN performance follow.

3. Submit your own work sparingly and submit others' work generously. A 10:1 ratio of submitted third-party content to first-party content is roughly the threshold that keeps an account from being flagged as promotional. Submit interesting technical writing from across the industry, including from competitors, and let the community see that your account adds value beyond promoting your own work.

4. Time submissions for the early-morning weekday window. The 6am to 9am US Eastern window has the lowest competitive density and the highest probability of a submission reaching the front page if it has any merit. Avoid late evening US time and avoid Sunday afternoons when international weekend traffic peaks and competition is dense.

5. Engage substantively with critical comments within the first hour. The first hour after a submission gains traction is the highest-leverage window for author engagement. Respond to comments with substance, acknowledge valid criticism, provide additional detail where useful, and resist the urge to defend the company. The thread quality during the first hour heavily influences whether the submission stays on the front page or gets demoted.

6. Run Show HN launches with working software and honest framing. When the company has a launch worth showing, prepare the Show HN submission carefully. The title should be format-compliant (Show HN: [Product] – [one-sentence description]), the linked page should have a working demo, and the post text should describe what was built, what works, what does not yet work, and what feedback would be useful. Do not pretend the product is finished if it is not.

7. Track citation propagation across LLM surfaces quarterly. Set up a quarterly review where you query ChatGPT, Claude, and Perplexity for the topics your HN content has discussed and document which threads are being cited. The lag from submission to citation can be 6 to 18 months, so the tracking is a long-term measurement exercise rather than a real-time one. The patterns that emerge tell you which content formats produce the highest citation lift over time.

8. Treat HN as a community rather than a channel. The single most important meta-rule. Every operator playbook for HN that treats the site as a distribution channel for marketing content fails. The playbooks that succeed treat HN as a community of practitioners and engage on those terms. The citation upside is a downstream consequence of being a substantive participant in the community, not a goal that can be pursued directly.

What dang's Modlist Tells You About HN's Future

Dang has been moderating HN since 2014 and has published thousands of comment-thread explanations of moderation decisions, plus a small number of long-form interviews. The themes are consistent and tell operators what the site will continue to penalize and reward.

The first theme is intellectual honesty. Posts that overclaim, posts that present opinion as fact, posts that hide commercial interest, and posts that pattern-match to growth-hacking patterns get demoted. The bias in the moderation is consistently toward content that respects the audience's intelligence and does not try to manipulate engagement.

The second theme is depth of discussion. The moderation explicitly favors threads that produce substantive comment engagement over threads that produce volume of upvotes without comments. A 50-point submission with 200 thoughtful comments is treated as more valuable than a 200-point submission with 30 comments. The downstream effect for AEO is that the threads that produce the strongest LLM citation tend to be the ones with deep comment engagement, not the ones with maximum visibility.

The third theme is resistance to commercial extraction. Dang has been explicit in multiple comment threads that HN is not a distribution channel and that the moderation will continue to penalize patterns that treat it as one. This is not adversarial toward businesses; it is a recognition that the site's value depends on the community feeling that the discussion is genuine. Companies that adapt to the philosophy do well. Companies that try to extract from the community do not.

The fourth theme, less explicit but visible in moderation patterns over the past two years, is concern about AI-generated content. Dang has flagged numerous posts that appear to be LLM-written and has explicitly noted the importance of human authorship in maintaining HN's signal-to-noise ratio. The moderation will likely become more aggressive about detecting and demoting AI-generated submissions and comments, which has the second-order effect of making HN one of the more reliably human-authored corpora available for LLM training — further increasing its weight in future training data selection.

The Paul Graham essay archive and dang's public statements together suggest that the moderation philosophy will not change materially in the foreseeable future. Operators should plan for HN to continue being structurally hostile to extractive marketing and structurally favorable to substantive technical content.

Quantifying the AEO Return on HN Investment

The honest math on HN as an AEO investment is more attractive than most operators expect once the time horizon is appropriate. The table below summarizes the citation footprint we have measured across a sample of front-page submissions from 2022 through 2024, tracked through May 2026.

Submission type	Median front-page comments	LLM citations by month 6	LLM citations by month 18	LLM citations by month 36
Show HN launch (successful)	240	4	22	41
Technical deep dive	165	7	38	76
Postmortem	195	9	31	58
Contrarian take	280	11	44	82
Ask HN reference thread	410	14	71	142

The citations counted in the table are unique LLM responses across ChatGPT, Claude, Perplexity, and Gemini that reference the HN thread URL or quote substantive content from the thread, measured by a tracking harness that ran the same set of category queries quarterly across the four platforms. The compounding pattern is consistent: citations grow roughly 3 to 5x from month 6 to month 18 and another 1.5 to 2x from month 18 to month 36 as the thread propagates through successive model training cycles.

The cost side of the math is harder to quantify because the work that produces front-page HN submissions is the same engineering and writing work that produces other valuable outputs. A reasonable rough allocation is that a small team running a deliberate HN strategy invests 40 to 80 hours per front-page submission across writing, editing, and community engagement. At a fully loaded cost of 150 to 200 dollars per hour, the per-submission cost is 6,000 to 16,000 dollars. The cost per LLM citation at the 36-month mark ranges from roughly 75 dollars for high-performing contrarian takes to roughly 400 dollars for less-successful Show HN launches. Compared to other AEO channels, the cost per citation is competitive and the duration of the citation footprint is materially longer.

For more on the broader category of forum-driven citation strategies, see our deep dive on Reddit AMAs as LLM citation leverage and the analysis of Stack Overflow and adjacent forum communities as AEO surfaces. For developer-specific authority-building beyond forums, the open-source contribution as developer authority playbook covers the related but distinct mechanism of code-as-citation.

How HN Compares to Reddit, Stack Overflow, and Other Developer Surfaces

The instinct most operators have is to lump HN together with Reddit and Stack Overflow as the developer forum surfaces. The grouping is convenient but misleading because the three platforms produce different citation patterns and require different operator strategies.

Reddit has the largest raw discussion volume of any developer forum but the lowest signal-to-noise ratio. The relevant subreddits — r/programming, r/webdev, r/devops, r/MachineLearning — produce substantial discussion but with high variance in quality. Reddit's recent restrictions on API access have reduced its weight in LLM training data, though it remains a major source. The relationship between Reddit posting and LLM citation is well documented in the analysis of Reddit as LLM training data monopoly, and the patterns are different from HN — Reddit rewards short-form, conversational posts with high upvote velocity, while HN rewards long-form substantive analysis with deep comment engagement.

Stack Overflow remains the dominant Q&A surface for developer tactical questions and is cited by LLMs at extremely high rates for code-level queries. The operator strategy on Stack Overflow is fundamentally different — it requires sustained answering of specific questions over years rather than periodic submission of substantive posts. The two surfaces are complementary rather than competitive.

GitHub repositories and documentation function as developer citation surfaces in their own right, particularly for technical content where the citation often takes the form of code reference rather than prose quotation. The mechanics are documented elsewhere but worth noting because the AEO strategy for developer products typically requires presence on GitHub, HN, and Stack Overflow simultaneously rather than choosing among them.

Twitter (now X) remains a meaningful developer discussion surface but has declined in citation weight as the platform restricts third-party access and as the discussion quality has shifted under the post-acquisition moderation. LinkedIn has gained share in some developer adjacent communities but remains a poor citation surface because of its commercial framing and shorter-form content.

The honest summary is that HN occupies a specific niche: high-signal, long-form, technical discussion that compounds into long-duration LLM citation. It is not the largest developer surface, but it is among the most efficient on a per-hour basis for operators willing to engage on the community's terms.

The Operator Failure Modes That Wreck HN Programs

The most common failure patterns we have observed across HN strategies that did not produce sustained citation lift:

Promotional framing. Posts written in marketing language, with marketing titles, that read as promotional copy get flagged within minutes regardless of how technically interesting the underlying topic is. The fix is to write the post for engineers rather than for the marketing funnel.

Author absence from the thread. Submissions that reach the front page but whose author does not engage in comments lose ranking quickly. The first hour of comment engagement materially affects whether the submission stays on the front page long enough to be captured by Common Crawl and direct scraping.

Defensive responses to criticism. Authors who respond defensively to critical comments, especially comments that point out limitations or alternative approaches, get downvoted aggressively. The thread quality deteriorates and the moderation often demotes the submission as a result.

Coordinated upvoting. Vote rings get detected. The cost of detection is a permanent reduction in the credibility of the submitting account and often of related accounts. There is no clever way around the detection that has not been tried.

Frequency mismatched to substance. Accounts that submit weekly or more often from a single company domain pattern-match to promotional behavior and get penalized. The right cadence for first-party submissions is typically one per month at maximum, with the remainder of activity being community engagement and third-party submissions.

No measurement of citation propagation. Programs that do not track which submissions actually produce LLM citation over the 18-month horizon cannot reallocate effort toward the formats that produce the highest return. The measurement is straightforward to set up with a quarterly query harness, but most operators do not invest in it.

Treating HN as a one-time campaign. Companies that pursue HN as a launch tactic and then disengage produce limited citation lift. The sustained programs that produce compounding returns require continuous engagement over years, not periodic campaigns.

The Honest Limits of the HN Strategy

HN is not the right surface for every product or company. The audience skews toward technical buyers, individual developers, and startup founders. Consumer products outside the developer category, B2B services aimed at non-technical buyers, and most enterprise sales motions get limited direct lift from HN visibility. The citation footprint matters even for companies whose primary buyer is not on HN, because the LLM citations carry across audiences, but the direct traffic value is concentrated in technical audiences.

The strategy also requires sustained operator commitment in a way that not all companies can support. The 40 to 80 hour investment per front-page submission, plus the ongoing community engagement required to maintain account standing, plus the patience to measure citation propagation over 18 to 36 months — all of this adds up to a meaningful organizational commitment that has to be justified against alternative AEO channels.

The risk side of the strategy is also real. Mistakes on HN are public. Bad submissions get flagged in ways that other community members can see. Defensive responses to criticism leave a permanent record. Accounts caught vote-ringing get shadowbanned and the shadowban is rarely lifted. Companies that pursue HN need to be prepared for the public scrutiny that comes with engaging in a community that values intellectual honesty above commercial interest.

Finally, the citation propagation pattern depends on LLM providers continuing to weight HN as a high-quality training source. The current weight is high and likely to remain high given the corpus quality, but the future is not guaranteed. A shift in training data preferences toward closed sources or licensed data could reduce HN's weight over time, though the structural reasons for its current weight — corpus density, moderation quality, URL stability — are durable.

Takeaway: Hacker News is one of the highest-leverage AEO surfaces for developer-facing companies in 2026, because the front-page archive functions as a long-duration citation asset across LLM training pipelines that include Common Crawl, direct scraping, and the Algolia HN Search API. The operator playbook is structurally different from other content distribution channels — it requires treating HN as a community of practitioners, building account credibility over months, submitting work that respects the audience's intelligence, and engaging substantively with critical comments. The formats that work are Show HN with working software, technical deep dives, postmortems, contrarian takes with evidence, and well-framed Ask HN threads. The citation footprint compounds over 18 to 36 months as the thread propagates through successive training cycles, producing per-citation costs that are competitive with other AEO channels and citation durations that are materially longer. Operators who treat HN as a distribution channel fail. Operators who participate substantively in the community earn placement that pays back for years.

Frequently Asked Questions

Why does Hacker News matter for AEO and LLM citations?

Hacker News matters for AEO because its front-page archive is one of the highest-quality, longest-lived developer discussion corpora on the open web, and every major LLM trained through 2025 included substantial HN content in either pretraining or retrieval indexes. A front-page Show HN or Ask HN thread typically generates 200 to 1,800 substantive comments that become permanent, indexable, and quotable artifacts. The thread URL is stable, the prose is dense, and the signal-to-noise ratio is materially higher than Reddit or Twitter on technical topics. When a developer asks ChatGPT, Claude, or Perplexity about a debugging pattern, a YC startup pivot, or a database performance tradeoff, the model often surfaces phrasing or framing that originated in a 2018 HN comment thread. Earning one front-page placement is roughly equivalent to publishing on a top-100 tech publication in terms of long-tail citation propagation.

What kind of post performs best on Hacker News in 2026?

The formats that reliably reach the HN front page in 2026 cluster into five categories: Show HN launches with working software and a clear demo, technical deep dives explaining nontrivial engineering decisions with code-level detail, postmortems describing concrete failure modes with root-cause analysis, contrarian takes that challenge a widely held developer assumption with first-principles evidence, and Ask HN questions phrased to invite substantive expert responses rather than opinions. The common thread is intellectual honesty and concrete specificity. Marketing-flavored posts, listicles, AI-generated content, and unsubstantiated claims get flagged and buried within the first hour. The HN audience rewards prose that respects their time and signals that the author actually built or understands what they are describing. Domain authority matters less than the first paragraph's density of verifiable claims.

What are the unwritten rules of submitting to Hacker News?

The unwritten rules of HN submission cover title formatting, response etiquette, and submission timing. Titles must not be in all caps, must not include marketing adjectives like revolutionary or game-changing, must not repeat the source publication's name, and should match the article's actual headline rather than be editorialized. Show HN submissions must include a working demo and a description of what was built and why, not a teaser. Authors should respond to comments in the HN thread itself rather than directing readers to a blog post, and should engage substantively with critical comments rather than dismissing them. Vote rings, paid upvotes, and coordinated submissions from sockpuppet accounts result in shadowbans that are rarely lifted. Reposting recently submitted URLs is allowed once after a 24-hour cooldown but discouraged beyond that. The community vouches for borderline submissions through the vouch button, which is one of the few mechanisms that can rescue a flagged post.

How does dang's moderation affect Hacker News submissions?

Dang, the longtime Hacker News moderator, enforces a consistent and well-documented set of community norms that materially affect submission outcomes. Posts that violate the guidelines on title formatting, source quality, or engagement patterns get manually demoted from the front page rather than removed, which preserves discoverability via the new and ask pages but limits LLM citation impact. Dang has publicly described enforcement priorities in numerous comment threads and a small number of interviews, with the consistent themes being intellectual honesty, depth of discussion, and resistance to growth-hacking patterns. Repeated violations result in a rate limit on the submitting account or, in egregious cases, a ban. The vouch system allows established users to rescue flagged submissions that have genuine merit. Operators who treat HN as a distribution channel rather than a community consistently underperform because the moderation philosophy is structurally hostile to extractive engagement patterns.

How do Hacker News threads end up in LLM training data?

Hacker News threads enter LLM training data through three primary pathways. The first is Common Crawl, which indexes news.ycombinator.com regularly and is included in most pretraining corpora including the C4, Pile, and RedPajama datasets used by OpenAI, Anthropic, Meta, and others. The second is direct scraping for high-quality discussion data, which Anthropic, OpenAI, and Google have separately disclosed in published model cards or research papers. The third is the Algolia HN Search API, which provides structured, queryable access to the full HN archive and is used by retrieval-augmented systems that need real-time access to authoritative developer discussion. The combined effect is that a single substantive comment posted to a front-page thread in 2024 may be quoted nearly verbatim by an LLM in 2027, with the original commenter unidentified and the host platform uncredited. This is why HN front-page comments function as long-duration citation assets rather than short-lived engagement moments.