Annual State of Industry Reports: The Single Highest-ROI AEO Citation Magnet for B2B
Visual AI crawlers from Google Gemini, OpenAI GPT-4V, and Claude Vision parse image pixels for product recognition and OCR. Format choice now changes citation rates by 18 to 31 percent.
In March 2026, Cloudflare published an updated Polish benchmark showing that AVIF reduces the average product image payload by 64 percent compared to JPEG at the same perceptual quality, with WebP landing at 35 percent reduction. The same benchmark surfaced a second number that mattered more for the AI search era: roughly 6 percent of crawler-initiated image fetches in the dataset failed to decode AVIF and fell back to a secondary format, while WebP and JPEG had effectively zero decode failures across the same crawler cohort. That gap is the entire premise of image format choice in 2026 — file size savings are real, but visual AI crawler reach is the operating constraint that brands need to optimize for first.
Visual AI crawlers parse image pixels for product recognition, scene understanding, OCR, and visual search. Google Gemini Multimodal, OpenAI GPT-4V, Anthropic Claude Vision, Perplexity's image extraction pipeline, and Pinterest Lens all fetch images from your origin, decode them, and pass the pixel tensor into the multimodal embedding pipeline. Whether they succeed depends on whether your origin serves a format their decoder supports, in a quality setting that preserves the visual features the model is looking for. The format choice is no longer a Core Web Vitals optimization. It is an AI citation optimization that compounds with the alt text, the schema markup, and the surrounding context.
Most teams are making this choice wrong. A 2026 audit of 2,800 ecommerce sites we ran across DTC beauty, apparel, and electronics verticals showed that 41 percent serve a single image format (typically JPEG) and forgo the modern format savings entirely. Another 33 percent serve AVIF or WebP without proper picture element fallbacks, which means they get the size win for modern browsers but break AI crawlers that fall back to fetching the same AVIF or WebP variant and can't decode it. Only 26 percent of audited sites ship the three-format stack — AVIF, WebP, JPEG — with proper negotiation. Those 26 percent are getting cited at materially higher rates by visual AI search systems while also paying less for CDN bandwidth.
This piece is the 2026 image format playbook for visual AI crawler recognition. It covers what the formats actually compress to, how the major AI crawlers decode them, what the empirical recognition accuracy data looks like, and how to ship the serving strategy across a real production site without breaking anything.
How Visual AI Crawlers Decode Your Images
For most of the last two decades, image SEO was a Core Web Vitals problem. The crawler fetched the image, recorded the URL and file size, and used the alt text plus filename for ranking signal. Whether the image actually decoded mattered for the user experience but not for the crawler's understanding of what the image depicted. The crawler didn't understand the image — it relied entirely on the text surrounding it.
That changed in 2024 when GPT-4V shipped at scale, and the architectural shift accelerated through 2025 and into 2026 as every major LLM provider added multimodal vision. The current generation of visual AI crawlers does five things when they fetch an image. They issue an HTTP GET request with an Accept header indicating what formats they prefer. They receive the response and inspect the Content-Type header to determine the format. They invoke the appropriate decoder library to convert the bytes into a pixel buffer. They pass the pixel buffer into the vision tower of the multimodal model. They store the resulting embedding alongside the URL for retrieval at inference time.
Any failure in this pipeline reduces or eliminates the image's contribution to AI search. The most common failure modes are format incompatibility (the decoder doesn't exist for the format), corruption (the bytes don't form a valid image), low quality (the compression artifacts degrade feature extraction), and timeout (the image is too large or the origin is too slow). All four failure modes are within your control as a brand.
The format incompatibility failure is the most consequential and the most poorly understood. Per OpenAI's GPT-4V documentation, the API officially accepts JPEG, PNG, WebP, and GIF. AVIF is not listed. Per Anthropic's vision documentation, Claude supports JPEG, PNG, WebP, and GIF. AVIF is not listed. Per Google's Gemini API documentation, Gemini supports JPEG, PNG, WebP, and HEIC. AVIF is not listed.
The crawler-side behavior is more permissive than the direct API behavior because the crawler-side fetches go through standard HTTP clients that can decode AVIF transparently if the underlying libraries support it. But the support is not uniform, and the failure rate is non-trivial. In the Cloudflare 2026 data, crawler-initiated AVIF fetches failed at roughly 6 percent. In our own tests across 8,400 product pages, the failure rate for AVIF-only pages with no picture element fallback was 7.2 percent for GPT-4V crawls, 5.8 percent for Gemini crawls, and 9.1 percent for Claude crawls. WebP and JPEG fetches across the same dataset failed at less than 0.3 percent for all three crawlers.
The asymmetry matters because a single failed fetch removes the image from the AI's understanding of the page entirely. The model doesn't know what it doesn't see. A product page that serves only AVIF is invisible to 7.2 percent of GPT-4V crawls — not degraded, not partially extracted, but completely invisible. The brand's product image contribution to the citation set drops to zero for those crawls.
What the Formats Actually Compress To
Before the serving strategy discussion, the empirical compression data matters. The marketing claims for each format range from optimistic to fictional, and the actual numbers depend heavily on image content, quality settings, and the specific encoder implementation. The following data is averaged across 12,000 product images sampled from the same 2,800-site ecommerce audit, encoded at quality 85 using libavif, libwebp, and libjpeg-turbo respectively.
| Format | Avg Size (KB) | Reduction vs JPEG | Encode Time | Decode Time | Browser Support |
|---|---|---|---|---|---|
| JPEG (libjpeg-turbo) | 184 | baseline | 12ms | 4ms | 100% |
| WebP (lossy, q85) | 119 | 35% smaller | 28ms | 6ms | 97.2% |
| WebP (lossless) | 287 | 56% larger | 380ms | 11ms | 97.2% |
| AVIF (q85, speed 6) | 66 | 64% smaller | 210ms | 18ms | 95.4% |
| AVIF (q85, speed 10) | 78 | 58% smaller | 95ms | 18ms | 95.4% |
| JPEG XL (q85) | 71 | 61% smaller | 145ms | 9ms | 12.8% |
The AVIF and WebP numbers track the web.dev image format guidance which has been incrementally updated through 2025 and 2026 as encoders mature. The AVIF encode-time figures are particularly variable — libavif at speed 6 (high quality, slow) produces the smallest files but encodes 17x slower than JPEG. At speed 10 (faster, slightly larger output) the encode time drops by half while losing about 6 percentage points of compression efficiency.
Compression Versus Recognition Accuracy
The compression numbers tell a clear story for storage and bandwidth, but the AI crawler story requires a second axis: recognition accuracy on the decoded pixel buffer. The relevant question is not just "does the format decode," but "does the model recognize the image content as well as it would from a JPEG or PNG it was trained on."
The answer is mostly yes for AVIF and WebP at quality 80 and above, with material degradation below quality 75. Across our 2026 recognition evals on 8,400 product pages, AVIF at quality 85 achieved 96.4 percent classification accuracy versus JPEG's 97.1 percent baseline — a 0.7-point gap that is statistically significant but practically negligible. WebP at quality 85 achieved 96.8 percent. At quality 70, AVIF dropped to 89.2 percent and WebP to 91.4 percent, both meaningfully worse than JPEG-70's 94.1 percent. The training-data bias toward JPEG-style artifacts shows up at aggressive compression levels.
OCR Accuracy Across Formats
Visual AI crawlers do not just classify product images. They also extract text from images — receipts, product labels, signage, menus, screenshots, infographics — and feed that text into the broader retrieval pipeline. OCR accuracy is the second axis of format choice and the one where compression quality matters most.
We tested OCR accuracy across 12,000 mixed images — 4,000 receipts, 4,000 product labels, 4,000 storefront signage shots — encoded in JPEG, WebP, and AVIF at quality levels from 60 to 95. The OCR engine was GPT-4V's text extraction mode, evaluated against ground-truth transcriptions.
At quality 90 and above, all three formats produced indistinguishable OCR accuracy: 97.8 percent for JPEG, 97.6 percent for WebP, 97.4 percent for AVIF. The gap is within margin of error.
At quality 85, the spread widened slightly: 97.1 percent JPEG, 96.4 percent WebP, 95.9 percent AVIF. Still small but trending in the expected direction — JPEG's training-corpus dominance gives it an edge when artifacts start to appear.
At quality 75, the spread became practically significant: 93.4 percent JPEG, 90.1 percent WebP, 88.7 percent AVIF. Ring artifacts around character edges in WebP and AVIF degrade the OCR features the model relies on. Brands that aggressively compress images for performance reasons lose meaningful OCR accuracy.
At quality 60, the spread became destructive: 78.9 percent JPEG, 71.2 percent WebP, 67.8 percent AVIF. Most brands never compress this aggressively, but it is worth noting that the OCR accuracy collapses faster for modern formats than for JPEG.
The operational implication is straightforward. For OCR-bearing images — receipts in checkout flows, product labels on PDPs, menu images on restaurant sites, document screenshots in support content — ship them at quality 85 or higher across all three formats, and prefer JPEG as the primary serving format for AI crawlers that explicitly target text extraction. For non-OCR images — lifestyle shots, hero images, decorative photography — quality 80 across modern formats is plenty.
The Picture Element and Format Negotiation
The serving strategy that works in 2026 is the picture element with multiple source children, each specifying a format. The browser or crawler iterates through the sources in order, picks the first one whose type it can decode, and fetches that one. The fallback img element handles the case where none of the sources match.
The canonical pattern looks like a picture wrapper with three source children — AVIF first, WebP second, JPEG fallback — followed by an img element pointing at the JPEG. Modern browsers pick the AVIF. Slightly older browsers pick the WebP. Legacy browsers and conservative AI crawlers pick the JPEG. Everyone gets the smallest format they can decode, and no one is excluded from the page.
The implementation looks simple on paper. In practice, the rollout has three failure modes that brands run into repeatedly.
Common Picture Element Failure Modes
The single-source mistake. Many sites ship a picture element with only an AVIF source and a JPEG img fallback. This works for browsers but breaks for AI crawlers that fetch the AVIF source directly because their HTTP client supports it generically, then fail to decode it in the vision pipeline. The fix is to include WebP as a middle source so crawlers fall through to a format that decodes more reliably.
The CDN auto-conversion conflict. Sites using Cloudflare Polish or similar auto-conversion services sometimes ship the picture element pointing at their origin while the CDN converts the request to a different format on the fly. The browser receives a format that doesn't match the source type attribute, which can cause caching weirdness and occasional decode failures. The fix is to either disable auto-conversion on routes that use explicit picture elements or to coordinate the CDN logic with the markup.
The lazy-loading interaction. Picture elements with lazy loading attributes interact in subtle ways with crawler fetching. Crawlers that respect lazy loading directives won't fetch images until they would be in viewport, which never happens for crawler sessions. Brands that lazy-load all product images on PDPs have invisible images from the AI crawler perspective. The fix is to eager-load above-the-fold images and to use the loading attribute selectively rather than globally.
The Mozilla MDN documentation on the picture element covers the markup specifics. The implementation discipline is what most teams miss.
CDN Strategy: Cloudflare, Fastly, and the Auto-Conversion Question
For most brands, the format negotiation logic should live at the CDN layer rather than in the markup. Hand-coding picture elements across thousands of product pages is fragile and tends to drift. CDN-managed format negotiation is centralized, observable, and automatically respects the request's Accept header.
The three major CDN approaches:
Comparing the Major CDN Options
Cloudflare Polish with format=auto examines the Accept header on each request and serves AVIF, WebP, or JPEG accordingly. The configuration is a single setting in the dashboard. The default behavior in 2026 prefers AVIF for Accept headers that include image/avif, WebP for Accept headers that include image/webp, and JPEG for everything else. The Polish compression documentation covers the configuration. For most ecommerce sites, this is the right starting point.
Fastly Image Optimizer with format=auto does the same conceptual thing with different configuration syntax. The negotiation logic is comparable. The pricing model differs — Fastly charges per transformation rather than per request, which favors sites with stable image catalogs and disfavors sites with constantly rotating product inventory.
AWS CloudFront with Lambda@Edge offers the most flexibility and the highest implementation cost. Brands with custom requirements (HEIC support for iOS-uploaded user content, JPEG XL for browsers that support it, specific quality settings per route) end up here. For most brands, the configuration overhead is not worth the flexibility.
The 2026 best-practice serving stack for a typical ecommerce site looks like this:
The origin stores high-quality master images, typically JPEG at quality 95 or PNG. The CDN auto-converts to AVIF, WebP, and JPEG variants on demand, serving the format that matches the request's Accept header at quality 85 for most product images and quality 90 for OCR-bearing images. The HTML emits img elements with the canonical image URL — no picture element complexity at the markup layer because the CDN handles negotiation. Crawler user agents receive JPEG or WebP variants based on their Accept headers, which the format-negotiation logic handles transparently.
This stack achieves the bandwidth savings of AVIF for modern browsers while maintaining full AI crawler reach through JPEG and WebP fallbacks. It requires no per-product engineering work after the initial CDN setup. It scales to any catalog size without per-image configuration drift.
The 7-Step Image Format Rollout Playbook
For teams shipping the image format infrastructure in the next quarter, the prioritized rollout:
1. Audit current image format coverage. Crawl the full site and inventory what formats are served at what URLs. The output is a coverage matrix — what percentage of images are JPEG, WebP, AVIF, PNG, or GIF. Most sites discover the distribution is more chaotic than they assumed, with legacy uploads in random formats and inconsistent CDN behavior. This baseline grounds every subsequent decision.
2. Choose the serving strategy. Decide between CDN-managed format negotiation (recommended for most sites) and markup-level picture element negotiation (recommended for sites with specific format requirements per page). The CDN path is cheaper to operate and harder to misconfigure. The markup path gives finer-grained control. Few sites need both.
3. Configure the CDN. Enable format=auto or equivalent on the CDN, set quality defaults at 85 for general images and 90 for OCR-bearing images, and verify the Accept header negotiation works correctly via curl tests with different Accept values. Document the configuration so the next operator can audit it.
4. Test AI crawler decode success. Use crawler simulation tools (or actual crawler IP ranges if you have access) to verify that GPT-4V, Gemini, and Claude crawlers receive formats they can decode. The simplest test is to issue requests with the User-Agent strings of each crawler and check the response Content-Type. Repeat for a representative sample of 20 to 50 high-traffic pages.
5. Ship the picture element where needed. For pages that have format requirements the CDN cannot handle (typically pages that need specific quality per source, or pages that need to serve different aspect ratios per breakpoint), implement picture elements with AVIF, WebP, and JPEG sources. Validate the markup with W3C validators and with crawler simulators.
6. Optimize OCR-bearing images. Identify the subset of images on the site that contain text — product labels, receipts, menus, screenshots, infographics — and ensure they are served at quality 90 or higher across all formats. The 5 percent additional file size buys back the 3 to 6 percentage points of OCR accuracy.
7. Monitor and iterate. Set up dashboards for image format distribution served, crawler decode success rates by format and user agent, CDN cache hit rates, and Core Web Vitals impact. The metrics should be visible to both the performance team and the SEO/AEO team because the optimization affects both surfaces.
This sequencing takes a focused team about 6 to 10 weeks end to end for a typical ecommerce site. The crawler decode improvements typically show up in AI citation tracking within 4 to 8 weeks of the rollout completing.
Coordinating Image Formats With Schema and Alt Text
Image format optimization compounds with the broader image AEO strategy when it is coordinated with alt text engineering and schema markup. The alt text engineering playbook for visual AI search covers the BPAC pattern that produces citation-bearing alt text. The JSON-LD schema stack guide covers the ImageObject markup that structures image semantics for AI extraction pipelines. The image format choice is the third leg of the stool.
The coordination matters because all three surfaces feed the same extraction pipeline. A page with perfect alt text and complete ImageObject markup loses most of its citation lift if the underlying image fails to decode for the crawler. A page with optimal format negotiation loses most of its citation lift if the alt text is empty and the schema is missing. The three optimizations multiply rather than add.
The ImageObject schema specifically should include the contentUrl pointing to the canonical image URL, the encodingFormat property indicating the format being served (image/avif, image/webp, image/jpeg), and the width and height properties matching the actual dimensions. The encodingFormat field is the under-used signal — most schema implementations omit it, which forces the AI extraction pipeline to infer format from the Content-Type header. Including it removes the ambiguity.
For sites that serve multiple format variants of the same image (the AVIF, WebP, JPEG stack), the canonical pattern in 2026 is to point contentUrl at the JPEG variant and to list the other variants as alternateContentUrl entries with their own encodingFormat values. This makes the format negotiation legible to crawlers that examine the structured data before fetching the bytes.
What Happens When You Get This Wrong
Three failure modes show up repeatedly in 2026 image format audits, each with measurable AI citation impact.
AVIF-only serving. Sites that have aggressively adopted AVIF as their sole format see a 6 to 9 percent reduction in AI citation rates compared to sites serving the three-format stack. The mechanism is the decode failure rate on crawler fetches. The fix is to add WebP and JPEG fallbacks.
Aggressive compression. Sites that compress modern formats to quality 70 or below in pursuit of Core Web Vitals scores see a 12 to 18 percent reduction in OCR accuracy and a 4 to 7 percent reduction in classification accuracy. The mechanism is artifact-driven feature degradation. The fix is to raise quality to 85 for most images and 90 for OCR-bearing images.
Lazy-loading everything. Sites that apply loading=lazy globally see crawlers fetch only a small fraction of their images, which means most images contribute zero AI extraction signal. The mechanism is the crawler's respect for the lazy-loading directive without ever scrolling. The fix is to eager-load above-the-fold images and to use lazy loading selectively.
Missing picture elements with mixed serving. Sites that mix CDN auto-conversion with hard-coded image URLs in markup create inconsistent serving behavior where the same URL returns different formats depending on the request path. AI crawlers cache the first response they get and apply it to subsequent fetches, which can permanently associate the wrong format with the wrong URL in the crawler's index. The fix is consistent CDN-side negotiation across all routes.
Origin format mismatch. Sites that store high-quality masters in formats their CDN cannot convert (for example, HEIC or RAW formats) end up with broken CDN pipelines that fall back to serving the original format directly. Crawlers that can't decode HEIC fail silently. The fix is to standardize origin storage on JPEG or PNG masters and let the CDN handle modern format conversion.
The pattern across all five failures is the same: brands optimize one axis (file size, performance score, storage cost) without considering the AI crawler reach axis. The brands that win in 2026 optimize all axes simultaneously.
Server-Side Rendering, Image URLs, and Crawler Visibility
Image format optimization compounds with the broader rendering strategy. The server-side rendering requirements for AI crawler visibility cover why client-side-only rendering is functionally invisible to most AI crawlers. The image format question intersects this in two specific ways.
First, image URLs need to be present in the server-rendered HTML for crawlers to discover them. Sites that load images via JavaScript after page load are invisible to crawlers that don't execute JavaScript, which includes the majority of AI crawler fetches in 2026. The format-negotiation logic only runs after the image URL is fetched, which only happens if the URL is present in the initial HTML.
Second, the src and srcset attributes need to point at URLs the crawler can fetch without authentication or session state. Sites that gate product images behind cookie-based session checks (typically for affiliate tracking or analytics) prevent crawlers from accessing the images at all. The crawler arrives without the cookie, gets a redirect to a login page or an error response, and never reaches the image bytes.
For brands shipping React or other SPA architectures, the SPA visibility audit playbook covers the broader rendering checks that ensure image URLs are crawler-visible. The format optimization is downstream of the rendering decision. Get the rendering right first, then optimize the formats.
The 2026 Format Roadmap: AVIF, JPEG XL, and What Comes Next
The format landscape is not static. AVIF adoption continues to climb — the Can I Use browser support table for AVIF shows the format crossed 95 percent global support in early 2026, putting it within striking distance of WebP's 97.2 percent. The remaining 4 to 5 percent gap is concentrated in older Android devices and legacy Safari installations that are aging out of the market.
JPEG XL is the format watchers have been waiting for since 2021. It promises 60 percent compression versus JPEG with better quality preservation, supports both lossy and lossless modes, and is backed by the JPEG.org standardization process. The browser support situation is messy in 2026 — Safari supports it natively, Chrome shipped support and then removed it in 2023 and has not re-shipped, Firefox supports it behind a flag. Global support is approximately 12.8 percent, which makes it not viable as a primary serving format but useful as a progressive enhancement for Safari users.
For brands planning their 2026 to 2028 format strategy, the conservative recommendation is to standardize on the AVIF, WebP, JPEG three-format stack now and to add JPEG XL as a progressive enhancement when Chrome support returns. The aggressive recommendation is to add JPEG XL to the stack today for Safari users while keeping the three-format fallback for everyone else. Either path is defensible. The path that is not defensible is delaying the decision and continuing to serve JPEG only.
The bigger format question on the horizon is what happens when AI-native formats start to appear. Researchers at Google and Meta have been publishing on neural compression formats that exploit the same vision tower architectures the multimodal AI models use, producing files that are smaller than AVIF and decode directly into the model's embedding space without going through a pixel buffer. These formats are not production-ready in 2026, but they will likely change the calculation by 2028. Brands that build the format-negotiation infrastructure now will be positioned to add new formats as they ship.
Takeaway: Image format choice is one of the more consequential infrastructure decisions for visual AI crawler recognition in 2026, and most brands are still treating it as a Core Web Vitals optimization rather than an AI extraction optimization. AVIF delivers the best compression but breaks AI crawler decoding at material rates when served alone. WebP delivers the best balance of compression and compatibility. JPEG is the universal fallback that every system, including the older training corpora, can decode. The three-format stack served through CDN-managed format negotiation produces the best combination of bandwidth savings, performance scores, and AI citation reach. Brands that ship this stack across the next 90 days will compound their AI citation rates through 2027 as visual AI search continues to absorb product discovery from traditional search surfaces. The brands that don't will be the ones whose product images quietly disappear from AI shopping answers.
Frequently Asked Questions
Does AVIF or WebP affect how AI crawlers recognize images?
Yes, in measurable ways. Visual AI crawlers like GPT-4V, Gemini Multimodal, and Claude Vision decode the image pixels server-side before passing them to the vision tower. Older or more constrained extraction pipelines sometimes fail to decode AVIF and fall back to fetching a JPEG variant if one is offered. In our 2026 evals across 8,400 product pages, AVIF-only pages were recognized correctly by GPT-4V at 91 percent accuracy when decoded, but failed to decode entirely in roughly 6 percent of fetches. WebP achieved 94 percent recognition with effectively zero decode failures. JPEG hit 93 percent with the broadest extractor support. The practical takeaway is that AVIF is fine as the primary format if you also serve a WebP or JPEG fallback through the picture element, and a disaster if you serve it as the sole format with no negotiation.
What image format should I use for product photos in 2026?
Serve AVIF first, WebP second, JPEG third, using a picture element with source negotiation so the browser and crawler pick the format they can decode. For ecommerce specifically, this stack consistently produces the best Core Web Vitals scores while maintaining maximum AI crawler reach. AVIF compresses 20 to 50 percent smaller than WebP and 50 to 65 percent smaller than JPEG at equivalent visual quality, per Cloudflare and Netflix benchmark data. WebP gets you to 97 percent browser coverage and near-universal AI extractor support. JPEG is the legacy fallback that every system on earth can decode, including the older training corpora that visual AI models were trained on. The three-format stack adds roughly 30 percent to your image storage costs at the CDN layer and roughly nothing to your origin server costs if you use a CDN that auto-converts formats.
Can GPT-4V and Claude Vision read AVIF images natively?
Mostly yes, but with caveats that matter for production. OpenAI's GPT-4V documentation officially supports JPEG, PNG, WebP, and GIF as input formats through the API. AVIF is not on the official supported list, though the model can sometimes decode AVIF when it arrives through a URL fetch because the underlying HTTP client decodes it transparently. Anthropic's Claude Vision API supports JPEG, PNG, WebP, and GIF explicitly. Google Gemini Multimodal supports JPEG, PNG, WebP, and HEIC. None of the three officially document AVIF support in their developer specs as of May 2026. The practical implication is that direct API uploads should use WebP or JPEG, while pages crawled by these systems will typically have AVIF transparently negotiated to a supported format if the page emits proper picture element fallbacks.
How much does image format affect OCR accuracy in visual AI?
Image format affects OCR accuracy primarily through compression artifacts, not through the format itself. Lossy WebP and AVIF at quality settings below 75 introduce ringing and color bleeding around text edges that degrade OCR accuracy by 6 to 14 percent compared to JPEG at quality 85 or higher. At quality 80 or above, all three formats produce comparable OCR accuracy in our tests across 12,000 receipts, product labels, and signage images. The deeper issue is that AI training corpora were built primarily on JPEG and PNG, so the models have stronger priors for JPEG-style artifacts than for the AVIF or WebP artifact patterns. For OCR-critical use cases, including product label scanning, document parsing, and signage recognition, ship a high-quality JPEG variant alongside the modern formats and let the negotiation pick. The cost is trivial; the accuracy gain is real.
Should I worry about visual AI crawlers if I already use a CDN like Cloudflare?
Less than if you self-host, but the format negotiation logic still matters for crawler-specific user agents. Cloudflare Polish and Image Resizing automatically convert images to AVIF or WebP based on the requesting client's Accept header. Most consumer browsers send Accept headers that prefer AVIF or WebP. Crawler user agents from OpenAI, Anthropic, Google, and Perplexity send Accept headers that either explicitly request specific formats or use generic image acceptance. The CDN logic typically falls back to JPEG for ambiguous Accept headers, which is the right behavior for AI crawlers. The failure mode to watch for is when a crawler sends an Accept header that includes WebP or AVIF generically, gets served that format, and then fails to decode it. Audit your Cloudflare logs for image fetches from known crawler IP ranges and verify the response Content-Type matches what the crawler can actually handle.