10 Things Blocking Your Website from AI Search

Contents

1. JavaScript-rendered content
2. Robots.txt blocking AI crawlers
3. Content hidden behind interactive UI elements
4. Missing or JS-rendered publish dates
5. Entity gaps in your content
6. Poor HTML semantic structure
7. Lazy-loaded content and images
8. No third-party brand mentions
9. Duplicate or thin content across pages
10. No AI traffic measurement in place
Where to start

Your site ranks on Google and content is solid. So why, when someone asks ChatGPT or Perplexity to recommend a provider in your space, don’t you appear?

At SoBold, the gap between organic and AI visibility is almost always technical. The following ten structural issues are what our team finds most often when auditing enterprise sites for AI search readiness. Each one has a specific fix your development team can action this week.

1. JavaScript-rendered content

A significant proportion of modern websites have JS dependencies that affect AI crawler access, as AI crawlers can’t render JavaScript. GPTBot, ClaudeBot, and PerplexityBot see raw HTML only, so any content loaded via client-side JS is invisible to them.

A 2025 study found JS pages take 9x longer to process than static HTML (313 hours vs 36 hours to equivalent depth). Google quietly removed JS accessibility warnings from its documentation in March 2026, but that only benefits Google’s own rendering pipeline. Every competing AI system still hits the same wall.

The quickest diagnostic is to use the free “View Rendered Source” Chrome extension. It shows raw HTML on the left and the rendered DOM on the right, with green highlights for content that only exists after JavaScript fires. If your main body copy is green-highlighted, AI crawlers aren’t seeing it.

The fix: Server-side rendering (SSR) or static site generation. If you’re running WordPress with proper server-side rendering, your content is already in the “initial HTML” when crawlers arrive. For React, Next.js, or Vue-based sites, SSR configuration is the priority.

2. Robots.txt blocking AI crawlers

Some sites block GPTBot, ClaudeBot, and PerplexityBot deliberately. Many more block them accidentally because they copied a robots.txt template without checking what it disallows. Either way, you’ve opted out of the channel entirely.

Bing updated its webmaster guidelines in February 2026 to explicitly cover “Bing search experiences, Copilot, and grounding API results.” Bing’s Fabrice Canel has consistently emphasised that rankings depend on what users actually experience on the page. If your robots.txt prevents AI user agents from seeing your content, there’s nothing to rank.

We’ve seen this on sites where a previous agency added aggressive bot-blocking rules years ago and nobody revisited them. The file sits there quietly killing visibility.

The fix: Audit your robots.txt file. Search for GPTBot, ClaudeBot, PerplexityBot, and any blanket Disallow: / rules applied to unfamiliar user agents. Allow the crawlers you want to index your content.

3. Content hidden behind interactive UI elements

Google doesn’t click tabs, and it doesn’t expand accordions or trigger “load more” buttons. Content behind any JS interaction is invisible during Google’s “first crawl wave” and completely inaccessible to AI crawlers.

The patterns we encounter most are as follows: tabbed product descriptions where only the default tab gets indexed, expandable FAQ sections where answers load on click, paginated review carousels, and accordion layouts hiding service details until a user interaction fires.

The fix: Render all content in the DOM on page load. If you want progressive disclosure for the user experience, use CSS (e.g. display: none toggled by CSS or minimal JS) while keeping the actual content in the initial HTML. The content must exist in source, whether or not it’s visually expanded.

4. Missing or JS-rendered publish dates

Freshness is a direct selection signal for AI systems. Ahrefs found that 76.4% of ChatGPT’s most-cited pages were updated within 30 days, and AI-cited content is on average 25.7% fresher than standard search results.

If your publish and last-modified dates are rendered via JavaScript, or missing entirely, crawlers can’t assess how current your content is. A page updated last week looks identical to one from 2019 if neither date appears in the raw HTML. This trips up a surprising number of WordPress sites where theme developers render dates dynamically for design flexibility.

The fix: Render dates in clean HTML, visible in the page source. Add datePublished and dateModified properties in Article schema markup. Update both the visible date and the schema whenever content is meaningfully revised.

5. Entity gaps in your content

Google evaluates content through entities (concepts with Knowledge Graph entries) rather than keyword density alone. A page targeting “enterprise WordPress migration” that doesn’t mention data migration, DNS propagation, content management system, or server architecture, signals incompleteness to both Google’s NLP and to RAG retrieval systems.

Google’s NLP API is free to test at cloud.google.com/natural-language. Paste your page content, click Analyze, and it shows every entity Google detects along with salience scores and content category. Run the same analysis on the page ranking first for your target term. The difference between those two entity lists is your optimisation checklist.

The fix: Run entity gap analysis against the top 10 ranking pages. Enrich your content with the entities you’re missing. For a deeper look at how entity enrichment fits into broader AI content strategy, we’ve covered the methodology separately.

6. Poor HTML semantic structure

RAG systems parse and chunk content based on HTML structure. Google Vertex AI’s default chunk size is 512 tokens (roughly 350 to 400 words), and its layout-aware chunking keeps all text within a chunk from the same structural element. It won’t grab half a table or split a list midway.

When pages are built with nested div elements instead of semantic HTML, the chunking engine produces imprecise embeddings. Each chunk becomes a blurred mix of topics instead of a clean, retrievable unit. We’ve tested this directly: restructuring a service page from generic divs to proper semantic HTML moved it from zero AI citations to appearing in Perplexity results within three weeks.

The fix: Use proper semantic HTML throughout. Heading hierarchy (h1 through h3), ordered and unordered lists, tables, blockquotes. Keep each section self-contained within roughly 350 to 400 words so it maps cleanly to a single retrieval chunk.

7. Lazy-loaded content and images

Images behind JS interaction events don’t get indexed. The alt text on those images never reaches AI crawlers, and critical content below the fold that only loads when the viewport intersects the element may never be seen by any crawler at all.

This goes beyond images. Product specifications, pricing tables, testimonial sections, and supporting copy that loads dynamically on scroll are all at risk. If the content isn’t in the server response, crawlers treat it as if it doesn’t exist. The distinction matters because some lazy-loading implementations look fine in-browser but deliver an empty in the raw HTML response.

The fix: Load all critical content server-side. Use native lazy loading (loading=”lazy”) for below-fold images only, which browsers handle without requiring JS execution. Ensure every piece of important text content exists in the initial HTML response.

8. No third-party brand mentions

This is the one blocker on this list that isn’t a dev task. AI systems want consensus across the web before recommending a brand. Ahrefs’ research into AI citation patterns found that third-party listicles, comparisons, and independent reviews consistently outperform first-party content in AI recommendations.

If ChatGPT, Gemini, or Perplexity can’t find your brand mentioned independently across multiple sources, regardless of how well your own site is structured, you won’t get recommended. The difference between traditional SEO signals and LLM selection criteria makes this especially acute for B2B brands that rely heavily on their own content.

The fix: Earn placements on three to seven external domains. Industry blogs, partner content, comparison sites, and directories like G2 or Clutch. Brief your comms or PR team on this one rather than filing a dev ticket.

9. Duplicate or thin content across pages

Google’s Information Gain patent describes how RAG systems seek complementary sources rather than duplicates. If multiple pages on your site cover similar ground with slightly different phrasing, they’re mathematically redundant in vector space. Re-ranking systems discard documents below roughly 0.75 relevance threshold, and near-duplicate pages dilute each other’s chances of clearing that bar.

This shows up constantly on service pages where overlapping capabilities get split across multiple URLs with thin differentiators. Two pages about “custom integrations” and “API development” that say essentially the same thing in different wrappers are working against each other.

The fix: Consolidate overlapping pages. Make each page’s angle genuinely distinct in substance and approach. If two pages could plausibly answer the same user query, merge them into one stronger resource.

10. No AI traffic measurement in place

Right now, GA4 can segment referral traffic from ChatGPT, Perplexity, Claude, and Copilot. Without those segments configured, you’ve got no visibility into which pages AI systems are citing and which they’re ignoring.

This data turns the nine fixes above from a generic checklist into a prioritised action plan. Pages already receiving AI referral traffic tell you what’s working structurally. Pages with strong organic rankings but zero AI referrals tell you exactly where to look for the blockers listed here.

The fix: Set up GA4 referral segments for AI platforms and monitor monthly alongside your organic reporting. The measurement doesn’t fix anything on its own, but it shows you where the structural fixes will have the most impact.

Where to start

Most enterprise sites have three or four of these blockers active simultaneously. The fastest path to AI visibility usually starts with the rendering layer (blockers 1, 3, and 7), moves to the structural layer (blockers 4, 5, and 6), and then addresses the content and measurement gaps.

If you’re planning a site migration or platform move, that’s the ideal window to address all ten together rather than retrofitting them one at a time.

8 minutes read

Contents