How to ensure your new website is optimised for search and LLMs

Contents

Discovery and strategy: the decisions before any code is written
Information architecture: URL structure, taxonomy, internal linking
Design and UX: templates that chunk well
Development: rendering, semantic HTML, schema, performance
Pre-launch: migration mapping, benchmarking, redirects, QA
Launch and the first 90 days
What not to spend build budget on
Working with us on a new build

Optimising a new website for both Google and AI search has to be considered at every stage of the process when designing and rebuilding a new site.

This guide follows how a build project at SoBold actually runs, with the search and LLM decisions flagged at each stage.

AI search visibility is mostly downstream of solid traditional SEO.

Discovery and strategy: the decisions before any code is written

Discovery is where the topical structure of the site is decided, and it sets the bar for every other phase. Good discovery produces two outputs that matter for search: an entity baseline for each major topic, and a content scope decision.

Entity baseline. For each major topic the site needs to own, run the proposed page content (or competitor pages if no draft content exists yet) through Google’s free Natural Language API, and compare the entities it surfaces against the entities surfaced from the top three ranking competitors. Gaps in that comparison are the entities Google expects to see for the topic.

Content scope decision. The deliverable here is a short list of pillar topics, each with a defined cluster of supporting pages.

Information architecture: URL structure, taxonomy, internal linking

Of all the phases in a build, information architecture is the hardest to change after launch. URL structure changes mean redirect chains, broken inbound links, and a measurable drop in ranking signals. Taxonomy changes ripple through every page that references a category. Internal linking changes are recoverable, but only if templates support it. Most agencies treat IA as a UX exercise with SEO consulted afterwards. We treat it as a Day-Zero SEO decision.

URL hierarchy. Shallow, predictable, cluster-aware. If a service has five supporting articles, the URL should signal that relationship: /services/{topic}/ with /news/{article-slug}/ linking back through the taxonomy, not flat /articles/{slug}/ paths that hide the topical relationship from both crawlers and readers.

Taxonomy as entity mapping. Categories and tags are how the site signals topical groupings to search engines and to RAG systems that chunk content for retrieval. Match category names to the entity baseline from discovery. Avoid taxonomy proliferation: ten well-populated categories outperform forty thin ones.

Internal linking modules at template level. The strongest internal linking comes from templates that surface relevant content automatically, e.g. a service page that pulls related articles from the same cluster or an article template that surfaces the parent service. Where templates can’t carry the full load, taxonomy-driven tagging or manual link insertion in body content fills the gap. Internal linking is one of the strongest signals a site sends Google about what each page is about, and a sophisticated dev team can integrate it into the architecture from day one.

Cost of getting this wrong. Even with perfect 301 mapping, URL changes post-launch can erode some short-term ranking signal on each affected page. Multiply that across a site with hundreds of pages and the cost compounds quickly.

Design and UX: templates that chunk well

Design decides how content is presented, and presentation decides how content is retrieved. AI assistants and AI Overviews don’t read whole pages, they pull specific passages (called chunks) that a retrieval system has scored as most relevant to the user’s prompt. Google’s Vertex AI documentation puts the default chunk size at 512 tokens (roughly 350 to 400 words), and the layout parser segments those chunks along structural elements: paragraphs, lists, headings, and tables. The implication for design is direct. Templates that break content into clean, self-contained sections produce clean chunks the AI search layer can actually use. When content is buried in nested divs, tabs, or accordions, the parser’s section boundaries fall in the wrong places and chunks miss half the page.

Three template patterns that should ship by default:

Flat content reveal. Tabs, accordions, and “load more” patterns are fine for UX, but only if the content inside them is present in the raw HTML and revealed via CSS rather than injected by JavaScript on click. AI crawlers do not click and do not render JavaScript (more on the rendering data in the development phase below). Google’s Rich Results Test, with the “View Tested Page” option, shows what crawlers actually see. If the content is missing there, it is missing for AI and Google search crawlers too.
Native FAQ blocks and comparison tables. FAQ schema is one of the few schema types that still influences SERP appearance, and tables tend to earn meaningfully more AI citations than equivalent prose content. Building these as native components rather than plugin-rendered widgets keeps the markup clean and the chunking predictable.
Heading hierarchy that matches reading order. H2s mark major topic shifts, H3s mark subsections within them. A page with twelve H2s and no H3s is hard for both readers and chunking systems to parse.

For the detailed mechanics of how content length and structure influence AI citation rates, our content-strategy article on optimising around AI search covers the grounding budget data.

Accessibility belongs in this phase too. WCAG 2.2 AA compliance is the practical baseline for any new site launching in 2026, and the design constraints it imposes (contrast ratios, focus states, semantic landmarks) are the same constraints that produce well-structured, machine-readable pages.

Cost of getting this wrong. Template-level patterns ship across every page. One JavaScript-rendered tab in the master template could mean hundreds or thousands of pages with broken chunks.

Development: rendering, semantic HTML, schema, performance

The development phase is where the search and LLM consequences of earlier decisions get encoded. Four areas matter most.

Rendering. Ensure there’s server-side rendering or static site generation for core content. A late-2025 study of 23 AI crawlers, including GPTBot, ClaudeBot, PerplexityBot, and Bingbot’s AI variants, found that none of them render JavaScript. Only Googlebot performs full two-pass rendering, with the JavaScript pass running at an unconfirmed later point, anywhere from minutes to days and weeks after the initial HTML crawl. If the site relies on client-side JavaScript to render product copy, service descriptions, or article body text, that content is invisible to the AI search layer entirely and can be delayed or missed by Google. Cloudflare’s bot traffic reporting shows Googlebot accounting for a far larger share of crawl volume than all AI crawlers combined, so even traditional SEO is under-served by JS-only rendering.

Semantic HTML. Use the elements the spec provides: article, section, nav, aside, figure, ordered and unordered lists, tables for tabular data. A page built from generic nested containers still parses, but retrieval systems extract less signal from it. Semantic elements give the parser explicit hints about what each block of content is: an <div><article> is the main body, a <nav> is navigation, a <figure>is a figure with its caption. That clarity carries through into the embeddings and chunks the AI search layer actually retrieves.

Schema baked into templates. Use Article, BreadcrumbList, Organisation, FAQPage where genuinely relevant. Schema that matches visible content is helpful; schema that contradicts it (or describes content that isn’t on the page) is at best ignored and at worst flagged as deceptive. We typically configure schema through Yoast SEO at the WordPress level for site-wide types and add custom schema for the templates that need it. Our bespoke WordPress development work treats schema as a build-time decision, not a plugin-and-pray retrofit.

Performance as a design-time constraint. Core Web Vitals (LCP under 2.5 seconds, INP under 200ms, CLS under 0.1) are achievable when image sizing, font loading, and JavaScript bundling are decided during design and development rather than fixed post-launch. For an article-by-article diagnostic of what blocks visibility once a site is live, the ten things that block your website from AI search covers the specifics.

Cost of getting this wrong. Rendering decisions are platform decisions. Switching from client-side React to server-rendered output post-launch usually means rebuilding the front end.

Pre-launch: migration mapping, benchmarking, redirects, QA

Pre-launch is where the launch-week panic gets created or avoided. The single most expensive mistake we see is treating redirects as a launch-week task. A botched migration with missed or wrong redirects can wipe out years of accumulated organic visibility in days, and recovery typically takes months rather than weeks.

The pre-launch checklist that actually matters:

Baseline GSC and GA4 four to six weeks before launch. Capture trophy keywords, top organic landing pages, current AI referral traffic by source (ChatGPT, Perplexity, Claude, Copilot). Without a baseline, post-launch diagnostics are guesswork.
Redirect mapping from staging crawls, not just the old sitemap. Combine the existing sitemap with a fresh crawl of the live site. The sitemap shows what the site is supposed to be; the crawl shows everything that’s actually indexed, including paginated archives, tag pages, and parameter-based URLs the marketing team has forgotten exist. Map every URL in that inventory to its destination on the new site, then validate against a staging crawl to confirm the new URLs actually resolve. Site migrations are also the natural moment to address structural SEO issues that built up on the old site, and our SEO migration support work covers the methodology.
Indexation toggling. Staging environments must be noindexed; production environments must be indexable. The handover from one to the other on launch day fails more often than it should. Verify canonicals do not still point to staging URLs after the DNS flip.
Schema validation, sitemap submission, robots.txt sanity check. Robots.txt blocking GPTBot, ClaudeBot, or PerplexityBot is a common copy-paste error from old templates. Audit it before launch.

Cost of getting this wrong. Missed redirects and lost trophy keywords are recoverable, but recovery typically takes months, not weeks. The initial impact if key pages are missed can be catastrophic on overall traffic levels.

Launch and the first 90 days

Launch is not the end of the SEO and AI visibility work; it is the start of the highest-signal monitoring period of the project.

The first 90 days are when live traffic, real users, and real crawler behaviour reveal anything that staging couldn’t fully replicate. GA4 referral segments for ChatGPT, Perplexity, Claude, and Copilot need to be live from day one. Set them up in pre-launch and they populate from the first user. GSC needs daily eyes on coverage reports and Core Web Vitals warnings. Internal links from new content to pillar pages need to be added as content is published, not bolted on quarterly.

A sustainable freshness rhythm matters more than a launch-week burst. Pages get updated when the underlying information changes. dateModified accuracy in schema is part of the freshness signal AI systems read. For the editorial side of post-launch (how to actually run the content programme that keeps a site cited) our piece on building an AI workflow to create content describes the operational setup we use.

Cost of getting this wrong. Without measurement live at launch, the highest-signal first month of the project is flying blind. Diagnostic information becomes much harder to capture once a site has been live for several months.

What not to spend build budget on

Several tactics get pitched as essential for AI search visibility and are not. Naming them helps a build budget go further.

llms.txt files. Both Google and Bing have publicly dismissed them. SE Ranking analysed 300,000 domains and found zero correlation between llms.txt presence and AI citation frequency. Yoast SEO does feature an easy toggle to attach an llms.txt to a site if you want it for peace of mind.
Speculative schema stacking. Schema works when it accurately describes content the user can see. Article, BreadcrumbList, Organisation, FAQPage, and type-specific schema like Person for team profiles or LegalService for service pages all carry weight for entity disambiguation. Layering invented or deeply nested combinations in the hope of triggering AI citation does not.
Markdown-for-bots duplicate pages. Generating a /page.md copy of every URL for AI crawlers does not improve citation. The crawlers want clean HTML, not separate markdown files.
Self-promotional listicle networks at scale. This was a tolerable tactic up until recently and has become actively risky after recent Google Updates. This was a tolerable tactic in 2024 and has become actively risky in 2026. A January 2026 analysis documented seven SaaS brands losing 29-49% of organic visibility after scaling self-promotional listicle networks, with the drops cascading into AI citation losses too.

If phases 1 through 6 are done well, none of these tactics changes the outcome. Spending on them in place of the foundational work is a poor trade.

Working with us on a new build

We build websites where the search and AI visibility work happens at the right phase, with the right decisions made before they become expensive to reverse.

If you are planning a new site or a major rebuild and want this baked in from discovery onwards, we would be happy to walk through how the phases play out for your project.

10 minutes read

Contents