A02 · Access & Crawlability

No Noindex Blocking

Jump to section

TL;DR

Your page is being prevented from appearing in search results because an indexing-blocking directive is present. Remove the noindex directive (in HTML meta robots or the X-Robots-Tag header), then validate that the page is indexable. Run an Oversearch AI Page Optimizer scan to confirm the fix.

Why this matters

Access and crawlability are prerequisites. If crawlers can’t fetch or parse your content, rankings and citations become unreliable, and LLMs may fail to extract answers.

Where this shows up in Oversearch

In Oversearch, open AI Page Optimizer and run a scan for the affected page. Then open Benchmark Breakdown to see evidence, and use the View guide link to jump back here when needed.

Why is my page not indexed even though it’s live?

A noindex directive in your HTML or HTTP headers is telling search engines not to add the page to their index, even though the page loads fine in a browser.

This is one of the most common indexing blockers. A single <meta name="robots" content="noindex"> tag or an X-Robots-Tag: noindex HTTP header will prevent the page from appearing in search results, regardless of how good the content is.

  • Check for <meta name="robots" content="noindex"> in View Page Source.
  • Check for X-Robots-Tag: noindex in the HTTP response headers (curl -I <url>).
  • Check your CMS settings — many have a “Discourage search engines” checkbox.
  • Verify the canonical URL is not pointing to a noindexed page.

If you use Oversearch, open AI Page OptimizerBenchmark Breakdown to see whether noindex was detected.

How do I check if a page has a noindex tag?

View the page source and search for “noindex” — it can appear in a meta tag or in the HTTP response headers.

Noindex can be set in two places: an HTML meta tag in the <head> or an X-Robots-Tag HTTP header. Both are equally effective at blocking indexing, so you need to check both.

  • HTML: View Page Source → search for noindex.
  • HTTP header: Run curl -I <url> → look for X-Robots-Tag.
  • Google Search Console: URL Inspection → check “Indexing allowed?” status.
  • CMS: Check page-level SEO settings for indexing toggles.
  • Some plugins add noindex conditionally (e.g., on staging, on paginated pages).

If you use Oversearch, open AI Page OptimizerBenchmark Breakdown to see the exact directive detected.

What’s the difference between noindex and disallow in robots.txt?

Noindex prevents a page from appearing in search results. Disallow in robots.txt prevents crawlers from fetching the page at all — but does not guarantee de-indexing.

A common misconception is that robots.txt disallow removes pages from search results. It does not. If other pages link to a disallowed URL, search engines may still index it (with limited information). Only noindex reliably removes a page from the index.

  • Use noindex when you want the page crawlable but not indexed (e.g., thank-you pages, internal search results).
  • Use robots.txt disallow when you want to prevent crawl budget waste on low-value URLs.
  • Never use both together — if robots.txt blocks crawling, the crawler cannot see the noindex tag.
  • For complete de-indexing, use noindex and allow crawling.

If you use Oversearch, open AI Page OptimizerBenchmark Breakdown to see which directives apply to your page.

Can an X-Robots-Tag header block indexing?

Yes. The X-Robots-Tag HTTP header has the same effect as a meta robots tag but is set at the server/CDN level.

This header is often added by reverse proxies, CDNs, or server configs and is invisible in the HTML source. It is a common source of “mystery” noindex issues because developers only check the HTML.

  • Run curl -I <url> and look for X-Robots-Tag in the response.
  • Check your server config (nginx, Apache, Vercel, Netlify) for robots header rules.
  • Check CDN or edge function configs that might inject headers.
  • The header supports the same directives as meta robots: noindex, nofollow, etc.

If you use Oversearch, open AI Page OptimizerBenchmark Breakdown to see all detected robots directives including headers.

Common root causes

  • noindex set in a template/CMS setting for a content type.
  • X-Robots-Tag: noindex set by the server/CDN/security layer.
  • Multiple directives conflicting (meta says index, header says noindex).
  • Noindex applied to canonical pages while duplicates remain indexable.

How to detect

  • In Oversearch AI Page Optimizer, open the scan for this URL and review the Benchmark Breakdown evidence.
  • Verify the signal outside Oversearch with at least one method: fetch the HTML with curl -L, check response headers, or use a crawler/URL inspection.
  • Confirm you’re testing the exact canonical URL (final URL after redirects), not a variant.
  • Check both the HTML (<meta name="robots">) and HTTP headers (X-Robots-Tag) for indexing directives.

How to fix

Start by identifying where the noindex directive comes from (see: How do I check if a page has a noindex tag? and Can an X-Robots-Tag header block indexing?), then follow the steps below to remove it.

  1. Locate the directive source: HTML meta robots vs X-Robots-Tag header.
  2. Remove noindex from pages that should be indexed (or remove the directive entirely if not needed).
  3. Check templates/CMS settings so it doesn’t re-apply on publish.
  4. Ensure canonical pages are indexable and duplicates use canonical/redirects appropriately.
  5. Validate with URL inspection/crawler, then run an Oversearch AI Page Optimizer scan.

Implementation notes

  • CDN: search for header rules adding X-Robots-Tag.
  • WordPress: check per-post noindex toggles and global ‘discourage indexing’ setting.
  • Framework apps: check server/middleware headers configuration.

Verify the fix

  • Run an Oversearch AI Page Optimizer scan for the same URL and confirm the benchmark is now passing.
  • Confirm the page is 200 OK and the primary content is present in initial HTML.
  • Validate with an external tool (crawler, URL inspection, Lighthouse) to avoid false positives.
  • Confirm there is no noindex in HTML meta robots or X-Robots-Tag headers.

Prevention

  • Add automated checks for robots/noindex/canonical on deploy.
  • Keep a single, documented preferred URL policy (host/protocol/trailing slash).
  • After releases, spot-check Oversearch AI Page Optimizer on critical templates.

FAQ

Why is my CMS adding noindex automatically?

Many CMS platforms add noindex by default on staging environments, draft pages, or when a ‘Discourage search engines’ setting is enabled. Check your CMS SEO settings, environment config, and any SEO plugins. When in doubt, search your page source for ‘noindex’ and trace it to the template or plugin responsible.

How do I remove noindex safely on staging vs production?

Use environment-based configuration so noindex is only applied on non-production domains. Most frameworks and CMS platforms support environment variables for this. When in doubt, set noindex via an environment variable that defaults to true on staging and false on production.

How long does it take for a page to get indexed after removing noindex?

Typically days to weeks, depending on crawl frequency. You can speed it up by requesting indexing in Google Search Console or submitting the URL via the Indexing API. When in doubt, request re-crawl in Search Console after removing the tag.

Can I noindex certain parameters but index the canonical page?

Yes. Set noindex on parameterized variants and ensure the canonical tag points to the clean URL. Search engines will index only the canonical version. When in doubt, canonical the parameter URLs to the clean version and let search engines handle deduplication.

Does noindex also remove the page from AI search results?

Yes. AI search tools that respect robots directives will not surface noindexed pages in their answers. If you want AI citations, ensure the page is indexable. When in doubt, remove noindex from any page you want cited by AI systems.

Can a noindex tag appear inside the body instead of the head?

Search engines only reliably read meta robots tags inside <head>. A noindex tag in the <body> may be ignored or inconsistently applied. When in doubt, always place meta robots in the <head> element.