A13 · Access & Crawlability

Meta Robots Not Blocking

Updated February 9, 2026

Jump to section

TL;DR

Your page is being prevented from appearing in search results because an indexing-blocking directive is present. Remove the noindex directive (in HTML meta robots or the X-Robots-Tag header), then validate that the page is indexable. Run an Oversearch AI Page Optimizer scan to confirm the fix.

Why this matters

Access and crawlability are prerequisites. If crawlers can’t fetch or parse your content, rankings and citations become unreliable, and LLMs may fail to extract answers.

Where this shows up in Oversearch

In Oversearch, open AI Page Optimizer and run a scan for the affected page. Then open Benchmark Breakdown to see evidence, and use the View guide link to jump back here when needed.

What’s the difference between meta robots and robots.txt?

Robots.txt controls whether crawlers can fetch a page. Meta robots (or X-Robots-Tag) controls whether a fetched page is indexed or its links are followed.

They operate at different stages of the crawl pipeline. Robots.txt is checked before fetching. Meta robots is processed after fetching the HTML. This distinction matters: if robots.txt blocks a page, the crawler never sees the meta robots tag.

robots.txt: “Don’t fetch this URL” — prevents crawling.
meta robots: “Don’t index this page” or “Don’t follow links” — prevents indexing.
Never use both together: if robots.txt blocks crawling, the noindex tag is never seen.
For de-indexing, use meta robots noindex and allow crawling.

If you use Oversearch, open AI Page Optimizer → Benchmark Breakdown to see which directives were detected.

Which meta robots directives block indexing?

The noindex directive prevents the page from appearing in search results. none is equivalent to noindex, nofollow.

There are several meta robots values, but only a few block indexing:

noindex: Page will not be indexed.
none: Equivalent to noindex, nofollow.
nofollow: Links on the page are not followed (does not block indexing).
noarchive: Prevents cached version (does not block indexing).
nosnippet: Prevents snippets (does not block indexing).
Multiple directives can be combined: noindex, nofollow.

If you use Oversearch, open AI Page Optimizer → Benchmark Breakdown to see the exact directives on your page.

Why is my page indexed but not showing in search results?

The page may be indexed but suppressed due to low quality signals, a manual action, or a nosnippet/max-snippet:0 directive that hides it from results.

Being indexed and appearing in search results are different. Google may index a page but rank it so low it never appears, or specific directives can prevent it from showing snippets.

Check Google Search Console → URL Inspection for indexing status.
Check for manual actions in Search Console → Security & Manual Actions.
Check for nosnippet or max-snippet:0 in meta robots.
The page may simply rank too low for any query — improve content quality.

If you use Oversearch, open AI Page Optimizer → Benchmark Breakdown to check all detected robots directives.

How do I set robots directives per page in my CMS?

Most CMS platforms have per-page SEO settings where you can set the robots meta tag value. Check your page editor’s SEO section or the SEO plugin settings.

WordPress + Yoast/RankMath: Edit page → SEO tab → Advanced → Robots.
Shopify: Requires editing the theme template or using an SEO app.
Webflow: Page settings → SEO → “Disable page indexing” checkbox.
Custom CMS: Add a meta robots field to your content model and render it in the template.
Always verify the output by checking the page source after saving.

If you use Oversearch, open AI Page Optimizer → Benchmark Breakdown to confirm the directives match your intent.

Common root causes

noindex set in a template/CMS setting for a content type.
X-Robots-Tag: noindex set by the server/CDN/security layer.
Multiple directives conflicting (meta says index, header says noindex).
Noindex applied to canonical pages while duplicates remain indexable.

How to detect

In Oversearch AI Page Optimizer, open the scan for this URL and review the Benchmark Breakdown evidence.
Verify the signal outside Oversearch with at least one method: fetch the HTML with curl -L, check response headers, or use a crawler/URL inspection.
Confirm you’re testing the exact canonical URL (final URL after redirects), not a variant.
Check both the HTML (<meta name="robots">) and HTTP headers (X-Robots-Tag) for indexing directives.

How to fix

Understand the difference between robots.txt and meta robots (see: What’s the difference between meta robots and robots.txt?) and identify which directives are active (see: Which meta robots directives block indexing?). Then follow the steps below.

Locate the directive source: HTML meta robots vs X-Robots-Tag header.
Remove noindex from pages that should be indexed (or remove the directive entirely if not needed).
Check templates/CMS settings so it doesn’t re-apply on publish.
Ensure canonical pages are indexable and duplicates use canonical/redirects appropriately.
Validate with URL inspection/crawler, then run an Oversearch AI Page Optimizer scan.

Implementation notes

CDN: search for header rules adding X-Robots-Tag.
WordPress: check per-post noindex toggles and global ‘discourage indexing’ setting.
Framework apps: check server/middleware headers configuration.

Verify the fix

Run an Oversearch AI Page Optimizer scan for the same URL and confirm the benchmark is now passing.
Confirm the page is 200 OK and the primary content is present in initial HTML.
Validate with an external tool (crawler, URL inspection, Lighthouse) to avoid false positives.
Confirm there is no noindex in HTML meta robots or X-Robots-Tag headers.

Prevention

Add automated checks for robots/noindex/canonical on deploy.
Keep a single, documented preferred URL policy (host/protocol/trailing slash).
After releases, spot-check Oversearch AI Page Optimizer on critical templates.

FAQ

How do I check if robots.txt is blocking my page?

Open /robots.txt and check for Disallow rules matching your URL path. Use Google Search Console’s robots.txt Tester for exact matching. When in doubt, test the specific URL in the Tester tool.

Can I block some bots but allow Google and Bing?

Yes. Create separate User-agent sections in robots.txt with specific Allow/Disallow rules. Googlebot and Bingbot have their own user-agent strings. When in doubt, add explicit Allow rules for Googlebot and Bingbot.

Do AI crawlers follow robots.txt?

Most major AI crawlers (GPTBot, OAI-SearchBot, anthropic-ai, PerplexityBot) respect robots.txt. Add explicit rules for their user-agents if you want to allow or block them specifically. When in doubt, check your robots.txt for blanket Disallow rules that might block AI crawlers.

Can ‘nofollow’ affect crawling and discovery?

Yes. Nofollow on links tells crawlers not to follow or pass ranking signals through those links. This can prevent discovery of linked pages. Use nofollow only on untrusted links. When in doubt, do not nofollow your own internal links.

How do I test robots.txt rules quickly?

Use Google Search Console’s robots.txt Tester or an online validator. Paste your rules and test specific URLs against them. When in doubt, test your most important pages first.

Can meta robots and robots.txt conflict?

Yes. If robots.txt blocks crawling, the crawler never sees the meta robots tag. Never use robots.txt Disallow and noindex together on the same URL — the noindex will be ignored. When in doubt, allow crawling and use meta robots to control indexing.