Should I block AI crawlers like GPTBot?

It depends on your goals. Blocking AI crawlers prevents your content from being used in AI training data. However, allowing them can increase your brand visibility in AI-powered search and chatbot recommendations. Many merchants block training crawlers but keep an llms.txt file for AI search visibility.

Free Tool

Robots.txt Generator for Shopify

Create a production-ready robots.txt file to control how search engines and AI bots crawl your Shopify store. Block sensitive paths, manage AI crawlers, and point to your sitemap — all in one click.

What is a robots.txt File?

A robots.txt file is a plain text file placed at the root of your website (e.g. yourstore.com/robots.txt) that tells search engine crawlers and other bots which parts of your site they can and cannot access. It follows the Robots Exclusion Protocol — a standard every major search engine respects.

For Shopify stores, a well-configured robots.txt prevents crawlers from wasting their budget on admin panels, checkout flows, and duplicate filter pages — keeping them focused on the pages that actually drive traffic and sales.

The Standard Format

User-agent: * Allow: / Disallow: /admin Disallow: /cart Disallow: /checkout Disallow: /orders/ Disallow: /account Disallow: /*?q=* Disallow: /*sort_by* # Block AI training crawlers User-agent: GPTBot Disallow: / User-agent: Google-Extended Disallow: / Sitemap: https://yourstore.com/sitemap.xml Host: https://yourstore.com

Each User-agent line targets a specific crawler (or all crawlers with *). Allow and Disallow directives control access to specific URL paths. The Sitemap directive tells crawlers where your XML sitemap lives.

Why Your Shopify Store Needs an Optimized robots.txt

Shopify generates a default robots.txt, but it's generic. An optimized file tuned to your store prevents crawl budget waste, keeps sensitive pages out of search indexes, and gives you control over AI training crawlers.

🕷️

Preserve Crawl Budget

Google allocates a crawl budget to every site. Blocking low-value pages (cart, search, filters) means more budget for your product and collection pages.

🔒

Protect Sensitive Pages

Keep admin panels, checkout flows, and customer account pages out of search results where they don't belong.

🤖

Control AI Crawlers

Decide whether OpenAI (GPTBot), Google (Google-Extended), Anthropic, and other AI companies can use your content for training.

📊

Eliminate Duplicate Content

Shopify generates duplicate URLs for sorted and filtered collections. Blocking query parameters prevents thin content issues.

Shopify-Specific Paths You Should Block

/admin — Shopify admin panel. Never needs indexing.
/cart — Shopping cart page. Unique to each visitor.
/checkout — Checkout flow. Private and session-specific.
/account — Customer login and account pages.
/orders/ — Order confirmation pages with private data.
/*?q=* — Internal search result pages that create thin content.
/*sort_by* — Sorted collection variants that duplicate your main collection pages.
/collections/*+*— Shopify's combined tag filtering URLs.

Managing AI Crawlers in 2026

AI companies send crawlers to scrape website content for training large language models. While some merchants welcome this exposure, others prefer to keep their content, product descriptions, and brand copy out of AI training data. Your robots.txt gives you that choice.

Known AI Crawlers

GPTBot— OpenAI's crawler for ChatGPT training data. Blocking this prevents your content from being used in ChatGPT's training.
Google-Extended— Google's crawler specifically for Gemini AI training. Separate from Googlebot (search indexing).
anthropic-ai— Anthropic's crawler for Claude model training.
CCBot— Common Crawl's bot, whose datasets are used to train many open-source LLMs.
Bytespider— ByteDance's crawler, associated with TikTok and their AI products.

Blocking AI crawlers does not affect your Google search rankings. Googlebot (for Search) and Google-Extended (for Gemini AI) are separate user agents.

2026 AI Crawler Reference Table

This is the current list of the major AI crawler user-agents and what each one is for. Add a Disallow: / block for any you want to keep out — the generator above does it for you.

User-agent	Company	Purpose
GPTBot	OpenAI	Training data for ChatGPT
OAI-SearchBot	OpenAI	Indexing for ChatGPT Search
ChatGPT-User	OpenAI	Live fetches when a user asks ChatGPT about a page
ClaudeBot	Anthropic	Training data for Claude (replaced anthropic-ai)
Claude-Web	Anthropic	Live fetches for Claude
Google-Extended	Google	Training data for Gemini (separate from Googlebot)
PerplexityBot	Perplexity	Indexing for Perplexity answers
CCBot	Common Crawl	Open dataset used to train many LLMs
Bytespider	ByteDance	Training data (TikTok / Doubao AI)
Amazonbot	Amazon	Training and answers for Alexa / Rufus
Applebot-Extended	Apple	Training data for Apple Intelligence
Meta-ExternalAgent	Meta	Training data for Llama / Meta AI

Search vs training: bots like OAI-SearchBot and PerplexityBot help your store appear in AI answers, while GPTBot, ClaudeBot, and Google-Extended mainly feed model training. A common middle-ground setup is to allow the search crawlers, block the training ones, and publish an llms.txt file so AI search still understands your store.

How to Add robots.txt to Your Shopify Store

Option 1: Shopify's Built-in robots.txt

Since Shopify 2021, stores can customize their robots.txt through the theme.liquid file or by creating a robots.txt.liquid template in your theme. Go to Online Store → Themes → Edit Code and look for the robots.txt.liquid file in the Templates folder.

Option 2: Shopify Apps

Several SEO apps on the Shopify App Store let you edit your robots.txt through a visual interface without touching code. Apps like Smart SEO and BOOSTER SEO include robots.txt management as part of their feature set.

Step-by-Step

Use the generator above to create your customized robots.txt
Copy the output or download the file
In Shopify admin, go to Online Store → Themes
Click Edit Code on your active theme
Find or create robots.txt.liquid in the Templates folder
Paste your generated robots.txt content
Verify at yourstore.com/robots.txt

Frequently Asked Questions

Does Shopify have a default robots.txt?

Yes. Shopify auto-generates a robots.txt that blocks basic paths like /admin and /checkout. However, it doesn't block AI crawlers or Shopify-specific duplicate URL patterns like sorted collections. Customizing it gives you much better control.

Will blocking pages hurt my SEO?

Only if you block pages that should be indexed (like product or collection pages). Blocking admin, cart, checkout, and duplicate filter pages actually helps SEO by preserving crawl budget for your important content.

Does robots.txt guarantee pages won't appear in Google?

No. robots.txt prevents crawling, not indexing. If other sites link to a blocked page, Google may still show the URL in results (without a snippet). For guaranteed removal, use a noindex meta tag instead.

Should I block AI crawlers or let them access my store?

It depends on your goals. Blocking AI crawlers prevents your content from being used in AI training data. However, allowing them can increase your brand's visibility in AI-powered search results and chatbot recommendations. Many merchants block training crawlers but keep an llms.txt file for AI search visibility.

What is crawl-delay and should I use it?

Crawl-delay tells bots to wait a specified number of seconds between requests. It reduces server load but slows down indexing. Most Shopify stores don't need it since Shopify's infrastructure handles high traffic well. Only add it if you notice bots causing performance issues.

How often should I update my robots.txt?

Update it when you add new sections to your site, change your URL structure, or want to modify AI crawler access. For most Shopify stores, setting it up once and reviewing annually is sufficient.

If I block GPTBot, will my store still show up in ChatGPT?

Possibly, yes. GPTBot is OpenAI's training crawler — blocking it stops your content feeding model training, but ChatGPT Search uses a separate crawler called OAI-SearchBot, plus ChatGPT-User for live fetches. To stay visible in AI answers while opting out of training, block GPTBot but allow OAI-SearchBot.

Is anthropic-ai still the right crawler to block for Claude?

Anthropic now uses ClaudeBot as its main crawler, with Claude-Web for live fetches. The older anthropic-ai agent is largely retired. To be thorough, block ClaudeBot, Claude-Web, and anthropic-ai together — the generator includes the current names.