robots.txt for AI crawlers — a practical guide
Allow or block ChatGPT, Claude, Perplexity, Gemini, and other AI bots from your site.
Updated 5/4/2026
Try the free tool: robots.txt Generator →
What is robots.txt?
robots.txt is a plain-text file at your site root that tells crawlers what they can and can't access. Each rule names a User-agent and either Allows or Disallows paths.
AI vs search crawlers
AI platforms ship their own crawlers, separate from Googlebot or Bingbot. The big ones today:
- OAI-SearchBot — ChatGPT search
- ChatGPT-User — ChatGPT browsing mode
- GPTBot — OpenAI training data
- PerplexityBot — Perplexity AI
- ClaudeBot — Anthropic Claude
- Google-Extended — Gemini training data
- Applebot-Extended — Apple AI / Siri
- Bytespider — ByteDance / TikTok
- Meta-ExternalAgent — Meta AI
If you want to appear in AI answers, you generally want these allowed.
Common mistakes
- Blocking everything by default. A blanket
Disallow: /for AI bots silently removes you from AI search. - Forgetting Google-Extended. This controls Gemini's training — separate from Googlebot.
- Putting robots.txt in the wrong place. It must live at the root:
https://yoursite.com/robots.txt.
Use the free generator
Our robots.txt Generator gives you per-bot toggles for every major AI and search crawler, lets you append a sitemap URL and custom rules, and outputs a valid file.
Was this page helpful?