Complete AI Crawler User Agent Reference
The following tables provide comprehensive information about AI crawler user agents as of 2026.
OpenAI Crawler User Agents
OpenAI operates multiple crawlers for different purposes.
| User Agent | Purpose | Real-Time | robots.txt Control | IP Verification |
|---|
| GPTBot | Model training and content indexing | No | Yes | Yes |
| ChatGPT-User | ChatGPT browsing functionality | Yes | Yes | Yes |
| GPTBot-Trainer | Training data collection | No | Yes | Yes |
GPTBot User Agent String:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
ChatGPT-User User Agent String:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)
OpenAI Crawler Behavior:
- Crawl Frequency: 2-4 times per month for most sites
- Request Rate: 1-3 requests per second
- JavaScript Support: Limited (basic execution only)
- Content Preference: Text-heavy, structured content
- Respects robots.txt: Yes
- IP Range Verification: Available at https://openai.com/gptbot-ranges
Controlling OpenAI Crawlers via robots.txt:
# Allow all OpenAI crawlers
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
# Block OpenAI crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
Anthropic Crawler User Agents
Anthropic's Claude uses web browsing for real-time information.
| User Agent | Purpose | Real-Time | robots.txt Control | IP Verification |
|---|
| Claude-Web | Claude web browsing | Yes | Yes | Yes |
| ClaudeBot | Content indexing | No | Yes | Yes |
| Anthropic-AI | General crawling | No | Yes | Yes |
Claude-Web User Agent String:
Mozilla/5.0 (compatible; Claude-Web/1.0; +https://anthropic.com/claude-web)
ClaudeBot User Agent String:
Mozilla/5.0 (compatible; ClaudeBot/1.0; +https://anthropic.com/bot)
Anthropic Crawler Behavior:
- Crawl Frequency: Real-time during user queries
- Request Rate: 1-2 requests per second
- JavaScript Support: Moderate
- Content Preference: Fresh, authoritative content
- Respects robots.txt: Yes
- IP Range Verification: Available at https://anthropic.com/crawler-ip-ranges
Controlling Anthropic Crawlers via robots.txt:
# Allow Claude browsing
User-agent: Claude-Web
Allow: /
User-agent: ClaudeBot
Allow: /
# Block Anthropic crawlers
User-agent: Claude-Web
Disallow: /
User-agent: ClaudeBot
Disallow: /
Google AI Crawlers
Google's AI crawlers are integrated with traditional search crawling.
| User Agent | Purpose | Real-Time | robots.txt Control | IP Verification |
|---|
| Google-Extended | Gemini/Bard training | No | Yes | Via Google |
| Googlebot | General crawling + AI | No | Yes | Via Google |
| GoogleOther | Experimental AI features | Varies | Yes | Via Google |
Google-Extended User Agent String:
Mozilla/5.0 (compatible; Google-Extended/1.0; +http://www.google.com/bot.html)
Google AI Crawler Behavior:
- Crawl Frequency: Daily to weekly for most sites
- Request Rate: Varies by site authority
- JavaScript Support: Excellent
- Content Preference: Comprehensive, structured content
- Respects robots.txt: Yes
- IP Range Verification: Via Google Search Console
Controlling Google AI Crawlers via robots.txt:
# Allow Google AI training
User-agent: Google-Extended
Allow: /
# Block Google AI training (keeps traditional search)
User-agent: Google-Extended
Disallow: /
# Allow all Google crawling
User-agent: Googlebot
Allow: /
Perplexity AI Crawlers
Perplexity operates aggressive real-time crawling for answer generation.
| User Agent | Purpose | Real-Time | robots.txt Control | IP Verification |
|---|
| PerplexityBot | Real-time search | Yes | Yes | Yes |
| Perplexity-Search | Search indexing | Yes | Yes | Yes |
PerplexityBot User Agent String:
Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/bot)
PerplexityBot Behavior:
- Crawl Frequency: Real-time during queries
- Request Rate: 2-5 requests per second
- JavaScript Support: Moderate to good
- Content Preference: Fresh, specific answers
- Respects robots.txt: Yes
- IP Range Verification: Available at https://perplexity.ai/crawler-info
Controlling Perplexity via robots.txt:
# Allow Perplexity crawling
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-Search
Allow: /
# Block Perplexity
User-agent: PerplexityBot
Disallow: /
Microsoft/Bing AI Crawlers
Microsoft's AI crawlers power Copilot and Bing Chat.
| User Agent | Purpose | Real-Time | robots.txt Control | IP Verification |
|---|
| Bingbot | Search + AI training | No | Yes | Via Bing |
| Copilot-Bot | Copilot-specific features | Partial | Yes | Via Bing |
Bingbot User Agent String:
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Microsoft AI Crawler Behavior:
- Crawl Frequency: Weekly to monthly
- Request Rate: Varies by site
- JavaScript Support: Excellent
- Content Preference: Diverse content types
- Respects robots.txt: Yes
- IP Range Verification: Via Bing Webmaster Tools
Other Major AI Crawlers
Additional AI platforms operate crawlers for various purposes.
| Platform | User Agent | Purpose | Real-Time | robots.txt Control |
|---|
| Common Crawl | CCBot | Open dataset creation | No | Yes |
| Apple | Applebot-Extended | Apple Intelligence training | No | Yes |
| Meta | Meta-ExternalAgent | AI model training | No | Yes |
| Amazon | Amazonbot | Alexa shopping features | Partial | Yes |
| You.com | YouBot | Search AI | Yes | Yes |
| Brave | BraveSearchBot | Leo AI answers | Yes | Yes |
Common Crawl CCBot:
CCBot/2.0 (https://commoncrawl.org/faq/)
Applebot-Extended:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15 (Applebot-Extended)
Meta-ExternalAgent:
Mozilla/5.0 (compatible; Meta-ExternalAgent/1.0; +https://developers.facebook.com/doc/)