The Great GPT Firewall ?
This collection is a curated list of websites that employ the robots.txt file to restrict access to AI Agents, AI crawlers and GPTs.
It will be updated monthly.

User agents & robots.txt
The robots.txt file allows website owners to control and limit the access of these user agents to certain areas of their website by specifying rules and directives.
# OpenAI’s web crawler: GPT3.5, GPT4, ChatGPT
# https://platform.openai.com/docs/bots
User-agent: GPTBot
# ChatGPT plugins
# https://platform.openai.com/docs/bots
User-agent: ChatGPT-User
# OpenAI Search bot
# https://platform.openai.com/docs/bots
User-agent: OAI-SearchBot
# Google's web crawler: Bard, VertexAI, Gemini
# https://blog.google/technology/ai/an-update-on-web-publisher-controls/
User-agent: Google-Extended
# Apple's web crawler, dedicated to GenAI projects
# https://support.apple.com/en-us/119829
User-agent: Applebot-Extended
# Claude
User-agent: anthropic-ai
# Claude Bot
User-agent: ClaudeBot
# Claude web
User-agent: Claude-Web
# Cohere
User-agent: Cohere-ai
# Perplexity
User-agent: PerplexityBot
# Common Crawl
# https://commoncrawl.org/ccbot
User-agent: CCBot
# Omglibot: webz.io
# https://webz.io/blog/web-data/what-is-the-omgili-bot-and-why-is-it-crawling-your-website/
User-agent: Omgilibot
User-agent: Omgili
User-agent: Webzio-Extended
# Facebook: Llama
# https://developers.facebook.com/docs/sharing/bot/
User-agent: FacebookBot
# ByteDance: Duobao
User-agent: Bytespider
# Censorship area
Disallow: /
Disclaimer
Please note that this blocklist is intended for informational purposes only. Despite the provoking project name, it's fine to disallow web crawling and protect content ownership.
2024-05 update
Category: Press
- Scanned: 66
- ✅ Passing: 38 %
- ? Blocked: 62 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| The Times |
?? |
? |
| BBC |
?? |
? |
| The Guardian |
?? |
? |
| The Economist |
?? |
? |
| Financial Times |
?? |
? |
| The Independent |
?? |
✅ |
| The Telegraph |
?? |
? |
| Daily Mail |
?? |
? |
| The Sun |
?? |
? |
| Daily Mirror |
?? |
? |
| Daily Express |
?? |
? |
| Washington Post |
?? |
? |
| USA Today |
?? |
✅ |
| Fox News |
?? |
✅ |
| ABC News |
?? |
? |
| NBC News |
?? |
? |
| CBS News |
?? |
? |
| Los Angeles Times |
?? |
? |
| Chicago Tribune |
?? |
✅ |
| New York Post |
?? |
? |
| New York Daily News |
?? |
✅ |
| The New Yorker |
?? |
? |
| Vice |
?? |
✅ |
| New York Times |
?? |
? |
| Wall Street Journal |
?? |
? |
| CNN |
?? |
? |
| El País |
?? |
✅ |
| Süddeutsche Zeitung |
?? |
? |
| Der Spiegel |
?? |
? |
| Corriere della Sera |
?? |
? |
| La Repubblica |
?? |
? |
| Le Monde |
?? |
? |
| Libération |
?? |
? |
| Le Figaro |
?? |
? |
| 20 Minutes |
?? |
? |
| Ouest France |
?? |
? |
| Le Parisien |
?? |
? |
| L'Equipe |
?? |
? |
| Le Point |
?? |
? |
| Marianne |
?? |
? |
| Le Nouvel Observateur |
?? |
? |
| L'Express |
?? |
? |
| France 24 |
?? |
? |
| BFMTV |
?? |
? |
| CNews |
?? |
✅ |
| Le Monde Diplomatique |
?? |
✅ |
| Mediapart |
?? |
? |
| Courrier International |
?? |
? |
| Brut |
?? |
✅ |
| IMDB |
? |
✅ |
| Allocine |
?? |
✅ |
| Fakt |
?? |
✅ |
| Super Express |
?? |
✅ |
| Gazeta Wyborcza |
?? |
? |
| Rzeczpospolita |
?? |
✅ |
| Dziennik Gazeta Prawna |
?? |
✅ |
| Polityka |
?? |
✅ |
| Newsweek Polska |
?? |
✅ |
| Gość Niedzielny |
?? |
✅ |
| Sieci |
?? |
✅ |
| Do Rzeczy |
?? |
✅ |
| Twój Styl |
?? |
✅ |
| Zwierciadło |
?? |
✅ |
| Wysokie Obcasy Extra |
?? |
? |
| Pani |
?? |
✅ |
| Elle |
?? |
✅ |
Category: Video on demand
- Scanned: 9
- ✅ Passing: 56 %
- ? Blocked: 44 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Prime Video |
? |
✅ |
| Netflix |
? |
✅ |
| Disney+ |
? |
? |
| Hulu |
?? |
? |
| HBO Max |
?? |
✅ |
| Canal+ |
?? |
? |
| FranceTV |
?? |
✅ |
| TF1 |
?? |
? |
| 6Play |
?? |
✅ |
Category: Music
- Scanned: 6
- ✅ Passing: 67 %
- ? Blocked: 33 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Soundcloud |
? |
? |
| Youtube |
? |
✅ |
| Apple Music |
? |
✅ |
| Spotify |
? |
? |
| Deezer |
?? |
✅ |
| LastFM |
?? |
✅ |
Category: Podcast
- Scanned: 8
- ✅ Passing: 75 %
- ? Blocked: 25 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Google Podcasts |
? |
✅ |
| Apple Podcast |
? |
✅ |
| Spotify Podcaster |
? |
? |
| Buzzsprout |
? |
✅ |
| Podbean |
? |
✅ |
| Acast |
?? |
✅ |
| AudioMeans |
?? |
✅ |
| Radio France |
?? |
? |
Category: X
- Scanned: 6
- ✅ Passing: 67 %
- ? Blocked: 33 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| PornHub |
? |
? |
| YouPorn |
? |
? |
| Xnxx |
? |
✅ |
| Xvideos |
? |
✅ |
| Xhamster |
? |
✅ |
| OnlyFan |
? |
✅ |
Category: Religion
- Scanned: 5
- ✅ Passing: 100 %
- ? Blocked: 0 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Bible |
?? |
✅ |
| Bible gateway |
?? |
✅ |
| Jehovah's Witnesses |
?? |
✅ |
| Vatican |
?? |
✅ |
| Islamweb |
? |
✅ |
Category: Social media
- Scanned: 13
- ✅ Passing: 31 %
- ? Blocked: 62 %
- ❓ Unknown: 8 %
| Name |
Country |
Status |
| Facebook |
? |
? |
| Instagram |
? |
? |
| Reddit |
? |
✅ |
| Hacker News |
? |
❓ |
| Lobsters |
? |
? |
| Pinterest |
? |
? |
| TikTok |
? |
✅ |
| Twitter |
? |
? |
| LinkedIn |
? |
✅ |
| Quora |
? |
? |
| VK |
?? |
✅ |
| TripAdvisor |
? |
? |
| Yelp |
? |
? |
Category: Artist
- Scanned: 42
- ✅ Passing: 76 %
- ? Blocked: 19 %
- ❓ Unknown: 5 %
| Name |
Country |
Status |
| Michael Jackson |
?? |
✅ |
| Madonna |
?? |
✅ |
| Taylor Swift |
?? |
? |
| Rihanna |
?? |
✅ |
| Bruno Mars |
?? |
✅ |
| Justin Bieber |
?? |
? |
| Beyoncé |
?? |
✅ |
| Katy Perry |
?? |
? |
| Lady Gaga |
?? |
? |
| Hardwell |
?? |
✅ |
| Dimitri Vegas & Like Mike |
?? |
✅ |
| Kanye West |
?? |
❓ |
| Black Eyed Peas |
?? |
✅ |
| Imagine Dragons |
?? |
✅ |
| Twenty One Pilots |
?? |
✅ |
| Maroon 5 |
?? |
? |
| Selena Gomez |
?? |
? |
| Usher |
?? |
? |
| Stromae |
?? |
✅ |
| Aya Nakamura |
?? |
❓ |
| Soprano |
?? |
✅ |
| Johnny Hallyday |
?? |
✅ |
| Grand Corps Malade |
?? |
✅ |
| Zaho |
?? |
✅ |
| Jean Louis Aubert |
?? |
✅ |
| Camelia Jordana |
?? |
✅ |
| Indochine |
?? |
✅ |
| Tryo |
?? |
✅ |
| David Guetta |
?? |
✅ |
| Mc Solaar |
?? |
✅ |
| Zaz |
?? |
✅ |
| Christine and the Queens |
?? |
✅ |
| Boulevard des Airs |
?? |
✅ |
| Calogero |
?? |
✅ |
| Hoshi |
?? |
✅ |
| Avicii |
?? |
✅ |
| Adele |
?? |
✅ |
| Calvin Harris |
?? |
✅ |
| Ed Sheeran |
?? |
✅ |
| Arctic Monkeys |
?? |
✅ |
| Coldplay |
?? |
✅ |
| The Weeknd |
?? |
? |
Category: Gov
- Scanned: 3
- ✅ Passing: 100 %
- ? Blocked: 0 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| White House |
?? |
✅ |
| Elysée |
?? |
✅ |
| Europe |
?? |
✅ |
Category: Science
- Scanned: 28
- ✅ Passing: 82 %
- ? Blocked: 18 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Google Scholar |
? |
✅ |
| Sci-Hub |
? |
✅ |
| PubPeer |
? |
✅ |
| Scopus |
?? |
? |
| Elsevier |
?? |
? |
| ScienceDirect |
?? |
? |
| MDPI |
?? |
✅ |
| Springer |
?? |
✅ |
| Wiley |
?? |
✅ |
| American Chemical Society |
?? |
✅ |
| PubMed |
?? |
✅ |
| Academia |
?? |
✅ |
| Science |
?? |
? |
| ArXiv |
?? |
✅ |
| American Physical Society |
?? |
✅ |
| Mendeley |
?? |
✅ |
| Nature |
?? |
? |
| Taylor & Francis |
?? |
✅ |
| Oxford University Press |
?? |
✅ |
| Cambridge University Press |
?? |
✅ |
| Royal Society of Chemistry |
?? |
✅ |
| ResearchGate |
?? |
✅ |
| BNF |
?? |
✅ |
| Cairn |
?? |
✅ |
| Persee |
?? |
✅ |
| Gallica |
?? |
✅ |
| HAL |
?? |
✅ |
| OpenEdition |
?? |
✅ |
Category: Dev
- Scanned: 3
- ✅ Passing: 67 %
- ? Blocked: 33 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Github |
? |
✅ |
| Gitlab |
? |
✅ |
| Stack Overflow |
? |
? |
Category: Other content
- Scanned: 19
- ✅ Passing: 74 %
- ? Blocked: 26 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Wikipedia |
? |
✅ |
| Medium |
? |
? |
| Substack |
? |
✅ |
| Common Crawl |
? |
✅ |
| Internet Archive |
? |
✅ |
| Wayback Machine |
? |
✅ |
| Notion |
? |
✅ |
| Weather |
?? |
? |
| AccuWeather |
?? |
✅ |
| Météo France |
?? |
✅ |
| Getty Images |
?? |
✅ |
| Shutterstock |
?? |
? |
| Adobe Stock |
?? |
? |
| Unsplash |
?? |
? |
| Pexels |
?? |
✅ |
| Pixabay |
?? |
✅ |
| Flickr |
?? |
✅ |
| 500px |
?? |
✅ |
| Giphy |
?? |
✅ |
Category: Other
- Scanned: 1
- ✅ Passing: 100 %
- ? Blocked: 0 %
- ❓ Unknown: 0 %
| Name |
Country |
Status |
| Indeed |
?? |
✅ |
WTF list
A.k.a: do they understand their business model? ?
| Name |
Status |
| Getty Images |
✅ |
| Pexels |
✅ |
| 500px |
✅ |
Shame list
A.k.a: this is public interest. ?
| Name |
Status |
| Medium |
? |
| Quora |
? |
| Elsevier |
? |
| Scopus |
? |
| Science |
? |
| ScienceDirect |
? |
| Nature |
? |
? Contributing
Looking for contributions:
- Enrich website database
- Chinese websites
- New categories
Please open issues!
- Ping me on Twitter @samuelberthe (DMs, mentions, whatever :))
- Fork the project
- Fix open issues or request new features
Don't hesitate ;)
Build
python -m venv venv
source ./venv/bin/activate
pip3 install -r requirements.txt
python3 scrape.py
# then copy the last version into readme
? Contributors
? Show your support
Give a ️ if this project helped you!
License
Copyright © 2024 Samuel Berthe.
This project is MIT licensed.