
How to Set Up Poison Fountain on Netlify
Introduction
AI crawlers are scraping the web at an unprecedented scale, hoovering up content to train large language models — often without permission and sometimes in direct violation of robots.txt directives. The Poison Fountain initiative is a grassroots response: instead of just blocking these crawlers, you feed them corrupted data designed to degrade the models that ingest it.
The idea is simple. If a bot ignores your robots.txt and scrapes your site anyway, it gets served intentionally bad data — subtly broken code, misleading text — instead of your actual content. Bots that respect robots.txt never see the poisoned content at all.
In this guide, you'll learn how to set up a full Poison Fountain implementation on Netlify using serverless functions, edge functions, and a hidden honeypot link.
How Poison Fountain works
The setup has three layers:
- A serverless function that serves poisoned content from an external source
- A hidden link on every page that acts as a honeypot for crawlers that ignore
robots.txt - An edge function that intercepts requests from known AI bot user agents and serves them poisoned content directly
Legitimate visitors never see any of this. Search engine crawlers that respect robots.txt are also unaffected.
Step 1: Create the serverless function
First, create a Netlify Function that proxies poisoned content. This function will be the target of your honeypot link.
Create netlify/functions/pf.ts:
import type { Handler } from "@netlify/functions";
const POISON_URL = "https://rnsaffn.com/poison2/";
export const handler: Handler = async (event) => {
const res = await fetch(POISON_URL);
const data = await res.text();
const ip = event.headers["x-nf-client-connection-ip"];
const userAgent = event.headers["user-agent"];
console.log(`IP: "${ip}", User-Agent: "${userAgent}"`);
return {
statusCode: 200,
body: data,
};
};
This function fetches poisoned content from the Poison Fountain service and serves it to whoever requests the endpoint. It also logs the IP and user agent so you can see what's hitting it.
Step 2: Update robots.txt
Add a Disallow rule for the poison function endpoint. This way, bots that actually respect robots.txt won't be affected:
User-agent: *
Content-Signal: search=yes, ai-train=no, ai-input=no
Disallow: /.netlify/functions/pf
Allow: /
The Content-Signal header is part of a newer spec that explicitly tells crawlers you do not consent to AI training or AI input use of your content. It's not widely respected yet, but it establishes clear intent.
Step 3: Add the hidden honeypot link
Add an invisible link to your site's layout so it appears on every page. This is the trap: crawlers that ignore robots.txt will discover and follow this link.
<a
href="/.netlify/functions/pf"
class="sr-only"
aria-hidden="true"
tabindex="-1"
>
Poison AI crawlers if they do not respect robots.txt
</a>
The link is:
- Visually hidden with
sr-only(screen-reader only class) - Removed from the accessibility tree with
aria-hidden="true" - Removed from tab order with
tabindex="-1"
Human visitors will never see or interact with it. But crawlers parsing your HTML will find the link and follow it right into the poison function.
Step 4: Create the edge function
This is where the real power is. The edge function intercepts every request and checks the user agent. Known AI crawlers get poisoned content. Known bad bots get a 403. Everyone else passes through normally.
Create netlify/edge-functions/bot-filter.ts:
import type { Context, Config } from "@netlify/edge-functions";
const PoisonURL = "https://rnsaffn.com/poison2/";
const poisonPatterns = [
/DuckAssistBot/i,
/Claude-SearchBot/i,
/ChatGPT/i,
/Scrapy/i,
/OAI-SearchBot/i,
/Applebot/i,
/DotBot/i,
/Amazonbot/i,
/MistralAI/i,
/iaskspider/i,
/Bytespider/i,
/GoogleOther/i,
/Google-NotebookLM/i,
/ClaudeBot/i,
/PerplexityBot/i,
/PetalBot/i,
/Brightbot/i,
];
const blockPatterns = [
/headlesschrome/i,
/headlesschromium/i,
/lightpanda/i,
/puppeteer/i,
/AhrefsBot/i,
/AhrefsSiteAudit/i,
/KStandBot/i,
/ev-crawler/i,
/NetcraftSurveyAgent/i,
/BitSightBot/i,
/Mediapartners-Google/i,
/Pandalytics/i,
/MetaInspector/i,
/InternetMeasurement/i,
/Thinkbot/i,
/BrightEdge Crawler/i,
/Timpibot/i,
/wpbot/i,
/Slackbot/i,
/l9scan/i,
/CensysInspect/i,
/Nutch/i,
/TerraCotta/i,
/Flyriverbot/i,
/Storebot-Google/i,
/MarketGoo/i,
/HubSpot/i,
/panscient/i,
];
export default async (request: Request, context: Context) => {
const userAgent = request.headers.get("User-Agent");
// Serve poisoned content to AI crawlers
const isPoisonBot = poisonPatterns.some(
(pattern) => userAgent && userAgent.match(pattern)
);
if (isPoisonBot) {
const res = await fetch(PoisonURL);
const data = await res.text();
console.log(
`POISONED: IP="${context.ip}" path="${context.url.pathname}" UserAgent="${userAgent}"`
);
return new Response(data, {
status: 200,
headers: { "Content-Type": "text/html" },
});
}
// Block other unwanted bots entirely
const isBadBot = blockPatterns.some(
(pattern) => userAgent && userAgent.match(pattern)
);
if (isBadBot) {
console.log(
`BLOCKED: IP="${context.ip}" path="${context.url.pathname}" UserAgent="${userAgent}"`
);
return new Response("Forbidden", {
status: 403,
headers: { "Content-Type": "text/plain" },
});
}
// Let everyone else through
return context.next();
};
export const config: Config = {
onError: "bypass",
path: "/*",
excludedPath: [
"/media/*",
"/.well-known/*",
"/license.xml",
"/robots.txt",
"/.netlify/functions/pf",
],
};
A few things to note about the config:
onError: "bypass"means if the edge function crashes, the request passes through normally — your site stays up- The
excludedPatharray prevents the edge function from running on static assets, the robots.txt file, and the poison function itself (to avoid an infinite loop) - The poison function endpoint is excluded because it has its own handling
How the layers work together
Here's the flow for different types of visitors:
Regular visitors: Edge function checks user agent → no match → context.next() → normal page served.
AI crawlers (e.g., ClaudeBot, ChatGPT): Edge function matches user agent against poisonPatterns → fetches and serves poisoned content. The crawler never sees your real site.
Bad bots (e.g., headless browsers, SEO scrapers): Edge function matches against blockPatterns → returns 403.
Crawlers that ignore robots.txt but spoof their user agent: They might bypass the edge function, but they'll find the hidden honeypot link and follow it to the poison function. Caught anyway.
Customizing the bot lists
The poisonPatterns and blockPatterns arrays are the core of the filtering logic. You'll want to maintain these over time as new crawlers appear.
Some considerations:
- Don't poison Googlebot — you still want to appear in search results.
GoogleOtherandGoogle-NotebookLMare fair game since those are used for AI features, not search indexing. - Be careful with Slackbot — if you share links in Slack, they won't unfurl. Move it to
poisonPatternsinstead ofblockPatternsif you prefer poisoning over blocking. - Monitor your logs — check Netlify function logs regularly to see what's hitting your poison endpoint and adjust your patterns.
Monitoring with Netlify function logs
Both the serverless function and the edge function log every poisoned or blocked request. You can view these in the Netlify dashboard under Functions → Logs, or stream them with the CLI:
netlify functions:log pf
This gives you visibility into which bots are crawling your site and whether your setup is working.
Conclusion
Setting up Poison Fountain on Netlify takes about 15 minutes and gives you a three-layer defense against unauthorized AI crawling. The serverless function acts as a honeypot, the hidden link catches bots that ignore robots.txt, and the edge function intercepts known AI crawlers before they even reach your content.
Whether you agree with the Poison Fountain initiative's broader goals or just want more control over how your content is used, this setup gives you concrete tools to push back against unconsented AI training.
Manage Netlify on the go
Download Netli.fyi and monitor your sites, check deploys, and manage your projects from anywhere.


