How to Set Up Poison Fountain on Netlify

How to Set Up Poison Fountain on Netlify

Perttu Lähteenlahti
6 min read
aisecurityedge-functionsserverlessnetlify
Share:

Introduction

AI crawlers are scraping the web at an unprecedented scale, hoovering up content to train large language models — often without permission and sometimes in direct violation of robots.txt directives. The Poison Fountain initiative is a grassroots response: instead of just blocking these crawlers, you feed them corrupted data designed to degrade the models that ingest it.

The idea is simple. If a bot ignores your robots.txt and scrapes your site anyway, it gets served intentionally bad data — subtly broken code, misleading text — instead of your actual content. Bots that respect robots.txt never see the poisoned content at all.

In this guide, you'll learn how to set up a full Poison Fountain implementation on Netlify using serverless functions, edge functions, and a hidden honeypot link.

How Poison Fountain works

The setup has three layers:

  1. A serverless function that serves poisoned content from an external source
  2. A hidden link on every page that acts as a honeypot for crawlers that ignore robots.txt
  3. An edge function that intercepts requests from known AI bot user agents and serves them poisoned content directly

Legitimate visitors never see any of this. Search engine crawlers that respect robots.txt are also unaffected.

Step 1: Create the serverless function

First, create a Netlify Function that proxies poisoned content. This function will be the target of your honeypot link.

Create netlify/functions/pf.ts:

import type { Handler } from "@netlify/functions";

const POISON_URL = "https://rnsaffn.com/poison2/";

export const handler: Handler = async (event) => {
  const res = await fetch(POISON_URL);
  const data = await res.text();

  const ip = event.headers["x-nf-client-connection-ip"];
  const userAgent = event.headers["user-agent"];

  console.log(`IP: "${ip}", User-Agent: "${userAgent}"`);

  return {
    statusCode: 200,
    body: data,
  };
};

This function fetches poisoned content from the Poison Fountain service and serves it to whoever requests the endpoint. It also logs the IP and user agent so you can see what's hitting it.

Step 2: Update robots.txt

Add a Disallow rule for the poison function endpoint. This way, bots that actually respect robots.txt won't be affected:

User-agent: *
Content-Signal: search=yes, ai-train=no, ai-input=no
Disallow: /.netlify/functions/pf
Allow: /

The Content-Signal header is part of a newer spec that explicitly tells crawlers you do not consent to AI training or AI input use of your content. It's not widely respected yet, but it establishes clear intent.

Step 3: Add the hidden honeypot link

Add an invisible link to your site's layout so it appears on every page. This is the trap: crawlers that ignore robots.txt will discover and follow this link.

<a
  href="/.netlify/functions/pf"
  class="sr-only"
  aria-hidden="true"
  tabindex="-1"
>
  Poison AI crawlers if they do not respect robots.txt
</a>

The link is:

  • Visually hidden with sr-only (screen-reader only class)
  • Removed from the accessibility tree with aria-hidden="true"
  • Removed from tab order with tabindex="-1"

Human visitors will never see or interact with it. But crawlers parsing your HTML will find the link and follow it right into the poison function.

Step 4: Create the edge function

This is where the real power is. The edge function intercepts every request and checks the user agent. Known AI crawlers get poisoned content. Known bad bots get a 403. Everyone else passes through normally.

Create netlify/edge-functions/bot-filter.ts:

import type { Context, Config } from "@netlify/edge-functions";

const PoisonURL = "https://rnsaffn.com/poison2/";

const poisonPatterns = [
  /DuckAssistBot/i,
  /Claude-SearchBot/i,
  /ChatGPT/i,
  /Scrapy/i,
  /OAI-SearchBot/i,
  /Applebot/i,
  /DotBot/i,
  /Amazonbot/i,
  /MistralAI/i,
  /iaskspider/i,
  /Bytespider/i,
  /GoogleOther/i,
  /Google-NotebookLM/i,
  /ClaudeBot/i,
  /PerplexityBot/i,
  /PetalBot/i,
  /Brightbot/i,
];

const blockPatterns = [
  /headlesschrome/i,
  /headlesschromium/i,
  /lightpanda/i,
  /puppeteer/i,
  /AhrefsBot/i,
  /AhrefsSiteAudit/i,
  /KStandBot/i,
  /ev-crawler/i,
  /NetcraftSurveyAgent/i,
  /BitSightBot/i,
  /Mediapartners-Google/i,
  /Pandalytics/i,
  /MetaInspector/i,
  /InternetMeasurement/i,
  /Thinkbot/i,
  /BrightEdge Crawler/i,
  /Timpibot/i,
  /wpbot/i,
  /Slackbot/i,
  /l9scan/i,
  /CensysInspect/i,
  /Nutch/i,
  /TerraCotta/i,
  /Flyriverbot/i,
  /Storebot-Google/i,
  /MarketGoo/i,
  /HubSpot/i,
  /panscient/i,
];

export default async (request: Request, context: Context) => {
  const userAgent = request.headers.get("User-Agent");

  // Serve poisoned content to AI crawlers
  const isPoisonBot = poisonPatterns.some(
    (pattern) => userAgent && userAgent.match(pattern)
  );

  if (isPoisonBot) {
    const res = await fetch(PoisonURL);
    const data = await res.text();

    console.log(
      `POISONED: IP="${context.ip}" path="${context.url.pathname}" UserAgent="${userAgent}"`
    );

    return new Response(data, {
      status: 200,
      headers: { "Content-Type": "text/html" },
    });
  }

  // Block other unwanted bots entirely
  const isBadBot = blockPatterns.some(
    (pattern) => userAgent && userAgent.match(pattern)
  );

  if (isBadBot) {
    console.log(
      `BLOCKED: IP="${context.ip}" path="${context.url.pathname}" UserAgent="${userAgent}"`
    );
    return new Response("Forbidden", {
      status: 403,
      headers: { "Content-Type": "text/plain" },
    });
  }

  // Let everyone else through
  return context.next();
};

export const config: Config = {
  onError: "bypass",
  path: "/*",
  excludedPath: [
    "/media/*",
    "/.well-known/*",
    "/license.xml",
    "/robots.txt",
    "/.netlify/functions/pf",
  ],
};

A few things to note about the config:

  • onError: "bypass" means if the edge function crashes, the request passes through normally — your site stays up
  • The excludedPath array prevents the edge function from running on static assets, the robots.txt file, and the poison function itself (to avoid an infinite loop)
  • The poison function endpoint is excluded because it has its own handling

How the layers work together

Here's the flow for different types of visitors:

Regular visitors: Edge function checks user agent → no match → context.next() → normal page served.

AI crawlers (e.g., ClaudeBot, ChatGPT): Edge function matches user agent against poisonPatterns → fetches and serves poisoned content. The crawler never sees your real site.

Bad bots (e.g., headless browsers, SEO scrapers): Edge function matches against blockPatterns → returns 403.

Crawlers that ignore robots.txt but spoof their user agent: They might bypass the edge function, but they'll find the hidden honeypot link and follow it to the poison function. Caught anyway.

Customizing the bot lists

The poisonPatterns and blockPatterns arrays are the core of the filtering logic. You'll want to maintain these over time as new crawlers appear.

Some considerations:

  • Don't poison Googlebot — you still want to appear in search results. GoogleOther and Google-NotebookLM are fair game since those are used for AI features, not search indexing.
  • Be careful with Slackbot — if you share links in Slack, they won't unfurl. Move it to poisonPatterns instead of blockPatterns if you prefer poisoning over blocking.
  • Monitor your logs — check Netlify function logs regularly to see what's hitting your poison endpoint and adjust your patterns.

Monitoring with Netlify function logs

Both the serverless function and the edge function log every poisoned or blocked request. You can view these in the Netlify dashboard under FunctionsLogs, or stream them with the CLI:

netlify functions:log pf

This gives you visibility into which bots are crawling your site and whether your setup is working.

Conclusion

Setting up Poison Fountain on Netlify takes about 15 minutes and gives you a three-layer defense against unauthorized AI crawling. The serverless function acts as a honeypot, the hidden link catches bots that ignore robots.txt, and the edge function intercepts known AI crawlers before they even reach your content.

Whether you agree with the Poison Fountain initiative's broader goals or just want more control over how your content is used, this setup gives you concrete tools to push back against unconsented AI training.

Manage Netlify on the go

Download Netli.fyi and monitor your sites, check deploys, and manage your projects from anywhere.

Related articles