Open Source Developers Battle AI Crawlers with Ingenuity and Determination
AI web-crawling bots have been likened to the cockroaches of the internet by many software developers, who are increasingly seeking innovative and often humorous ways to combat their negative impacts. These bots can severely affect websites, particularly those hosting free and open source software (FOSS) projects, which are disproportionately vulnerable to aggressive crawling behavior.
The Challenge of AI Crawlers
Many developers have expressed their concerns regarding the disruptive nature of AI bots. According to Niccolò Venerandi, a developer behind the Linux desktop environment Plasma and the owner of the blog LibreNews, open source projects tend to be more exposed due to their shared resources and infrastructure.
Ignoring the Rules
One major problem is that numerous AI bots disregard the Robots Exclusion Protocol (robots.txt file), which was created to inform bots about which parts of a website should not be crawled. This oversight can lead to significant issues, as seen in the case of developer Xe Iaso, who reported that AmazonBot relentlessly attacked a Git server, causing DDoS outages.
Iaso noted, “It’s futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more.” These bots often bombard websites, clicking through links repetitively, which can lead to crashes and outages.
Innovative Solutions from Developers
In response to these challenges, Iaso developed a clever tool named Anubis. This proof-of-work reverse proxy solution acts as a barrier against bots while allowing genuine human users to navigate the site.
What is Anubis?
- Anubis blocks unwanted bot traffic.
- It determines if a web request comes from a human or a bot.
- Successful human requests are greeted with a whimsical anime image of Anubis, inspired by Egyptian mythology.
This tool quickly gained traction within the FOSS community, amassing 2,000 stars on GitHub shortly after its release on March 19, along with numerous contributors and forks.
Widespread Impact of Aggressive Crawlers
The swift popularity of Anubis highlights the common struggle among developers. Venerandi shared several accounts from the FOSS community:
- Drew DeVault, founder of SourceHut, revealed that he spends a significant portion of his time dealing with aggressive AI crawlers, often leading to frequent outages.
- Jonathan Corbet, a well-known FOSS developer and operator of LWN, reported that his site suffers from DDoS-level traffic due to AI scraper bots.
- Kevin Fenzi, the sysadmin for the Fedora project, had to block all traffic from Brazil in response to the aggressive behaviors of these bots.
Creative Defenses Against Bots
Some developers have taken a more humorous approach to combatting these bots. Suggestions have been made to fill restricted pages with absurd content to deter crawlers. For example, a user on Hacker News recommended loading blocked pages with misleading articles.
In January, an anonymous developer known as Aaron released Nepenthes, a tool designed to ensnare crawlers in an endless loop of fake content. This tool is named after a carnivorous plant and aims to confuse bots.
Furthermore, Cloudflare, a leading provider of web security services, introduced a similar tool called AI Labyrinth to slow down and mislead AI crawlers that do not adhere to “no crawl” directives, ensuring they waste their resources.
A Call for Action
As developers continue to grapple with the challenges posed by AI crawlers, many are calling for a more sustainable solution. Drew DeVault expressed a heartfelt plea for the community to reconsider the use and promotion of AI technologies that contribute to these issues. “Please stop legitimizing LLMs or AI image generators,” he implored.
While the likelihood of such a shift remains slim, the creative and humorous responses from developers offer hope and a sense of solidarity in facing these challenges.