Robots Beware 
Effective 1 Jan '96: "We attack back."
millions and billions of distinct URL's
This www server has been under all-too-frequent attack from "intelligent agents" (a.k.a. "robots", and more recently, "accelerators", and "link checkers") that mindlessly download every link encountered, ultimately trying to access the entire database through the listings links. In most cases, these processes are run by well-intentioned but thoughtless neophytes, ignorant of common sense guidelines.
(Very few of these same robotrunners would ever dream of downloading entire databases via anonymous ftp, but for some reason conceptualize www sites as somehow associated only to small and limited databases. This mentality must change --- large databases such as this one [which has millions of distinct URL's that lead to gigabytes of data] are likely to grow ever more commonly exported via www.)
Following a proposed
standard
for robot exclusion, this site has maintained since early '94 a file
/robots.txt that specifies those URL's that are
off-limits to robots.
(And this "Robots Beware" page was originally posted March 1994.)
We are not willing to play sitting duck to nonsensical methods of "indexing" information. (Presumably you neither would be terribly thrilled if every aspiring encyclopedia editor were to send a gang of blind 600 lb gorillas to your library, armed with a photocopy machine.) We also have no intention of inconveniencing in any way our many tens of thousands of real users, just because a small handful of misconfigured miscreants -- with neither interest in, nor understanding of, our actual content -- is incapable of abiding by well-posted guidelines.
This server is configured to monitor activity and deny access to sites that
violate the above guidelines. Continued rapid-fire requests from any site
after access has been denied (i.e. with 403 Access denied HTTP response)
will be interpreted as a network attack; and we will respond accordingly ---
without hesitation, and without further warning.
(Click here to initiate
automated "seek-and-destroy" against your site.)
If some specific application requires relaxation of the above guidelines, contact www-admin@arXiv.org in advance of any attempted download. This system is not responsible for the consequences of automated downloads attempted in violation of the above guidelines.