Why Were New York Government Websites Hidden From an Internet Archive for 13 Years?
The nonprofit Internet Archive has an impossibly ambitious mission: to save a copy of every last piece of the public internet, forever, and to make the records freely available for anyone to use.
Credit: Screenshot Someone in New York State government apparently didn't want the Wayback Machine archiving their goods.
The archive currently stands at 2 petabytes, which is more than all the text contained in the Library of Congress. They've been treating the internet — the fleeting, here-today-gone-tomorrow internet — as something worth preserving.
Since its inception in 1996, the organization has become a critical resource for academics and researchers interested in the internet as a cultural repository. One part of the project, called the Wayback Machine, has been especially popular. It's like a time capsule for the Web, preserving copies of billions of pages, as they are, at a moment in time.
It couldn't be simpler; type in a URL, and the Wayback Machine will display snapshots of that URL on various dates. It's especially useful for looking back at deleted information, which has made the Wayback Machine an indispensable tool for journalists. Some politician decided to remove a particularly blockheaded press release from his site? The Wayback Machine sees all, and preserves every misstep.
So it's remarkable that for the past 13 years, some of the most important websites of New York State's government have been deliberately excluded from the archive, their records hidden from public view. Sixty-three "state.ny.us" addresses, to be exact, including the site of the New York State Assembly. Even more remarkably, the problem wasn't noticed until now.
The exclusion wasn't an oversight by the archive itself. The group's servers grab virtually every public webpage by default. According to the Archive's Chris Butler, someone within the New York State government requested, way back in 2001, that a broad swath of domains be eliminated from the archiving process.
"It was at a time when the Wayback Machine had only been public for a short time," Butler said, "for less than a year." Butler explained that the situation was particularly odd because government websites are among the group's highest priority for preservation. "They're obviously really important records."
Who requested that the sites be removed? Butler isn't exactly sure. And why? He's not sure about that either.