Why Were New York Government Websites Hidden From an Internet Archive for 13 Years?

assembly-wayback-exclusion-sm.jpg
Credit: Screenshot
Someone in New York State government apparently didn't want the Wayback Machine archiving their goods.
The nonprofit Internet Archive has an impossibly ambitious mission: to save a copy of every last piece of the public internet, forever, and to make the records freely available for anyone to use.

The archive currently stands at 2 petabytes, which is more than all the text contained in the Library of Congress. They've been treating the internet — the fleeting, here-today-gone-tomorrow internet — as something worth preserving.

Since its inception in 1996, the organization has become a critical resource for academics and researchers interested in the internet as a cultural repository. One part of the project, called the Wayback Machine, has been especially popular. It's like a time capsule for the Web, preserving copies of billions of pages, as they are, at a moment in time.

It couldn't be simpler; type in a URL, and the Wayback Machine will display snapshots of that URL on various dates. It's especially useful for looking back at deleted information, which has made the Wayback Machine an indispensable tool for journalists. Some politician decided to remove a particularly blockheaded press release from his site? The Wayback Machine sees all, and preserves every misstep.

So it's remarkable that for the past 13 years, some of the most important websites of New York State's government have been deliberately excluded from the archive, their records hidden from public view. Sixty-three "state.ny.us" addresses, to be exact, including the site of the New York State Assembly. Even more remarkably, the problem wasn't noticed until now.

The exclusion wasn't an oversight by the archive itself. The group's servers grab virtually every public webpage by default. According to the Archive's Chris Butler, someone within the New York State government requested, way back in 2001, that a broad swath of domains be eliminated from the archiving process.

"It was at a time when the Wayback Machine had only been public for a short time," Butler said, "for less than a year." Butler explained that the situation was particularly odd because government websites are among the group's highest priority for preservation. "They're obviously really important records."

Who requested that the sites be removed? Butler isn't exactly sure. And why? He's not sure about that either.


Advertisement

My Voice Nation Help
18 comments
dignifiedprotection
dignifiedprotection

New York State and most of its agencies routinely block access to webpages. I use wayback all the time because it is my only way to access a webpage and file a complaint against an individual or agency or access a form that can help you fight a summons ETC. This has been going on for a long time, and trust me its your State Government that is blocking your access to justice. This article should have been a little more inquisitive. I have been fighting for justice and it is very hard!!! This is just one hurdle and Wayback helped. 

WHarkavy
WHarkavy

@archivesmatter why are Village Voice Web stories even from 6 years ago impossible to find or have broken links?

WHarkavy
WHarkavy

@archivesmatter Partly self-interest. Most links to my VV blogs 2004-11 broken for years (even while I was a senior editor there).

j0ncampbell
j0ncampbell

@jefferson_bail and having read your link I see what you're saying, too many variables for a simple shorthand ref. Thx again.

WHarkavy
WHarkavy

@archivesmatter Bulk redirects would've been easy, but org. fired techs, didn't replace 'em, doesn't care.

Now Trending

New York Concert Tickets

From the Vault

 

Loading...