Does it matter if libraries and archives aren't involved with open government data repositories?

18 April 2014

Accessing information about government no longer has to mean going to a building and requesting permission to sift through paper documents. It doesn’t even have to mean writing a letter, filling out a complex form, or trying to figure out who to contact about public records or how to access records in the first place.

Technology has enabled faster, more efficient and more user-friendly access to government information — to public information — and governments across the country are increasingly embracing this opportunity in their policies and practices. One way to do this is to adopt open data policies that build on precedent set by existing policy like public records laws.

-From the Sunlight Foundation’s post “Open data is the next iteration of public records.”

Government transparency is a good thing, and so I am happy to see the proliferation of open government data repositories, like data.gov and its equivalents at the state level. One worry, however, that I have is that it does not seem that libraries and archives are involved in creating, managing, disseminating, or describing this information. Most of these projects come out of such locations as state’s IT agency, or the budget department, or some other administrative actor. I will admit that I have not looked at all of the open government data repositories out there, so there may be libraries and archives involved in some of them. But this is the kind of data that has traditionally been provided by libraries and archives, and now it is being taken out of our hands and being served elsewhere.

When I initially sat down to write this post, I was ready to be full of angst about the state of archives in our society, and that our roles were being taken over by IT departments and computers. But open data repositories generally deal with data that is machine-readable, and usually machine-created. On the other hand, we archivists have an explosion of digital data to deal with, the natural extension of the explosion of paper records during the middle of the 20th century; emails, word documents, excel spreadsheets, websites, blogs, Twitter accounts, Facebook, and more still defy easy manipulation by computers and easy aggregation into data repositories. One could argue that by taking the burden of providing these datasets that are easily described in the aggregate, archivists are thereby allowed to concentrate on records that cannot be described so easily or automatically. In addition, many of these datasets are provided in machine-readable format, which obviously leads itself to be described by machines rather than archivists. One of the things at which archivists excel at is taking a large mass of data that is difficult or impossible to describe automatically and providing access to it to researchers. But there is always an existential pang in my soul when a record previously provided by archives moves elsewhere.

So what do you think? Does it matter? Am I being an open government hipster, saying that libraries and archives were into open data before it was cool? Or should we cheer the move of these types of records to a different sector?