Does it matter if libraries and archives aren’t involved with open government data repositories?

Accessing information about government no longer has to mean going to a building and requesting permission to sift through paper documents. It doesn’t even have to mean writing a letter, filling out a complex form, or trying to figure out who to contact about public records or how to access records in the first place.

Technology has enabled faster, more efficient and more user-friendly access to government information — to public information — and governments across the country are increasingly embracing this opportunity in their policies and practices. One way to do this is to adopt open data policies that build on precedent set by existing policy like public records laws.

-From the Sunlight Foundation’s post “Open data is the next iteration of public records.”

Government transparency is a good thing, and so I am happy to see the proliferation of open government data repositories, like data.gov and its equivalents at the state level. One worry, however, that I have is that it does not seem that libraries and archives are involved in creating, managing, disseminating, or describing this information. Most of these projects come out of such locations as state’s IT agency, or the budget department, or some other administrative actor. I will admit that I have not looked at all of the open government data repositories out there, so there may be libraries and archives involved in some of them. But this is the kind of data that has traditionally been provided by libraries and archives, and now it is being taken out of our hands and being served elsewhere.

When I initially sat down to write this post, I was ready to be full of angst about the state of archives in our society, and that our roles were being taken over by IT departments and computers. But open data repositories generally deal with data that is machine-readable, and usually machine-created. On the other hand, we archivists have an explosion of digital data to deal with, the natural extension of the explosion of paper records during the middle of the 20th century; emails, word documents, excel spreadsheets, websites, blogs, Twitter accounts, Facebook, and more still defy easy manipulation by computers and easy aggregation into data repositories. One could argue that by taking the burden of providing these datasets that are easily described in the aggregate, archivists are thereby allowed to concentrate on records that cannot be described so easily or automatically. In addition, many of these datasets are provided in machine-readable format, which obviously leads itself to be described by machines rather than archivists. One of the things at which archivists excel at is taking a large mass of data that is difficult or impossible to describe automatically and providing access to it to researchers. But there is always an existential pang in my soul when a record previously provided by archives moves elsewhere.

So what do you think? Does it matter? Am I being an open government hipster, saying that libraries and archives were into open data before it was cool? Or should we cheer the move of these types of records to a different sector?

The Library of Virginia now has emails from Tim Kaine’s administration available online!

Since being hired by the Library of Virginia just over a year ago, a part of my job has been to process emails from the administration of Governor Tim Kaine, who was Governor of Virginia from 2006 to 2010. And last week, the first fruits of our labors were realized, with the release of over 66000 emails from Tim Kaine and his executive office. I was only a very small part of this overall effort, but I am proud to be a part of an organization that takes this kind of commitment to making digital public records accessible. You can find the whole collection on our Virginia Memory site; have a look!

Why I use free software

As everyone by now has heard, Google has decided to pull the plug on Google Reader, its RSS feed aggregator. Google’s reason is that Reader has a declining user base and that it wants to concentrate on other projects. I should have known that this was in the works every since they killed the social aspects of Reader and made its link harder to find. Unlike some other companies, Google is providing users with an easy way to extract their data and giving them three months before the service is shut down.

But let’s be clear: their motives are nothing if not in their own self-interest. It was one thing to allow Google Reader to continue to operate when, while it provided no benefit, there was no harm in allowing it to occupy a small section of Google’s massive server farms. But now the rumor is that Google will be launching its own competitor to Apple’s Newsstand, a place where people can pay for digital subscriptions to online content. Many of those content providers publish at least some of that content for free, often with RSS feeds; with Google axing Reader, it removes any perceived conflict that might prevent those content providers from joining up.

I tend to go back and forth on the scale of free software ideology vs. convenience, with this decision by Google propelling me firmly back towards the side of ideology. Projects like Firefox and Debian (both of which I am a proud user), run by non-profits that are big enough to compete with industry backed alternatives, are key to a healthy software ecosystem. Of course for-profit companies, like Canonical, Red Hat, and Google, are necessary and useful members of the free software community; however, having organizations dedicated to free software for its own sake helps get the ecosystem vibrant and alive. While I may not always agree with the goals and methods of the Free Software Foundation, I admire their dedication and think that that are a key member of the community. The ability to have control over your data, without needing to rely exclusively on a third party for its continued existence, is more important than ever in the era of cloud computing. (Its also the reason that I think libraries and archives should be even stronger supporters of free software than they already are, but that is a topic for a separate post.)

As of now, I’m trying out two different replacements for Google Reader: NewsBlur and tt-rss. Both are free software projects: NewsBlur is primarily a hosted service, while you can get the code from GitHub, and tt-rss is only available to those who can install it on a server. Both of these options also have Android apps, which is the other key requirement for me. Right now NewsBlur is still too slammed for me to give it a proper tryout, so tt-rss (installed on my raspberry pi!) is my main feed reader right now. If you are looking for an alternative to Google Reader, try one of those if you can.

Science Friday talks digital preservation

Everyone’s favorite Friday NPR show, Science Friday, came across a topic that hits near and dear to my heart: digital preservation! This past week they talked about a new method of long term preservation, which entails encoding the data onto DNA. Obviously, the practical use of this technology is still decades away, but this could solve the problem of media obsolescence.

AOI, now powered by Raspberry Pi!

Among Other Items, my humble blog, has been undergoing a bit of a change behind the scenes. Formerly, this site was powered by an old desktop computer, probably released in 2002, which had been converted into a Debian server. Now, this site is powered by a Raspberry Pi, a $35 computer which is running a variant of Debian. I can tell that the site is a little slower, but its worth it for the ability to experiment with this new little machine and because of the power savings it will reap.

If you want to learn Linux, I highly recommend getting a Raspberry Pi. Lifehacker has a lot of different recommendations for projects that you can try, and I will probably have some too if you are interested!

Going to MARAC!

I hope to see anyone and everyone at MARAC next weekend! I’ll try to post some updates like I did last year, even if I am able to tweet this year.

Disappearing documents, Archives Team, Facebook, and more: Link roundup 10/4/11

This time, on the link roundup, there are disappearing documents, distributed digital archives projects, inaccurate quilts, government records, and more!

Link roundup, 9/30/2011

Hi all! I am going to start posting a roundup of interesting archives/library/information science related links. Hopefully this will be at least weekly. Today’s installment includes Henry Rollins, sexy archivists, QR codes, a concerto, and more!

The intersection of soccer and archives

As some of you may be aware, I am a bit of a soccer nut. I’m a fan of DC United and the Men’s and Women’s United States National Teams. Last summer, I created a new twitter account for my soccer ramblings, just so that all my archivist friends wouldn’t have to listen if they didn’t want to. However, the worlds of archives have been intersecting recently.

The first intersection of soccer and archives is the fact that Vancouver Whitecaps FC, a team in Major League Soccer, has hired an archivist for the rest of the season. There was a post about it on a Canadian archives listserv, but I am unable to find a link to the actual job posting. I emailed the Whitecaps to see if they would send me the posting, but no word on that yet. I would be interested to see what kind of records they are looking to preserve; memorabilia and paperwork from the club itself, of course, but are they going after more? There is a world of digital content, blogs, tweets, flickr pictures, and supporters clubs items that they could also try to get.

The second intersection is one that we are all used to: an interest old document being put up for auction and sold for a ridiculous sum of money. In this case, the oldest soccer team in the world, Sheffield FC, has sold the oldest handwritten copy of the rules of soccer for the tidy sum of $1.42 million. That document, as well as others, were sold to raise money for the club, which currently plays on the seventh level of English soccer. This amount of money is a gamechanger for a team on this level, three rungs below fully professional leagues, and could allow the the ability to pay good enough players for them to jump up into more prestigious, and more lucrative, leagues. However, there is always a twinge in my heart when a collection is broken up and sold into private hands. Hopefully someday these materials will be back in the public’s custody, or perhaps the new owner will allow it to be digitized and made available that way.

Things I might have tweeted, part 6: digitizing oral histories #marac

Jennifer Synder, AAA
Uses Sound Directions from IU.

Pet peeve of mine: gold “archival” cds. They are no more reliable than regular cds, cost more, and give you a false sense of security. If you need cds, for whatever reason, just use regular cds.

Vendor continuity and relationships are important in succesful large scale digitization projects.

Kate Stewart, American Folklife Center

Created a database to figure out who has been interviewed with regards to the Civil Rights Movement. Trying to do new interviews with peiple who have never been interviewed before. The list should be published soon.

Oral history interviews are going more and more to video, although they will still do audio only if the interviewee insists.

Beth Millwood, UNC SOHP

Four keys for oral history programs: Ongoing consulting, training, returning work to the community, and giving the oral histories to the larger community.

An ideal oral history program: has a continued connection with those interviewed, a willingness to train at several levels, access to resources for community groups.