SLA News Division Awards Announcement

On Monday, Eli Edwards, past chair of the SLA News Division posted this note to NewsLib:

Dear Members of the News Division …

It is my honor and privilege to announce this year’s awardees of the Division’s Joseph F. Kwapil Memorial Award and the Agnes Henebry Roll of Honor Award.

The 2013 Agnes Henebry Roll of Honor Award will go to Catherine Kitchell and Derek Willis. The Joseph F. Kwapil will be awarded to Justin Scroggs.

Details for the awards ceremony in San Diego during the 2013 Annual Conference are still being finalized.  If you plan to attend the annual conference and will want to attend the ceremony, please email me at and your name will be placed on a list of potential attendees.

Thanks to the Awards Committee this year for their efficient and invaluable work: Carolyn Edds, Peter Johnson, Toby Lyles, Leigh Montgomery, and Justin Scroggs (no, he didn’t vote for himself, the rest of the committee thrusted the honor upon him).



Another Way to Look at the Congressional Record

I’m going to start this Tech Tips post by assuming that, unlike me, most people don’t love reading the Congressional Record. But newsrooms often like to know what’s in it, and when their elected representatives said it.

The tried and true route for answering such questions is to search via Thomas, the Library of Congress legislative site, or you might try C-SPAN’s Congressional Chronicle. Now there’s another way to see what members of the House and Senate say in the Record from the folks at the Sunlight Foundation, called Capitol Words.
Capitol Words is a search engine for the Record with some nice extras built-in. For example, you can find out the popularity of particular words or phrases over time, or popular words or phrases. The site allows you to narrow the focus to lawmakers from a particular state or party, too. For example, here’s what a comparison of the terms “bailout” and “big banks” looks like.
The site also allows you to browse by date, where you can see popular words and phrases month by month or even day by day. Individual terms have their own pages, so you can see the history and popularity of words such as “preexisting“, for example.
Sunlight gets the text of the Record straight from the Government Printing Office, so it’s the official version. Capitol Words just applies a bit of structure to it by attempting to definitively identify every speaker and index every word. See what you can learn about your state’s delegation by exploring it.
And by the way, if you haven’t checked out that C-SPAN site I referenced, it’s a great way to isolate video or find recordings of a particular speaker – not just in Congress but of any C-SPAN appearance.

New 2010 Census Data Project from IRE

We all know that the 2010 Census data has started to come out, giving newsrooms updated demographic information from the once-every-10-years count of the American population. The Census Bureau website has detailed information and a schedule of new releases. But what if you just want to quickly download population figures and do some simple comparisons?

A new project by Investigative Reporters and Editors and the Donald W. Reynolds Journalism Institute at the University of Missouri offers quick access to 2010 Census data in a variety of formats and ranges. The project, built by volunteers from the Chicago Tribune, New York Times, USA Today, CNN, Spokesman-Review (Spokane, Wash.) and the University of Nebraska-Lincoln, allows users to browse data from states, counties, places and even individual tracts, providing 2000 Census data for comparison.

For example, data from Alachua County, Fla., shows that the total population grew some 13.48 percent from 2000 to 2010 (the project helpfully includes raw numbers and the percentage change). And each page can be downloaded as a CSV file that can be opened in Excel or another spreadsheet program for analysis, although you’d be best-served to have a reference to Census data headers handy. Luckily, the IRE Census project provides one.

Developers who want to use Census data directly in Intranet or Internet applications can also get their fix via the project’s JSONP files, which make it easy to read the data programmatically. You can even download a shapefile of geographic data, and the project allows you to compare multiple geographies (such as a five-county area) if you want to.

The Census Bureau is still your source for reports and in-depth releases based on the 2010 Census data, but if you want to play with the data yourself and don’t want to download the entire set (which can be very large and hard to manage), IRE’s project is a great way to dip your toes in the Census pool.

–Derek Willis

A Tech Tips Roundup

It has been a long winter here in the mid-Atlantic where I am, which means I had plenty of time indoors to check out some great new tools and utilities for libraries and librarians. There’s no theme or organization to these, but they represent some of the best things I’ve seen in the realm of creating and managing information. Let’s get to it!

CAR Conference
This year’s Computer Assisted Reporting conference by Investigative Reporters and Editors (IRE) was loaded with good ideas and tutorials. Chrys Wu has collected a lot of them on her blog, and many of them have applications in the library, such as:
  • NodeXL for Social Network Analysis by Peter Aldhous of New Scientist. Shows how an Excel plugin can be used to create a network analysis. Great for projects that involve keeping track of many people.
  • Google Refine tutorial and datasets by David Huynh of Google. If you ever have to clean up or standardize some information, Google Refine might make your life much easier. It runs on your desktop, too.
  • A Gentle Introduction to SQL using SQLite: slides, full tutorial and steps only by Troy Thibodeaux of the AP. If you want to get started learning databases, this is a great way to go. SQLite is most likely already on your computer.
With the rise of Wikipedia, many researchers are interested in how often and where links to external sources appear in the volunteer-edited encyclopedia’s article references. Ed Summers, a developer at the Library of Congress who will be speaking at the Special Libraries Association conference in June, has a project that tracks references in Wikipedia.
Called linkypedia, the project can be set up and run on a local computer. It accepts one or more domain names or URLs (, for example) and scans Wikipedia for links in article references. Here’s a screenshot of the results for a search on, which is our congressional database at The Times:
Ed is working on making a public, hosted version available for others to use, but if you have administrative rights on your PC and are feeling adventurous, you can try to set it up yourself. I’ll be glad to walk anyone through the process.
New FOIA Resources

For those of us who deal with public records requests, a new site from the Department of Justice may come in handy. launched in March as a portal site for news and statistics about the processing of federal Freedom of Information Act requests. It shows percentages for full, partial and denied requests by agency along with other reports. Good reading for people who want to learn how to become a better FOIA submitter.
A few days before, the folks at the Investigative Reporting Workshop, based at American University, launched a new blog called Exemption 10 that covers federal FOIA issues. It is primarily written by Wendell Cochran, the Workshop’s senior editor.
A Basketball Database

This being March, basketball is a big topic. And while it’s not college-focused, the Los Angeles Times recently took the information it had collected about the NBA’s Los Angeles Lakers and compiled it into a searchable and browsable database that should inform and entertain fans. It also is a great internal resource for reporters, since they’ll have a single place to look up facts and refer to when trying to settle those all-important sports desk debates.

Tech Tip: DocumentCloud for Librarians

Documents – and the information they contain – are the lifeblood of news organizations. We read them, write about them, discuss them. And, every so often, we do the right thing and let our readers and listeners do the same. DocumentCloud is a project that could change the way that newsrooms deal with primary source materials such as government reports or court decisions. And news researchers should take full advantage of it.

First, a disclosure: DocumentCloud is run by a group that includes my current boss, Aron Pilhofer, and when I’m in New York I sit near the site’s developers. So I’ve got a bias. But I think that when you look at what other journalists have been able to do with it, and consider the internal newsroom uses as well, you’ll agree that it’s a valuable tool.
One of DocumentCloud’s great strengths is freeing information from file formats like the PDF, allowing it to be read and searched as you would almost any Web page. In this way, documents become a seamless part of the story, not a distracting trip away from it. The Memphis Commercial-Appeal recently used DocumentCloud to help present its project on civil rights era photographer Ernest Withers, who was also an FBI informant.
DocumentCloud made it easy not just to view the FBI reports, but also simple for reporters to draw readers’ attention to the important bits via its annotation feature (it’s the part in yellow here). That static PDF can now be a more interactive document, more Webby, if you will. The Arizona Republic used it to help explain SB1070 and the impact of a federal judge’s ruling on it, inviting two attorneys to add their expertise.
These are two examples of public projects, but DocumentCloud can also be used to store documents that newsrooms might not want to share externally (yet). It’s a great way to maintain a set of files that anyone from the newsroom can access and annotate, making it a good candidate for long-term project work. And when you’re reading to show that work to the world, you can make any or all of the files public.
So how does it work? If you have, say, a PDF, you can upload it to DocumentCloud’s servers, where it will be scanned and have the text extracted (electronic PDFs yield a better result, but DocumentCloud tries its best for images using Optical Character Recognition software). Then you can annotate the finished document. For more details, check out the FAQ.
DocumentCloud is free to use, so there’s not much stopping you from giving it a try. Contact the folks there to get an invite; they love working with news organizations.

Newsroom Wiki Session

This morning’s session on newsroom wikis featured demonstrations of several research wikis in use at papers from St. Louis, Columbus and Raleigh. MediaWiki seems to be a popular choice for newsroom wikis, in part because it’s very customizable and visually familiar. In most cases, newsroom wikis are being edited only by researchers; broader newsroom access doesn’t seem to be very widespread yet. Thanks to Jessica, Mike Meiners, Susan Ebbs and Jim Hunter for a great session.

An Intro to My Session

Newslibbers, please forgive the shameless plug, but I wanted to give an introduction to my session next week at SLA. Somehow, the title got a little muddled, so it reads as “News Researchers: News Research in the Newsroom,” which could only have been envisioned by the Department of Redundancy Department. At 7:30 a.m., perhaps a snappier title would be better.

What I’m hoping for is a conversation, which requires two things: an audience, and an idea of what I’m talking about. So here it is: for the past year I’ve been writing a series of essays about using technology to improve reporting and research in the newsroom. Next week, I’ll be showing some examples of the stuff I’ve written about and explaining how researchers can implement them in their newsrooms. Hope to see you there — your questions and comments are welcomed.

– Derek Willis

Fixing Journalism…an article from Derek Willis on his TheScoop website

Fixing Journalism:
The Washington Post’s Derek Willis has challenged news librarians and researchers to be information technology evangelists in their newsroom, after publishing an essay on what’s wrong with journalism on his TheScoop website. The link was posted to NewsLib a few days ago and now has been linked to by Romenesko.

SLA Conference 2004: Money in Politics

Money in Politics:

Derek Willis is doing the seminar at the conference next week and finds he needs to expand his presentation due to illness of another presenter.

Derek wants to know what attendees might like to see included in this presentation. What do news librarians and researchers need to know about campaign contributions, candidate finances, and the like?

Please email Derek at: derek (at)