Ethan Zuckerman ’93 blogs about MediaCloud, a new project to visualize flows of news and information on the internet sponsored by the Berkman Center for Internet & Society:

Media Cloud is a platform to help researchers find quantitative answers
to questions like:

– What type of stories are covered more heavily in blogs than in newspapers?
– How does coverage of a topic like Iran differ between national newspapers, local newspapers and political blogs?
– How much overlap in coverage do two news sources have? If you’re reading the New York Times and the Boston Globe, how much topical difference do the sources have?
– How do news stories move between bloggers and mainstream journalists? How common or infrequent is it that bloggers “break” stories or introduce new analytic frames?

For six months [..] we’ve been collecting data from several hundred US political blogs, from the US’s largest newspapers, a selection of smaller newspapers and some international newspapers and news agencies like the BBC. We subscribe to these sources RSS feeds, retrieve the full HTML of every piece of content posted, use a set of algorithms to separate story text from formatting information, feed story text into Calais and other classification tools to associate “named entities” and topic tags with each story.

The result? We can report, with a pretty good degree of certainty, the main topics covered on Fox News in the past week. Or on any of a thousand other news sources. […] We’re also releasing tools that let you dive more deeply into the data – you can see what topics are most closely associated with a term like “Iran” in different media sources, or build maps that visualize what parts of the world different media sources are paying attention to.


We’re releasing all the code created for MediaCloud under the GPL later this month, and hope to make a dump of the data we’ve collected thus far accessible shortly afterwards. We’re reaching out to academics and researchers around the world to help them build experiments that lean on the MediaCloud data, and we’re planning on making it possible for other folks to build experiments via the API in the near future.

Two examples of MediaCloud’s analysis:

You can create your own version of the map above by going here.

Watch Ethan talk about MediaCloud here, or try it out for yourself.

