Visualization

Supercharge Your Zotero Library Using Paper Machines: Part I


Topic Modeling output for a Zotero collection using Paper Machines

 Share Share

Paper Machines, the add-on that integrates a range of text analysis tools into Zotero, has generated quite a buzz in the short period of time since its release. For those of us that store notes, citation information, PDFs, and article links in huge Zotero libraries, Paper Machines has the potential to be a game-changer in terms of how we visualize our research.

Because Paper Machines is so new, it's being updated with added functionality every few days. I'll provide step-by step documentation for how to use specific components of Paper Machines in Part II of this post. For now, I'll discuss whether or not Paper Machines might be a good fit for your research, the tools that it offers, and how it might help your work.

Paper Machines provides a broad range of text analysis tools, but it's not meant for everyone's research. You'll probably benefit most from Paper Machines if you:

  1.  Already use Zotero to manage your sources. Paper Machines draws on a number of open source tools available elsewhere on the web. If you want to visualize your data but aren't already comfortable using Zotero, you might want to look elsewhere.
  2. Have a relatively large or robust Zotero library. At the time of this posting, Paper Machines incorporates the full text of Web snapshots and OCR'd PDF files into its text analysis, as well as the title, place, date, and subcollection of a source. The option to include notes, tags, and links to live websites will be available shortly.
  3.  Are collaborating on a Zotero library with a group. Paper Machines is very good at helping you figure out the contents of a collection. If you're working on a collection with multiple group members, it's a quick way to visualize what kinds of material your collaborators are adding.

What kinds of analysis tools does Paper Machines employ?

  • A word cloud with the option to filter out commonly used words.
  • Phase nets, which allow you to visualize relationships between common words in your text (for example, x and y; x is y)
  • A Geoparser, which uses location information to produce beautiful visualizations of the places mentioned in your texts.
  • DBpedia Annotation, which produces a visualization of what people, places, and things are mentioned in your texts.
  • MALLET-based topic modeling, which generates visualizations based on commonly occurring topics in your texts. The author offers some additional information about information about Paper Machines' use of topic modeling here.

What can Paper Machines help you do?

  • Assess the contents of a collection. Looking through the Paper Machines results is a helpful way to get to know the contents of a group library or to get reacquainted with a collection that you haven't used for a while.
  • Identify gaps in your material. Reviewing the MALLET output for a specific collection in my Zotero library (canonical works in US history) I noticed a surge in books about women's labor history (which MALLET identified using the terms women, labor, work, and activism) during the 1980s. I also noticed a lack of items in my library about these topics since 2000.
  • Compare collections. Analyzing two collections with Paper Machines makes similarities and differences evident. Using topic modeling, for example, I could see what subjects came up most frequently in the two collections and if they coincided. The word cloud function is the easiest way to identify concurrent terms and subjects at a glance.
  • Find patterns in your collections. Using the "Phrase Net" function, I conducted an "x is y" analysis on one of my collections. I was surprised to see that "democracy is necessary" and "Cold War is necessary" were recurring phrases a number of sources.

The Geoparser links texts in a Zotero collection to the places that they mention.

Additional examples of Paper Machines visualizations are available on the developer's site. The add-on is available for Firefox and Zotero Standalone, and visualizations  can also be saved as html files. While the occasional error or puzzling result is inevitable early on, the creator of Paper Machines is constantly tweaking the interface in response to feedback from users.

Authored By: 

Sarita Alami is a Graduate Fellow at DiSC.

Paper Machines is an add-on that incorporates a broad range of text visualizaiton tools into your Zotero library.

Mapping with OpenHeatMap and Geocommons


Related Story:

Visualizing Words with Voyant 

Related Links: 

Tweeting #OWS

Geocommons

OpenHeatMap

 Join the discussion

Even if data visualization isn't the primary goal of a project, adding an animated or interactive map can be an effective way to enrich a presentation, article, or lecture--and it doesn't have to take up huge swaths of time. As part of the Tweeting Occupy Wall Street project, I tested web-based mapping tools that would allow us to plot some of the 10 million tweets related to Occupy Wall Street.

Dozens of geographic data visualization tools, many of them open-source, are available on the web, but for this particular project I investigated tools that are 1) powerful enough to handle large data sets; 2) relatively easy to learn and share; and 3) free. Here's a rundown of the two tools that I found to be most effective in the Tweeting #OWS project: OpenHeatMap and Geocommons.

OpenHeatMap allows users to create static or animated heat maps (also called intensity or chloropleth maps) based on data uploaded through a Google Doc or Excel spreadsheet. Heatmaps plot values in a range of colors that indicate intensity, similarly to a meteorological radar map. One of the most user-friendly tools that I tested, all OpenHeatMap requires to generate a map is 1) location information, in the form of latitude and longitude or state/country abbreviations; 2) a column of values (used to plot intensity); and 3) if you want an animated heatmap, an optional column marked "time."

Customization options in OpenHeatMap allow creators to control features like color and size of the data and map. After customizing, a user can simply use an autogenerated code to embed the map into an website. Alternatively, Open HeatMap offers an option to host the map on your own site and fully customize it using the Heatmap API. My attempt to do so was unsuccessful, however, and and a number of sources suggest simply using the embed code to store and share a map.

More than a site for creating maps, Geocommons is a robust data analysis, management, and visualization platform. Like its name suggests, Geocommons embraces the open-source model and strongly encourages users to make their maps and data public (20 megabytes of private data storage is also available with a free membership). This means that, along with uploading data, users can access hundreds of data sets including census data, zip code and county maps, and much more.

It's possible to run analyses on data from within Geocommons, but I found it to be much faster to do the process in Excel or Google Docs first and upload the finished dataset that contained the values I wanted to plot. Geocommons makes it easy to aggregate data into non-map geographic visualizations, like this chart I make of the top states with Twitter activity related to Occupy Wall Street.

Geocommons serves up simple embed codes for its maps, as well as a customizable Javascript API geared toward site developers. Like OpenHeatMap, embedding maps is either straightforward and noncustomizable or tricky and flexible. I'd suggest using the standard embed code unless you have some experience with Javascript.

Authored By: 

Sarita Alami is a Graduate Fellow at DiSC.

Visualizing Words with Voyant


 Share Share

Authored By: 

Moya Bailey

When the graduate students arrived at DiSC for the fall semester, we were tasked with creating a visualization of the Emory Library Occupy Wall Street archive. We brainstormed with Jay Varner, our resident solutions analyst, about what might be the best way to highlight what could be done with such a massive amount of data. We decided that using the subset of geolocated tweets would provide an opportunity for some unique visualizations that would entice others to  learn more and want to use the archive.

Syndicate content

Site design by: Sharpdot