Supercharge Your Zotero Library Using Paper Machines: Part II


Visualization of the first ten topics extracted from a Zotero collection.


 Share Share

Related Story:

Supercharge your Zotero Library Using Paper Machines: Part I

Related Links: 

Paper Machines

Zotero

 Join the discussion

In my last post I discussed how Paper Machines, the text analysis add-on for Zotero, can help you visualize your research. Some of Paper Machines' features are pretty self-explanatory, but others are less intuitive. Here I've tried to expand on some of the potentially complicated aspects of Paper Machines to supplement the documentation available on the developer's site.

Getting Started
Paper Machines is available for Zotero Standalone and Mozilla Firefox. To install the Paper Machines add-on in Firefox, download the  XPI file, then load it by navigating to tools → add-ons → install add-ons → get add on from file. In Zotero Standalone,  navigate to tools → add-ons → gear icon → install add-on from file.

Once you've installed the add-on, you can adjust various default settings.

  

You can analyze the contents of your Zotero library by right clicking on any collection and selecting “Extract Text for Paper Machines.” Once the text is extracted, you have the option of running various processes and viewing the corresponding visualizations.

Word Cloud
Paper Machines’ default word cloud is automatically displayed at the lower left corner of the Zotero pane. You can also compare sets of text using multiple word clouds, which can be divided either chronologically or by subcollection. This option requires that you select among multiple filter methods:

  • None produces a simple word cloud based on raw frequency.
  • Tf*idf eliminates words that are deemed unimportant to the corpus.
  • Dunning’s log-likelihood measures the probability of a word occurring in one corpus of text versus another.
  • Mann-Whitney U assesses how consistently a given term appears in one corpus versus another. Here’s a good post about the differences between Dunning’s log-likelihood and Mann-Whitney U.

Topic Modeling
Using the MALLET toolkit, Paper Machines can determine what topics (derived from groups of words that appear together) arise most frequently in your text. Topics can be charted over time (in days), within specific subcollections, or by mutual information. You can also adjust the topic modeling settings, including:

  • Tf*idf (See above.)
  • Porter stemming modifies words by removing their suffixes."Worked” and “working,” for example, would both be counted under the word “work.”
  • JSTOR for Data Research uses data from JSTOR to supplement the data in your Zotero library. You must have a JSTOR account to use this function.
  • Number of Iterations  (under "Advanced Options"): Paper Machines defaults to 1000; the larger the number of iterations, the longer the sampling will take; smaller numbers will produce lower-quality models.


Right click on a Zotero collection for the basic Paper Machines menu.


There are a number of other adjustable fields under "Advanced Options," but the default settings should work well for almost everyone. If you're interested in delving into the mechanics of topic modeling, I'd suggest starting with this post from The Programming Historian 2, as well as"A Whirlwind Tour of Automated Language Processing for the Humanities and Social Sciences," a book chapter by Douglas Oard.

Certain Paper Machines functions—for example, Periodical PDF Import and Classifier—are still in the experimental phase, so I'll explore them after they've been updated further. Be sure to select "automatically update" under the add-on preferences so you can benefit from the added functionality that's being continually added to Paper Machines.

Authored By: 

Sarita Alami is a Graduate Fellow at DiSC.

 How to use Paper Machines, the add-on that incorporates a range of text visualizaiton tools into your Zotero library.

Site design by: Sharpdot