To Quote or Not to Quote

How Repeated Citation Makes Shakespeare Legible (or Not)

data: JSTOR Labs | texts: Folger Library | visualizations: Derek Miller


Here you will find all of the dialogue from all of the plays by William Shakespeare.

But this version of Shakespeare's plays is different from those you usually find online. On this site the plays are, well ... kind of hard to read. That's because I've displayed them so that they reflect not simply the texts but rather how people cite Shakespeare's words in their own writing.

Using an API built by JSTOR Labs, I gathered the number of times every line from every play has been cited in JSTOR's journal collection. (JSTOR Labs explains in detail how they created their dataset. I restricted my search to 75% similarity and a match length of 15 characters.) With that data, I've calculated the average number of citations per line, normalized those results between 0 and 1, and then made the underlying text on this site more or less clear (I'll call this "fuzzifying"), as a function of the normalized citations per line calculation.

To understand better this process, look at the menu on the left. I ran this operation first on every play in the Shakespeare canon. If you glance at the list of play titles, you can see the results of this process. Hamlet stands out clearly and firmly: it has the highest citations per line of all the plays and thus is fully legible. Other highly cited (and legible) plays include The Merchant of Venice and Macbeth. Meanwhile, Love's Labors Lost and Two Noble Kinsmen are barely legible. (The plays are listed here in one of the common compositional chronologies.)

Then I iterated this process at each level of the text. What does that mean? Well, if you click on a title, you'll see a submenu for that play. Each act is fuzzified to represent the normalized citations per line for that act relative to other acts in the play. And each scene within each act is fuzzified to represent the normalized citations per line for that scene relative to other scenes in that act. Finally, if you select a scene itself, you'll see the text for that scene, with each line fuzzified relative to the normalized citation count for lines in that scene. (The play texts are all courtesy of the Folger Library's Digital Texts, which JSTOR Labs used for their data query.)

I hope this project helps us to see (a) how much Shakespeare we don't cite (or, at least, don't cite very often); and (b) that our unequal citational practices are fractal. By this I mean that, no matter how you slice the text, we cite some subset of the text with far more frequency than other sections. This is true at the level of the scene, act, play, and corpus. Every time you see one or two sharply defined items and a passel of other blurry items, you're witnessing the effects of those uneven citational practices. And no matter what level of Shakespeare's work you look at, you will find that same inequality repeated.

So, go ahead, click around, and enjoy reading (or trying to read) the Shakespeare that citational practices bring us.