Corpus Delicti is a legal term for the physical evidence of a crime. It shows up a lot in the detective/crime genre when there’s a missing dead body. The title of my Digital Scholarship Summer Graduate Research project comes from the fact that I spent most of the summer searching for my own missing scholarly “body”—the corpus of texts of Victorian detective fiction, a genre we read surprisingly little of and know surprisingly little about, despite the continued popularity of Sherlock Holmes.
My investigation into the missing corpus led me to a wide variety of digital repositories, including some wanderings in the massive holdings and research tools of the HathiTrust. I ended up with a corpus of about 200 novels—not as many as I had hoped for, but still enough to make cleaning the texts a daunting task, requiring me to use all the shortcuts and tricks I could find.
To round out the summer, I split up my corpus into over 8,000 chunks of text to run topic modeling. In topic modeling, the computer hunts through your corpus and turns out lists of words it believes are related thematically (aka captain, boat, ship, sea, men, board, deck, vessel, etc.) I used a tool called Mallet; this tutorial from the Programming Historian is incredibly helpful if you’re looking to get started.
After some finagling, Mallet produced a list of twenty “topics” or themes. But how to visualize them over time? I set up some Excel spreadsheets to let me average and visualize my data, and here are the results! (Click on the images to see more detail.)
This graph shows the percentage of each topic in all of the texts published in a particular decade. The chart above is a bit overwhelming; pulling out some specific topics can make it easier to see trends:
House: door room house back window hand light night man open bed moment heard head stood eyes side time floor
Home/Family: father lady young years dear mother wife love son house girl daughter lie good day married home woman poor
I found a surprising inverse relationship between the topics I had labeled “House” and “Home/Family.” As you can see, the House words are mostly literally about physical interior space—doors, windows, beds, floors. The Home/Family category is populated by family members and domestic relationships—father, mother, wife, son, daughter, married. I believe this inverse relationship might express a shift from domestic fiction with a crime element (popular in the sensation novels of the 1860s) to a more purely “detective” genre, in which case the layout of a house might well be more important.
I also found fascinating congruences between themes:
Alibi: man train street police London detective station left hotel time cab office asked found day men inspector case place
Murder: replied sir man asked murder bishop killed cried woman night dead found crime Dr. left father lord girl marry
The topics I labeled “Alibi” and “Murder” both increase more dramatically over time than any other topics. I see these two topics as interrelated—as murder becomes more prevalent in fiction, so do police detectives!
This is particularly interesting to me because there’s a bit of a debate in the field about whether authors became more or less interested in murder over the course of the century; I’ve seen arguments on both sides. While not entirely conclusive, this data does suggest that murder (and its investigation) increased in importance over the course of the nineteenth-century.