By Jaclyn Partyka
David Holmes claims that “[t]he major problem inhibiting stylometry’s acceptance within humanities scholarship is that, as yet, there is no consensus as to correct methodology or technique.” So, even though scholars have been practicing various methods of stylometry for almost a century, methods seem to have evolved very individualistically, without agreement between different researchers. This conclusion is somewhat surprising within the context of the digital humanities, considering how the increased processing power of computers would allow scholars to perform forms of “distant reading” analytics on an ever increasing archive of digitized texts. Upon embarking on my own version of stylometric research, I have tried out a number of different programs to test out their usefulness. Today I’m going to review one called Signature.
Signature is a stylometric program designed by Peter Millican of Oxford University, previously of Leeds University. I wanted to begin my reviews of various stylometric programs here because Signature is centered on the most basic and straightforward stylometric techniques while also providing computer generated visualizations. Signature interprets stylometric data using five different frequency criteria: Word lengths, Sentence lengths, Paragraph lengths, Letters, and Punctuation. These kinds of frequency measures are typical for some of the earliest examples of stylometric research and so they are a great place to start interpreting your specific corpus.
Signature is a free software download and has a relatively simple user interface, allowing you to upload your corpora using text or html files. You can also combine individual files into a corpus, which is very useful when working with multiple authors at once. Signature also gives you the option when uploading your files to divide them into halves, which is helpful for statistical analysis, but not very precise when it comes to researcher control.
As I mentioned in my last blog post, my project is somewhat atypical of normal stylometry projects, since I’m not comparing multiple authors, but looking at a single author’s work for significant stylistic variations. It is for this reason that any results I get using a program like Signature will likely show only slight variation across individual texts or corpora, whereas if you were to compare two or more different authors the variations would likely be more significant.
The first test I did using Signature was to visualize Roth’s corpus* by sentence and paragraph length.
The Sentence Length visualization shows the majority of the novels as maintaining a relatively even distribution of sentences consisting of 3 to 10 words, with Exit Ghost and American Pastoral being the outliers. And thank goodness for outliers! I think the variation here actually reinforces the specific stylistics of each novel since Exit Ghost is primarily composed in a dramatic dialogue, which would account for the shorter sentences, and American Pastoral tends to include a significant number of longer, stream-of-consciousness style musings. And, generally, the paragraph graph reinforces this interpretation, with the dramatic structure of Exit Ghost resulting in shorter paragraphs. So far, so good.
But what I’m really interested in is how Roth’s more “autobiographical” novels compare to his typical fiction. To look at this I chose The Facts: The Novelist’s Autobiography, because it truly bridges the gap between these different genres according to its structure. Basically, the novel begins with a letter from “Roth”** to Nathan Zuckerman, an author-character that many readers view as a mask of Roth’s own understanding authorship. “Roth” asks Nathan if he should bother publishing this foray into autobiography instead of his normal fiction. Then the novel recounts, in relatively typical autobiographical reportage, Roth’s childhood growing up in Newark, his college years, and his disastrous first marriage, and ending with the inspiration and publication of Portnoy’s Complaint. What follows this section is Zuckerman’s reply to Roth, and he basically cautions him not to publish. This metafictional and dialogic style paratextually incorporates fiction into Roth’s autobiography, so I was interested in whether this unconventional approach would be reflected at the stylistic level.
To do this I compared Facts alongside the Signature Roth corpus according to sentence and paragraph length. I also contrasted these results by looking at the novel immediately preceding The Facts chronologically, The Counterlife.
While these are just preliminary results, it’s clear from the graphs that The Counterlife aligns much more closely with works typical of Roth’s oeuvre while The Facts deviates significantly. The difference may be attributed to a number of factors, but one of them may be that Roth’s writing style slightly changes when he switched between writing autobiographical and fictional modes.
As we can see from these few comparisons, stylometric probability tests only provide limited insight into the actual authorship of a specific text. Stylometry, like most digital humanities methods, cannot be used within a vacuum. Rather, these tools can help us visualize and confirm the significance of textual similarities and differences, but they cannot definitively determine the probability of authorship.
Finally, Signature is a useful, but very limited stylometric tool due to a lack of resources and updates. Even though the designers of Signature may have had the intention to develop additional versions, very little has been updated since 2013. Additionally, while the designers provide a helpful powerpoint file to explain how to use the program, there is no additional help file or resource contained within the program. There are some case studies using Signature available via Millican’s website, but there is no active community of Signature users.
However, this is not to say that Signature does not have its benefits. It would probably be most useful in the classroom since students could practice loading corpora and generating some basic graphs, but it is very limiting for significant stylometric research.
*If you are a Roth scholar, you will notice that this is not Roth’s complete oeuvre, but it does provide a basic overview for these early tests.
**Not to conflate the author with the fictionalized version of his implied author, I’ll refer to this character as “Roth.”