Written by Liz Rodrigues
Model of the Month Club is a regular DSC event for the fall semester. At each meeting (held monthly as you might have guessed), I am going to be presenting on a digital humanities project that uses a mathematical model to represent and analyze texts. Truth in advertising: I’m not a statistician, I’m a humanist in search of enough statistical understanding to interpret the results as they are relevant in my domain of knowledge. The goal is to illuminate some of the most common approaches and, with as much detail as our access to/comprehension of the underlying algorithms allow, parse the assumptions that shape the model reached.
For the first month’s meeting, we looked at the recently-released HathiTrust genre dataset published by Ted Underwood, Boris Capitanu, Peter Organisciak, Sayan Bhattacharyya, Loretta Auvil, Colleen Fallaw, and J. Stephen Downie as an example of the more general modeling technique of building a classifier.
The slides from this meeting are below. The key source was Underwood et al.’s Interim Project Report.