{"id":8980,"date":"2024-05-13T12:56:10","date_gmt":"2024-05-13T16:56:10","guid":{"rendered":"https:\/\/sites.temple.edu\/tudsc\/?p=8980"},"modified":"2024-05-20T11:02:39","modified_gmt":"2024-05-20T15:02:39","slug":"building-and-topic-modeling-a-corpus-of-gothic-literature","status":"publish","type":"post","link":"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/13\/building-and-topic-modeling-a-corpus-of-gothic-literature\/","title":{"rendered":"Building and Topic Modeling a Corpus of Gothic Literature"},"content":{"rendered":"\n<p>By SaraGrace Stefan<\/p>\n\n\n\n<!--more-->\n\n\n\n<h3 class=\"wp-block-heading\">Introduction<\/h3>\n\n\n\n<p>How do we convey that which is beyond words? According to literary critic Teresa Goddu, gothic literature has historically served as the genre that allows authors to do just that: to \u201cspeak the unspeakable\u201d (Goddu, 10). As a PhD student studying the gothic in the English department at Temple, I wanted my culminating project for the Cultural Analytics Certificate to investigate what hundreds of gothic novels would say if given the ability to speak, shriek, or moan in concert together. I specifically was curious to see what thematic focuses or recurring topics would emerge if I analyzed a large corpus of gothic texts from different time periods.<\/p>\n\n\n\n<p>Although my dissertation does not directly employ digital methods, I find great value (and challenge!) in complementing my traditional scholarship with computational exploration. For my project, I decided to create a corpus of gothic literature and examine its thematic interests over time through topic modeling with Python. Needing to operate within copyright limitations, I decided to create my corpus exclusively from public domain texts available from <a href=\"https:\/\/www.gutenberg.org\/\">Project Gutenberg<\/a>, a library of free eBooks.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading has-text-align-left\">Building a Gothic Corpus: What Even is the Gothic?<\/h3>\n\n\n\n<p>Before I created my gothic corpus, I had to first define what I meant by \u201cgothic literature,\u201d \u2013 a notoriously difficult task! As Jerrold E. Hogle explains in the Introduction to <em>The Cambridge Companion to Gothic Fiction, <\/em>the gothic is a \u201chighly unstable genre\u201d (Hogle, 1),&nbsp; that engenders much debate amongst his scholars, but there are certain identifiable characteristics that go beyond the mere presence of ghosts or castles.<\/p>\n\n\n\n<p>Scholar Chris Baldick describes the gothic as the combination of \u201ca fearful sense of inheritance in time with a claustrophobic sense of enclosure in space, these two dimensions reinforcing one another to produce an impression of sickening descent into disintegration\u201d (Baldick, xix). More concretely, gothic texts, be they British, Irish, American, etc., invoke the specter of the past and interrogate its hauntological effects on the present.<\/p>\n\n\n\n<p>For my corpus, I selected 238 texts from Project Gutenberg, with the first 47 texts spanning from 1682 to 1816; the second quarter (45 texts) spanning from 1817 to 1890; the third quarter (45 texts) from 1891 to 1903; and the final quarter (46 texts) 1904 to 1925. These divisions were in an effort to make my data set smaller and easier to work with, but more historically-conscientious corpus divisions may be part of my future research.<\/p>\n\n\n\n<p>My corpus for the time being contains available gothic classics such as Ann Radcliffe\u2019s <em>The Mysteries of Udolpho <\/em>(1794) and Mary Shelley\u2019s <em>Frankenstein<\/em> (1818), as well as lesser known texts like Elia Wilkinson Peattie\u2019s <em>The Shape of Fear<\/em> (1898) and M.R. James\u2019s <em>A Warning to the Curious, and Other Ghost Stories <\/em>(1925). Once I had created my <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1UISLZYeeXYyEnAC16FAJLQtudrYRBAuiP_GJWoExitU\/edit?usp=sharing\">corpus<\/a>, I was ready to begin transforming the texts into machine-readable data for my topic model.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"631\" height=\"1000\" data-id=\"8989\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/mysteriesofudolpho-3.jpg\" alt=\"\" class=\"wp-image-8989\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/mysteriesofudolpho-3.jpg 631w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/mysteriesofudolpho-3-189x300.jpg 189w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/mysteriesofudolpho-3-300x475.jpg 300w\" sizes=\"auto, (max-width: 631px) 100vw, 631px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"258\" height=\"392\" data-id=\"8988\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/A_Warning_to_the_Curious-2.jpg\" alt=\"\" class=\"wp-image-8988\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/A_Warning_to_the_Curious-2.jpg 258w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/A_Warning_to_the_Curious-2-197x300.jpg 197w\" sizes=\"auto, (max-width: 258px) 100vw, 258px\" \/><\/figure>\n<\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Topic Modeling the Gothic<\/h3>\n\n\n\n<p>There are a variety of online resources that will assist beginners with text analysis and topic modeling. For an introduction to text analysis in general, please look at my earlier blog post on the Cli-Fi and Banned Book projects, but for now, we will move straight to topic modeling.<\/p>\n\n\n\n<p>&nbsp;As described by Andrew Goldstone and Ted Underwood in their article, \u201c<a href=\"https:\/\/librarysearch.temple.edu\/articles\/cdi_proquest_journals_1625136568\">The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us<\/a>,\u201d topic modeling is a method that attempts \u201cto identify the thematic or rhetorical patterns that inform a collection of documents\u2026These patterns we refer to as topics\u201d (Goldstone, 361). More concretely, topic modeling is a kind of statistical modeling that uses unsupervised machine learning to identify recurring clusters of words within a corpus of texts.&nbsp;<\/p>\n\n\n\n<p>&nbsp;For their study, Goldstone and Underwood extrapolated historical patterns from a corpus of 21,367 scholarly articles using the topic modeling toolkit <a href=\"https:\/\/mimno.github.io\/Mallet\/topics.html\">MALLET<\/a>. Alternatively, I chose to analyze my 238 texts with&nbsp; a free Python library called <a href=\"https:\/\/radimrehurek.com\/gensim\/?utm_source=thenewstack&amp;utm_medium=website&amp;utm_content=inline-mention&amp;utm_campaign=platform\">Gensim<\/a> and the LDAvis Python library (pyLDAvis). These packages worked together to examine the statistical recurrence of different topics within my corpus and to then display them through visualizations.&nbsp;<\/p>\n\n\n\n<p>Furthermore, my project is indebted to my fabulous recently-graduated colleague Dr. Megan Kane and the <a href=\"https:\/\/github.com\/SF-Nexus\/extracted-features-notebooks\/tree\/main\/notebooks\/Analyzing_Extracted_Features\">excellent code<\/a> she created for the Cli-Fi project which can be accessed via Github.<a href=\"https:\/\/colab.research.google.com\/drive\/1fOty6jcc7TPZBWBWi1QI-ab2YAVNynIx?usp=sharing\"> My rendition of Megan\u2019s code <\/a>allowed me to break the texts in my corpus down into small chunks before trying to identify 100 topics present within those chunks.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Speaking the Unspeakable<\/h3>\n\n\n\n<p>So what did my topic model reveal? What did my gothic corpus \u201csay\u201d? After a good deal of trial and error working with <a href=\"https:\/\/colab.research.google.com\/\">Google Colab<\/a> and different runtime options, I was able to generate topics for each quarter of my corpus and visualize those topics through Word Clouds and more technical intertopic distance map visualizations.<\/p>\n\n\n\n<p>Although the results are not perfect, there are some focuses that clearly reoccur in different sections of the corpus such as Religion, depicted below in Topic 31 (1817-1890) and Topic 35 (1891-1903).<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-3 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"515\" height=\"290\" data-id=\"8990\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/GothicCorpus_1817to1890_topic31.png\" alt=\"\" class=\"wp-image-8990\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/GothicCorpus_1817to1890_topic31.png 515w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/GothicCorpus_1817to1890_topic31-300x169.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"515\" height=\"290\" data-id=\"8991\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/GothicCorpus_1891to1903_topic35.png\" alt=\"\" class=\"wp-image-8991\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/GothicCorpus_1891to1903_topic35.png 515w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/GothicCorpus_1891to1903_topic35-300x169.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/figure>\n<\/figure>\n\n\n\n<p>Of course, despite their consistent presence, the thematic focus of Religion (or Family, Nature, and House for example) do not seem unique to the gothic genre (though associating churches and bishops with witches and ghosts, as Topic 35 does, seems rather fitting). But I identified another topic that does seem to evoke my definition of the gothic, that which I have begun thinking of as \u201cUnease\/Insecurity.\u201d We can see similar evocations of insecurity, vulnerability, and disintegration in representative topics from each quarter of the corpus: Topic 30 (1682-1816), Topic 27 (1817-1890), Topic 61 (1891 to 1903), and Topic 46 (1904-1925).&nbsp;&nbsp;<\/p>\n\n\n\n<div class=\"wp-block-group\"><div class=\"wp-block-group__inner-container is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-4 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"515\" height=\"290\" data-id=\"9068\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1817to1890_topic27.png\" alt=\"\" class=\"wp-image-9068\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1817to1890_topic27.png 515w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1817to1890_topic27-300x169.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"515\" height=\"290\" data-id=\"9069\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus1682to1816_topic30.png\" alt=\"\" class=\"wp-image-9069\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus1682to1816_topic30.png 515w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus1682to1816_topic30-300x169.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/figure>\n<\/figure>\n<\/div>\n<\/div>\n<\/div><\/div>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-5 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"515\" height=\"290\" data-id=\"9072\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1905to1925_topic46-1.png\" alt=\"\" class=\"wp-image-9072\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1905to1925_topic46-1.png 515w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1905to1925_topic46-1-300x169.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"515\" height=\"290\" data-id=\"9073\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1891to1903_topic61-1.png\" alt=\"\" class=\"wp-image-9073\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1891to1903_topic61-1.png 515w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/05\/GothicCorpus_1891to1903_topic61-1-300x169.png 300w\" sizes=\"auto, (max-width: 515px) 100vw, 515px\" \/><\/figure>\n<\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Conclusions<\/h3>\n\n\n\n<p>The presence of this recurring topic of gothic uneasiness, as well as the certain topics unique to each quarter of the corpus, such as the appearance of \u201cchemical,\u201d \u201cmagnetic,\u201d and \u201cevolution\u201d in the 1904-1925 corpus alone, seem to confirm that topic modeling can be a helpful way to analyze genre changes or consistencies over time. <\/p>\n\n\n\n<p>One can view the topics from each corpus quarter in relation to each other via the Intertopic Distance Maps hosted on my Github page: <a href=\"https:\/\/saragracestefan.github.io\/GothicCorpusViz_1682to1816\/#topic=0&amp;lambda=1&amp;term=\">1682 to 1816<\/a>, <a href=\"https:\/\/saragracestefan.github.io\/GothicCorpusViz_1817to1890\/\" data-type=\"link\" data-id=\"https:\/\/saragracestefan.github.io\/GothicCorpusViz_1817to1890\/\">1817 to 1890<\/a>, <a href=\"https:\/\/saragracestefan.github.io\/GothicCorpusViz_1891to1903\/\">1891 to 1903<\/a>, and <a href=\"https:\/\/saragracestefan.github.io\/GothicCorpusViz_1904to1925\/\">1904 to 1925<\/a>. <\/p>\n\n\n\n<p>Although the topics and visualizations generated from my gothic corpus reveal that future projects would benefit from additional cleaning and consideration, the process of creating this gothic corpus, preparing it for topic modeling, and considering the results have given me greater insight into the world of late-17th to early-20th century gothic texts as well as how traditional and digital scholarly methods can work together to speak the unspeakable.&nbsp;&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">References <\/h3>\n\n\n\n<p>Baldick, Chris, ed. <em>The Oxford Book of Gothic Tales<\/em>, Oxford University Press, 1992, reissued 2009.&nbsp;<\/p>\n\n\n\n<p>Goddu, Teresa. <em>Gothic America: Narrative, History, and Nation<\/em>, Columbia University Press, 1997.&nbsp;<\/p>\n\n\n\n<p>Goldstone, Andrew and Ted Underwood. &#8220;The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.&#8221; <em>New Literary History<\/em>, vol. 45 no. 3, 2014, p. 359-384. <em>Project MUSE<\/em>, <a href=\"https:\/\/doi-org.libproxy.temple.edu\/10.1353\/nlh.2014.0025\">https:\/\/doi-org.libproxy.temple.edu\/10.1353\/nlh.2014.0025<\/a>.<\/p>\n\n\n\n<p>Hogle, Jerrold E. <em>The Cambridge Companion to Gothic Fiction<\/em>, Cambridge University Press, 2002.&nbsp;&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By SaraGrace Stefan<\/p>\n","protected":false},"author":34028,"featured_media":8982,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[433,405,2,288,86],"tags":[477,71,329,6,380],"class_list":["post-8980","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cultural-analytics-practicum","category-digital-humanities","category-grad-students","category-literary-studies","category-services","tag-gothic-literature","tag-python","tag-text-analysis","tag-top-news","tag-topic-modeling"],"_links":{"self":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/8980","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/users\/34028"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/comments?post=8980"}],"version-history":[{"count":9,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/8980\/revisions"}],"predecessor-version":[{"id":9075,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/8980\/revisions\/9075"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media\/8982"}],"wp:attachment":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media?parent=8980"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/categories?post=8980"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/tags?post=8980"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}