

{"id":8749,"date":"2024-05-01T11:47:06","date_gmt":"2024-05-01T15:47:06","guid":{"rendered":"https:\/\/sites.temple.edu\/tudsc\/?p=8749"},"modified":"2024-12-10T10:52:17","modified_gmt":"2024-12-10T14:52:17","slug":"climbing-a-mountain-of-books-with-distant-reading-part-1-of-the-banned-books-project","status":"publish","type":"post","link":"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/01\/climbing-a-mountain-of-books-with-distant-reading-part-1-of-the-banned-books-project\/","title":{"rendered":"Climbing a Mountain of Books with Distant Reading: Part 1 of the Banned Books Project"},"content":{"rendered":"\n<p>By SaraGrace Stefan<\/p>\n\n\n\n<!--more-->\n\n\n\n<h3 class=\"wp-block-heading\">Starting the Climb: An Introduction to Computational Text Analysis<\/h3>\n\n\n\n<p>My past two summers have been filled with the stuff of schoolchildren\u2019s nightmares and literary nerds\u2019 dreams: a mountain\u2019s worth of hundreds of books, all requiring digitization, curation, and analysis. <\/p>\n\n\n\n<p>Fortunately, as a PhD student in Temple\u2019s English department, being surrounded by and rapidly devouring books is an ideal way for me to spend the months of June-August. However, those of you who are familiar with the process of book-reading and the traditionally ceaseless marching of time may say, \u201cBut wait, SaraGrace, summer vacation is only so long! How could you possibly read hundreds of books?\u201d To that, I would say, \u201ccertainly not with that attitude!\u201d and then I would introduce you to distant reading and corpora creation.<\/p>\n\n\n\n<p><em>Then<\/em> I would tell you about TWO exciting projects here at the Scholars Studio, one all about Science Fiction and one about Banned Books. Of course, you would be curious to learn more about the current Banned Books project so you&#8217;d be thrilled to kindly read my blog post, and follow it with recent graduate Sydney Grimm&#8217;s blog post &#8220;<a href=\"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/01\/how-to-judge-a-banned-book-by-its-cover-part-2-of-the-banned-books-project\/\">How to Judge a (Banned) Book by Its Cover: Part 2 of the Banned Books Project<\/a>&#8221; about creating a banned book cover methodology, and read about our initial results and what happens when you judge a book by its cover in Abigail Corcelli&#8217;s post &#8220;<a href=\"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/01\/pulling-back-the-curtain-on-censorship-part-3-of-the-banned-books-project\/\" data-type=\"link\" data-id=\"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/01\/pulling-back-the-curtain-on-censorship-part-3-of-the-banned-books-project\/\">Pulling Back the Curtain on Censorship: Part 3 of The Banned Books Project<\/a>.&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Gaining a New Perspective: \u201cDistant\u201d Reading and the Cli-Fi Project<\/h3>\n\n\n\n<p>But before I get ahead of myself even more, let me define some terms: \u201cDistant\u201d reading is the practice of applying computational methods to a large number of texts, called a corpus, to see what can be revealed by reading and analyzing not just 2-3 books, but 2-3 hundred or thousand. The practice is often associated with the <a href=\"https:\/\/shc.stanford.edu\/arcade\/interventions\/distant-reading-after-moretti\" data-type=\"link\" data-id=\"https:\/\/shc.stanford.edu\/arcade\/interventions\/distant-reading-after-moretti\">influential and problematic scholar Franco Moretti<\/a> who coined the term \u2018distant reading\u2019, but such methods and their implications have been taken up by a range of academics for diverse purposes, such as Ted Underwood, Kenton Ramsby, and Catherine D&#8217;Ignazio and Lauren F. Klein.<\/p>\n\n\n\n<p>Ideally, scholars combine this \u201czoomed out\u201d perspective with critical research and analysis, so we should not think of distant reading as any sort of replacement for the traditional method of close reading one text at a time. Rather, it provides a different perspective from which we can consider literary and cultural trends.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Cli-Fi Project<\/h3>\n\n\n\n<p>I first gained distant reading experience working on the <a href=\"https:\/\/lcdssgeo.com\/omeka-s\/s\/scifi\/page\/digitizing-science-fiction\" data-type=\"link\" data-id=\"https:\/\/lcdssgeo.com\/omeka-s\/s\/scifi\/page\/digitizing-science-fiction\">Cli-Fi Project<\/a> over the summer of 2022 at the Temple University Loretta C. Duckworth Scholars Studio through funding from Temple&#8217;s Arts and Humanities Research Grant. Initially introduced by <a href=\"https:\/\/sites.temple.edu\/tudsc\/about\/current-staff\/h-alexander-wermer-colan\/\" data-type=\"link\" data-id=\"https:\/\/sites.temple.edu\/tudsc\/about\/current-staff\/h-alexander-wermer-colan\/\">Dr. Alex Wermer-Colan<\/a> in his blog posts <a href=\"https:\/\/sites.temple.edu\/tudsc\/2017\/12\/20\/building-new-wave-science-fiction-corpus\/\">&#8220;Building a New Wave Science Fiction Corpus,&#8221;<\/a> and <a href=\"https:\/\/sites.temple.edu\/tudsc\/2018\/04\/26\/modeling-the-new-wave-on-learning-to-use-machines-to-read-sci-fi-lit\/\">&#8220;Modeling the New Wave: On Learning to Use Machines to Read Sci-Fi Lit,&#8221;<\/a> the SF Digitization project consisted of digitizing and analyzing works of New Wave science fiction to explore how these literary works reflect the largely 20th century concern that humanity\u2019s missteps will culminate in an environmental apocalypse.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"360\" height=\"640\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/clifibooks.jpg\" alt=\"\" class=\"wp-image-8902\" style=\"width:314px;height:auto\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/clifibooks.jpg 360w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/clifibooks-169x300.jpg 169w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/clifibooks-300x533.jpg 300w\" sizes=\"auto, (max-width: 360px) 100vw, 360px\" \/><figcaption class=\"wp-element-caption\">Some of the book covers from the disbound New Wave Cli-Fi corpus.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>To analyze our New Wave corpus, I joined fellow PhD student Megan Kane, and undergraduate student Asher Riley to pick up where previous students had left off in the digitization and curation process, with a focus on climate-themed science fiction from the Paskow Science Fiction Collection. Temple University librarians had already disbanded, scanned, and stored the texts in a secure server, so it was our job to use an optical character recognition (OCR) software called <a href=\"https:\/\/pdf.abbyy.com\/\">ABBYY FineReader<\/a> to \u201cclean\u201d the texts.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"652\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/10_finereader_16_advanced_document_conversion-1024x652-1.png\" alt=\"\" class=\"wp-image-8903\" style=\"width:656px;height:auto\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/10_finereader_16_advanced_document_conversion-1024x652-1.png 1024w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/10_finereader_16_advanced_document_conversion-1024x652-1-300x191.png 300w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/10_finereader_16_advanced_document_conversion-1024x652-1-768x489.png 768w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/10_finereader_16_advanced_document_conversion-1024x652-1-850x541.png 850w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Promotional screenshot of <a href=\"https:\/\/pdf.abbyy.com\/finereader-pdf\/\" data-type=\"link\" data-id=\"https:\/\/pdf.abbyy.com\/finereader-pdf\/\">ABBYY <\/a>FineReader screen.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>The process of improving the OCR output required removing any \u201cunnecessary\u201d information, such as page numbers, the author\u2019s name, etc., as well as correcting the scans to ensure that ABBYY had \u201cread\u201d the books correctly and <a href=\"https:\/\/github.com\/SF-Nexus\/OCR\/blob\/main\/abbyy.md\">produced accurate text files<\/a>.&nbsp;<\/p>\n\n\n\n<p>Once we cleaned and corrected the texts, we were able to transform them into data that we could <a href=\"https:\/\/sfnexus.io\/\">explore and share with scholars both at Temple University and beyond.<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Old Methods, New Texts: The Banned Books Project<\/h3>\n\n\n\n<p>This process of digitizing, correcting, and analyzing literary texts en masse prepared me for the second text analysis project developed out of the Scholars Studio, the <a href=\"https:\/\/representationlab.github.io\/\">Banned Books Project<\/a> in 2023-2024. Funded by a Mellon grant and led by Dr. Wermer-Colan and the English Department\u2019s <a href=\"https:\/\/liberalarts.temple.edu\/about\/faculty-staff\/laura-mcgrath\">Dr. Laura McGrath<\/a>, we formed a team of undergraduate and graduate students that worked to curate a Banned Books corpus based off of <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1eU3rCvzLjBwnVpph_Svs8MFmnp9EH8RG_72UofANVeM\/edit?usp=sharing\">PEN America\u2019s research<\/a> on which texts have been and continue to be banned in schools and libraries across the United States. <\/p>\n\n\n\n<p>PEN America is a nonprofit organization that advocates for rights-related issues within the literary world. The banned books they identified in their<a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1eU3rCvzLjBwnVpph_Svs8MFmnp9EH8RG_72UofANVeM\/edit?usp=sharing\"> Index of School Book Bans from July 2022-December 2022<\/a> range from children\u2019s picture books that depict diverse cultural identities to Young Adult novels that grapple with topics related to sexual exploration, racism, domestic violence, or police brutality. Specific examples include Saadia Faruqi\u2019s <em>Give it a Try, Yasmin <\/em>(2021), part of a series that features a Pakistani American second-grader and her multi-generational family, as well as Benjamin Alire S\u00e1enz\u2019s <em>Aristotle and Dante Discover the Secrets of the Universe <\/em>(2012) and Angie Thomas\u2019s <em>The Hate U Give<\/em> (2017).<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-2 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"777\" height=\"1024\" data-id=\"8901\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/giveitatryyasmin-777x1024-1.jpg\" alt=\"\" class=\"wp-image-8901\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/giveitatryyasmin-777x1024-1.jpg 777w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/giveitatryyasmin-777x1024-1-228x300.jpg 228w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/giveitatryyasmin-777x1024-1-768x1012.jpg 768w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/giveitatryyasmin-777x1024-1-300x395.jpg 300w\" sizes=\"auto, (max-width: 777px) 100vw, 777px\" \/><figcaption class=\"wp-element-caption\"><strong>Saadia Faruqi&#8217;s <em>Give it a Try, Yasmin!<\/em> (2021)<\/strong><\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"674\" height=\"1000\" data-id=\"8900\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/thehateugive.jpg\" alt=\"\" class=\"wp-image-8900\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/thehateugive.jpg 674w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/thehateugive-202x300.jpg 202w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/thehateugive-300x445.jpg 300w\" sizes=\"auto, (max-width: 674px) 100vw, 674px\" \/><figcaption class=\"wp-element-caption\"><strong>Angie Thomas&#8217;s <em>The Hate U Give<\/em> (2017)<\/strong><\/figcaption><\/figure>\n<\/figure>\n\n\n\n<p>PEN America\u2019s research data (linked above) consisted of the titles of these books and relevant information regarding their bans, such as the location of the school district or the type of ban (being banned from a class vs. being banned from a library, for example). With this dataset, we began the process of selecting and purchasing books to build our corpus. We bought hundreds of physical books for the library\u2019s digitization team to break down into pages that could be scanned. We also purchased&nbsp;digital books through <a href=\"https:\/\/www.kobo.com\/us\/en\">Kobo<\/a> that could be transformed into machine-readable text files in accordance with a recent exemption to the <a href=\"https:\/\/www.copyright.gov\/dmca\/\">Digital Millennium Copyright Act<\/a> enabling scholars to transform eBooks for the purposes of conducting non-consumptive research on copyrighted literature.&nbsp;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"1024\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/BoxofBookSpines-768x1024-1.jpg\" alt=\"\" class=\"wp-image-8904\" style=\"width:506px;height:auto\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/BoxofBookSpines-768x1024-1.jpg 768w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/BoxofBookSpines-768x1024-1-225x300.jpg 225w, https:\/\/sites.temple.edu\/tudsc\/files\/2024\/04\/BoxofBookSpines-768x1024-1-300x400.jpg 300w\" sizes=\"auto, (max-width: 768px) 100vw, 768px\" \/><figcaption class=\"wp-element-caption\">Box of book spines from the disbound Banned Book corpus.<\/figcaption><\/figure>\n<\/div>\n\n\n<p>As we built our corpus, undergraduate researchers Kriti Baru, Abigail Corcelli, and Sydney Grimm became interested in the appearances of the books that were being repeatedly banned. Our research question honed in on what can be gleaned from these texts if we take on the role of someone attempting to judge the books by their covers alone. If we consider the appearances of the books being repeatedly banned, what might be revealed? <\/p>\n\n\n\n<p>Like I explained in my introduction, you can hear more about the specific methodology for this part of our project in recent graduate Sydney Grimm&#8217;s blog post &#8220;<a href=\"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/01\/how-to-judge-a-banned-book-by-its-cover-part-2-of-the-banned-books-project\/\">How to Judge a (Banned) Book by Its Cover: Part 2 of the Banned Books Project<\/a>&#8221; about creating a banned book cover methodology, and read about our initial results and what happens when you judge a book by its cover in Abigail Corcelli&#8217;s post &#8220;<a href=\"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/01\/pulling-back-the-curtain-on-censorship-part-3-of-the-banned-books-project\/\" data-type=\"link\" data-id=\"https:\/\/sites.temple.edu\/tudsc\/2024\/05\/01\/pulling-back-the-curtain-on-censorship-part-3-of-the-banned-books-project\/\">Pulling Back the Curtain on Censorship: Part 3 of The Banned Books Project<\/a>.&#8221;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conclusions<\/h3>\n\n\n\n<p>To wrap things up, these distant reading projects at the Loretta C. Duckworth Scholars Studio have allowed us to interrogate two drastically different &#8220;genres&#8221; through digital methods: first the Cli-Fi corpus and now the Banned Book corpus. We have not only come to better understand literature and the way it both reflects and acts upon society, but have prepared a valuable dataset for scholars and students alike to interpret the presence of forces actively working to silence marginalized voices and limit representations of cultural, racial, or gender diversity.&nbsp;<\/p>\n\n\n\n<p>We are currently readying our data for future publication, so<strong> <\/strong>please contact the Scholars Studio if you would like to utilize our findings for your own research. It is our hope that future scholars can utilize our datasets for exploring questions such as if there is specific language that correlates with books being banned, or how these banned books might compare with a more general corpus.<\/p>\n\n\n\n<p>By exploring these texts and interrogating the banned book phenomenon in the U.S., we can intelligently advocate for the value in and importance of telling these stories, and let those who may support the totalitarian<strong> <\/strong>banning of books know that we are &#8220;with the banned.&#8221;<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><img decoding=\"async\" src=\"https:\/\/i.etsystatic.com\/36851187\/r\/il\/e85bf6\/4737027227\/il_1588xN.4737027227_iq0g.jpg\" style=\"width:493px;height:auto\" \/><figcaption class=\"wp-element-caption\">&#8220;I&#8217;m with the Banned&#8221; sticker from <a href=\"https:\/\/www.etsy.com\/listing\/1277814002\/im-with-the-banned-sticker-banned-books\">StickerConnectionCO<\/a> on Etsy.<\/figcaption><\/figure>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\"><strong>References<\/strong><\/h3>\n\n\n\n<p>D&#8217;Ignazio, C. and L. F. Klein (2020). <em>Data Feminism<\/em>. MIT Press. <a href=\"https:\/\/data-feminism.mitpress.mit.edu\/\">https:\/\/data-feminism.mitpress.mit.edu\/<\/a>&nbsp;<\/p>\n\n\n\n<p>Klein, L.F. (2018). \u201cDistant Reading After Moretti.\u201d <em>Stanford Humanities Center<\/em>. <a href=\"https:\/\/shc.stanford.edu\/arcade\/interventions\/distant-reading-after-moretti\">https:\/\/shc.stanford.edu\/arcade\/interventions\/distant-reading-after-moretti<\/a>.&nbsp;<\/p>\n\n\n\n<p>Moretti, F. (2000). Conjectures on World Literature. <em>New Left Review<\/em>, <em>1<\/em>, 54. <a href=\"http:\/\/libproxy.temple.edu\/login?url=https:\/\/www.proquest.com\/scholarly-journals\/conjectures-on-world-literature\/docview\/1301903612\/se-2\">http:\/\/libproxy.temple.edu\/login?url=https:\/\/www.proquest.com\/scholarly-journals\/conjectures-on-world-literature\/docview\/1301903612\/se-2<\/a>&nbsp;<\/p>\n\n\n\n<p>Rambsy, K. (2016). Text-Mining Short Fiction by Zora Neale Hurston and Richard Wright using Voyant Tools. <em>CLA Journal, 59<\/em>(3), 251\u2013258. <a href=\"http:\/\/www.jstor.org\/stable\/44325917\">http:\/\/www.jstor.org\/stable\/44325917<\/a>&nbsp;<\/p>\n\n\n\n<p>Underwood, T. (2019). <em>Distant horizons: Digital evidence and literary change<\/em>. University of Chicago Press. <a href=\"https:\/\/ebookcentral.proquest.com\/lib\/templeuniv-ebooks\/detail.action?docID=5524170#\">https:\/\/ebookcentral.proquest.com\/lib\/templeuniv-ebooks\/detail.action?docID=5524170#<\/a>&nbsp;<\/p>\n\n\n\n<p>Wermer-Colan, Alex. (2017). Building a New Wave Science Fiction Corpus. <em>Scholars Studio Blog<\/em>. <a href=\"https:\/\/sites.temple.edu\/tudsc\/2017\/12\/20\/building-new-wave-science-fiction-corpus\/\">https:\/\/sites.temple.edu\/tudsc\/2017\/12\/20\/building-new-wave-science-fiction-corpus\/<\/a>&nbsp;<\/p>\n\n\n\n<p>Wermer-Colan, Alex. (2018). Modeling the New Wave: On Learning to Use Machines to Read Sci-Fi Lit. <em>Scholars Studio Blog<\/em>. <a href=\"https:\/\/sites.temple.edu\/tudsc\/2018\/04\/26\/modeling-the-new-wave-on-learning-to-use-machines-to-read-sci-fi-lit\/\">https:\/\/sites.temple.edu\/tudsc\/2018\/04\/26\/modeling-the-new-wave-on-learning-to-use-machines-to-read-sci-fi-lit\/<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By SaraGrace Stefan<\/p>\n","protected":false},"author":34028,"featured_media":8897,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[286,405,2,288],"tags":[475,329,6],"class_list":["post-8749","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cultural-studies","category-digital-humanities","category-grad-students","category-literary-studies","tag-banned-books","tag-text-analysis","tag-top-news"],"_links":{"self":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/8749","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/users\/34028"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/comments?post=8749"}],"version-history":[{"count":18,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/8749\/revisions"}],"predecessor-version":[{"id":9334,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/8749\/revisions\/9334"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media\/8897"}],"wp:attachment":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media?parent=8749"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/categories?post=8749"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/tags?post=8749"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}