Skip to content

Loretta C. Duckworth Scholars Studio

⠀

Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
    • Cultural Analytics Practicum Blogposts
  • Current Staff
  • Newsletter
  • About
    • Games Group 
Menu
Photo of the members of Linkin Park

Textual Analysis of Linkin Park’s Discography with Voyant Tools

Posted on May 9, 2024May 10, 2024 by Crys Pikarski

By Crys Pikarski

Introduction

Music as an art form is one of many layers, each encompassing an entirely different aspect of creativity to be shaped into one final product. While listening to a song, all these parts combine to give you the listening experience you are meant to have. As someone who has dabbled in poetry and creative writing myself, though, lyrics have always been the aspect that most fascinated and connected to me. With this interest in mind, I decided to tackle song lyrics as a form of poetry in order to perform textual analysis using computational means.

For this project, I wanted to take a musician’s entire discography worth of lyrics and analyze overarching themes on a song level as well as album level as a way to understand changes in the artist’s writing style and thematic priorities. The artist I selected for this is Linkin Park, who I chose for a variety of reasons:

  •  the band is notorious for having drastic musical changes from album-to-album particularly in the latter half of their discography
  • their discography is finite and relatively small compared to other bands with 7 studio albums to their name
  • I have my own partial assumptions based on consuming their music for almost 2 decades now.

Linkin Park’s discography of 7 studio albums has a wide range of critical and commercial reception, with their first 2 albums being critically acclaimed and labeled as ‘must-listens’ by many music reviewers. Their musical style from album-to-album shifted greatly, leading to listeners and reviewers alike having great nostalgia for the nu-metal roots and less care towards the more pop-driven melodies of their later albums. Metal Injection, upon the sudden death of lead singer Chester Bennington in 2017, did a detailed retrospective article about these musical shifts which can be found here. For a computational analysis deciphering the musical aspects Linkin Park’s discography, I recommend Karanveer Singh’s project on the topic, as my project is exlcusively focused on lyrical content.

My hypothesis coming into this project was that many of the lyrical themes in Linkin Park’s songs have remained rather consistent, leading towards the change in sound being the actual core reason for a critical ‘fallout’ rather than their writing style becoming more generic and pop-friendly. These common themes, from my perspective as a casual listener prior to this project, were that of depression, angst, and emotional/mental health.

Building and Cleaning the Corpus

 For this textual analysis, I curated the lyrical content of all 7 main-line albums of Linkin Park’s discography. To do this, I used the well-known lyric website Genius and, by hand, copied the lyrics of every song on the following albums excluding bonus tracks and instrumentals:

  • Hybrid Theory (2000)
  • Meteora (2003)
  • Minutes to Midnight (2007)
  • A Thousand Suns (2010)
  • LIVING THINGS (2012)
  • The Hunting Party (2014)
  • One More Light (2017)

From these albums, I came out with 81 different text files for individual tracks. I then combined these tracks into 7 different album files, leading to 88 total ‘raw’ text files. I then chose to utilize Voyant Tools for this project due to its easily digestible visuals and flexibility in textual analysis methods as well as my personal experience using it for prior coursework. However, the option to exclude stop words within Voyant has not been working for the duration of this project, leading to the textual analysis being cluttered with words with little-to-no analytical value. To fix this issue, I went through a customized text cleaning process through Python code in Google Colab. I also realized while doing this that making a customized stop word list was essential due to songs often having ‘noises’ contained within the lyrics (oooh and ahh, for example) which are not worth keeping in a textual analysis.

For this textual analysis, I broke the entire lyrical discography of Linkin Park’s main 7 albums into 9 corpora:

  • every mainline song on all 7 albums as individual text files
  • each album’s songs combined into an album text file, leading to 7 album-based text files
  • every song from each album in an album-specific corpus (Hybrid Theory tracks all in one corpus, Meteora tracks in a Meteora corpus, etc)

The customized stop word list, the Google Colab code I personally used to clean the ‘raw’ text files, and all 176 text files (88 raw and 88 clean) can be accessed on my Github repository alongside the document containing all 9 Voyant Tools links for the current corpora.

Text Analysis

For the purposes of this short blog post about my project, I am going to focus on the combined albums corpus as it is the easiest to digest visually and also effectively shows the album-based thematic developments. As a starting point, I used the Topics Tool within Voyant to see what textual commonalities are observable on a surface level. This tool uses the Latent Dirichlet Allocation (LDA) technique to create ‘topics’ out of the corpus within the user-set parameters of the amount of topics and terms per topic. Topic modeling is generally better for larger corpora like analyzing dozens of texts, but it is a type of text analysis I have previous academic experience in and, for this project in particular, showed some interesting results.

While at first glance it may seem the topic development  is largely just broken up into album-specific topics, Topic #2 (represented as the deeper green in these visuals) is very equally prevalent amongst all 7 albums. The range for this topic’s prevalence in each topic goes from 12.4% (Hybrid Theory) to  The terms the topic modeling selected for this topic were “know say let got time wanna cause way gone right want fall lost away”. Out of these terms, I think the most interesting words that feel thematically relevant to understanding Linkin Park’s overarching lyrical styles are:

  • gone
  • fall
  • lost
  • away

These four words seem to correlate with the theme of loss, which can draw upon the themes I highlighted witnessing in Linkin Park’s music earlier (depression, angst, and emotional/mental health). 

From here, I isolated 3 of these 4 terms (‘away’ is in multiple song titles and has way more prevalence than the other three in the corpus, making visualizations hard to decipher) and checked out how they persist using the Trends Tool in Voyant. 

From this graph, it can be noted that while this topic involving these words themed around loss is heavily represented in all 7 albums, the specific word choice to show said theme is changed from album to album. This coincides rather well with my hypothesis of the themes not changing drastically from album-to-album, but also brings into the equation how word choice affects the portrayal of said themes.

Next Steps

While this is just the start of a project I plan on developing further, these initial results feel very promising for the purpose of potential evidence that lyrical themes are similar throughout the majority of Linkin Park’s discography. Currently, the next steps I have in mind for further expansion of this project are:

  • looking through all 9 corpora and further tuning the stop words list to achieve effective results free of clutter
  • doing in-depth analysis of lyrical themes for each album corpus individually
  • Incorporate the metadata I collected for the project into further analysis, most notably if the lyrical content shifts greatly depending on who is providing vocals on the track

All these next steps and more will be updated over time on my Github repository dedicated to this project.

References

Google Colab. (n.d.). Google. Retrieved May 9, 2024, from https://colab.research.google.com/

Linkin park. (n.d.). Genius. Retrieved May 9, 2024, from https://genius.com/artists/Linkin-park

Sinclair, S., & Rockwell, G. (2016). Voyant Tools. Voyant Tools. https://voyant-tools.org

Singh, K. (2018). Analyzing the evolution of Linkin Park’s music over the years . https://kvsingh.github.io/lp-music.html

The impact and legacy of linkin park’s work. (2017, July 25). Metal Injection. https://metalinjection.net/editorials/the-impact-and-legacy-of-linkin-parks-work

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Tags

3D modeling 3D printing arduino augmented reality banned books coding corpus building critical making Cultural Heritage data cleaning data visualization Digital Preservation digital reconstruction digital scholarship film editing game design games gephi human subject research linked open data machine learning makerspace makerspace residency mapping network analysis oculus rift omeka OpenRefine Photogrammetry Python QGIS R SketchUp stylometry text analysis text mining textual analysis top news twitter video analysis virtual reality visual analysis voyant web scraping webscraping

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

Blog Tags

3D modeling (11) 3D printing (14) arduino (8) augmented reality (5) banned books (3) coding (12) corpus building (4) critical making (7) Cultural Heritage (11) data cleaning (4) data visualization (11) Digital Preservation (3) digital reconstruction (9) digital scholarship (12) film editing (3) game design (3) games (6) gephi (3) human subject research (3) linked open data (4) machine learning (6) makerspace (8) makerspace residency (4) mapping (30) network analysis (17) oculus rift (8) omeka (3) OpenRefine (4) Photogrammetry (5) Python (8) QGIS (10) R (9) SketchUp (4) stylometry (8) text analysis (10) text mining (4) textual analysis (32) top news (102) twitter (5) video analysis (4) virtual reality (17) visual analysis (5) voyant (4) web scraping (16) webscraping (3)

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025
  • From Theory to Practice: Weaving in Response to the Grid in the Global Context March 26, 2025
  • Visiting a Land of Twilight February 24, 2025

Archives

©2025 Loretta C. Duckworth Scholars Studio | Design: Newspaperly WordPress Theme