By Crys Pikarski
Introduction
Music as an art form is one of many layers, each encompassing an entirely different aspect of creativity to be shaped into one final product. While listening to a song, all these parts combine to give you the listening experience you are meant to have. As someone who has dabbled in poetry and creative writing myself, though, lyrics have always been the aspect that most fascinated and connected to me. With this interest in mind, I decided to tackle song lyrics as a form of poetry in order to perform textual analysis using computational means.
For this project, I wanted to take a musician’s entire discography worth of lyrics and analyze overarching themes on a song level as well as album level as a way to understand changes in the artist’s writing style and thematic priorities. The artist I selected for this is Linkin Park, who I chose for a variety of reasons:
- the band is notorious for having drastic musical changes from album-to-album particularly in the latter half of their discography
- their discography is finite and relatively small compared to other bands with 7 studio albums to their name
- I have my own partial assumptions based on consuming their music for almost 2 decades now.
Linkin Park’s discography of 7 studio albums has a wide range of critical and commercial reception, with their first 2 albums being critically acclaimed and labeled as ‘must-listens’ by many music reviewers. Their musical style from album-to-album shifted greatly, leading to listeners and reviewers alike having great nostalgia for the nu-metal roots and less care towards the more pop-driven melodies of their later albums. Metal Injection, upon the sudden death of lead singer Chester Bennington in 2017, did a detailed retrospective article about these musical shifts which can be found here. For a computational analysis deciphering the musical aspects Linkin Park’s discography, I recommend Karanveer Singh’s project on the topic, as my project is exlcusively focused on lyrical content.
My hypothesis coming into this project was that many of the lyrical themes in Linkin Park’s songs have remained rather consistent, leading towards the change in sound being the actual core reason for a critical ‘fallout’ rather than their writing style becoming more generic and pop-friendly. These common themes, from my perspective as a casual listener prior to this project, were that of depression, angst, and emotional/mental health.
Building and Cleaning the Corpus
For this textual analysis, I curated the lyrical content of all 7 main-line albums of Linkin Park’s discography. To do this, I used the well-known lyric website Genius and, by hand, copied the lyrics of every song on the following albums excluding bonus tracks and instrumentals:
- Hybrid Theory (2000)
- Meteora (2003)
- Minutes to Midnight (2007)
- A Thousand Suns (2010)
- LIVING THINGS (2012)
- The Hunting Party (2014)
- One More Light (2017)
From these albums, I came out with 81 different text files for individual tracks. I then combined these tracks into 7 different album files, leading to 88 total ‘raw’ text files. I then chose to utilize Voyant Tools for this project due to its easily digestible visuals and flexibility in textual analysis methods as well as my personal experience using it for prior coursework. However, the option to exclude stop words within Voyant has not been working for the duration of this project, leading to the textual analysis being cluttered with words with little-to-no analytical value. To fix this issue, I went through a customized text cleaning process through Python code in Google Colab. I also realized while doing this that making a customized stop word list was essential due to songs often having ‘noises’ contained within the lyrics (oooh and ahh, for example) which are not worth keeping in a textual analysis.
For this textual analysis, I broke the entire lyrical discography of Linkin Park’s main 7 albums into 9 corpora:
- every mainline song on all 7 albums as individual text files
- each album’s songs combined into an album text file, leading to 7 album-based text files
- every song from each album in an album-specific corpus (Hybrid Theory tracks all in one corpus, Meteora tracks in a Meteora corpus, etc)
The customized stop word list, the Google Colab code I personally used to clean the ‘raw’ text files, and all 176 text files (88 raw and 88 clean) can be accessed on my Github repository alongside the document containing all 9 Voyant Tools links for the current corpora.
Text Analysis
For the purposes of this short blog post about my project, I am going to focus on the combined albums corpus as it is the easiest to digest visually and also effectively shows the album-based thematic developments. As a starting point, I used the Topics Tool within Voyant to see what textual commonalities are observable on a surface level. This tool uses the Latent Dirichlet Allocation (LDA) technique to create ‘topics’ out of the corpus within the user-set parameters of the amount of topics and terms per topic. Topic modeling is generally better for larger corpora like analyzing dozens of texts, but it is a type of text analysis I have previous academic experience in and, for this project in particular, showed some interesting results.
While at first glance it may seem the topic development is largely just broken up into album-specific topics, Topic #2 (represented as the deeper green in these visuals) is very equally prevalent amongst all 7 albums. The range for this topic’s prevalence in each topic goes from 12.4% (Hybrid Theory) to The terms the topic modeling selected for this topic were “know say let got time wanna cause way gone right want fall lost away”. Out of these terms, I think the most interesting words that feel thematically relevant to understanding Linkin Park’s overarching lyrical styles are:
- gone
- fall
- lost
- away
These four words seem to correlate with the theme of loss, which can draw upon the themes I highlighted witnessing in Linkin Park’s music earlier (depression, angst, and emotional/mental health).
From here, I isolated 3 of these 4 terms (‘away’ is in multiple song titles and has way more prevalence than the other three in the corpus, making visualizations hard to decipher) and checked out how they persist using the Trends Tool in Voyant.
From this graph, it can be noted that while this topic involving these words themed around loss is heavily represented in all 7 albums, the specific word choice to show said theme is changed from album to album. This coincides rather well with my hypothesis of the themes not changing drastically from album-to-album, but also brings into the equation how word choice affects the portrayal of said themes.
Next Steps
While this is just the start of a project I plan on developing further, these initial results feel very promising for the purpose of potential evidence that lyrical themes are similar throughout the majority of Linkin Park’s discography. Currently, the next steps I have in mind for further expansion of this project are:
- looking through all 9 corpora and further tuning the stop words list to achieve effective results free of clutter
- doing in-depth analysis of lyrical themes for each album corpus individually
- Incorporate the metadata I collected for the project into further analysis, most notably if the lyrical content shifts greatly depending on who is providing vocals on the track
All these next steps and more will be updated over time on my Github repository dedicated to this project.
References
Google Colab. (n.d.). Google. Retrieved May 9, 2024, from https://colab.research.google.com/
Linkin park. (n.d.). Genius. Retrieved May 9, 2024, from https://genius.com/artists/Linkin-park
Sinclair, S., & Rockwell, G. (2016). Voyant Tools. Voyant Tools. https://voyant-tools.org
Singh, K. (2018). Analyzing the evolution of Linkin Park’s music over the years . https://kvsingh.github.io/lp-music.html
The impact and legacy of linkin park’s work. (2017, July 25). Metal Injection. https://metalinjection.net/editorials/the-impact-and-legacy-of-linkin-parks-work