Skip to content

Loretta C. Duckworth Scholars Studio

⠀

Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
    • Cultural Analytics Practicum Blogposts
  • Current Staff
  • Newsletter
  • About
    • Games Group 
Menu

How to Scrape and Analyze YouTube Data: Prototyping a Digital Project on Immigration Discourse

Posted on December 12, 2018January 7, 2019 by Nicole Lemire Garlic

By Nicole Lemire Garlic

As the world’s most utilized platform for video sharing, YouTube houses a wealth of culturally-relevant data that researchers and academics are only beginning to explore. Over one billion users view and upload videos on the platform each month. Part of what makes YouTube an interesting platform to study is its multimodality and intertextuality—each page includes multiple forms of mediated communication that refer to one another. Algorithmically-chosen recommended videos, user comments, and advertising videos appear dynamically on the same screen as the video originally posted for sharing.

At the Digital Scholarship Center, a group of graduate students and a postdoc hailing from communication and media studies, political science, and English (Alex Wermer-Colan, Ania Korsunska, Caroline Tynan,  Jeff Antsen, Luling Huang, and I) have joined forces for a collaborative YouTube scraping project. Recent mediated images of the migrant caravan storming the U.S.-Mexico border led us to investigate national immigration and border discourses that circulate on the platform. We are exploring the ways in which content creators and users discuss immigration, asylum seekers, border security, and children held in immigration detention awaiting their, or their parents’, hearings. We’ve looked at various genres of videos, including news, advertisements, and music videos, with titles such as “Border patrol official: ‘Zero Tolerance’ Defense” and “Together is Beautiful.”

When approaching a project like this one, the sheer amount and variety of data produced on (and by) the platform can be overwhelming. Looking at the screenshot above for reference, there is the video news clip discussing the border crossing, the metadata about the video, the subscriber information about the YouTube channel on which the video is playing, the recommended videos highlighted by YouTube’s algorithms, and the advertisement in the upper-right hand corner of the screen. And, below the video, is a treasure trove of user comments.

Having worked with this patchwork of interwoven data for a few months, and having completed my own independent research analyzing race-related dialogue in YouTube comments discussing the 2018 Blank Panther film, I’ve culled some thoughts and tips on how to mine and analyze YouTube for all its worth.

Step One: Defining your Interest

When designing a YouTube project, a good place to start is Burgess & Green’s YouTube book for an overview of the platform’s history, how it works, and its cultural significance. From there, decide how you’d like to focus your study.

Here are some potential research foci, modes of analysis, and the relevant YouTube data. This chart is drawn primarily from the communication and media studies field, but English, political science, and other social sciences would find interesting questions to ask about YouTube as well.

Research Interest Modes of Analysis YouTube Data
Content
Video content Semiotic analysis

Quantitative content analysis

Qualitative textual analysis

Genre analysis

Discourse analysis

Videos (incl. likes/dislikes)
Comments content Quantitative content analysis

Qualitative textual analysis

Emoji analysis

Topic modeling

Word vectorization

Network analysis

Comments (incl. likes/dislikes)
Video recommendations Quantitative content analysis

Qualitative textual analysis

Network analysis

Recommended videos

Video networks

In-Video Advertisements Quantitative content analysis

Qualitative textual analysis

Network analysis

Comments

Video networks

Content creators Quantitative content analysis

Qualitative textual analysis

Network analysis

Video metadata

Subscription data

Discourse
How is an issue framed? Framing analysis Comments

Videos

Who is interacting with this content, and how?
  •  Comments
Conversation analysis

Computer-mediated discourse analysis (CMDA)

Network analysis

Comment metadata

Usernames

User icons

Uploads

User subscription data

Geolocation data (with content creator’s grant of access)

  •  Response Videos
Semiotic analysis

Qualitative content analysis

Network analysis

Videos

Video metadata

Video network

Step Two: How to Scrape YouTube

Once you’ve decided on your project’s focus, and what type of data you need to collect, the next step is to scrape. One of the best open-source and user-friendly tools I’ve found is YouTube Data Tools hosted by the University of Amsterdam’s Digital Methods Initiative. The scraper uses its credentials to access YouTube’s APIv3, saving you the step of registering for your own Google access token. With this YouTube scraper, you can pull user comments, metadata about a YouTube channel, and videos via keyword search. You can also create networks of users, videos, and recommended videos. To scrape other types of data, such as images, you would need a different tool.

In this brief tutorial, I will focus on scraping user comments with YouTube Data Tools. With the few clicks of a button, the software will scrape comments, emojis intact. All you need is the video ID—the last few characters of the YouTube site for that video (e.g., SNWic0kGH-E).

What the scraper outputs is a neatly organized spreadsheet of the scraped comments alongside the exact time the comment was made, user information, and information about replies. The spreadsheet can be opened in Google Sheets. Using this data, a simple sort on the “replyCount” column can extract threaded conversations in order to focus on dialogue. Or, the comments alone could be concatenated into one large text file for topic modeling or other corpus analytics.

As we at the DSC continue working on our YouTube scraping project, we will look for what comments and other forms of YouTube data tell us about the changing nature of discourse around immigration and the border wall. Through our analyses, we will explore the way the YouTube platform shapes and limits the range of cultural discourse around politically-charged topics.

 

 

 

1 thought on “How to Scrape and Analyze YouTube Data: Prototyping a Digital Project on Immigration Discourse”

  1. Emilia Jazz says:
    December 10, 2019 at 11:45 am

    The use of web scraping here helps to extract only those information which is relevant. For instance, scraping information about competitor’s pricing strategies, client reviews, feedbacks of clients about various topics such as new product launches and so much more. All these can conduct quality actions to serve such clients more easily and quickly.

Leave a Reply

You must be logged in to post a comment.

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Tags

3D modeling 3D printing arduino augmented reality banned books coding corpus building critical making Cultural Heritage data cleaning data visualization Digital Preservation digital reconstruction digital scholarship film editing game design games gephi human subject research linked open data machine learning makerspace makerspace residency mapping network analysis oculus rift omeka OpenRefine Photogrammetry Python QGIS R SketchUp stylometry text analysis text mining textual analysis top news twitter video analysis virtual reality visual analysis voyant web scraping webscraping

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

Blog Tags

3D modeling (11) 3D printing (14) arduino (8) augmented reality (5) banned books (3) coding (12) corpus building (4) critical making (7) Cultural Heritage (11) data cleaning (4) data visualization (11) Digital Preservation (3) digital reconstruction (9) digital scholarship (12) film editing (3) game design (3) games (6) gephi (3) human subject research (3) linked open data (4) machine learning (6) makerspace (8) makerspace residency (4) mapping (30) network analysis (17) oculus rift (8) omeka (3) OpenRefine (4) Photogrammetry (5) Python (8) QGIS (10) R (9) SketchUp (4) stylometry (8) text analysis (10) text mining (4) textual analysis (32) top news (102) twitter (5) video analysis (4) virtual reality (17) visual analysis (5) voyant (4) web scraping (16) webscraping (3)

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025
  • From Theory to Practice: Weaving in Response to the Grid in the Global Context March 26, 2025
  • Visiting a Land of Twilight February 24, 2025

Archives

©2025 Loretta C. Duckworth Scholars Studio | Design: Newspaperly WordPress Theme
Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
    • Cultural Analytics Practicum Blogposts
  • Current Staff
  • Newsletter
  • About
    • Games Group