Skip to content

Loretta C. Duckworth Scholars Studio

⠀

Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
  • Current Staff
  • Current Fellows
    • Faculty Fellowships
    • Graduate Extern Program
  • About
  • Newsletter
Menu

Scraping for Studying Online Political Discussion

Posted on September 20, 2016January 25, 2021 by Luling Huang

By Luling Huang

My first blog (1) briefly introduces my research plan at the DSC and (2) describes the data I am gathering.

What does the project look like?
I am studying the relationship between conversational structure and belief or attitude change in political discussion. For example, in a presidential election, are Candidate A’s supporters more likely to reply when they are addressed by Candidate B’s supporters? Or when they are addressed by someone whose political leaning reflects their own? My goal is to capture the dynamics of belief or attitude change during naturally occurring social interaction. I think online political discussion forums are ideal for my research questions. These forums allow researchers to capture naturally occurring social interaction and to apply computational methods to analyze a large amount of textual data.

Methodologically, there are two main components. First, conversational structure is operationalized as a stream of “participation shifts” (Gibson, 2003; 2005). A participation shift tells us “who addresses whom” in any two adjacent turns in group conversation. A shift is denoted as [initial speaker][initial addressee][hyphen][following speaker][following addressee]. Gibson (2003, p. 1342) classified thirteen categories. Here are three examples from his inventory:

  • AB-BA denotes the situation when the initial addressee B replies to the initial speaker A in a following turn.
  • AB-XB denotes the situation when a third individual X addresses the initial addressee B.
  • A0-XA denotes the situation when the initial addressee is the group (denoted by “0”), and one of the group members X replies to A.

We can do two things with participation shifts. (1) Group members can be differentiated based on how often they are involved in certain categories of shifts. Then we can look at how social ties, or the beliefs of members (see below) may affect differentiation. (2) A relational event model can be built by organizing participation shifts sequentially (Butts, 2008; Liang, 2014). Analysis can be done by investigating whether a set of exogenous and endogenous variables (e.g., political lean, popularity of post authors, opinion congruity, past posting behavior, etc.) have effects on the relational event streams.

Second, and what my study contributes to previous studies, is to add another “layer” to participation shifts and the relational event model just described. Attitude or belief change may occur during interaction just like participation shifts do. Therefore, if social ties among individuals can be related to participation shift (Gibson, 2005), why not attitudes or beliefs? If, for each turn, attitudes or beliefs can be measured from what individuals actually say, rather than from post- or pre-discussion survey, we can build a stream of attitude or belief shifts. The challenge for me here is to use computational methods to measure psychological states from a large amount of textual data. I will be learning how to apply opinion mining to extract attitudes or beliefs from texts. If you’re interested in my research topics or an expert on opinion mining, I’m more than happy to have a discussion and take advice.

What does the data look like?
My data is collected from debatepolitics.com. The forum is a popular website for political discussion in the U.S. It promotes a nonpartisan and civilized environment with moderators. Anyone can register as a user with an email account. Just like most forums, debatepolitics.com is asynchronous, which means that conversation occurs without requiring users being on the website at the same time. Also, I narrow down my focus on the subforum of the 2016 presidential election.

I’m collecting information of all threads under the subforum, including thread title and thread author. For each thread, I collect information of all posts, including post author, post author’s ideological lean, gender, post author’s total number of post in the forum, and post date and time. For post content, I collect the entire message, as well as any quotes contained and those quotes’ original authors. These latter information allows me to to construct participation shifts post by post.

How am I getting the data?
Considering my limited programming skills and the fairly neat structure of the forum website, I used the free Chrome extension Web Scraper to get my data. The extension uses CSS selectors to locate the type of data we want to scrape. Basically users just need to click the parts they want on a website, and Web Scraper will make its best guess at the corresponding CSS selector.

In my next blog post, I will describe how I use the Chrome extension Web Scraper to collect the data and how I proceed to clean it.

 

 

References

Butts, C. T. (2008). A relational event framework for social action. Sociological Methodology, 38, 155-200. doi:10.1111/j.1467-9531.2008.00203.x

Gibson, D. R. (2003). Participation shifts: Order and differentiation in group conversation. Social Forces, 81, 1335-1380. doi:10.1353/sof.2003.0055

Gibson, D. R. (2005). Taking turns and talking ties: Networks and conversational interaction. American Journal of Sociology, 110, 1561-1597. doi:10.1086/428689

Liang, H. (2014). The organizational principles of online political discussion: A relational event stream model for analysis of web forum deliberation. Human Communication Research, 40, 483-507. doi:10.1111/hcre.12034

Leave a Reply

You must be logged in to post a comment.

Recent Posts

  • Web Scraping Wikipedia to Analyze XBOX Game Development Companies by Nationality January 4, 2023
  • Critical Elements for Making Games December 22, 2022
  • Cities as Havens for Bees: Using Remote Sensing to Visualize Urban Bee Habitat December 21, 2022

Tags

3D modeling 3D printing 360 video arduino augmented reality authorship attribution coding corpus building critical making Cultural Heritage data cleaning data visualization digital art history Digital Preservation digital reconstruction digital scholarship film editing games GIS linked open data machine learning makerspace mapping network analysis oculus rift omeka OpenRefine Photogrammetry physical computing Python QGIS R SketchUp stylometry text analysis text mining textual analysis top news twitter video analysis virtual reality visual analysis voyant web scraping YouTube

Archives

©2023 Loretta C. Duckworth Scholars Studio | Design: Newspaperly WordPress Theme