Skip to content

Loretta C. Duckworth Scholars Studio

⠀

Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
    • Cultural Analytics Practicum Blogposts
  • Current Staff
  • Newsletter
  • About
    • Games Group 
Menu

Measuring Impact of Built Environment on Health Part IV: Data Analysis

Posted on April 22, 2020January 25, 2021 by Huilin Zhu

By Huilin Zhu

Introduction 

My project aims to study the relationship between built environments and health outcomes, specifically obesity prevalence in Pennsylvania. My previous blog introduced how I use transfer learning methods to extract the features of the built environment from Google satellite images. In this post, I will focus on the data analysis part to investigate the impact of the built environment on overweight. 

 

Statistical Analysis 

Using VGG as the pre-trained model, 4096 variables are extracted from the satellite images to represent the built environment in each census tract. These variables do not have a specific meaning, but they can be regarded as the indicator of the built environment, including color, gradient, edge, height, length, etc.  Since the data contains a large number of features (n=4,096), I use an elastic net algorithm in the data analysis stage. Elastic Net is a regularized regression method involving eliminating insignificant variables and preserving significant and correlated variables. It’s especially powerful when applied to very large data where the number of variables might be in the thousands or even millions.

My project aims to investigate how people’s body-weights can be affected by the built environment. Adult’s obesity prevalence is chosen as a dependent variable. The obesity data comes from the 500 cities project. The independent variables would be the built environment, which is represented by the 4096 variables drawn from CNN. Each census tract is regarded as one observation. I combined these variables with heath variables to check the association between the built environment and overweight. The following table is the merged data. 

Using the scikit-learn package in python, I run elastic net regression and get the coefficients of each feature variable, only 58 coefficients are significant, which means 58 variables have the feature that is related to the obesity percentage.  The following image shows the value of coefficients of each independent variable.

The figure of elastic net coefficients

Predictions 

In order to evaluate how well the model predicts obesity prevalence across all census tracts, I split the data into two sets – a random sample representing fifty percent of the data for fitting and the remaining fifty percent for model evaluation. The model coefficient of determination R2 is 0.25. R2 is interpreted as the proportion of the variance in the dependent variable that is predictable from the independent variable. A value of 1.0 indicates a perfect fit.  In this work, the value of R2 indicates the built environment variables can explain 25 percent of between–census tract variance of obesity.

I generated a choropleth map to compare the predicted obesity prevalence with the actual obesity prevalence: 

The first image represents the cross-validated estimates of obesity prevalence based on features of the built environment extracted from satellite images; the second image represents the actual obesity prevalence. These two images have some similarities, but the similarity level is not very high. This indicates that the model can explain obesity prevalence in some sense, but the prediction is not quite well.

Currently, I only include the census tracts in Allentown. Allentown has 26 census tracts, so there are only 26 observations in this data sample. This will cause an overfitting problem, making results unreliable.  To prevent overfitting, I need to limit the number of selected features to be less than or equal to the number of census tracts. I will include more cities in Pennsylvania to the sample, and use the same method to test the relationship between the built environment and body weight. If I get a high value of R2, as well as a high similarity between the predicted obesity percentage and actual obesity prevalence, then I can conclude that the built environment is correlated with obesity prevalence across neighborhoods.

 

Future work

In the future, I will do data analysis incorporating control variables: gender, race, median household income, and percentage of households under the poverty line. Also, I will gather the data of Google places of interest to investigate how the built environment will affect body weight through a food access channel.

My broader dissertation research explores how social factors affect health outcomes, particularly how gender, government policy, and urban space affect health well-being. My first dissertation chapter looks at how maternity leave affects children’s health in urban China, and my second chapter discusses the impact of maternity leave on mothers’ labor outcomes after childbirth. This digital project will be the third  chapter of my dissertation.

In the process of working on this digital project, I encountered many technical problems such as downloading Google tile images, implementing convolution neural networking, and generating results in a map. I’ve gained a lot of experience in Python coding and become more familiar with the area of image analysis and CNN. All these will help me in my future studies. I really appreciate the help I’ve received from the Scholars Studio.

 

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Tags

3D modeling 3D printing arduino augmented reality banned books coding corpus building critical making Cultural Heritage data cleaning data visualization Digital Preservation digital reconstruction digital scholarship film editing game design games gephi human subject research linked open data machine learning makerspace makerspace residency mapping network analysis oculus rift omeka OpenRefine Photogrammetry Python QGIS R SketchUp stylometry text analysis text mining textual analysis top news twitter video analysis virtual reality visual analysis voyant web scraping webscraping

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025

Archives

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Archives

Blog Tags

3D modeling (11) 3D printing (14) arduino (8) augmented reality (5) banned books (3) coding (12) corpus building (4) critical making (7) Cultural Heritage (11) data cleaning (4) data visualization (11) Digital Preservation (3) digital reconstruction (9) digital scholarship (12) film editing (3) game design (3) games (6) gephi (3) human subject research (3) linked open data (4) machine learning (6) makerspace (8) makerspace residency (4) mapping (30) network analysis (17) oculus rift (8) omeka (3) OpenRefine (4) Photogrammetry (5) Python (8) QGIS (10) R (9) SketchUp (4) stylometry (8) text analysis (10) text mining (4) textual analysis (32) top news (102) twitter (5) video analysis (4) virtual reality (17) visual analysis (5) voyant (4) web scraping (16) webscraping (3)

Recent Posts

  • The Untold History of Fletcher Street’s Stables April 21, 2025
  • Building an Immersive Archive of the Greek Orthodox Churches in Istanbul April 15, 2025
  • Tracing Influence in Genealogies of Communication Theory April 14, 2025
  • From Theory to Practice: Weaving in Response to the Grid in the Global Context March 26, 2025
  • Visiting a Land of Twilight February 24, 2025

Archives

©2025 Loretta C. Duckworth Scholars Studio | Design: Newspaperly WordPress Theme
Menu
  • Scholars Studio Blog
    • Digital Methods
      • coding
      • critical making
      • data visualization
      • digital pedagogy
      • immersive technology (AR/VR)
      • mapping
      • textual analysis
      • web scraping
    • Disciplinary Fields
      • Anthropology
      • Archaeology
      • Architecture
      • Art History
      • Business
      • Computer Science
      • Critical Digital Studies
      • Cultural Studies
      • Dance
      • Economics
      • Education
      • Environmental Studies
      • Film Studies
      • Gaming Studies
      • Geography
      • History
      • Information Science
      • Linguistics
      • Literary Studies
      • Marketing
      • Media and Communication Studies
      • Music Studies
      • Political Science
      • Psychology
      • Public Health
      • Sculpture
      • Sociology
      • Urban Studies
      • Visual Art
    • Cultural Analytics Practicum Blogposts
  • Current Staff
  • Newsletter
  • About
    • Games Group