

{"id":7173,"date":"2020-04-22T12:35:30","date_gmt":"2020-04-22T16:35:30","guid":{"rendered":"https:\/\/sites.temple.edu\/tudsc\/?p=7173"},"modified":"2021-01-25T11:32:35","modified_gmt":"2021-01-25T15:32:35","slug":"measuring-impact-of-built-environment-on-health-part-iv-data-analysis","status":"publish","type":"post","link":"https:\/\/sites.temple.edu\/tudsc\/2020\/04\/22\/measuring-impact-of-built-environment-on-health-part-iv-data-analysis\/","title":{"rendered":"Measuring Impact of Built Environment on Health Part IV: Data Analysis"},"content":{"rendered":"<p>By Huilin Zhu<\/p>\n<p><!--more--><\/p>\n<h2><b>Introduction\u00a0<\/b><\/h2>\n<p><span style=\"font-weight: 400\">My project aims to study the relationship between built environments and health outcomes, specifically obesity prevalence in Pennsylvania. <\/span><a href=\"https:\/\/sites.temple.edu\/tudsc\/2020\/03\/11\/coding-with-keras-for-transfer-learning-measuring-impact-of-built-environments-on-health-part-iii\/\"><span style=\"font-weight: 400\">My previous blog <\/span><\/a><span style=\"font-weight: 400\">introduced how I use transfer learning methods to extract the features of the built environment from Google satellite images. In this post, I will focus on the data analysis part to investigate the impact of the built environment on overweight.\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2><b>Statistical Analysis\u00a0<\/b><\/h2>\n<p><span style=\"font-weight: 400\">Using <a href=\"https:\/\/keras.io\/applications\/#vgg16\">VGG<\/a> as the pre-trained model, 4096 variables are extracted from the satellite images to represent the built environment in each census tract. These variables do not have a specific meaning, but they can be regarded as the indicator of the built environment, including color, gradient, edge, height, length, etc.\u00a0 Since the data contains a large number of features (n=4,096), I use an elastic net algorithm in the data analysis stage. Elastic Net is a regularized regression method involving eliminating insignificant variables and preserving significant and correlated variables. It&#8217;s especially powerful when applied to very large data where the number of variables might be in the thousands or even millions.<\/span><\/p>\n<p><span style=\"font-weight: 400\">My project aims to investigate how people\u2019s body-weights can be affected by the built environment. Adult\u2019s obesity prevalence is chosen as a dependent variable. The obesity data comes from the <a href=\"https:\/\/web.archive.org\/web\/20201111234858\/https:\/\/www.cdc.gov\/500cities\/index.htm\">500 cities project.<\/a><\/span><span style=\"font-weight: 400\"> The independent variables would be the built environment, which is represented by the 4096 variables drawn from CNN. Each census tract is regarded as one observation. I combined these variables with heath variables to check the association between the built environment and overweight. The following table is the merged data.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7174 size-large\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.22.03-AM-1024x325.png\" alt=\"\" width=\"640\" height=\"203\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.22.03-AM-1024x325.png 1024w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.22.03-AM-300x95.png 300w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.22.03-AM-768x243.png 768w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.22.03-AM-850x269.png 850w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.22.03-AM.png 1420w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Using the scikit-learn package in python, I run elastic net regression and get the coefficients of each feature variable, only 58 coefficients are significant, which means 58 variables have the feature that is related to the obesity percentage.\u00a0 The following image shows the value of coefficients of each independent variable.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7175 \" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.55.09-AM.png\" alt=\"The figure of elastic net coefficients\" width=\"588\" height=\"368\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.55.09-AM.png 824w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.55.09-AM-300x188.png 300w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-5.55.09-AM-768x481.png 768w\" sizes=\"auto, (max-width: 588px) 100vw, 588px\" \/><\/p>\n<h2><b>Predictions\u00a0<\/b><\/h2>\n<p><span style=\"font-weight: 400\">In order to evaluate how well the model predicts obesity prevalence across all census tracts, I split the data into two sets \u2013 a random sample representing fifty percent of the data for fitting and the remaining fifty percent for model evaluation. The model coefficient of determination R2<\/span><span style=\"font-weight: 400\">\u00a0is 0.25. <\/span><span style=\"font-weight: 400\">R<\/span><span style=\"font-weight: 400\">2<\/span><span style=\"font-weight: 400\"> is interpreted as the proportion of the variance in the dependent variable that is predictable from the independent variable. A value of 1.0 indicates a perfect fit.\u00a0 In this work, the value of <\/span><span style=\"font-weight: 400\">R<\/span><span style=\"font-weight: 400\">2<\/span><span style=\"font-weight: 400\"> indicates the built environment variables<\/span> <span style=\"font-weight: 400\">can explain 25 percent of between\u2013census tract variance of obesity. <\/span><\/p>\n<p><span style=\"font-weight: 400\">I generated a choropleth map to compare the predicted obesity prevalence with the actual obesity prevalence:\u00a0 <\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-large wp-image-7176 aligncenter\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.48.57-AM-1024x651.png\" alt=\"\" width=\"640\" height=\"407\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.48.57-AM-1024x651.png 1024w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.48.57-AM-300x191.png 300w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.48.57-AM-768x488.png 768w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.48.57-AM-850x540.png 850w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.48.57-AM.png 1532w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-7177 size-large\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.51.29-AM-1024x698.png\" alt=\"\" width=\"640\" height=\"436\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.51.29-AM-1024x698.png 1024w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.51.29-AM-300x205.png 300w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.51.29-AM-768x524.png 768w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.51.29-AM-850x579.png 850w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/04\/Screen-Shot-2020-04-09-at-8.51.29-AM.png 1508w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<p><span style=\"font-weight: 400\">The first image represents the cross-validated estimates of obesity prevalence based on features of the built environment extracted from satellite images; the second image represents the actual obesity prevalence. These two images have some similarities, but the similarity level is not very high. This indicates that the model can explain obesity prevalence in some sense, but the prediction is not quite well.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Currently, I only include the census tracts in Allentown. Allentown has 26 census tracts, so there are only 26 observations in this data sample. This will cause an overfitting problem, making results unreliable.\u00a0 To prevent overfitting, I need to limit the number of selected features to be less than or equal to the number of census tracts. I will include more cities in Pennsylvania to the sample, and use the same method to test the relationship between the built environment and body weight. If I get a high value of <\/span><span style=\"font-weight: 400\">R<\/span><span style=\"font-weight: 400\">2<\/span><span style=\"font-weight: 400\">, as well as a high similarity between the predicted obesity percentage and actual obesity prevalence, then I can conclude that the built environment is correlated with obesity prevalence across neighborhoods.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2><b>Future work<\/b><\/h2>\n<p><span style=\"font-weight: 400\">In the future, I will do data analysis incorporating control variables: gender, race, median household income, and percentage of households under the poverty line. Also, I will gather the data of Google places of interest to investigate how the built environment will affect body weight through a food access channel.<\/span><\/p>\n<p>My broader dissertation research explores how social factors affect health outcomes, particularly how gender, government policy, and urban space affect health well-being. My first dissertation chapter looks at how maternity leave affects children\u2019s health in urban China, and my second chapter discusses the impact of maternity leave on mothers\u2019 labor outcomes after childbirth. <span style=\"font-weight: 400\">This digital project will be the third\u00a0 chapter of my dissertation. <\/span><\/p>\n<p><span style=\"font-weight: 400\">In the process of working on this digital project, I encountered many technical problems such as downloading Google tile images, implementing convolution neural networking, and generating results in a map. I&#8217;ve gained a lot of experience in Python coding and become more familiar with the area of image analysis and CNN. All these will help me in my future studies. I really appreciate the help I&#8217;ve received from the Scholars Studio.<\/span><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Huilin Zhu<\/p>\n","protected":false},"author":18180,"featured_media":7176,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[395,299,2],"tags":[375,92,402,6],"class_list":["post-7173","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-economics","category-geography","category-grad-students","tag-machine-learning","tag-mapping","tag-neural-networks","tag-top-news"],"_links":{"self":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/7173","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/users\/18180"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/comments?post=7173"}],"version-history":[{"count":1,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/7173\/revisions"}],"predecessor-version":[{"id":9188,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/7173\/revisions\/9188"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media\/7176"}],"wp:attachment":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media?parent=7173"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/categories?post=7173"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/tags?post=7173"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}