{"id":7022,"date":"2020-03-11T15:40:57","date_gmt":"2020-03-11T19:40:57","guid":{"rendered":"https:\/\/sites.temple.edu\/tudsc\/?p=7022"},"modified":"2020-04-22T15:51:56","modified_gmt":"2020-04-22T19:51:56","slug":"coding-with-keras-for-transfer-learning-measuring-impact-of-built-environments-on-health-part-iii","status":"publish","type":"post","link":"https:\/\/sites.temple.edu\/tudsc\/2020\/03\/11\/coding-with-keras-for-transfer-learning-measuring-impact-of-built-environments-on-health-part-iii\/","title":{"rendered":"Coding with Keras for Transfer Learning:  Measuring Impact of Built Environments on Health Part III"},"content":{"rendered":"<p>By Huilin Zhu<\/p>\n<p><!--more--><\/p>\n<p><span style=\"font-weight: 400\">My digital project\u2019s research question explores how the built environment affects people\u2019s health, especially in terms of weight, in the state of Pennsylvania.<\/span><span style=\"font-weight: 400\"> In order to consistently measure the built environment\u2019s effects, I\u2019ve previously worked to download satellite images from Google\u2019s static. <a href=\"https:\/\/sites.temple.edu\/tudsc\/2019\/10\/30\/measuring-the-impact-of-built-environments-on-health-gathering-data-from-satellite-imagery-and-census-maps\/\">My first blog<\/a> introduced how to use python and Google static map API to downloaded all the tile images. I\u00a0 put the image together based on census tract, import all these images, and convert each image into a 3-dimensional array in python.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The next step is to use a pre-trained Convolutional Neural Network (CNN) model to extract features of the built environment from satellite maps. To run a deep convolutional network for object recognition, I&#8217;m using a model trained over ImageNet&#8217;s dataset developed by Oxford&#8217;s renowned <\/span><span style=\"font-weight: 400\">Visual Geometry Grou<\/span><a href=\"http:\/\/www.robots.ox.ac.uk\/~vgg\"><span style=\"font-weight: 400\">p<\/span><\/a><span style=\"font-weight: 400\"> (VGG). My previous blog\u00a0 <a href=\"https:\/\/sites.temple.edu\/tudsc\/2019\/12\/16\/measuring-the-impact-of-built-environments-on-health-part-ii-use-of-transfer-learning\/\">\u201cMeasuring the Impact of Built Environments on Health Part II: Use of transfer learning\u201d<\/a> discussed why and how I use the VGG\u00a0 model as the pre-trained model to extract the features of the built environment. In this blog, I will focus on how to use Python to apply the VGG16 model to do transfer learning.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Installation<\/span><\/h2>\n<p><span style=\"font-weight: 400\">I downloaded TensorFlow and Keras to implement VGG. It\u2019s better to upgrade pip before you download TensorFlow and Keras.<\/span><\/p>\n<pre><code>pip install --upgrade pip\r\npip install tensorflow\r\npip install keras\r\n<\/code><\/pre>\n<h1><span style=\"font-weight: 400\">Set up a model<\/span><\/h1>\n<p><span style=\"font-weight: 400\">Keras provides both the 16-layer and 19-layer version via the VGG16 and VGG19 classes. My study uses the VGG16 model. VGG16 is trained over <\/span><a href=\"http:\/\/www.image-net.org\/\"><span style=\"font-weight: 400\">ImageNet<\/span><\/a><span style=\"font-weight: 400\">, and the images in<\/span><span style=\"font-weight: 400\"> ImageNet<\/span><span style=\"font-weight: 400\"> are classified into animals, geological formation, natural objects, and many other different categories. The input images I am using are Google\u2019s satellite images, including images of parks, highways, green streets, crosswalks, and housing. \u00a0 Since there are some differences between the input data image in the pre-trained model and the input image data, I will use the second fully connected layer instead of the final layer-predictions.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The model can be created as follows:<\/span><\/p>\n<pre><code>from keras.applications.vgg16 import VGG16\r\nmodel = VGG16(weights='imagenet', input_shape=(224,224,3))<\/code><code>\r\nmodel.summary()\r\n<\/code><\/pre>\n<p><span style=\"font-weight: 400\">My project uses a second fully connected layer of the VGG-CNN-F network to extract the information of the built environment. If you want to check what kind of exact objects exist in the image, you should use prediction as to the extracting layer.\u00a0 The following code shows how I use the second fully connected layer \u2018fc2\u2019 to extract the information from each image.<\/span><\/p>\n<pre><code>from keras import layers\r\nfrom keras import models\r\nfrom keras.models import Model\r\nlayer_name = 'fc2'    #set up layer extracted\r\nintermediate_layer_model = Model(inputs=model.input,outputs=model.get_layer(layer_name).output)\r\n<\/code><\/pre>\n<h1><span style=\"font-weight: 400\">Extract the feature of the maps<\/span><\/h1>\n<p><span style=\"font-weight: 400\">All images are loaded and converted to 3-dimensional arrays in Python. Each item in x_list includes all the Numpy arrays in one specific census tract.\u00a0 For example, x_4202000-42077001800 represents a list containing all the arrays in the census tract with 4202000-42077001800 as Place_tractID. Using the function of eval(x_list[0]), we can see all the NumPy arrays for each image in the first census tract in Allentown.\u00a0<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-7025 size-medium\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-10-at-11.13.23-AM-208x300.png\" alt=\"\" width=\"208\" height=\"300\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-10-at-11.13.23-AM-208x300.png 208w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-10-at-11.13.23-AM-300x433.png 300w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-10-at-11.13.23-AM.png 538w\" sizes=\"auto, (max-width: 208px) 100vw, 208px\" \/><\/p>\n<p><span style=\"font-weight: 400\">The following code shows how to implement the VGG pre-trained model to extract the features of the built environment in each census tract.<\/span><\/p>\n<pre><code>data_sum = [ ]\r\nfor i in x_list:\r\nk = [ ]\r\n#get 4096 variables for each image\r\ncreateVar['layer_output_' + i] = intermediate_layer_model.predict(eval(i))\r\n# get the mean value for the 4096 variables for all images in each census tract\r\ncreateVar['mean_' + i] = pd.DataFrame(eval('layer_output_' + i)).mean(axis=0)\r\nk = eval('mean_' + i)\r\nk['Place_TractID'] = i\r\nif len(data_sum) == 0:\r\ndata_sum = k\r\nelse:\r\ndata_sum = pd.concat([data_sum,k], axis=1, ignore_index=True)\r\ndata_sum  # show the results\r\n<\/code><\/pre>\n<p><span style=\"font-weight: 400\">Finally, I can get 4096 variables for each census tract in data_sum. The following table is part of the output of data_sum.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-large wp-image-7027 aligncenter\" src=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-11-at-10.53.44-AM-1024x409.png\" alt=\"\" width=\"640\" height=\"256\" srcset=\"https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-11-at-10.53.44-AM-1024x409.png 1024w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-11-at-10.53.44-AM-300x120.png 300w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-11-at-10.53.44-AM-768x307.png 768w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-11-at-10.53.44-AM-850x340.png 850w, https:\/\/sites.temple.edu\/tudsc\/files\/2020\/03\/Screen-Shot-2020-03-11-at-10.53.44-AM.png 1206w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<p>The first column is the index and another 26 columns representing each census tract in Allentown. The last row of the table shows the Place_TractID for each census tract.\u00a0 Each Column contains 4096 variables that can represent the feature of the built environment in each census tract.<\/p>\n<p>For example, x_4202000-42077001800 is the place tract ID for the first census tract. \u00a0The value of the first variable in the first census tract is 0.39. The value of the variable 4096 in the first census tract is 0.32. These 4096 variables do not have a specific meaning, but they can represent the indicator of the built environment, including color, gradient, edge, height, length, etc.<\/p>\n<h1>Future Research<\/h1>\n<p>Right now, I get 4096 variables for each census tract in Allentown. Allentown only has 26 census tract, which makes the sample very small.\u00a0 I will download the satellite image for the main cities in Pennsylvania, get all variables for each census tract.\u00a0 Then I will combine the 4096 variables with the overweight percentage level in each census and do statistical analysis. Other control variables may also be taken into accounts, such as median household income, percentage of male(female), and percentage of the race (White, Black, Asian, and other race), percentage of households under the poverty line.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Huilin Zhu<\/p>\n","protected":false},"author":18180,"featured_media":7034,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[303,395,299,2,406],"tags":[375,6],"class_list":["post-7022","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-science","category-economics","category-geography","category-grad-students","category-maps-gis","tag-machine-learning","tag-top-news"],"_links":{"self":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/7022","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/users\/18180"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/comments?post=7022"}],"version-history":[{"count":1,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/7022\/revisions"}],"predecessor-version":[{"id":9192,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/posts\/7022\/revisions\/9192"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media\/7034"}],"wp:attachment":[{"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/media?parent=7022"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/categories?post=7022"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.temple.edu\/tudsc\/wp-json\/wp\/v2\/tags?post=7022"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}