Find the best matched polygons for use in a time series using pgAdmin

Goal:
Find the best matched polygons in a time series using a PostGIS Spatial Database
Background:
For previous project a comparison of housing data from the 1930s to current and historic US Census data was required. The geographic portion of the census data, was obtained from the National Historical Geographic Information System (NHGIS) which produces downloadable shapefiles with a GISJOIN field allowing for convenient linking of the shapefile to tables of census data also available through NHGIS. The 1930s housing data was in the form of an image of a 1937 Home Owners’ Loan Corporation (HOLC) map of Philadelphia available here. Very broadly the HOLC graded neighborhoods based on race/class and these grades where then used to determine the likelihood of housing loan assistance. There is debate if the HOLC process normalized race-based home lending practices and then the lead to further housing issues such as redlining. more info.

Process:
The HOLC map data was pulled into a GIS and a shapefile was created of the neighborhood grading (this process is often listed as ‘Heads-up Digitizing’ in job postings) here is a handy tutorial for ArcMap and another using QGIS. Shapefile and HOLC jpeg available on github.

The HOLC neighborhood grading schema does not coincide with any current (or historic) census arrangement so the spatial querying capabilities of pgAdmin and the PostgreSQL database engine was used to match the newly created HOLC digital data with the NHGIS Decennial Census geographic data.
Create a PostGIS Spatial Database and use pgShapeLoader to load the shapefiles (HOLC and NHGIS spatial data)

After much trial and error the following code worked the best:

create table overlap2010census as
with limited as(
select distinct us2010.*
from us2010, holc1937
WHERE ST_DWithin(us2010.geom, holc1937.geom, 0.25)
order by us2010.gid
)
/* the section above limits the query of the 2010 census polygons (Entire US) to those near the HOLC polygons –input can vary based on projection of files*/

SELECT DISTINCT ON (limited.gid)
limited.gid as us2010a_gid,
holc1937.gid as holc_gid, holc1937.name, holc1937.grade,
st_area((st_transform(limited.geom,4326)::geography)) as area_check,
ST_Area(ST_Intersection((st_transform(limited.geom,4326)::geography),
(st_transform(holc1937.geom,4326)::geography))) as intersect_area_2010_meters,
(ST_Area(ST_Intersection((st_transform(limited.geom,4326)::geography),
(st_transform(holc1937.geom,4326)::geography)))/
st_area((st_transform(limited.geom,4326)::geography)))*100 as percent_overlap,

/*calculates the area of the returned census and HOLC polygons and the percent of overlap */

limited.*
FROM limited, holc1937
where limited.geom && ST_Expand(holc1937.geom,1) and ST_Area(ST_Intersection(limited.geom, holc1937.geom)) >0

/* joins the files where the geom overlap*/

ORDER BY limited.gid, ST_Area(ST_Intersection((st_transform(limited.geom,4326)::geography),
(st_transform(holc1937.geom,4326)::geography))) DESC;

/* sorts by the largest amount of overlap, and in conjunction with the DISTINCT ON call
only returns the first instance of the census polygon that overlaps the target polygon */

While this method requires the user to update and repeat this code for each comparison the tables created can then be exported out to your GIS or QGIS can connect directly to your database if you prefer that option. As the query also created a unique identifier for each year and polygon of the Census data, a crosswalk/key table was created using this data which allowed for the Census data (race,income housing status etc.) for multiple years to be attached via the NHGIS GISJOIN field previously mentioned. Based off of the multiyear Census data attached via the crosswalk table comparisons of the HOLC geographic areas could then be made regardless of shifting Census borders.

Land Classification and Land Use

After the completion of my Geo-referencing tasks (years 1995, 1975, and 1959), I was given the option between more Geo-referencing (1965) or a slightly different route, which consisted of creating a method to classify land types and land uses. If it wasn’t obvious by the title, I chose Geo-referencing 1965…

Land classification is the method of determining what a feature is on imagery purely based on pixel value (pixel value can be interpreted differently depending on the situation). This allows for a colorful rendition and separation, which results in an easy to read and visualize context of where different features are located. Results can vary and are heavily reliant on image quality. The lower quality the image or imagery, the more generalization and inaccuracy of the classifications.

Anyway, land classification can be simple and it can also be quite difficult. If you are using tools that already exist, or software that are built to classify imagery, you can easily begin land classification/land use. If you are using preexisting material it will quickly become a matter of finding the right combination of numbers in order to get the classifications you want. This method is not too difficult, just more tedious in regards to acquiring your result. However, if you approach it from scratch, it will be significantly more engaging. In order to approach it from the bottom up, you have to essentially dissect the process. You have to analyze your imagery, extract pixel values, group the pixel values, combine all of them into a single file, and finally symbolize them based on attribution or pixel value which was recorded earlier. It is much easier said than done.

I am currently approaching the task via already created tools, however if I had a choice in the matter, I would have approached it via the bottom up method and attempted to create it from scratch as there is more learning in that and it is much more appealing to me. Regardless, I am creating info files, or files that contain the numbers, ranges, and classifications I am using to determine good land classifications. In contrast to what I stated earlier, this is quite difficult for me as the imagery is low quality and I am not a fan of continuously typing in ranges until I thread the needle.

The current tool I am using is the reclassify tool that is available through the ESRI suite and it requires the Spatial Analyst extension. This tool allows for the input of a single image, ranges you would like to use to classify the selected image, and output file. After much testing, I am pretty sure there can only be a maximum of 24 classifications (which is probably more than enough). In addition, the tool can be batch ran (as most ESRI tools can be), which means it can be run on multiple images at once. This is a much needed features for many situations, as I presume most times, individuals are not going to classify one image and be done (or at least I am not going to be one and done).

That is an image that was reclassified using the reclassify tool. I am not sure how good of a classification this is as I have not fully grasped the tool yet and every time I give it ranges, it spits out the same generic ranges that I did not input (which is a bit frustrating, but it comes with the territory). I am sure it is human error though and not the tool messing up. I am not sure what the final result is supposed to be, but I will be sure to fill you in once I achieve it (if I ever do…).

Creating a Network Analysis Dataset and Generating a Service Area Layer

Network analysis can be an extremely beneficial tool for any organization which relies on GIS to understand the area it serves. This is particularly true if the organization deals primarily with transportation or the distribution of goods. The network analysis package through ArcGIS allows organizations to gain critical insight into how people and goods move around within their designated service or study area. More specifically, the engineers that I work with at McMahon utilize GIS network analysis to gain enhanced insight on the regions in which they serve as transportation engineers and planners. The baseline method of analysis utilized is creating a service area layer, or series of layers, to gain an understanding of how much distance can be traveled on a road network given a set of parameters.

A Basic Example of a Service Area Layer, With Three Interval Breaks

The first step in creating a service area layer is to create a network dataset. A network dataset is a special type of ESRI dataset, which contains both spatial layers and network layers, as well as custom preferences and parameters to fine tune the output of the network dataset. To create a network dataset, first, download a road network layer that represents the area in which you want to create service areas. Often times, these layers are available from government agencies. For example, all Pennsylvania state and local roads are available on PASDA, provided by PennDOT. Once the streets / roads layer is downloaded, it is important to clip the layer to what you expect you might utilize from the service area analysis. If you choose not to clip the layer, you may find creating and working with the network dataset to be slow and difficult to manage.

You will also need to create a source point file that will represent the location or locations from which the service areas will be generated. You can create the actual points either now or after the network dataset is created, but the point feature class itself needs to be created beforehand. You will also need to create a file geodatabase and feature dataset to store all of these layers and datasets in.

Once the layer is downloaded, clipped, and present in your working directory with the source point file you can create the network dataset. This can be done by right-clicking inside any feature dataset and clicking new > network dataset. A dialog window should appear with options regarding the new network dataset. Name the network dataset and click next. You can leave most of the settings at their default values unless otherwise instructed or if you know the network dataset will be used for purposes in addition to creating a service area. Finally, on the second to last page of options, check the box next to “build service area indexes”. This will speed up the creation of service areas in the future.

Creating a New Network Dataset Inside of a Feature Dataset

Click finish, and you will be asked if you want to build the network dataset. Click Yes. Once the network dataset has been created, open ArcMap and bring the newly created network dataset into the dataframe. When prompted to bring in any associated layers, say yes. Add the Network Analyst toolbar and click “New Service Area” under the Network Analyst dropdown.

Creating a New Service Area with Network Analyst

Click the button directly to the right of the Network Analyst dropdown, to open Network Analyst Window. Here, you can load any origin points (previously created), polygon or line barriers, as well as set options for the service area that is going to be created. You can drag the point layer which contains the origin points into the Facilities group. Clicking the Service Area Properties button (top right corner of the Network Analyst Window) will allow you to set important preferences for the service area that is about to be created. Some of the most useful settings include the accumulation tab, which allows you to track distance and time along the service area, and the analysis settings tab. The analysis settings tab will allow you to set the direction of the service area, as well as any breaks, and to implement a temporal aspect to the service area.

Service Area Properties

Apply any preferences and settings that have changed, and close the service area properties window. When everything is configured as desired, click the Solve button on the network analyst toolbar to run the analysis and create the service area. The analysis can be re-run an infinite number of times, and with the service area index that was created earlier, creating a service area should not take long each time. This way, you can tweak and fine tune the service area quickly and easily until you gain the desired result.

A Service Area Created Using Network Analyst

Understanding Projections: Geo-referencing

Recently I started working on the year 1959, which is currently the last set of images I have to work on.

1959 was a unique set compared to 1975 and 1995 for a few reasons. The first reason being related to the significant difference in development, which is obvious due to how far back in time the images were taken. The second reason is probably common among Geo-referencing projects, but I have not encountered it until this year. The set was split into two types of images: Geo-referenced or projected images and non-referenced or non-projected images. The projected images are similar to the images I have completed in 1975 and 1995, they were already projected to the desired coordinate system and allowed for immediate Geo-referencing. The non-referenced images are a little more difficult.

These images would gray out the toolbar and not allow for any Geo-referencing. At first I was a bit confused, but quickly started brainstorming ideas. My first idea came in the form of projecting them to the same coordinate system as the data frame. Unfortunately, due to the size of the images, projecting these images was impossible on the machine I had (16 GB of RAM). Nearly instantly the tools would fail. However, I thought this had to be the reason, so I continued to experiment with different ways to project the images. The next was to project to an intermediate coordinate system, then proceed to the more advanced system in hopes of reducing the burden, but that also fizzled out. It seems that no matter what system you are coming from (although I don’t fully understand geographic transformations), it attempts to perform the full process. Running out of ideas, I tried one final solution. I thought that perhaps the size of the images were the reason and decided to slim them down by splitting them into eight sections. After creating a few scripts that would split every image, which went smoothly, it approached the projection part of the script and crashed nearly instantly (although it may have lasted slightly longer).

 

After none of my ideas worked, I decided to restart the computer…and that worked. Even though the images are not the same coordinate system as the data frame, I can still add control points, fit to display, etc. This was a bit frustrating since it was such a simple solution, but the lesson here is that it is best to try the simplest solution before trying the more difficult ideas.

After talking with a few people, the coordinate system should be irrelevant because as soon as you begin Geo-referencing the image should convert to that of the data frame and even that will not matter in the end. Once the images are completed another individual will tether or mosaic them together and assign a coordinate system that way, which will result in the current coordinate system being overwritten.

Geo-referencing: Learning to Save the Hard Way (Easy Way for you)

Introduction:

I began my internship at the office of innovation and technology in Philadelphia tasked with Geo-referencing historic images to the most recent imagery Philadelphia has to offer. In order to ease me into the tougher, later dates, they had me begin in 1995, which is much more similar to the most recent imagery than the 1950’s to the most recent imagery.

If you have Geo-referenced before, then you know it can be quite simple, but obviously depends on how similar the images are. But, you would also know how tedious and monotonous Geo-referencing is. My undergraduate GIS classes delved into Geo-referencing a little bit just to get the student’s feet wet, but because I was not versed in the little nuances Geo-referencing requires, I shot myself in the foot and learned the hard way.

Tip of the Sprint:

In order to effectively Geo-reference I suggest understanding what actually saves your control points and what does not. If you don’t know what control points are, control points are the references you place all over your target image when you attempt to tell the software this is where this specific placement on the target image is on the reference image in order to stitch them together seamlessly.

Some images require little to no control points, while others (in my case) seem to take in the 20 range…and sometimes more. Now imagine not understanding how Geo-referencing tracked your control points and going through an entire package of images (140 in my case) only to realize none of your work actually registered.

Let me be the first to say, you and I will never make that mistake again. To cut to the chase, as far as my current knowledge is concerned, there are two ways to essentially save your control point placements. The first way is to ‘rectify’, which will create a new image with the corrections you made. This option is found in the drop down menu of the Geo-referencing toolbar in ArcMap. However, if you do not wish to create new images, option two may be a better fit. Option two revolves around updating the current image through the function ‘Update Geo-referencing’ within the Geo-referencing toolbar in ArcMap. I should have done a better job of understanding this crucial aspect before spending two days ‘completing’ a large task.

 

Josh

 

Understanding Transformations within Geo-referencing: 1st Order and Spline

If you have ever been tasked with a Geo-referencing assignment, you may have heard of transformations. If you have not, or have never worked with Geo-referencing, transformations are essentially different algorithms that will determine how the image you’re Geo-referencing will shift, distort, bend, warp, or be altered. In regards to the ESRI suite’s (version 10.4) use of Geo-referencing, there are eight transformations (there may be more, but I have only been exposed to that offered by ESRI) available to you.

The transformations are as follows:

*In order to ‘unlock’ transformation for use, you need to meet a control point threshold

Zero polynomial – Essentially no image movement, just shifting

Similarity Polynomial – Image will move, but little to no distortion

1st Order Polynomial

2nd Order Polynomial*

3rd Order Polynomial*

Adjust – Significant emphasis on Control Points

Projective Transformation

Spline*

 

I tend to only use two transformations: 1st Order Polynomial and Spline. I am not an expert on Geo-referencing, so I tend to lean towards the ones I understand more easily. With that being said, those are the two I will discuss in further detail and how I have used them.

1st Order Polynomial:

This is a transformation that is available to the user the moment they start to Geo-reference an image. It requires no set amount of control points and allows for a consistent, but diminishing movement of the image you are working on. This consistency is what I like about this transformation and if you can successfully get an image to line up with your reference image in a few control points, this is the transformation for you.

However, the more control points you add, the less effective this transformation becomes. What I mean by that is within the first 6 control points, your image will shift greatly and begin to quickly line up with your reference, but after that threshold, the shifts become minimal and the amount of control points you have to add to gain the same effect rapidly get out of hand, which leads into the next transformation: Spline.

Spline:

Spline is a transformation that benefits heavily with the amount of control points you have. This transformation requires at least 10 control points be in play before you can even begin using it. It requires a lot of control points because it heavily distorts the images, essentially moving a part of your image exactly to the control point location. This is also a great detriment to the transformation. If you wish to use this transformation, you must place control points everywhere on your image, which can mean anywhere from 50-150+ control points, or else you are prone to having some parts of your image lining up and others being completely off.

Spline is very picky and requires delicate positioning of control points. Because they distort a given area, and you require a lot in order to maximize the effectiveness, you need to be careful with where you place them. If two or more control points are too close to each other, you will witness extreme warping in that location and will likely acquire the opposite effect of what you are looking for (unless you are an art major or artist, then you might find what you want).

 

To summarize both transformations within a sentence or two, 1st Order is a transformation if you want satisfactory, but not perfect Geo-referencing and you are on a time budget, while Spline is something you should use if you want a perfectly Geo-referenced image and have a lot more time available to you.

This should help transition into my next blog post: Understanding what you want and what you have time for within Geo-referencing.