Exercise: Nearest Neighbor Analysis

For the first part of this assignment, you are going to conduct proximity analysis using the BoundlessGeo NYC data.

  1. Using a KNN operator and a correlated subquery, create a query which lists the names of all subway stations and the nearest other subway station. CHALLENGE: Also display the distance between the station. Since a correlated subquery can only return a single column, you would have to construct a “compound” column, either using string concatenation (easiest) or the array constructor (harder).
  2. Using a KNN operator and a lateral join, create a query which calculates the distance from every subway station to the nearest five other subway stations. List the station names and the distances in the output.
  3. Create three queries which calculate the average nearest distance from NYC census blocks to subway stations for those blocks which are (a) 45% or greater African-American, (b) 45% or greater White, and (c) 45% or greater Asian. That is, your result should be one average nearest distance for each set of census blocks meeting the given demographic criteria.

For the second part of the assignment, you will work with the cities in the Natural Earth data on the Kropotkin server.

  1. Create a query which calculates distances from each city in the United States in the ne_110m_populated_places layer to its five nearest cities in any country. List the cities and distances in the output.