Assignment 2 – Spatial Data Catalog

Assignment

For this assignment, you will create a catalog of the spatial data available in the C:\gispy\data folder. This is the folder of data used for the textbook examples and exercises. You will construct a text file in Markdown, using Python’s file I/O functions, that stores the names of the workspaces, spatial layers, and some details of those layers.

In class we demonstrated the use of the os.walk method to recursively walk a folder tree, and the code we constructed is included below. ArcPy includes a similar iterator, arcpy.da.Walk. Whereas os.walk steps into each folder and lists each file, arcpy.da.Walk steps into each workspace and lists each spatial layer. A workspace could be a folder containing spatial data, but it could also be a geodatabase or other container. As you know, shapefiles are made up of multiple file system files. Whereas os.walk would list each shapefile part (DBF, SHX, PRJ, etc.), arcpy.da.Walk will list the complete shapefile once. DBF files will be listed if they are standalone tables, but will not be listed if they are part of a shapefile (that is, if the DBF has the same basename as an SHP file in the folder).

You will make liberal use of arcpy.Describe (see PfA 9.3) to extract information about each workspace and each spatial layer. Following the textbook, you can use the variable name desc when you instantiate the Describe object.

Markdown uses hashes (#) to indicate headings, with the number of hashes indicating the heading level. Use a first-level heading to print each workspace name and type, using the baseName and dataType method, respectively. Put the type in parentheses, and make sure to follow the heading with a blank line. Then print the path using the catalogPath method. The output Markdown should look like this:

# WorkspaceName (Folder, Workspace, etc.)

Path: C:\gispy\data\chxx\…

Use a second-level heading to print the name of each spatial layer (using the baseName method) in the workspace, followed by the data type in parentheses. Make sure to follow the heading with a blank line. Then print the path to the data using the catalogPath method. Your Markdown should look like this:

## LayerName (ShapeFile, TextFile, RasterDataset, etc.)

Path: C:\gispy\data\chxx\…

If the spatial layer is of type "RasterDataset", list the format on its own line:

Format: xxxxxx

If the spatial layer is a vector data type, list its geometry type (using the shapeType attribute:

Geometry Type: Point, Line, Polygon, etc.

If you attempt to read the shapeType attribute of a Describe object that is missing that attribute, Python will throw an error, so you only want to read this attribute for a vector layer. But there are several different vector data types. Rather than compare the value desc.dataType against a list of vector data types, it will be easiest to use the hasattr function to check whether the current Describe object has a shapeType attribute:

if hasattr(desc, "shapeType"):
    print(desc.shapeType)

If the layer has a fields attribute (again, use the hasattr function to check), output the name and data type of each field as a bullet list after a third-level heading “Fields”. (If the layer has no fields attribute, output “None” in bold.) The fields attribute returns a Python list of field objects. You will have to iterate this list and access the name and type attribute of each field object. Output the name of each field in bold, followed by a colon. Your Markdown should look like this:

### Fields

* **field1**: Integer
* **field2**: text
etc...

The spatial catalog should be created by a function that takes as parameters the path to top-level directory to walk and the name of the output file. The target directory should default to the current directory. The output file name should default to “catalog.md” in the same directory as the one being catalogued.

After defining the function, the function should be called with something like:

my_catalog_function("c:/gispy/data", "my_spatial_data_catalog.md")

If you find that you are having trouble completing the assignment using a function, start by creating it as a script. Define variables for the target directory to catalog and the output file name at the top of the script (after all of your imports).

The script should create a file using the file open function. This was demonstrated in class, and the script that was demoed is provided below. Make sure to use file.close() at the end, or make use of a with block as demonstrated in PfA 19.1.1.

Your script does not need any printed output, as the catalog itself is being written to a file. But if you have trouble with the file I/O, generate all Markdown in print functions so that you can earn points for those tasks (see Grading below). Other than that, if you want to, you could use print to output status messages like “Currently cataloguing …”.

Notes:

  • Start your assignment in a fresh script, which should be empty except for our usual imports.
  • Follow course coding conventions, including using snake_case for variable and function names, putting all imports at the top, etc.
  • Comment your code. Briefly describe what a variable or object is, what a for-loop is iterating or intended to do, what each output statement accomplishes.
  • Do not use command lines arguments (sys.argv) to control the script.
  • Assume the script will be run on a Windows computer with a correctly configured ArcPy environment, and with a data folder at c:\gispy\data.
  • Keep in mind that unlike the print function, the file.write method does not end with a newline! Thus, you have to add "\n" yourself for every write operation that you want to end with a newline. If you want a write operation to be followed by a blank line, you have to add two newlines ("\n\n") at the end.
  • There are many ways to build a text string in Python. I strongly recommend that you use the str.format() method. It has been demonstrated repeatedly in class and in the textbook, so you should have the most practice with it.
  • The Describe object has different attributes depending on the type of spatial data. Checking the dataType will give you a clue as to what attributes to expect, or use the hasattr function as described above.
  • Remember to start small. Your first step should be a script that runs without errors even if it doesn’t do very much! Assigning variable names, or stubbing a function that doesn’t do anything is a good start. You can build up the script largely following the flow of the assignment. That is, first output names of workspaces. When that is working, then try to output spatial layer (file) names. When that is working, then try to output spatial layer characteristics. Make sure that each new addition generates working code before moving on to the next step.

Grading

This assignment is worth 10 points, awarded 1 point for each of the following:

  • Create a script that runs to completion with no errors.
  • Use course coding conventions throughout.
  • Call arcpy.da.Walk on the correct directory.
  • Generate correct Markdown for each workspace.
  • Generate correct Markdown for the name and type of each spatial layer.
  • Generate correct Markdown for additional attributes of each spatial layer.
  • Generate correct Markdown for all fields in each spatial layer that has fields.
  • Create a text file in the correct directory (root of the directory you are cataloguing).
  • Write correct Markdown to the text file created, with blocks in the correct order and blank lines separating each block.
  • Provide adequate comments throughout your code.

References

  • os.walk: PfA 12.4 and see below
  • arcpy.da.Walk: Ch 12 arcpyWalkBuffer.py sample script; PfA 15.3.2
  • The Describe object: PfA 9.3
  • Working with file objects: PfA 19.1 and see below
  • Markdown: https://guides.github.com/features/mastering-markdown/; you will only need the basic syntax for this assignment, not the Github Flavored Markdown extensions

Code Example

In class, we demoed using os.walk to recursively walk a folder tree, and using Python’s file input/output methods to write information about the folder contents to a file. The script that we ended up with is included here for your reference:

import os

folder_name = "c:/gispy/data/ch05"

fout = open("c:/gispy/scratch/list_files.txt", "w")

for current_folder, folders, files in os.walk(folder_name):

    fout.write(f"# {current_folder}\n\n")

    for i, filename in enumerate(files):
        # print(os.path.join(current_folder, filename))
        fout.write(f"{str(i + 1)}. {filename}\n")

fout.write("\n")

fout.close()

In class, we used the triple current_folder, folders, files to store the items returned by the os.walk iterator. Keep in mind that the textbook and most examples you will find online and in the Python documentation will usually use root, dirs, files. As with any for-loop, these variable names are arbitrary. You could just as easily use peter, paul, mary, but it would make your code somewhat hard to understand. Using widely accepted conventions is usually a good idea.