Assignment
For this assignment, you will create a catalog of the spatial data available in the C:\gispy\data
folder. This is the folder of data used for the textbook examples and exercises. You will construct a text file in Markdown, using Python’s file I/O functions, that stores the names of the workspaces, spatial layers, and some details of those layers.
In class we demonstrated the use of the os.walk
method to recursively walk a folder tree, and the code we constructed is included below. ArcPy includes a similar iterator, arcpy.da.Walk
. Whereas os.walk
steps into each folder and lists each file, arcpy.da.Walk
steps into each workspace and lists each spatial layer. A workspace could be a folder containing spatial data, but it could also be a geodatabase or other container. As you know, shapefiles are made up of multiple file system files. Whereas os.walk
would list each shapefile part (DBF, SHX, PRJ, etc.), arcpy.da.Walk
will list the complete shapefile once. DBF files will be listed if they are standalone tables, but will not be listed if they are part of a shapefile (that is, if the DBF has the same basename as an SHP file in the folder).
You will make liberal use of arcpy.Describe
(see PfA 9.3) to extract information about each workspace and each spatial layer. Following the textbook, you can use the variable name desc
when you instantiate the Describe
object.
Markdown uses hashes (#
) to indicate headings, with the number of hashes indicating the heading level. Use a first-level heading to print each workspace name and type, using the baseName
and dataType
method, respectively. Put the type in parentheses, and make sure to follow the heading with a blank line. Then print the path using the catalogPath
method. The output Markdown should look like this:
# WorkspaceName (Folder, Workspace, etc.)
Path: C:\gispy\data\chxx\…
Use a second-level heading to print the name of each spatial layer (using the baseName
method) in the workspace, followed by the data type in parentheses. Make sure to follow the heading with a blank line. Then print the path to the data using the catalogPath
method. Your Markdown should look like this:
## LayerName (ShapeFile, TextFile, RasterDataset, etc.)
Path: C:\gispy\data\chxx\…
If the spatial layer is of type "RasterDataset"
, list the format on its own line:
Format: xxxxxx
If the spatial layer is a vector data type, list its geometry type (using the shapeType
attribute:
Geometry Type: Point, Line, Polygon, etc.
If you attempt to read the shapeType
attribute of a Describe
object that is missing that attribute, Python will throw an error, so you only want to read this attribute for a vector layer. But there are several different vector data types. Rather than compare the value desc.dataType
against a list of vector data types, it will be easiest to use the hasattr
function to check whether the current Describe
object has a shapeType
attribute:
if hasattr(desc, "shapeType"):
print(desc.shapeType)
If the layer has a fields
attribute (again, use the hasattr
function to check), output the name and data type of each field as a bullet list after a third-level heading “Fields”. (If the layer has no fields
attribute, output “None” in bold.) The fields
attribute returns a Python list of field
objects. You will have to iterate this list and access the name
and type
attribute of each field
object. Output the name of each field in bold, followed by a colon. Your Markdown should look like this:
### Fields
* **field1**: Integer
* **field2**: text
etc...
The spatial catalog should be created by a function that takes as parameters the path to top-level directory to walk and the name of the output file. The target directory should default to the current directory. The output file name should default to “catalog.md” in the same directory as the one being catalogued.
After defining the function, the function should be called with something like:
my_catalog_function("c:/gispy/data", "my_spatial_data_catalog.md")
If you find that you are having trouble completing the assignment using a function, start by creating it as a script. Define variables for the target directory to catalog and the output file name at the top of the script (after all of your imports).
The script should create a file using the file open
function. This was demonstrated in class, and the script that was demoed is provided below. Make sure to use file.close()
at the end, or make use of a with
block as demonstrated in PfA 19.1.1.
Your script does not need any printed output, as the catalog itself is being written to a file. But if you have trouble with the file I/O, generate all Markdown in print
functions so that you can earn points for those tasks (see Grading below). Other than that, if you want to, you could use print
to output status messages like “Currently cataloguing …”.
Notes:
- Start your assignment in a fresh script, which should be empty except for our usual imports.
- Follow course coding conventions, including using
snake_case
for variable and function names, putting all imports at the top, etc. - Comment your code. Briefly describe what a variable or object is, what a for-loop is iterating or intended to do, what each output statement accomplishes.
- Do not use command lines arguments (
sys.argv
) to control the script. - Assume the script will be run on a Windows computer with a correctly configured ArcPy environment, and with a data folder at
c:\gispy\data
. - Keep in mind that unlike the
print
function, thefile.write
method does not end with a newline! Thus, you have to add"\n"
yourself for every write operation that you want to end with a newline. If you want a write operation to be followed by a blank line, you have to add two newlines ("\n\n"
) at the end. - There are many ways to build a text string in Python. I strongly recommend that you use the
str.format()
method. It has been demonstrated repeatedly in class and in the textbook, so you should have the most practice with it. - The
Describe
object has different attributes depending on the type of spatial data. Checking thedataType
will give you a clue as to what attributes to expect, or use thehasattr
function as described above. - Remember to start small. Your first step should be a script that runs without errors even if it doesn’t do very much! Assigning variable names, or stubbing a function that doesn’t do anything is a good start. You can build up the script largely following the flow of the assignment. That is, first output names of workspaces. When that is working, then try to output spatial layer (file) names. When that is working, then try to output spatial layer characteristics. Make sure that each new addition generates working code before moving on to the next step.
Grading
This assignment is worth 10 points, awarded 1 point for each of the following:
- Create a script that runs to completion with no errors.
- Use course coding conventions throughout.
- Call
arcpy.da.Walk
on the correct directory. - Generate correct Markdown for each workspace.
- Generate correct Markdown for the name and type of each spatial layer.
- Generate correct Markdown for additional attributes of each spatial layer.
- Generate correct Markdown for all fields in each spatial layer that has fields.
- Create a text file in the correct directory (root of the directory you are cataloguing).
- Write correct Markdown to the text file created, with blocks in the correct order and blank lines separating each block.
- Provide adequate comments throughout your code.
References
os.walk
: PfA 12.4 and see belowarcpy.da.Walk
: Ch 12arcpyWalkBuffer.py
sample script; PfA 15.3.2- The
Describe
object: PfA 9.3 - Working with file objects: PfA 19.1 and see below
- Markdown: https://guides.github.com/features/mastering-markdown/; you will only need the basic syntax for this assignment, not the Github Flavored Markdown extensions
Code Example
In class, we demoed using os.walk
to recursively walk a folder tree, and using Python’s file input/output methods to write information about the folder contents to a file. The script that we ended up with is included here for your reference:
import os
folder_name = "c:/gispy/data/ch05"
fout = open("c:/gispy/scratch/list_files.txt", "w")
for current_folder, folders, files in os.walk(folder_name):
fout.write(f"# {current_folder}\n\n")
for i, filename in enumerate(files):
# print(os.path.join(current_folder, filename))
fout.write(f"{str(i + 1)}. {filename}\n")
fout.write("\n")
fout.close()
In class, we used the triple current_folder, folders, files
to store the items returned by the os.walk
iterator. Keep in mind that the textbook and most examples you will find online and in the Python documentation will usually use root, dirs, files
. As with any for-loop, these variable names are arbitrary. You could just as easily use peter, paul, mary
, but it would make your code somewhat hard to understand. Using widely accepted conventions is usually a good idea.