Data Download to R, Cleaning, and a Preliminary Map!

Hi All! I wanted to share the data that I downloaded to R, how I cleaned it and a quick map that I made based on the choropleth lab to test everything. A quick overview though, I am using a dataset from the EPA that discusses the amounts of specific chemicals in the air by county in the US. In addition to this, there is also a significant amount of population data. I felt that this really made it a well rounded set to use. As a note, the state map portion came from the ACS data that was imported prior in class.

Download directly to R:


urlEQuality = "https://edg.epa.gov/data/Public/ORD/NHEERL/EQI/Eqidata_all_domains_2014March11.csv" cEQualityColClasses = c("stfips"="character") dfEQuality = read.csv(urlEQuality, colClasses=cEQualityColClasses) row.names(dfEQuality) = dfEQuality$stfips

Cut down columns (there were well over 200 when it was originally downloaded with all kinds of chemicals and contaminants that I didn’t know well enough to use):


dfEQuality pct_no_eng, med_hh_inc, pct_vac_units, pct_rent_occ, a_pb_ln, a_so2_mean_ln, a_pm10_mean_ln, a_pm25_mean, a_no2_mean_ln, a_o3_mean_ln, a_co_mean_ln, radon_zone, herbicides_ln, fungicides_ln, insecticides_ln)

Rename columns to something less annoying and faster to type:


dfEQuality pm25_ln=a_pm25_mean, no2_ln=a_no2_mean_ln, o3_ln=a_o3_mean_ln, co_ln=a_co_mean_ln)

At this point I was working on taking the data out of the ln form but decided to wait for the time being so I didn’t have a series of extremely small numbers. But if I were to do that at some point the base code would be:


select(dfEQuality, exp(a_pb_ln), exp(a_so2_ln),...)

Create classes that can be used to map some preliminary images to make sure that the data uploaded correctly:


o3 = dfEQuality$o3_ln unemp = dfEQuality$pct_unemp co = dfEQuality$co_ln

Create choropleth maps (because this is from the choropleth lab I won’t get to in-depth with the steps):


dfEpsg = make_EPSG() prj4 = dfEpsg[which(dfEpsg$code == 2260),"prj4"] spdfCounty = spTransform(spdfCounty, CRS(prj4)) intClasses = 6 ciFisher = classIntervals(o3, n=intClasses, style="fisher") ciEqual = classIntervals(o3, n=intClasses, style="equal") ciQuantile = classIntervals(o3, n=intClasses, style="quantile") colRamp = brewer.pal(intClasses, "YlGnBu") cbind(unemp, findInterval(o3, ciFisher$brks), findInterval(o3, round(ciFisher$brks, -3)), findColours(ciFisher, colRamp)) options(scipen=10) plot(ciFisher, colRamp, main="Fisher-Jenks Classification") plot(ciEqual, colRamp, main="Equal Interval Classification") plot(ciQuantile, colRamp, main="Quantile Classification") plot(spdfCounty, bg="white", col=findColours(ciFisher, colRamp)) title("Amount of Ozone Present During Air Quality Assessments") strLegend = paste( "$", format(round(ciFisher$brks[-(intClasses + 1)]), big.mark=","), " - ", "$", format(round(ciFisher$brks[-1]), big.mark=","), sep="" ) legMain = legend( "topleft", legend=strLegend, title="Ozone (O3), 2014", bg="white", inset=0, cex=0.5, fill=colRamp )