Single Ancestry vs. GOP

hi all,

so i found a decent source for a wide range of county level data up to 2010 through census. gov found here

The one major issue is that all the files are in xls format. I’ve played around with a couple of packages such as (xlsx), (XLConnect) and (readxl), that all claim to be able to read xls and xlsx documents from the web. I haven’t had any luck with any of them.

Examples that should work:

urlancestry= ""
 tblancestry = read_excel(urlancestry, sheet = "ANC01B", col_names = "ANC040209D")

tblancestry=read.table("", sheet = "ANC01B", col_names = "ANC040209D",header = TRUE)
 startCol = 8,
 endCol = 2)

I still wanted to mess around with some of the data so I downloaded some files and converted them to csv to be read from my hard drive. Eventually, I hope to figure out the whole xls thing, but for now its time to play with some data.

I chose a table that had county level data from 2010 showing those who report and associate themselves with a single ancestry. I thought it might be interesting to compare that with the election data to see if there are any correlations to the election results and populations considering themselves from a single ancestry. A table was also needed that had total population counts so I found one of those too.  To create a dataframe with a factor with which to work I began my code like this:

tblancestry=read_csv("C:/Users/davestrohmeier/Desktop/Geoviz/County Level Data/ANC01B.csv")
 tbltotalpop=read_csv("C:/Users/davestrohmeier/Desktop/Geoviz/County Level Data/ANC01A.csv")

dfancestry= select(tblancestry, County = 1, GEOID= 2, Multi_Ancest= 8)
 tbltotalpop2= select(tbltotalpop, County = 1, Total_pop= 12)
 df_ancestry= left_join(dfancestry,tbltotalpop2, by="County" )
 df_ancestry$Percent = 100*(df_ancestry$Multi_Ancest/df_ancestry$Total_pop)

Next was to add the familiar election data as a table to join to the ancestry dataframe:

urlElection = ""
 tblElectionOriginal = read_csv(urlElection)

tblElection = transmute(tblElectionOriginal,
 dem_pct_2016 = 100 * dem_2016 / total_2016,
 gop_pct_2016 = 100 * gop_2016 / total_2016,
 oth_pct_2016 = 100 * oth_2016 / total_2016)
 tblElection2= select(tblElection,
 GEOID = 1,

Then I needed to join the datasets and begin tidying up the results:

dfElecAncest= left_join(df_ancestry, tblElection2, by= "GEOID")
 dfElecAncestry= na.omit(dfElecAncest)
 dfElecAncestry$SingleAncPct = 100-(dfElecAncestry$Percent)
 dfElecAncestry$MultiAncPct = (dfElecAncestry$Percent)
 dfElecAncestry= dfElecAncestry[-c(5)]

Then to add a spatial layer with all the US counties and combine if the the ancestry dataframe as well as assign it an appropriate CRS:

spdfNation = counties()
 spdfElecAncestry= geo_join(spdfNation, dfElecAncestry, by= "GEOID",how = "inner" )

dfEpsg = make_EPSG()
 prj4 = dfEpsg[which(dfEpsg$code == "3085"),"prj4"]
 spdfElecAncestry = spTransform(spdfElecAncestry, CRS(prj4))

After a couple of practice plots it was a shame to have to get rid of Hawaii. It was really screwing up the scale of my map:

dfContinental= filter(dfElecAncestry, GEOID != "15001",GEOID != "15003",GEOID != "15007",GEOID != "15009")

spdfContinental= geo_join(spdfNation, dfContinental, by= "GEOID",how = "inner" )

Next, try out a choropleth using the percentages of single ancestry reported:

plot(spdfContinental, bg="lightblue", col=findColours(ciEqual2, colRamp))
 title("Single Ancestry")
 strLegend = paste(
 format(round(ciEqual2$brks[-(intClasses + 1)]), big.mark=","),"%", " - ",
 format(round(ciEqual2$brks[-1]), big.mark=","),"%", sep="")
 legMain = legend(
 "bottomright", legend=strLegend,
 title="Percentage", bg="gray90", inset=0.02, cex=0.6,

Looks like there is some patterns, but let’s see how it compares to a choropleth of the GOP percentages:

ciEqual3 = classIntervals(spdfContinental$gop_pct_2016, n=intClasses, style="equal")

plot(spdfContinental, bg="lightblue", col=findColours(ciEqual3, colRamp))
 title("GOP Percentage")
 strLegend = paste(
 format(round(ciEqual3$brks[-(intClasses + 1)]), big.mark=","),"%", " - ",
 format(round(ciEqual3$brks[-1]), big.mark=","),"%", sep="")
 legMain = legend(
 "bottomright", legend=strLegend,
 title="Percentage", bg="gray90", inset=0.02, cex=0.6,

Hmmmm, there are some areas that compare, but they indeed look like two different maps. Lets try a scatter plot:

ggplot(dfContinental,aes(dem_pct_2016, gop_pct_2016, col=SingleAncPct))+
dfContinental= mutate(dfContinental, Vote = ifelse(gop_pct_2016 > dem_pct_2016, "GOP", "DEM"))
ggplot(dfContinental, aes(SingleAncPct, gop_pct_2016, col=Vote))+

Well, single ancestry seems to be fairly evenly distributed across the range of dem votes and gop votes. Perhaps, there is no real correlation here, so lets just have some fun before looking for a new dataset:

 phil=adjustcolor((findColours(ciEqual2, colRamp)),0.2)

plot(spdfContinental, bg="black", col=findColours(ciEqual3, colRamp))
 points(coords,cex=rad3, col=phil, pch= 19,)



Data Download to R, Cleaning, and a Preliminary Map!

Hi All! I wanted to share the data that I downloaded to R, how I cleaned it and a quick map that I made based on the choropleth lab to test everything. A quick overview though, I am using a dataset from the EPA that discusses the amounts of specific chemicals in the air by county in the US. In addition to this, there is also a significant amount of population data. I felt that this really made it a well rounded set to use. As a note, the state map portion came from the ACS data that was imported prior in class.

Download directly to R:

urlEQuality = "" cEQualityColClasses = c("stfips"="character") dfEQuality = read.csv(urlEQuality, colClasses=cEQualityColClasses) row.names(dfEQuality) = dfEQuality$stfips

Cut down columns (there were well over 200 when it was originally downloaded with all kinds of chemicals and contaminants that I didn’t know well enough to use):

dfEQuality pct_no_eng, med_hh_inc, pct_vac_units, pct_rent_occ, a_pb_ln, a_so2_mean_ln, a_pm10_mean_ln, a_pm25_mean, a_no2_mean_ln, a_o3_mean_ln, a_co_mean_ln, radon_zone, herbicides_ln, fungicides_ln, insecticides_ln)

Rename columns to something less annoying and faster to type:

dfEQuality pm25_ln=a_pm25_mean, no2_ln=a_no2_mean_ln, o3_ln=a_o3_mean_ln, co_ln=a_co_mean_ln)

At this point I was working on taking the data out of the ln form but decided to wait for the time being so I didn’t have a series of extremely small numbers. But if I were to do that at some point the base code would be:

select(dfEQuality, exp(a_pb_ln), exp(a_so2_ln),...)

Create classes that can be used to map some preliminary images to make sure that the data uploaded correctly:

o3 = dfEQuality$o3_ln unemp = dfEQuality$pct_unemp co = dfEQuality$co_ln

Create choropleth maps (because this is from the choropleth lab I won’t get to in-depth with the steps):

dfEpsg = make_EPSG() prj4 = dfEpsg[which(dfEpsg$code == 2260),"prj4"] spdfCounty = spTransform(spdfCounty, CRS(prj4)) intClasses = 6 ciFisher = classIntervals(o3, n=intClasses, style="fisher") ciEqual = classIntervals(o3, n=intClasses, style="equal") ciQuantile = classIntervals(o3, n=intClasses, style="quantile") colRamp = brewer.pal(intClasses, "YlGnBu") cbind(unemp, findInterval(o3, ciFisher$brks), findInterval(o3, round(ciFisher$brks, -3)), findColours(ciFisher, colRamp)) options(scipen=10) plot(ciFisher, colRamp, main="Fisher-Jenks Classification") plot(ciEqual, colRamp, main="Equal Interval Classification") plot(ciQuantile, colRamp, main="Quantile Classification") plot(spdfCounty, bg="white", col=findColours(ciFisher, colRamp)) title("Amount of Ozone Present During Air Quality Assessments") strLegend = paste( "$", format(round(ciFisher$brks[-(intClasses + 1)]), big.mark=","), " - ", "$", format(round(ciFisher$brks[-1]), big.mark=","), sep="" ) legMain = legend( "topleft", legend=strLegend, title="Ozone (O3), 2014", bg="white", inset=0, cex=0.5, fill=colRamp )





downloading csv from website with tidyverse

Apparently, I can’t upload an R file. Lee requested that I upload a file showing that I can download a csv directly from a website.

I found an environmental protection csv. This is the code I used to download and then do basic exploration through the data.


EP = read_csv("")