so i found a decent source for a wide range of county level data up to 2010 through census. gov found here https://www.census.gov/support/USACdataDownloads.html.
The one major issue is that all the files are in xls format. I’ve played around with a couple of packages such as (xlsx), (XLConnect) and (readxl), that all claim to be able to read xls and xlsx documents from the web. I haven’t had any luck with any of them.
Examples that should work:
urlancestry= "http://www2.census.gov/prod2/statcomp/usac/excel/ANC01.xls" tblancestry = read_excel(urlancestry, sheet = "ANC01B", col_names = "ANC040209D") tblancestry=read.table("http://www2.census.gov/prod2/statcomp/usac/excel/ANC01.xls", sheet = "ANC01B", col_names = "ANC040209D",header = TRUE) tblancestry=readWorksheetFromFile(urlancestry, sheet=1, startCol = 8, endCol = 2)
I still wanted to mess around with some of the data so I downloaded some files and converted them to csv to be read from my hard drive. Eventually, I hope to figure out the whole xls thing, but for now its time to play with some data.
I chose a table that had county level data from 2010 showing those who report and associate themselves with a single ancestry. I thought it might be interesting to compare that with the election data to see if there are any correlations to the election results and populations considering themselves from a single ancestry. A table was also needed that had total population counts so I found one of those too. To create a dataframe with a factor with which to work I began my code like this:
tblancestry=read_csv("C:/Users/davestrohmeier/Desktop/Geoviz/County Level Data/ANC01B.csv") tbltotalpop=read_csv("C:/Users/davestrohmeier/Desktop/Geoviz/County Level Data/ANC01A.csv") dfancestry= select(tblancestry, County = 1, GEOID= 2, Multi_Ancest= 8) tbltotalpop2= select(tbltotalpop, County = 1, Total_pop= 12) df_ancestry= left_join(dfancestry,tbltotalpop2, by="County" ) df_ancestry$Percent = 100*(df_ancestry$Multi_Ancest/df_ancestry$Total_pop)
Next was to add the familiar election data as a table to join to the ancestry dataframe:
urlElection = "https://github.com/tonmcg/County_Level_Election_Results_12-16/raw/master/US_County_Level_Presidential_Results_08-16.csv" tblElectionOriginal = read_csv(urlElection) tblElection = transmute(tblElectionOriginal, fips_code, dem_pct_2016 = 100 * dem_2016 / total_2016, gop_pct_2016 = 100 * gop_2016 / total_2016, oth_pct_2016 = 100 * oth_2016 / total_2016) tblElection2= select(tblElection, GEOID = 1, dem_pct_2016, gop_pct_2016, oth_pct_2016)
Then I needed to join the datasets and begin tidying up the results:
dfElecAncest= left_join(df_ancestry, tblElection2, by= "GEOID") dfElecAncestry= na.omit(dfElecAncest) dfElecAncestry$SingleAncPct = 100-(dfElecAncestry$Percent) dfElecAncestry$MultiAncPct = (dfElecAncestry$Percent) dfElecAncestry= dfElecAncestry[-c(5)] dfElecAncestry=dfElecAncestry[dfElecAncestry[,9]<=100,]
Then to add a spatial layer with all the US counties and combine if the the ancestry dataframe as well as assign it an appropriate CRS:
spdfNation = counties() spdfElecAncestry= geo_join(spdfNation, dfElecAncestry, by= "GEOID",how = "inner" ) dfEpsg = make_EPSG() prj4 = dfEpsg[which(dfEpsg$code == "3085"),"prj4"] spdfElecAncestry = spTransform(spdfElecAncestry, CRS(prj4))
After a couple of practice plots it was a shame to have to get rid of Hawaii. It was really screwing up the scale of my map:
dfContinental= filter(dfElecAncestry, GEOID != "15001",GEOID != "15003",GEOID != "15007",GEOID != "15009") spdfContinental= geo_join(spdfNation, dfContinental, by= "GEOID",how = "inner" )
Next, try out a choropleth using the percentages of single ancestry reported:
plot(spdfContinental, bg="lightblue", col=findColours(ciEqual2, colRamp)) title("Single Ancestry") strLegend = paste( format(round(ciEqual2$brks[-(intClasses + 1)]), big.mark=","),"%", " - ", format(round(ciEqual2$brks[-1]), big.mark=","),"%", sep="") legMain = legend( "bottomright", legend=strLegend, title="Percentage", bg="gray90", inset=0.02, cex=0.6, fill=colRamp)
Looks like there is some patterns, but let’s see how it compares to a choropleth of the GOP percentages:
ciEqual3 = classIntervals(spdfContinental$gop_pct_2016, n=intClasses, style="equal") plot(spdfContinental, bg="lightblue", col=findColours(ciEqual3, colRamp)) title("GOP Percentage") strLegend = paste( format(round(ciEqual3$brks[-(intClasses + 1)]), big.mark=","),"%", " - ", format(round(ciEqual3$brks[-1]), big.mark=","),"%", sep="") legMain = legend( "bottomright", legend=strLegend, title="Percentage", bg="gray90", inset=0.02, cex=0.6, fill=colRamp)
Hmmmm, there are some areas that compare, but they indeed look like two different maps. Lets try a scatter plot:
ggplot(dfContinental,aes(dem_pct_2016, gop_pct_2016, col=SingleAncPct))+ geom_point(pch=8) dfContinental= mutate(dfContinental, Vote = ifelse(gop_pct_2016 > dem_pct_2016, "GOP", "DEM")) ggplot(dfContinental, aes(SingleAncPct, gop_pct_2016, col=Vote))+ geom_jitter(pch=8)
Well, single ancestry seems to be fairly evenly distributed across the range of dem votes and gop votes. Perhaps, there is no real correlation here, so lets just have some fun before looking for a new dataset:
coords=coordinates(spdfContinental) rad3=sqrt(dfContinental$SingleAncPct/3.14)/1.5 phil=adjustcolor((findColours(ciEqual2, colRamp)),0.2) plot(spdfContinental, bg="black", col=findColours(ciEqual3, colRamp)) points(coords,cex=rad3, col=phil, pch= 19,)