Global warming national opinion data

There are a lot of great ways to use the R packages we are learning about from DataCamp.

But first -a couple of other options for importing from a URL if your data is zipped or not in csv-

Zipped files—downloads to temp file, unzips csv

#uses read_csv from readr package loaded with tidyverse
temp = tempfile()
download.file("https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/annual_all_2011.zip",temp)
monitr = read_csv(unz(temp, 'annual_all_2011.csv' )) #unz requires name of file to be opened
unlink(temp)
remove(temp)

Excel from a URL

I kept running into issues using different packages to read in excel files (readxl, xls, and  xlxs ) but they would throw an error because they don’t really like the URL path- sometimes just dropping the ‘s’ from ‘https’ will allow the download to go through-  (the gdata commad read.xls works but you have to first download and activate perl on your computer)

Downloading the xls/xlsx to a temp file was the best solution I could find  :

url = https://www.epa.gov/sites/production/files/2016-12/2015_o3_naaqs_preliminary_transport_assessment_design_values_contributions.xlsx
library(readxl)
library(httr) #when downoading from URL save as temp file- then appends file extension so that read_excel recognizes the file.
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
noda = read_excel(tf, sheet = 2) # uses library(readxl) not called by tidyverse-
remove(tf); remove(url) #clean up

I found a really interesting dataset that broke down the public’s belief in Global Warming  (it is mapped at http://climatecommunication.yale.edu/visualizations-data/ycom-us-2016/ ) the paper with backing statistical breakdown as to how the result were computed can be downloaded in full from the TU Library.

I combined the global warming data with the provided election data (party % voting) ran a simple linear regression and found nothing surprising -the more Democratic Party leaning a county went the more likely these counties were to agree that global warming was occurring, that there was a scientific consensus, and that global warming could harm the US.

Here is a scatterplot matrix of the data, the bottom two rows and rightmost two columns which contain the dem% and gop% data. You can see the Dem data roughly following the ‘agree about global warming’ data and the GOP data does the opposite. The scatterplot matrix is a good way to take a quick first look at your data. Scatterplot matrix is called by with the pairs() function that is loaded in the standard packages that open when you start up R.

SCATTERPLOT MATRIX

Not to be deterred by the weight of evidence I kept working with the data, here is an example of the output from the ggplot2 tutorial using the Climate Change/Election data –(County Victor is a new column created using an if/else command that returned Dem or GOP depending on which was the larger value.)

tblAnalysisCC$County_victor = 
ifelse(tblAnalysisCC$dem_pct_2016 > tblAnalysisCC$gop_pct_2016, "Dem", "GOP")

Here is a group of choropleths comparing the election data to the global warming opinion data (using the choropleth lab as a framework).

CHOROPLETH

QTM EXAMPLE (tmap)

The tmap package demonstrated in the Geospatial DataCamp course has this amazing function called qtm() which creates a choropleth in a single line. It is great for getting a rough idea of what your data will look like – if you like the results of the qtm() call you can then expand the map by recalling it with tm_shape() + tm_*** similar to the methodology of the ggplot2 package.

Assuming that you have a spatial dataframe the choropleth below can be made with the following line of code: [Where shp = spatial dataframe and the fill = is set to the variable to map)

qtm(shp = usmapjoin, fill = "x65_happening")

By wrapping the previous code in save_tmap() the package makes a web map, which is pretty cool.

save_tmap(  
qtm(shp = usmapjoin, fill = "x65_happening"),
filename = "blog.html")
I can't embed the webmap but here is a link:
http://42arthurdent.github.io/blog.html

 

postscript-while looking into the global warming opinion data I was reminded of this gem:

https://xkcd.com/1732/

but that then lead me to this horrible revelation that this R package exists:

https://cran.r-project.org/web/packages/xkcd/index.html

Which is how this was created (it uses ggplot2 so it is kind of relevant):

 

Single Ancestry vs. GOP

hi all,

so i found a decent source for a wide range of county level data up to 2010 through census. gov found here https://www.census.gov/support/USACdataDownloads.html.

The one major issue is that all the files are in xls format. I’ve played around with a couple of packages such as (xlsx), (XLConnect) and (readxl), that all claim to be able to read xls and xlsx documents from the web. I haven’t had any luck with any of them.

Examples that should work:

urlancestry= "http://www2.census.gov/prod2/statcomp/usac/excel/ANC01.xls"
 tblancestry = read_excel(urlancestry, sheet = "ANC01B", col_names = "ANC040209D")

tblancestry=read.table("http://www2.census.gov/prod2/statcomp/usac/excel/ANC01.xls", sheet = "ANC01B", col_names = "ANC040209D",header = TRUE)
 tblancestry=readWorksheetFromFile(urlancestry,
 sheet=1,
 startCol = 8,
 endCol = 2)

I still wanted to mess around with some of the data so I downloaded some files and converted them to csv to be read from my hard drive. Eventually, I hope to figure out the whole xls thing, but for now its time to play with some data.

I chose a table that had county level data from 2010 showing those who report and associate themselves with a single ancestry. I thought it might be interesting to compare that with the election data to see if there are any correlations to the election results and populations considering themselves from a single ancestry. A table was also needed that had total population counts so I found one of those too.  To create a dataframe with a factor with which to work I began my code like this:

tblancestry=read_csv("C:/Users/davestrohmeier/Desktop/Geoviz/County Level Data/ANC01B.csv")
 tbltotalpop=read_csv("C:/Users/davestrohmeier/Desktop/Geoviz/County Level Data/ANC01A.csv")

dfancestry= select(tblancestry, County = 1, GEOID= 2, Multi_Ancest= 8)
 tbltotalpop2= select(tbltotalpop, County = 1, Total_pop= 12)
 df_ancestry= left_join(dfancestry,tbltotalpop2, by="County" )
 df_ancestry$Percent = 100*(df_ancestry$Multi_Ancest/df_ancestry$Total_pop)

Next was to add the familiar election data as a table to join to the ancestry dataframe:

urlElection = "https://github.com/tonmcg/County_Level_Election_Results_12-16/raw/master/US_County_Level_Presidential_Results_08-16.csv"
 tblElectionOriginal = read_csv(urlElection)

tblElection = transmute(tblElectionOriginal,
 fips_code,
 dem_pct_2016 = 100 * dem_2016 / total_2016,
 gop_pct_2016 = 100 * gop_2016 / total_2016,
 oth_pct_2016 = 100 * oth_2016 / total_2016)
 tblElection2= select(tblElection,
 GEOID = 1,
 dem_pct_2016,
 gop_pct_2016,
 oth_pct_2016)

Then I needed to join the datasets and begin tidying up the results:

dfElecAncest= left_join(df_ancestry, tblElection2, by= "GEOID")
 dfElecAncestry= na.omit(dfElecAncest)
 dfElecAncestry$SingleAncPct = 100-(dfElecAncestry$Percent)
 dfElecAncestry$MultiAncPct = (dfElecAncestry$Percent)
 dfElecAncestry= dfElecAncestry[-c(5)]
 dfElecAncestry=dfElecAncestry[dfElecAncestry[,9]<=100,]

Then to add a spatial layer with all the US counties and combine if the the ancestry dataframe as well as assign it an appropriate CRS:

spdfNation = counties()
 spdfElecAncestry= geo_join(spdfNation, dfElecAncestry, by= "GEOID",how = "inner" )

dfEpsg = make_EPSG()
 prj4 = dfEpsg[which(dfEpsg$code == "3085"),"prj4"]
 spdfElecAncestry = spTransform(spdfElecAncestry, CRS(prj4))

After a couple of practice plots it was a shame to have to get rid of Hawaii. It was really screwing up the scale of my map:

dfContinental= filter(dfElecAncestry, GEOID != "15001",GEOID != "15003",GEOID != "15007",GEOID != "15009")

spdfContinental= geo_join(spdfNation, dfContinental, by= "GEOID",how = "inner" )

Next, try out a choropleth using the percentages of single ancestry reported:

plot(spdfContinental, bg="lightblue", col=findColours(ciEqual2, colRamp))
 title("Single Ancestry")
 strLegend = paste(
 format(round(ciEqual2$brks[-(intClasses + 1)]), big.mark=","),"%", " - ",
 format(round(ciEqual2$brks[-1]), big.mark=","),"%", sep="")
 legMain = legend(
 "bottomright", legend=strLegend,
 title="Percentage", bg="gray90", inset=0.02, cex=0.6,
 fill=colRamp)

Looks like there is some patterns, but let’s see how it compares to a choropleth of the GOP percentages:

ciEqual3 = classIntervals(spdfContinental$gop_pct_2016, n=intClasses, style="equal")

plot(spdfContinental, bg="lightblue", col=findColours(ciEqual3, colRamp))
 title("GOP Percentage")
 strLegend = paste(
 format(round(ciEqual3$brks[-(intClasses + 1)]), big.mark=","),"%", " - ",
 format(round(ciEqual3$brks[-1]), big.mark=","),"%", sep="")
 legMain = legend(
 "bottomright", legend=strLegend,
 title="Percentage", bg="gray90", inset=0.02, cex=0.6,
 fill=colRamp)

Hmmmm, there are some areas that compare, but they indeed look like two different maps. Lets try a scatter plot:

ggplot(dfContinental,aes(dem_pct_2016, gop_pct_2016, col=SingleAncPct))+
 geom_point(pch=8)
dfContinental= mutate(dfContinental, Vote = ifelse(gop_pct_2016 > dem_pct_2016, "GOP", "DEM"))
ggplot(dfContinental, aes(SingleAncPct, gop_pct_2016, col=Vote))+
 geom_jitter(pch=8)

Well, single ancestry seems to be fairly evenly distributed across the range of dem votes and gop votes. Perhaps, there is no real correlation here, so lets just have some fun before looking for a new dataset:

coords=coordinates(spdfContinental)
 rad3=sqrt(dfContinental$SingleAncPct/3.14)/1.5
 phil=adjustcolor((findColours(ciEqual2, colRamp)),0.2)

plot(spdfContinental, bg="black", col=findColours(ciEqual3, colRamp))
 points(coords,cex=rad3, col=phil, pch= 19,)

 

 

Data Download to R, Cleaning, and a Preliminary Map!

Hi All! I wanted to share the data that I downloaded to R, how I cleaned it and a quick map that I made based on the choropleth lab to test everything. A quick overview though, I am using a dataset from the EPA that discusses the amounts of specific chemicals in the air by county in the US. In addition to this, there is also a significant amount of population data. I felt that this really made it a well rounded set to use. As a note, the state map portion came from the ACS data that was imported prior in class.

Download directly to R:


urlEQuality = "https://edg.epa.gov/data/Public/ORD/NHEERL/EQI/Eqidata_all_domains_2014March11.csv" cEQualityColClasses = c("stfips"="character") dfEQuality = read.csv(urlEQuality, colClasses=cEQualityColClasses) row.names(dfEQuality) = dfEQuality$stfips

Cut down columns (there were well over 200 when it was originally downloaded with all kinds of chemicals and contaminants that I didn’t know well enough to use):


dfEQuality pct_no_eng, med_hh_inc, pct_vac_units, pct_rent_occ, a_pb_ln, a_so2_mean_ln, a_pm10_mean_ln, a_pm25_mean, a_no2_mean_ln, a_o3_mean_ln, a_co_mean_ln, radon_zone, herbicides_ln, fungicides_ln, insecticides_ln)

Rename columns to something less annoying and faster to type:


dfEQuality pm25_ln=a_pm25_mean, no2_ln=a_no2_mean_ln, o3_ln=a_o3_mean_ln, co_ln=a_co_mean_ln)

At this point I was working on taking the data out of the ln form but decided to wait for the time being so I didn’t have a series of extremely small numbers. But if I were to do that at some point the base code would be:


select(dfEQuality, exp(a_pb_ln), exp(a_so2_ln),...)

Create classes that can be used to map some preliminary images to make sure that the data uploaded correctly:


o3 = dfEQuality$o3_ln unemp = dfEQuality$pct_unemp co = dfEQuality$co_ln

Create choropleth maps (because this is from the choropleth lab I won’t get to in-depth with the steps):


dfEpsg = make_EPSG() prj4 = dfEpsg[which(dfEpsg$code == 2260),"prj4"] spdfCounty = spTransform(spdfCounty, CRS(prj4)) intClasses = 6 ciFisher = classIntervals(o3, n=intClasses, style="fisher") ciEqual = classIntervals(o3, n=intClasses, style="equal") ciQuantile = classIntervals(o3, n=intClasses, style="quantile") colRamp = brewer.pal(intClasses, "YlGnBu") cbind(unemp, findInterval(o3, ciFisher$brks), findInterval(o3, round(ciFisher$brks, -3)), findColours(ciFisher, colRamp)) options(scipen=10) plot(ciFisher, colRamp, main="Fisher-Jenks Classification") plot(ciEqual, colRamp, main="Equal Interval Classification") plot(ciQuantile, colRamp, main="Quantile Classification") plot(spdfCounty, bg="white", col=findColours(ciFisher, colRamp)) title("Amount of Ozone Present During Air Quality Assessments") strLegend = paste( "$", format(round(ciFisher$brks[-(intClasses + 1)]), big.mark=","), " - ", "$", format(round(ciFisher$brks[-1]), big.mark=","), sep="" ) legMain = legend( "topleft", legend=strLegend, title="Ozone (O3), 2014", bg="white", inset=0, cex=0.5, fill=colRamp )

 

 

 

 

downloading csv from website with tidyverse

Apparently, I can’t upload an R file. Lee requested that I upload a file showing that I can download a csv directly from a website.

I found an environmental protection csv. This is the code I used to download and then do basic exploration through the data.

library(tidyverse)

EP = read_csv("https://edg.epa.gov/data/Public/ORD/NHEERL/EQI/Eqidata_all_domains_2014March11.csv")
str(EP)
class(EP)

DataCamp Access

DataCamp generously provides access to their premium courses to educators, and has done so for our use in Geovisualization. You will receive an email with a link to create a DataCamp account (or link your existing account to my course).

As I have discovered this mid-semester, I still have to decide how we are going to use this, but I will definitely be requiring at least the following two DataCamp courses be completed:

The following courses also look relevant to Geovisualization. Whether or not they are assigned, you should consider looking at them:

There may be others, but I am trying to be judicious.