DeGAUSS is designed to be used at the command line by issuing DeGAUSS commands one at a time to read and write CSV files. Our Sample Workflow and other specific step-by-step guides specify each command, including the input and output filenames, but this approach can be inflexible and working with longer and longer CSV file names can be cumbersome.
degauss_run()
is a method for using DeGAUSS on a data
frame in R instead of on a CSV file on disk. Data is passed to DeGAUSS
via system Docker calls and the resulting data is read back into R.
Since DeGAUSS always works by adding additional columns to the
original input data, degauss_run()
can be used to flexibly
create DeGAUSS data pipelines from within R. This also allows for
cleaner integration of Make
-like workflows for geomarker
assessment at scale.
An example DeGAUSS pipeline in R might look like this:
addresses |>
degauss_run("postal", "0.1.4") |>
dplyr::select(id = id, address = parsed_address) |>
degauss_run("geocoder", "3.2.1") |>
select(id, lat, lon) |>
degauss_run("census_block_group", "0.6.0") |>
degauss_run("greenspace", "0.3.0")
The next section is a detailed example with step-by-step code and output.
Addresses could be imported from a CSV file, database, or other source. For the sake of this example, we’ll use ten addresses randomly sampled from all dwellings in Hamilton County, Ohio:
addresses <-
tibble::tibble(
id = paste0("g_", 1:10),
address = c(
"518 Fortune Ave Cincinnati OH 45219",
"3201 Stanhope Av Apt. 2 Cincinnati, OH 45211",
"3917 Catherine Av Norwood OH 45212",
"9960 Carolina Trace Road Harrison Township OH 45030",
"332 East Sharon Rd Unit #15 Glendale OH 45246",
"10101 Hamilton Cleves Road Crosby Township OH 45030",
"6076 Lagrange Ln Green Township, OH 45239",
"1325 Fuhrman Rd Reading, OH 45215",
"8831 Wellerstation Drive Montgomery OH 45249",
"2916 Willow Ridge Dr Colerain Township OH 45251"
))
Before geocoding, we will use the DeGAUSS postal
image
to create a “normalized and parsed” address column. We keep only our
id
column and the parsed addresses as address
so that we can use it in the next step for geocoding.
Geocode the addresses and keep only the id
, latitude
(lat
), and longitude (lon
) columns, discarding
the addresses.
Attach 2020 census tract and block group identifiers. Here, we
specify argument = "2020"
to ensure the DeGAUSS image uses
the correct vintage of census geographies.
Census tract identifiers can be used to link lots and lots of different types of data. Download the 2018 material deprivation index and join it to our data.
As an example of geomarker assessment that does not rely on census geography, we add the mean EVI (enhanced vegetation index) within three differently sized buffers (radii of 500, 1,500, and 2,500 m) around each location.
Our geomarker assessment process is complete and everything is linked in one data frame, ready for storing, analyzing, and sharing.