Title: | A Collection of Functions to Assist Building DeGAUSS Containers |
---|---|
Description: | degauss helper tools are used to develop and run DeGAUSS containers. |
Authors: | Erika Rasnick [aut, cre], Cole Brokamp [aut] |
Maintainer: | Erika Rasnick <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.3 |
Built: | 2024-11-12 05:27:23 UTC |
Source: | https://github.com/degauss-org/dht |
check if address is a known Cincinnati institutional address
address_is_institutional(address)
address_is_institutional(address)
address |
character vector of address text |
Print the function to view the source and the complete list of addresses considered to be Cincinnati insitutional addresses; Note that addresses in other cities might be erroneously categorized as institutional (e.g., "3333 Burnet Ave Syracuse NY 13206")
logical vector; TRUE when address contains some text indicating Cincinnati Children's Hospital, Ronald McDonald House, Jobs and Family Services for Hamilton and Butler Counties in Ohio, or Stetson Square
check if address text is not actually an address
address_is_nonaddress(address)
address_is_nonaddress(address)
address |
character vector of address text |
logical vector; TRUE when address text is "verify", "foreign", "foreign country", "unknown", or blank.
check if address is a PO Box
address_is_po_box(address)
address_is_po_box(address)
address |
character vector of address text |
logical vector; TRUE when address text contains some variation of "PO Box"
check format of dates from DeGAUSS container input file
check_dates(date, allow_missing = FALSE)
check_dates(date, allow_missing = FALSE)
date |
vector of dates to be checked for formatting |
allow_missing |
logical. defaults to FALSE, resulting in an error if any dates are missing. |
ISO formatted dates (i.e., "%Y-%m-%d" or YYYY-MM-DD) will stay the same U.S. standard slash formatted dates (common to Microsoft Excel; e.g., "%m/%d/%y" or MM/DD/YY, "%m/%d/%Y" or MM/DD/YYYY) will be reformatted to ISO format Any unrecognized input will cause an error and the user will be instructed to reformat their dates.
reformatted vector of dates, or an error if dates could not be reformatted
date <- c("1/1/21", "1/2/21", "1/3/21") check_dates(date)
date <- c("1/1/21", "1/2/21", "1/3/21") check_dates(date)
check that end_date occurs after start_date
check_end_after_start_date(start_date, end_date)
check_end_after_start_date(start_date, end_date)
start_date |
vector of start dates |
end_date |
vector of end dates |
## Not run: start_date <- check_dates(c("1/1/21", "1/2/21", "1/3/21")) end_date <- check_dates(c("1/7/21", "1/8/21", "1/9/20")) check_end_after_start_date(start_date, end_date) ## End(Not run)
## Not run: start_date <- check_dates(c("1/1/21", "1/2/21", "1/3/21")) end_date <- check_dates(c("1/7/21", "1/8/21", "1/9/20")) check_end_after_start_date(start_date, end_date) ## End(Not run)
check for specified columns and corresponding column types
check_for_column(d, column_name, column, column_type = NULL)
check_for_column(d, column_name, column, column_type = NULL)
d |
input dataframe |
column_name |
character string defining name of column to be checked |
column |
character vector to be checked (e.g., d$column_name) |
column_type |
(optional) desired column type to be checked for (e.g., 'character') |
if column_name exists in d and is of the correct column_type, nothing is returned. if column_name does not exist in d, an error is thrown. if column is not of the correct column_type, a warning is shown.
## Not run: d <- tibble::tribble( ~"id", ~"value", "123", 123, "456", 456 ) check_for_column(d, "id", d$id, "double") check_for_column(d, "id2", d$id2, "double") ## End(Not run)
## Not run: d <- tibble::tribble( ~"id", ~"value", "123", 123, "456", 456 ) check_for_column(d, "id", d$id, "double") check_for_column(d, "id2", d$id2, "double") ## End(Not run)
Checks for amount of system RAM and warns a potential DeGAUSS user if it might be too low
check_ram(minimum_ram = 4)
check_ram(minimum_ram = 4)
minimum_ram |
minimum recommended GB of RAM (as numeric value) |
clean address text strings for geocoding
clean_address(address)
clean_address(address)
address |
character vector of address text |
vector of character strings with non-alphanumerics (except dashes, which are left in for +4 ZIP issues) and excess white space removed.
list the DeGAUSS images in the core library
core_lib_images(geocoder = TRUE)
core_lib_images(geocoder = TRUE)
geocoder |
logical; include "geocoder" in core image list? |
names of DeGAUSS images in the core library as a character vector
core_lib_images() core_lib_images(geocoder = FALSE)
core_lib_images() core_lib_images(geocoder = FALSE)
Access the DeGAUSS color palette
degauss_colors(n)
degauss_colors(n)
n |
which DeGAUSS color(s): 1-darkblue, 2-lightblue, 3-pink, 4-lightgrey, 5-purple, 6-teal) |
a named character string or vector of named character strings containing RGB colors in hexadecimal
degauss_colors(2) degauss_colors(1:4) plot(1:6, rep(1, 6), col = degauss_colors(1:6), pch = 19, cex = 10)
degauss_colors(2) degauss_colors(1:4) plot(1:6, rep(1, 6), col = degauss_colors(1:6), pch = 19, cex = 10)
This function uses temporary CSV files and DeGAUSS commands
as system calls to docker
. Because of this approach,
caching of geocoding results or reuse of intermediate downloaded data
files are not possible, unless called from the same R session. See
the examples for a workaround.
degauss_run(.x, image, version = "latest", argument = NA, quiet = FALSE)
degauss_run(.x, image, version = "latest", argument = NA, quiet = FALSE)
.x |
a data.frame or tibble to be input to a DeGAUSS container |
image |
name of DeGAUSS image |
version |
version of DeGAUSS image; will use latest version if not specified |
argument |
optional argument |
quiet |
suppress output from DeGAUSS container? |
.x
with additional returned DeGAUSS columns
## create a memoised version of degauss_run so repetitive calls are cached ## this can be useful during development of DeGAUSS pipelines ## Not run: fc <- memoise::cache_filesystem(fs::path(fs::path_wd(), "data-raw")) degauss_run <- memoise::memoise(degauss_run, omit_args = c("quiet"), cache = fc) ## End(Not run)
## create a memoised version of degauss_run so repetitive calls are cached ## this can be useful during development of DeGAUSS pipelines ## Not run: fc <- memoise::cache_filesystem(fs::path(fs::path_wd(), "data-raw")) degauss_run <- memoise::memoise(degauss_run, omit_args = c("quiet"), cache = fc) ## End(Not run)
expand dates between start_date and end_date
expand_dates(d, by)
expand_dates(d, by)
d |
data.frame or tibble with columns called 'start_date' and 'end_date' |
by |
time interval to expand dates (e.g., 'day', 'week', etc) |
long data.frame or tibble with column called 'date' including all dates between start_date and end_date
## Not run: d <- data.frame( start_date = check_dates(c("1/1/21", "1/2/21", "1/3/21")), end_date = check_dates(c("1/7/21", "1/8/21", "1/9/21")) ) expand_dates(d, by = "day") ## End(Not run)
## Not run: d <- data.frame( start_date = check_dates(c("1/1/21", "1/2/21", "1/3/21")), end_date = check_dates(c("1/7/21", "1/8/21", "1/9/21")) ) expand_dates(d, by = "day") ## End(Not run)
Error if docker cannot be found or if the docker daemon is not running in the background.
find_docker()
find_docker()
path to Docker executable found using Sys.which("docker")
get DeGAUSS metadata on all images in the core library
get_degauss_core_lib_env(...)
get_degauss_core_lib_env(...)
... |
arguments passed to |
data.frame of DeGAUSS metatdata
get_degauss_core_lib_env(geocoder = FALSE)
get_degauss_core_lib_env(geocoder = FALSE)
These functions look in a Dockerfile (locally or online) to extract environment variables corresponding to DeGAUSS image metadata.
get_degauss_env_dockerfile( dockerfile_path = fs::path_join(c(getwd(), "Dockerfile")) ) get_degauss_env_online(name = "fortunes")
get_degauss_env_dockerfile( dockerfile_path = fs::path_join(c(getwd(), "Dockerfile")) ) get_degauss_env_online(name = "fortunes")
dockerfile_path |
path to Dockerfile |
name |
name of DeGAUSS container to download Dockerfile from |
Metadata on DeGAUSS images are defined using environment variables.
Specifically within a Dockerfile, this is defined as
ENV
instructions where the name of the environment variable begins with degauss_
,
for example "degauss_name", or "degauss_version". It is assumed that each ENV
instruction is on its own line and defines only one environment variable.
named vector of DeGAUSS metatdata
## Not run: use_degauss_dockerfile(version = "0.1") get_degauss_env_dockerfile() get_degauss_env_dockerfile()["degauss_version"] ## End(Not run) get_degauss_env_online("fortunes") get_degauss_env_online("fortunes")["degauss_version"]
## Not run: use_degauss_dockerfile(version = "0.1") get_degauss_env_dockerfile() get_degauss_env_dockerfile()["degauss_version"] ## End(Not run) get_degauss_env_online("fortunes") get_degauss_env_online("fortunes")["degauss_version"]
if not supplied as arguments, greeting-specific values
(geomarker_name, version, description) are read in from the environment variables
specified in the Dockerfile and made available when running the container;
these include degauss_name
, degauss_version
, and degauss_description
greeting( geomarker_name = Sys.getenv("degauss_name"), version = Sys.getenv("degauss_version"), description = Sys.getenv("degauss_description") )
greeting( geomarker_name = Sys.getenv("degauss_name"), version = Sys.getenv("degauss_version"), description = Sys.getenv("degauss_description") )
geomarker_name |
name of the geomarker, must be the name used in the degauss.org url |
version |
container version number as a character string |
description |
brief description of the container; finishes the sentence "This container returns..." |
greeting message includes name, version, and brief description of container, as well as a link to more information about the specific geomarker
## Not run: greeting("roads", "0.4", "returns proximity and length of nearby major roadways") ## End(Not run)
## Not run: greeting("roads", "0.4", "returns proximity and length of nearby major roadways") ## End(Not run)
is docker available?
has_docker()
has_docker()
TRUE if find_docker()
succeeds; FALSE otherwise
create a DeGAUSS command
make_degauss_command( input_file = "my_address_file_geocoded.csv", image, version = "latest", argument = NA, docker_cmd = find_docker() )
make_degauss_command( input_file = "my_address_file_geocoded.csv", image, version = "latest", argument = NA, docker_cmd = find_docker() )
input_file |
name of input file |
image |
name of DeGAUSS image |
version |
version of DeGAUSS image |
argument |
optional argument |
docker_cmd |
path to docker executable |
DeGAUSS command as a character string
make_degauss_command(image = "geocoder", version = "3.2.0", docker_cmd = "docker") make_degauss_command(image = "geocoder", version = "3.2.0", argument = "0.4", docker_cmd = "docker") make_degauss_command(image = "geocoder", version = "3.2.0", docker_cmd = "/usr/local/bin/docker")
make_degauss_command(image = "geocoder", version = "3.2.0", docker_cmd = "docker") make_degauss_command(image = "geocoder", version = "3.2.0", argument = "0.4", docker_cmd = "docker") make_degauss_command(image = "geocoder", version = "3.2.0", docker_cmd = "/usr/local/bin/docker")
note that renv will not pickup dependencies loaded using this function
and it is recommended to use something like
withr::with_message_sink("/dev/null", library(dplyr))
instead
qlibrary(...)
qlibrary(...)
... |
arguments passed to base::library() |
## Not run: qlibrary(dplyr) ## End(Not run)
## Not run: qlibrary(dplyr) ## End(Not run)
read in and format input file for DeGAUSS container
read_lat_lon_csv( filename, nest_df = FALSE, sf_out = FALSE, project_to_crs = NULL )
read_lat_lon_csv( filename, nest_df = FALSE, sf_out = FALSE, project_to_crs = NULL )
filename |
name of input file, probably opt$filename if inside container |
nest_df |
logical. If TRUE, data is nested on lat/lon. Defaults to FALSE. |
sf_out |
logical. If TRUE, data is converted as an sf object. Defaults to FALSE. |
project_to_crs |
(optional) if sf_out=TRUE, the crs to which input data is projected. If unspecified and sf_out=TRUE, the crs defaults to 4326. |
a list with two elements. The first is the raw_data as it is read in from the input file. The second is a tibble nested on lat and lon to prevent duplication of geomarker computations. If sf_out=TRUE the second is an sf object.
## Not run: d <- read_lat_lon_csv(filename = "test/my_address_file_geocoded.csv") d <- read_lat_lon_csv( filename = "test/my_address_file_geocoded.csv", sf_out = TRUE, project_to_crs = 5072 ) ## End(Not run)
## Not run: d <- read_lat_lon_csv(filename = "test/my_address_file_geocoded.csv") d <- read_lat_lon_csv( filename = "test/my_address_file_geocoded.csv", sf_out = TRUE, project_to_crs = 5072 ) ## End(Not run)
creates a docker-compose yaml file in current working directory
use_degauss_compose(...)
use_degauss_compose(...)
... |
arguments passed to render_degauss_template (overwrite) |
Creates all the necessary files to create a DeGAUSS container.
The container/geomarker name is assumed to be the basename of the working directory
and the version of R and renv is taken from the calling environment.
This function calls all of the individual dht::use_degauss_*()
functions to create the following:
Dockerfile
Makefile
README.md
entrypoint.R
.dockerignore
test/my_address_file_geocoded.csv
LICENSE
GPL license
.github/workflows/build-deploy-pr.yaml
.github/workflows/build-deploy-release.yaml
use_degauss_container(geomarker = getwd(), version = "0.1.0", ...) use_degauss_dockerfile(geomarker = getwd(), version, ...) use_degauss_makefile(geomarker = getwd(), ...) use_degauss_readme(geomarker = getwd(), version = "0.1.0", ...) use_degauss_githook_readme_rmd(geomarker = getwd(), ...) use_degauss_entrypoint(geomarker = getwd(), version = "0.1.0", ...) use_degauss_dockerignore(geomarker = getwd(), ...) use_degauss_tests(geomarker = getwd(), ...) use_degauss_license(geomarker = getwd(), ...) use_degauss_github_actions(geomarker = getwd(), ...)
use_degauss_container(geomarker = getwd(), version = "0.1.0", ...) use_degauss_dockerfile(geomarker = getwd(), version, ...) use_degauss_makefile(geomarker = getwd(), ...) use_degauss_readme(geomarker = getwd(), version = "0.1.0", ...) use_degauss_githook_readme_rmd(geomarker = getwd(), ...) use_degauss_entrypoint(geomarker = getwd(), version = "0.1.0", ...) use_degauss_dockerignore(geomarker = getwd(), ...) use_degauss_tests(geomarker = getwd(), ...) use_degauss_license(geomarker = getwd(), ...) use_degauss_github_actions(geomarker = getwd(), ...)
geomarker |
path to folder where DeGAUSS container files are to be added; defaults to the current working directory |
version |
string of version number used in freshly created README and entrypoint.R; defaults to "0.1.0" |
... |
arguments passed to render_degauss_template (overwrite) |
write geomarker output to file
write_geomarker_file( d, raw_data = NULL, filename, geomarker_name = Sys.getenv("degauss_name"), version = Sys.getenv("degauss_version"), argument = NULL )
write_geomarker_file( d, raw_data = NULL, filename, geomarker_name = Sys.getenv("degauss_name"), version = Sys.getenv("degauss_version"), argument = NULL )
d |
input nest on .row with added geomarker column(s) |
raw_data |
original unnested input data, defaults to NULL (for use when nest_df = FALSE in read_lat_lon_csv) |
filename |
name of input file, probably opt$filename if inside container |
geomarker_name |
name of the geomarker; defaults to degauss environment variable |
version |
container version number as a character string; defaults to degauss environment
variable |
argument |
optional information to append after the image version number that was specified using a degauss argument; for example, a selected buffer radius, care site, or geocoding threshold |
output file is written to working directory
## Not run: write_geomarker_file(d$d, d$raw_data, filename = "test/my_address_file_geocoded.csv", geomarker = "roads", version = "0.4" ) ## End(Not run)
## Not run: write_geomarker_file(d$d, d$raw_data, filename = "test/my_address_file_geocoded.csv", geomarker = "roads", version = "0.4" ) ## End(Not run)