The soilDB
package provides tools for accessing and
working with soil survey data from the USDA-NRCS. Two key functions,
downloadSSURGO()
and createSSURGO()
,
streamline the process of acquiring and preparing Soil Survey Geographic
database (SSURGO) data into a local SQLite (GeoPackage) database–similar
to the functionality offered by SSURGO
Portal.
This vignette demonstrates how to use these functions to obtain and
view data from Morton and Stanton counties, Kansas
(areasymbol = c("KS129", "KS187")
).
downloadSSURGO()
will download the official SSURGO data
from Web Soil Survey for the specified areasymbol
and
return the path to the ZIP archives.
library(soilDB)
gpkg_dir <- tempdir()
AREASYMBOLS <- c("KS129", "KS187")
ssurgo_zip <- downloadSSURGO(
areasymbol = AREASYMBOLS,
destdir = gpkg_dir
)
Here we specify just two soil survey areas of interest using the “area symbol.” Generally, this is a 2-letter state code and a 3-digit county FIPS code or other identifier.
There are more options to specify the area of interest. Any SQL WHERE
clause that targets the sacatalog
(Soil Survey Area
Catalog) table can be used.
For example, to do all soil survey areas in Kansas, instead of
areasymbol
we could set
WHERE = "areasymbol LIKE 'KS%'"
. Alternately,
WHERE
can be any R spatial object, which is passed to
SDA_spatialQuery()
to determine area symbols of
interest.
We specify destdir
as the destination directory to
download area-specific ZIP files.
If unspecified, exdir,
the directory the ZIP files get
extracted to, is the same as destdir
. If you want to keep
the ZIP files and extracted data, set destdir
. If you only
want the extracted data, set exdir
. If you want both to be
kept only temporarily in order to create a new database, leave
destdir
as the default (tempdir()
).
The createSSURGO()
function uses the sf
and
RSQLite
packages internally to build a SQLite database. A
suggested SQLite-based format to use is GeoPackage (".gpkg"
file), as it is an open format with a well-defined standard.
# Create a local GeoPackage from the downloaded ZIP
gpkg_path <- file.path(gpkg_dir, "ssurgo.gpkg")
createSSURGO(
gpkg_path,
exdir = gpkg_dir
)
Here we pass exdir
so createSSURGO()
knows
where to look for the data that downloadSSURGO()
extracted
from ZIP files. If we supply con
argument instead of
filename
we can connect to arbitrary DBIConnection
objects, which could include those for other database connection types
such as DuckDB or PostgresSQL.
The resulting .gpkg
file is a spatially-enabled SQLite
database that can be used in GIS software or queried directly in R.
Once the GeoPackage is created, you can connect to it using
DBI
and explore its contents. The database follows the
SSURGO schema, which includes tables like mapunit
,
component
, chorizon
, and spatial layers such
as the spatial map unit polygon layer, mupolygon
.
library(DBI)
library(RSQLite)
# Connect to the GeoPackage
con <- dbConnect(SQLite(), gpkg_path)
# List available tables
dbListTables(con)
You can inspect the structure of a specific table, such as
mapunit
, which contains general information about each map
unit.
You can write arbitrary queries to run against the SQLite connection to the local database.
For example, a query to look at the first 5
rows of the
mapunit
table:
You can join tables to explore relationships between map units and their components:
To work with spatial data, we will use the sf
package.
Here we demonstrate how to read the mupolygon
layer,
which contains the polygon geometries of delineations of specific map
unit concepts:
library(sf)
# Read spatial map units
spatial_mu <- st_read(gpkg_path, layer = "mupolygon")
spatial_mu
# Plot the map units
plot(st_geometry(spatial_mu))
This spatial layer can be used for mapping or spatial joins with
other geospatial data sets. The mukey
column is the unique
map unit identifier. Any SSURGO data that can be queried or aggregated
to be 1:1 with mukey
can be used in thematic mapping.
You can visualize soil properties by joining tabular data with the
mupolygon
layer and plotting with base R graphics.
Here we look at the first few rows of mapunit
and
component
, two critical tables in the SSURGO schema.
comppct_r
)As a simple example of how map unit aggregation works, we will
“manually” calculate a thematic variable using the
component
table and base R aggregate()
.
# Calculate dominant component per map unit
dominant_comp <- aggregate(
comppct_r ~ mukey,
data = component,
max
)
head(dominant_comp)
# Match dominant component value for each mukey with mukey of spatial data
spatial_mu$domcomppct_r <- dominant_comp$comppct_r[match(spatial_mu$mukey, dominant_comp$mukey)]
Here we see map units symbolized by the dominant component percentage. Numbers closer to 100% are more “pure” concepts.
# Visualize
plot(
spatial_mu["domcomppct_r"],
main = "Dominant Component Percentage (domcomppct_r)",
breaks = seq(0, 100, 10),
key.pos = 4,
border = NA,
pal = hcl.colors(10)
)
Note that in the northern part of the map the map units have high dominant component percentages, generally greater than 80%. As we look to the southern part of the map we see that the dominant component percentages are lower, indicating multiple major components per map unit.
hydgrp
) and Drainage Class
(drainagecl
)soilDB
provides many functions that can perform
aggregations of map unit component data. We will use the function
get_SDA_property()
to calculate the map unit dominant
condition values for several component-level hydrologic properties: the
Hydrologic Group and the Drainage Class.
In this case we want to get everything in our local database, so we
use the WHERE
clause "1=1"
which is true for
all map units in the database.
component_properties <- c("hydgrp", "drainagecl")
# Get most common hydgrp and drainagecl per mukey
hyd_tab <- get_SDA_property(
component_properties,
method = "dominant condition",
dsn = gpkg_path,
WHERE = "1=1"
)
head(hyd_tab)
soilDB
also includes a function,
NASISChoiceList()
, that helps with converting categorical
attributes to factors with standard labels. The function also handles
where those labels have a defined natural order.
We pass a data.frame (or named list) of properties we want to convert to factor. The column or list element names are used to identify the domain of interest, which stores the different factor levels, their order, and other metadata.
We extract from, and replace into, our hyd_tab
data.frame using [
:
# Convert to factor
hyd_tab[component_properties] <- NASISChoiceList(hyd_tab[component_properties])
# Inspect result
str(hyd_tab)
Above we see that hydgrp
and drainagecl
are
now factors and drainage class is an ordinal variable ranging from
"excessively"
drained at the driest to
"subaqueous"
at the wettest.
Let’s merge this processed tabular data into our sf
data.frame object using merge.
Both spatial_mu
and hyd_tab
have
"mukey"
as a column in common, along with
"areasymbol"
, "musym"
, and
"muname"
.
By default merge()
will work on all columns that match
exactly. If they differ or you only want to join on one specific column
you specify by
. by.x
, or
by.y
.
You may also need to specify sort=FALSE
and
incomparables=NA
for certain applications where the
original order matters or some keys contain NA
.
Now that we have the spatial and tabular data joined, we will make some thematic maps.
First, we plot map unit dominant condition Hydrologic Group.
plot(
spatial_mu["hydgrp"],
main = "Hydrologic Group (hydgrp)",
key.pos = 4,
border = NA,
pal = rev(hcl.colors(7))
)
Next, we see map unit dominant condition Drainage Class.
plot(
spatial_mu["drainagecl"],
main = "Drainage Class (drainagecl)",
border = NA,
key.pos = 4,
pal = rev(hcl.colors(8))
)
The previous examples show how to visualize component-level
properties using base R, soilDB
, DBI
,
RSQLite
, and sf
.
The aggregation of dominant component to map unit level values is one
of the simplest, and most common, transformations of the SSURGO data for
creating maps. Dominant condition aggregation method, combining the
components with the same classes in the property of interest, is
extremely useful for any categorical data element, such as
hydgrp
and drainagecl
shown in this demo.
You can adapt the aggregation logic depending on your needs (e.g.,
max, mean, or most frequent value). Many soilDB
functions
include default aggregations, and you can write custom queries of your
own using SDA_query()
or DBI dbGetQuery()
.
This section describes some existing options in the
soilDB
package for querying and aggregating data for
thematic maps.
In the future this section will be expanded.
All of the get_SDA_*()
“SSURGO On Demand” functions can
be applied to local copies of the SSURGO database by passing the
dsn
argument (either a path to a SQLite database or a
DBIConnection object).
get_SDA_hydric()
get_SDA_interpretation()
get_SDA_muaggatt()
get_SDA_pmgroupname()
get_SDA_property()
get_SDA_coecoclass()
get_SDA_cosurfmorph()
For users who prefer a Python-based solution, the SSURGOPortal R package wraps the official SSURGO Portal Python code for use in R.