The {domir}
package supports the determination of the
relative importance of the inputs (i.e., independent variables,
predictors, or features) in a user’s statistical or machine learning
model. The methodology used by {domir}
is called
Dominance Analysis which is based on a series of pairwise
comparisons between the model fit values ascribed to elements in the
model including comparing Shapley
values.
The intention of this package is to provide a flexible user interface
to dominance analysis—a relatively assumption-free methodology for
comparing the value of model inputs to prediction. The user interface is
structured such that {domir}
automates the decomposition of
the returned value and comparisons between model inputs and the user
provides the model inputs, the predictive model into which they are
entered, and returned value from the model to decompose.
To install the most recent version of {domir}
from CRAN
use:
install.packages("domir")
{domir}
is also used as the computational engine
underlying the dominance_analysis()
function for the {parameters} package
from the {easystats}
collection.
{domir}
DoesThe primary dominance analysis function domir
implements
the most computationally intensive and programming heavy parts of
dominance analysis for the user and has relatively few requirements on
the predictive modeling functions with which it can work.
The flexibility of domir
comes at the cost of more
complexity for the user in terms of setting up a function that accepts
the type of input domir
will provide (currently only a
‘formula’) and and expects to receive (currently only a numeric
scalar).
Below these ideas are outlined in greater detail in the context of a
few examples. The next section begins the discussion with a more
extensive comparison of domir
with packages that implement
similar methods.
The domir
function implements the same method as the
“lmg” type for the calc.relimpo
function in the
{relaimpo}
package. domir
can replicate the
results produced by both the above package but, as will be seen,
requires more user input.
To illustrate these points, consider the following example linear regression on which all three of the dominance analysis results to come are based:
lm(mpg ~ am + vs + cyl, data = mtcars)
Classic dominance analysis uses the variance explained \(R^2\) as fit statistic (i.e., as
implemented by lm
’s summary
method) and so
will this example.
{domir}
’s domir
Implementing a ‘classic’ dominance analysis on this linear regression
in domir
can be inputted as:
<-
lm_wrapper function(formula, data) {
lm(formula, data = data) |>
summary() |>
"r.squared"]]
_[[
}
domir(mpg ~ am + vs + cyl, lm_wrapper, data = mtcars)
## Overall Value: 0.7619773
##
## General Dominance Values:
## General Dominance Standardized Ranks
## am 0.1774892 0.2329324 3
## vs 0.2027032 0.2660226 2
## cyl 0.3817849 0.5010450 1
##
## Conditional Dominance Values:
## Subset Size: 1 Subset Size: 2 Subset Size: 3
## am 0.3597989 0.1389842 0.033684441
## vs 0.4409477 0.1641982 0.002963748
## cyl 0.7261800 0.3432799 0.075894823
##
## Complete Dominance Designations:
## Dmnated?am Dmnated?vs Dmnated?cyl
## Dmnates?am NA NA FALSE
## Dmnates?vs NA NA FALSE
## Dmnates?cyl TRUE TRUE NA
In domir
, the lm
model is not submitted
directly. Rather, it is wrapped into a function (i.e.,
lm_wrapper
) that, in this case, accepts two arguments;
formula or an R formula and data a data frame in which
the independent variables in the formula are present. The result of the
lm
submitted into the summary
function and the
result is then filtered to just the r.squared element
and returned.
What domir
does automate taking subsets of the
formula and submit them, repeatedly until all possible subsets
have been submitted, to lm_wrapper
(see this vignette
for a conceptual discussion of dominance analysis). In this way,
domir
is a Map
- or lapply
-like
function as it receives an object on which to operate (i.e., the
formula) and a function to which to apply to it.
domir
expects a numeric scalar to be returned from the
function.
Like lapply
, other arguments
(data = mtcars
) can also be passed to each call of the
function and must be explicitly built into the wrapper function.
What is important to note about domir
that differs from
other dominance analysis-oriented functions discussed below is that
domir
expects that the user will supply the analysis
pipeline linking the formula it passes to the numeric scalar
value that it expects. This ‘supply the pipeline’ approach makes
domir
far more flexible than other implementations but does
require the user to think more carefully about how to structure the
pipeline.
Note that the focus of domir
’s print
-ed
results focuses on the numerical results from “General Dominance Values”
and “Conditional Dominance Values” and, a logical matrix of “Complete
Dominance Designations”.
See also the (now superseded) domir::domin
function for
another approach to structuring the input pipeline for dominance
analysis.
{relaimpo}
’s
calc.relimp
with type = "lmg"
{relaimpo}
is not a dominance analysis software but does
produce general dominance value decomposition for linear regression
using the explained variance \(R^2\) in
the calc.relimp
function with the argument
type = "lmg"
.
::calc.relimp(mpg ~ am + vs + cyl, data = mtcars, type = "lmg") relaimpo
## Response variable: mpg
## Total response variance: 36.3241
## Analysis based on 32 observations
##
## 3 Regressors:
## am vs cyl
## Proportion of variance explained by model: 76.2%
## Metrics are not normalized (rela=FALSE).
##
## Relative importance metrics:
##
## lmg
## am 0.1774892
## vs 0.2027032
## cyl 0.3817849
##
## Average coefficients for different model sizes:
##
## 1X 2Xs 3Xs
## am 7.244939 4.316851 3.026480
## vs 7.940476 2.995142 1.294614
## cyl -2.875790 -2.795816 -2.137632
calc.relimp
has a similar to structure to that of
domir
but does not require a pipeline function. This is
because {relaimpo}
is specialized and works only with
lm
models and the variance explained \(R^2\) as a fit statistic.
calc.relimp
also allows for multiple methods of submitting
(i.e., correlation matrices, fitted lm
object, a
data.frame
) given that it always implements the same
model.
calc.relimp
’s printed results provide relative
importance metric values that match those obtained from
domir
(i.e., the general dominance values). In addition,
calc.relimp
reports the average lm
coefficients across numbers of independent variables/\(X\)s in a way similar to the conditional
dominance values reported by domir
—an additional and useful
result to show the impact of inclusion of different numbers of
independent variables on obtained coefficients/predicted values.
Again, note that {relaimpo}
is not dominance
analysis-oriented and does not report on dominance designations or
dominance values other than the general dominance values.
Further examples of domir
s functionality will be
populated on the {domir}
wiki.