Title: | Exploratory and Person/Item Misfit Diagnostics for Polytomous Data |
Version: | 1.1.1 |
Description: | Analysis of items and persons in data. To identify and remove person misfit in polytomous item-response data using either 'mokken' or a graded response model (GRM, via 'mirt'). Provides automatic thresholds, visual diagnostics (2D/3D), and export utilities. Methods build on Mokken scaling as in Mokken (1971, ISBN:9789027968821) and on the graded response model of Samejima (1969) <doi:10.1007/BF03372160>. |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
URL: | https://github.com/hsnbulut/epmfd |
BugReports: | https://github.com/hsnbulut/epmfd/issues |
Imports: | dplyr, fs, ggplot2, mirt, mokken, PerFit, readr, rlang, tibble |
Suggests: | plotly, ggrepel, openxlsx, haven, patchwork, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Depends: | R (≥ 4.1.0) |
LazyData: | true |
NeedsCompilation: | no |
Packaged: | 2025-10-15 21:10:27 UTC; hsn |
Author: | Hasan Bulut |
Maintainer: | Hasan Bulut <hasan.bulut@omu.edu.tr> |
Repository: | CRAN |
Date/Publication: | 2025-10-20 19:40:08 UTC |
Remove misfitting persons from an epmfd_misfit object
Description
clean_epmfd()
removes individuals flagged as misfitting according to
a chosen decision rule and returns a cleaned dataset that can be passed
directly to scale_epmfd()
.
Usage
clean_epmfd(misfit, criterion = c("union", "intersection"), clean_item = FALSE)
Arguments
misfit |
An |
criterion |
Character string, either |
clean_item |
is a logical argument. If clean_item=TRUE, then the function can clean items. The defaul value is FALSE. |
Details
The function uses logical misfit indicators stored in misfit$table
,
including:
-
misfit_any
:TRUE
if at least one statistic flagged the person. Statistic-specific columns (e.g.,
Gnp
,U3p
,lpz
) indicating per-statistic misfit decisions.
The set of statistics actually considered is taken from misfit$stats
.
Under the "intersection"
rule, a person is removed only if all of those
statistics are TRUE
. Internally, rowSums(..., na.rm = TRUE)
is
used so that NA
values do not force removal (i.e., NA
behaves
as “not flagged” in the intersection count).
Only items listed in misfit$scaled$kept
are retained in the output.
Person identifiers from the original raw object are preserved for the kept rows.
Value
An epmfd_clean
list with:
-
raw
: Anepmfd_raw
object containing only the retained persons and items, directly usable inscale_epmfd()
. -
clean_data
: The cleaned raw data frame (persons × kept items). -
n_removed
: Number of persons removed. -
criterion
: The applied decision rule. -
misfit
: The originalepmfd_misfit
object (as provided).
Criterion
-
"union"
(default): A person is removed if at least one statistic (e.g., Gnp, U3p, lpz) flags them as misfitting. This is stricter. -
"intersection"
: A person is removed only if all statistics flag them as misfitting. This is more lenient.
See Also
Examples
library(epmfd)
data<-load_epmfd(sampledata)
scaling_data<-scale_epmfd(data)
misfit_result<-misfit_epmfd(scaling_data)
clean_data<-clean_epmfd(misfit_result)
head(clean_data$clean_data)
dim(data$data) # the dimension of raw data
dim(clean_data$clean_data) # the dimension of clean data
Export epmfd objects to disk
Description
export_epmfd()
writes commonly used tables from epmfd_*
objects to
CSV / Excel / SPSS files, and (optionally) saves the object itself as an RDS.
Usage
export_epmfd(
object,
dir = NULL,
prefix = NULL,
format = c("csv", "xlsx", "sav"),
save_rds = FALSE,
include_misfit = FALSE
)
Arguments
object |
One of: |
dir |
Target directory. If |
prefix |
File name prefix (without extension). If |
format |
Output format; one of
|
save_rds |
Logical; if |
include_misfit |
Logical; if |
Details
What is produced depends on the object class:
-
epmfd_clean
: cleaned person-by-item data (clean
); ifinclude_misfit = TRUE
and a misfit object is attached, alsomisfit
. -
epmfd_misfit
: ifinclude_misfit = TRUE
,misfit
. -
epmfd_scaled
: item status summary (scale
).
When format = "sav"
, logical columns are converted to labelled factors
(FALSE
/TRUE
) for SPSS compatibility. Writing .sav
does not support list
columns; the function aborts if such columns are present.
Value
If dir
is NULL
, a named list containing the tables that would be
written (e.g., clean
, misfit
, scale
). If dir
is non-NULL
, (invisibly)
a character vector of file paths that were written.
File naming (when dir
is provided)
Files are named <prefix>_<name>.<format>
under dir
. For example:
study1_clean.csv
, study1_misfit.xlsx
, or study1_scale.sav
.
See Also
saveRDS()
, readr, openxlsx, haven
Examples
# Minimal toy objects created inside the example ----
set.seed(1)
toy_clean <- data.frame(
I1 = sample(0:1, 6, TRUE),
I2 = sample(0:1, 6, TRUE)
)
toy_misfit <- data.frame(
person = 1:6, Gpn = runif(6), U3p = runif(6)
)
clean_obj <- structure(
list(clean_data = toy_clean,
misfit = list(table = toy_misfit)),
class = "epmfd_clean"
)
misfit_obj <- structure(
list(table = toy_misfit, method = "mokken"),
class = "epmfd_misfit"
)
scaled_obj <- structure(
list(kept = c("I1", "I2"), removed = character()),
class = "epmfd_scaled"
)
# 1) No writing: return list
lst <- export_epmfd(clean_obj, dir = NULL, include_misfit = TRUE)
str(lst)
# 2) Write to a temporary directory (CRAN policy)
tmpdir <- tempdir()
export_epmfd(clean_obj, dir = tmpdir, prefix = "study1", format = "csv",
save_rds = TRUE)
# Optional formats guarded by Suggests (run only if installed)
if (requireNamespace("haven", quietly = TRUE)) {
export_epmfd(misfit_obj, dir = tmpdir, format = "sav",
include_misfit = TRUE)
}
if (requireNamespace("openxlsx", quietly = TRUE)) {
export_epmfd(scaled_obj, dir = tmpdir, prefix = "scaleA",
format = "xlsx")
}
Load and validate raw data for the epmfd workflow
Description
load_epmfd()
prepares raw item-response data for subsequent
functions in the epmfd workflow. It validates input, ensures that all
item responses fall within the expected range of categories, converts
items to ordered factors, and attaches person IDs.
Usage
load_epmfd(data, id_col = NULL, likert_levels = NULL)
Arguments
data |
A data.frame or tibble with persons in rows and items in columns.
All item responses must be integers in |
id_col |
Optional |
likert_levels |
Optional |
Details
Each column of data
is validated to ensure responses are within
1:K
. Values outside this range cause an error. Missing values
are allowed and reported.
Value
An object of class epmfd_raw
, a list with elements:
-
data
: A data.frame of ordered-factor responses -
id
: Vector of person IDs -
K
: Maximum number of categories per item
See Also
Examples
# Example: 5 persons × 3 items, responses 1–4
df <- data.frame(
Pid = paste0("P", 1:5),
Item1 = c(1, 2, 3, 2, 1),
Item2 = c(2, 3, 4, 2, 1),
Item3 = c(3, 4, 1, 2, 2)
)
raw <- load_epmfd(df, id_col = "Pid", likert_levels = 4)
str(raw)
Compute person-fit statistics (polytomous data)
Description
misfit_epmfd()
computes selected person-fit statistics for polytomous
responses and returns an epmfd_misfit
object with scores, thresholds,
and logical flags per person.
Usage
misfit_epmfd(object, stats = c("auto", "lpz", "Gnp", "U3p"), cut.off = "auto")
Arguments
object |
An |
stats |
Character vector choosing which statistics to compute.
Allowed values:
|
cut.off |
Cut-off for |
Details
Auto vs manual decision for misfit_final
:
If
stats
contains"auto"
:for
"mokken"
:misfit_final = Gnp & U3p
for
"mirt"
:misfit_final = (lpz & Gnp & U3p)
if any such rows exist, otherwise fallback tolpz
only.
If
stats
is manual (no"auto"
):misfit_final
is the AND over the selected statistics (if only one selected, it is used directly).
Polytomous PerFit statistics assume a common design K (number of
categories) across items. This function uses object$raw$K
as the global
design K and maps item responses to 0..K-1 without compressing per-item
gaps (unused categories are allowed and do not trigger an error).
Value
An epmfd_misfit
list with:
-
scaled
: the inputepmfd_scaled
object -
method
: detected method ("mirt"
or"mokken"
) -
stats
: actually computed statistics (subset ofc("lpz","Gnp","U3p")
) -
thresholds
: named list of lists withvalue
andtail
-
scores
: named list of numeric score vectors per statistic -
table
: a tibble withid
, one logical column per statistic,misfit_any
(OR over selected stats), andmisfit_final
(see Details)
Examples
library(epmfd)
data<-load_epmfd(sampledata)
scaling_data<-scale_epmfd(data)
misfit_result<-misfit_epmfd(scaling_data)
misfit_result
plot_misfit(misfit_result,threeD=TRUE)
Plot methods for epmfd objects
Description
Quick visual summaries for three object classes:
-
epmfd_scaled: Item-level retention summary (Kept vs Removed) and a quality-statistic histogram (either discrimination
a
for mirt or scalabilityH_i
for mokken). -
epmfd_misfit: Bar plot of misfit counts per statistic and a global bar summarizing overall misfit ratio.
-
epmfd_clean: Bar plot comparing remaining vs removed persons.
Usage
## S3 method for class 'epmfd_scaled'
plot(x, ...)
## S3 method for class 'epmfd_misfit'
plot(x, ...)
## S3 method for class 'epmfd_clean'
plot(x, ...)
Arguments
x |
An |
... |
Additional aesthetics or layers forwarded to the underlying
ggplot2 geoms (e.g., |
Details
If the patchwork package is installed, paired plots are stacked
vertically and returned as a single patchwork object; otherwise a
list of two ggplot2
objects is returned.
Value
A single ggplot2
object, a patchwork
object (if
available), or a list of ggplot2
objects—depending on the
class and whether combined layout is possible.
Dependencies
These methods use ggplot2. For epmfd_scaled
objects fitted with
mirt, the method accesses model coefficients via mirt if that
package is installed (it is not required for mokken). Stacking
multiple plots uses patchwork when available.
See Also
plot_misfit
for 2D/3D scatter visualizations of
person-level misfit, and misfit_epmfd
/ clean_epmfd
for producing the inputs to these plots.
Examples
# Scaled object
p_scaled <- plot(scaled_obj) # item retention + quality histogram
# Misfit object
p_mf <- plot(misfit_obj) # per-statistic counts + overall ratio
# Cleaned object
p_cl <- plot(clean_obj) # remaining vs removed persons
# Add ggplot2 options through '...'
plot(misfit_obj, alpha = 0.8)
Plot person misfit in 2D/3D using stored thresholds
Description
plot_misfit()
visualizes person-level misfit statistics stored in an
epmfd_misfit
object. It supports:
-
2D: scatter plots for all pairwise combinations of the selected statistics (or a single 2D plot if exactly two are given). Points are coloured by the joint exceedance logic (see
any
). Axis titles are the statistic names; the main title shows the cut-offs in parentheses, and dashed lines mark the cut-offs per axis. -
3D: an interactive scatter using plotly if three statistics are supplied; optionally adds three semi-transparent planes at the x/y/z cut-offs when
planes = TRUE
.
Usage
plot_misfit(
object,
stats = NULL,
threeD = FALSE,
any = FALSE,
planes = TRUE,
label_ids = FALSE,
...
)
Arguments
object |
An |
stats |
Character vector of length 2 or 3 naming statistics found in
|
threeD |
Logical. If |
any |
Logical. Colouring rule:
|
planes |
Logical (3D only). If |
label_ids |
Logical. If |
... |
Additional aesthetics passed to
|
Details
Cut-off logic. For each selected statistic, a person is deemed to
exceed if its score is greater than the cut-off for upper-tailed statistics
or less than the cut-off for lower-tailed statistics. In 2D, dashed vertical
and horizontal lines indicate the cut-offs; the plot title shows
"Y (cutY) vs X (cutX)"
with formatted values. In 3D, axis titles
include the cut-off values in parentheses, and (optionally) three grey planes
make the cut-offs explicit.
Returned value. With two statistics, a single ggplot
is returned;
with three statistics and threeD = FALSE
, a named list of ggplot
s is
returned for all 2D pairs. With threeD = TRUE
and three statistics, a
plotly
object is returned.
Dependencies. This function uses ggplot2 for 2D plots and,
for 3D, plotly (required only when threeD = TRUE
). Optional labels in
2D use ggrepel when installed.
Value
A ggplot
object (2D), a named list of ggplot
s (all 2D pairs), or a
plotly
object (3D), depending on stats
and threeD
.
See Also
misfit_epmfd()
for computing statistics and thresholds;
clean_epmfd()
for removing misfitting persons.
Examples
# Suppose 'mf' is an epmfd_misfit with scores Gnp, U3p, lpz
# 2D: single plot
plot_misfit(mf, stats = c("Gnp","U3p"), any = TRUE)
# 2D: all pairwise plots
plot_misfit(mf, stats = c("Gnp","U3p","lpz"))
# 3D: with cut-off planes
plot_misfit(mf, stats = c("Gnp","U3p","lpz"), threeD = TRUE, planes = TRUE)
# 3D: points only (no planes)
plot_misfit(mf, stats = c("Gnp","U3p","lpz"), threeD = TRUE, planes = FALSE)
Print Method for epmfd_misfit Objects
Description
Prints summary information for an epmfd_misfit
object.
Usage
## S3 method for class 'epmfd_misfit'
print(x, ...)
Arguments
x |
An object of class |
... |
Further arguments passed to or from other methods. |
Value
The input object x
, returned (invisibly) after printing.
Example Polytomous Response Data
Description
A small toy dataset included in the epmfd package, containing polytomous item responses from simulated persons.
Usage
sampledata
Format
A data frame with 20 persons (rows) and 6 items (columns). Each item takes ordered values 1–5.
Examples
data(sampledata)
head(sampledata)
Scale polytomous item responses
Description
scale_epmfd()
fits either a parametric graded response model (GRM, via
mirt) or a nonparametric Mokken model (via mokken) to
polytomous item-response data and filters out weak items based on
user-specified thresholds.
Usage
scale_epmfd(
object,
method = c("auto", "mirt", "mokken"),
a_thr = 0.5,
H_thr = 0.3
)
Arguments
object |
An |
method |
Scaling method. One of:
|
a_thr |
Numeric. Threshold for item discrimination parameter |
H_thr |
Numeric. Threshold for item scalability coefficient |
Details
The function converts ordered factors to numeric before analysis.
For GRM (
mirt
), items are filtered by their discrimination parametera
. The overall model fit is attempted usingmirt::M2()
; if this fails (e.g., due to insufficient df), a warning is issued andmodel_fit = NULL
.For Mokken, item scalability coefficients
H_i
are computed and compared toH_thr
.
Value
An object of class epmfd_scaled
, a list containing:
-
raw
: the originalepmfd_raw
object -
method
: scaling method actually used ("mirt"
or"mokken"
) -
kept
: names of items retained -
removed
: names of items removed -
model
: fitted GRM model (for"mirt"
), elseNULL
-
ai
: item parameter estimates (for"mirt"
) -
a_thr
: discrimination threshold used (for"mirt"
) -
model_fit
: results ofmirt::M2()
(if available) -
Hi
: vector of item scalability coefficients (for"mokken"
) -
H_thr
: scalability threshold used (for"mokken"
) -
items
: the vector containing all items names.
See Also
load_epmfd()
, misfit_epmfd()
, plot.epmfd_scaled()
Examples
library(epmfd)
data<-load_epmfd(sampledata)
scale_epmfd(data)
Summary method for epmfd_clean
objects
Description
Summary method for epmfd_clean
objects
Usage
## S3 method for class 'epmfd_clean'
summary(object, ...)
Arguments
object |
An object of class |
... |
Further arguments (ignored). |
Value
Invisibly returns a named list with summary numbers.