dataquieR 2.5.1
- News
- fixed a bug found by latest R-developer-version caused by
parentheses in the wrong position in encapsulated function calls. this
did not cause any harm, but was nevertheless a bug.
- properly deprecated the argument
threshold_value from
acc_varcomp()
loess and margins plot slightly improved
- Amendment to 2.5.0 news
- deprecated (and accidentally removed already) the argument
threshold_value from acc_varcomp()
dataquieR 2.5.0
- New features
- improved support for categorical variables, including:
- time trends w/ and w/o grouping variable
- observer/device effects
- distribution plots
dq_report2() can store results on the disk instead of
the RAM with the new argument storr_factory. This can be
useful in reducing issues of memory consumption, but we suggest to use
fast SSDs or NVMes
- all indicator functions now create result objects with nice print
functions (visible in the Data Viewer instead of the Console window).
However, this also implies, that warnings, errors and messages are
returned as part of the result object and are printed with that object.
If you want to restore the original behavior, use the option
options(dataquieR.dontwrapresults = TRUE). With
options(dataquieR.testdebug = TRUE), you can switch off
this behavior.
dataquieR can provision your function arguments from
the metadata. In order to enable lapply and
Vectorize(SIMPLIFY = FALSE) with indicator functions, the
first argument is now always resp_vars for item level
functions. dataquieR tries to guess if a function that
features both resp_vars and study_data as its
first arguments was called w/o resp_vars but only with
study_data as its first unnamed argument. If that is the
case, it sets resp_vars to the default for
resp_vars (typically all variables). With
options(dataquieR.testdebug = TRUE), you can switch off
this behavior, if you need.
- an improved version of
dq_report_by, in which it is
possible to specify:
- how to split the data in parts (strata and/or variable groups)
- strata and/or variable groups to include/exclude
- how to filter observational units
- a selection of variables to analyze (
resp_vars)
- a variable/s containing ID information for merging data frames
(
id_vars)
- a new function
int_encoding_errors checking invalid
characters present in the text with respect to the expected character
encoding / code page, e.g., a code place in the latin1
table is used but the encoding is utf8 resulting in damaged
text output
- a new dashboard in the General menu, in
Item-level data quality dashboard, usable to customize data
summaries
- new selection buttons are now present in the report to select
visible columns in the displayed tables (it also applied to the export
buttons)
- support for a sheet
CODE_LIST_TABLE in the metadata,
where it is possible to state both value label tables and missing list
tables all in one table.
- support for a sheet
item_computation_level in the
metadata, where it is possible to state variables to be computed from
the provided study data.
- Breaking changes
- moved example data from the package to our website. If you are
already using
prep_get_data_frame("ship") or
prep_get_data_frame("study_data") in your code to access
example data, no change is needed. If you are still accessing example
data using system.file() (e.g. using
load(system.file("extdata", "study_data.RData", package = "dataquieR"))),
you need to switch to prep_get_data_frame(), i.e.:
load(system.file("extdata", "study_data.RData", package = "dataquieR"))
would become
study_data <- prep_get_data_frame("study_data")
- changes in the output names:
- renamed
SummaryData in ResultData
(functions: acc_shape_or_scale, acc_margins,
com_segment_missingness)
- removed column
GRADING from SummaryData
outputs. SummaryTable outputs still feature the column,
since these are meant to be a machine readable interface
con_contradictions_redcap used to return a result named
SummaryTable, while the documentation spoke about
SummaryData. Alas, it should have been
VariableGroupTable in both cases. If you relied on
SummaryTable in the results of
con_contradictions_redcap, you need to change your code to
use now the correct output name VariableGroupTable. Also,
the table has been slightly modified.
VariableGroupData as returned by
con_contradictions_redcap is a version optimized for human
readers.
- in
VariableGroupTable as returned by
con_contradictions_redcap the column category
has been renamed to CONTRADICTION_TYPE
- in
con_contradictions_redcap, if
summarize_categories is selected the result will now be in
a sub-list named Other
- in
prep_add_computed_variables, the column
resp_vars is now named VAR_NAMES, to be more
in line with other data frames.
- Reporting
- improved button to export Excel, pdf, and print (colors
supported)
- improved rendering time introducing thumbnails as first visible
result in the report. Clicking on the image, the thumbnail is replaced
by
plotly’s interactive figures
- implementation of
[.dataquieR_resultset2 and
[[.dataquieR_result and related functions have changed
slightly. You can now for a report
(r <- dq_report2(...)) call, e.g.,
r[, "com_item_missingness", "ReportSummaryTable"] to get a
balloon plot or r[, "com_item_missingness", "SummaryData"]
to get a table, for all variables that were assessed with
com_item_missingness() in the report r
- if you print a list of
dataquieR_result objects, these
will be combined, but due to restrictions in R, this only
works, if you call print() explicitly on this list, not
with “auto-printing” (see https://stackoverflow.com/a/53983005), for
example:
a <- lapply(c("v00001", "v00004", "v00005", "v00006"), acc_loess, meta_data_v2 = "meta_data_v2", study_data = "study_data")
print(a) works, but typing a alone does not.
You have to call print() or to put lapply() in
brackets: (lapply())
- (Indicator) Functions related
acc_distributions() was split in
acc_distributions() and
acc_distributions_ecdf()
(prep_acc_distributions_with_ecdf() creates the original
plot)
- there is a new function
acc_cat_distributions()
- all functions now feature:
- a
meta_data_v2 argument
- new argument
item_level, as synonyms for
meta_data, new argument segment_level, as
synonyms for meta_data_segment, new argument
dataframe_level, as synonyms for
meta_data_dataframe, new argument
cross-item_level, as synonyms for
meta_data_cross_item, new argument
item_computation_level, as synonyms for
meta_data_item_computation
- if you call functions without
label_col, the
label_col will now default to LABEL, except
you set the option options(dataquieR.testdebug = TRUE) or
options(dataquieR.dontwrapresults = TRUE)
- the argument
resp_vars in
prep_scalelevel_from_data_and_metadata() was never working
correctly and not used neither, so it has been deprecated. It is already
not functional and it never was
- the function
des_summary is still present, but you can
now get results for continuous or categorical variables only, using
des_summary_continuous and
des_summary_categoricalrespectively
con_contradictions_redcap plot colors vary depending on
CONTRADICTION_TYPES
acc_loess() uses lowess instead of
loess (both from the stats package)
- General
- test coverage increased, again
- fixed bug in
prep_check_for_dataquieR_updates(), so,
maybe, you need to manually install the latest beta release using
devtools::install_gitlab("libreumg/dataquieR", auth_token = NULL)
- figure sizes have been overworked in the default report
options(dataquieR.ELEMENT_MISSMATCH_CHECKTYPE = "subset_u")
is now the default assuming a one-fits-all-metadata-file (see
? dataquieR.ELEMENT_MISSMATCH_CHECKTYPE)
- fewer custom implementations of stuff available from
rlang or withr, most prominently a faster
prep_prepare_dataframes() and rlang compatible
condition (error) handling.
- small changes in the behavior of the
dataquieR_result
class, which is now applied also to results outside a pipeline.
- many small fixes to figures
- small fixes to menu titles
- bug fixes
dataquieR 2.1.0
- renamed metadata column
SEGMENT_ID_TABLE to
SEGMENT_ID_REF_TABLE in segment level metadata
- scale level metadata support and heuristics
- significantly improved data quality summaries
- consolidated some of the indicator functions (limits, work in
progress)
- many minor optimizing changes
- figure sizing (work in progress), also resize handles
- improved report files structure
- improved
dq_report_by files structure
- fixes, e.g., in rules, fixes in label shortening, computation speed
and cache pre-filling-control
- improved Excel export from the
HTML reports
- missing codes: column
CODE_INTERPRET changed to be in
line with the AAPOR definitions, so the following
translation: PP -> P; P -> I; OH -> UO
- fixed tests
- updated concept excerpt
- excluded nominal and ordinal variables from marginal means
analysis
- improved data type handling
dataquieR 2.0.1
Reporting
- New functions
prep_save_report and
prep_load_report
- Update and simplification of summary overview, empty columns/rows
omitted from the matrices. Also, better classification of errors
- Many small updates in the usability of the report
- Fixes in
HTML/JS output for Firefox
- Bug fixes of report outputs that were not looking as expected (in
contradiction checks and limit violations)
- Fixed mixed distribution plots called several times
- Enable auto-resizing of
plot.ly-plots
- Fixed rendering problems for the new, automatically size-reduced
plots causing the report rendering to fail if having
gginnards installed; removed dependency from
gginnards.
- Do not show superfluous axis labels (e.g., variables, if variable
names are on an axis because these usually overlap without improving the
output)
- Prevent a warning of
robustbase about
doScale
- Less noisy display of conditions (e.g., warnings, errors, messages)
with the results in
dq_report2 reports
summarytools are included in dq_report2
reports, if installed.
- New report rendering code polished, parallel execution of
HTML generation prepared
- New parallel mode for
dq_report2 using a queue improves
speed
- Full support for
VARIABLE_ROLES in
dq_report2 and suppressing helper variable outputs in
dq_report_by
- Do not show conditions (e.g., warnings, messages, errors) in reports
if they address the call of the function (e.g., “using default for
argument…”) by
dq_report2 and not directly by the user
- No unit-missingness in
dq_report2 because it is not so
useful in its current implementation
- More robust
dq_report_by for large reports (can write
and optionally render results to disk rather than returning them)
- Bug fix in
dq_report_by causing
DATA_PROCESS not to work
- Fixed some errors and
TODO’s in
dq_report_by and add dependent variables on the fly but
with VARIABLE_ROLE suppress:
- If no role is given, add “primary” by default for single reports as
well as for
dq_report_by
- Support meta_data_v2 in
dq_report_by
- FIXED: referred variables did not correctly resolve co_vars and
labels instead of variable names
- Several bug fixes:
- Addressed most parts of
https://gitlab.com/libreumg/dataquier/-/issues/242
- Addressed https://gitlab.com/libreumg/dataquier/-/issues/244 and
https://gitlab.com/libreumg/dataquier/-/issues/212
- Default for result-slot-filter was not set
(
filter_result_slots in dq_report2)
- Sometimes, long labels in the first columns of a
JS-table prevented controlling the table
- Fixed missed check for missing cross-item level metadata and earlier
check for valid item-level metadata
- Control crude segment missingness output, so that we see it only if
there is more than one segment on the item-level after the removal of
VARIABLE_ROLES filtered items
- Outliers should work with empty metadata in
UNIVARIATE_OUTLIER_CHECKTYPE and
MULTIVARIATE_OUTLIER_CHECKTYPE
- Fixed successive dates to ignore empty dates
- New functions in
REDCap syntax:
strictly_successive_dates and
successive_dates
- Bug fixes for
REDCap rules and NA handling
and DATA_PROCESS.
- Checked, that code is in line with
https://gitlab.com/libreumg/dataquier/-/issues/243#note_1419465360
- Default for contradictions with the new syntax is now that hard
limits and missing codes are not removed. The argument
use_value_labels is not supported anymore. You can specify
the behavior on the rules level in the new cross-item-level metadata
column DATA_PREPARATION
- Compute end digit preferences only if explicitly requested by a new
item-level metadata column
END_DIGIT_CHECK in
dq_report2, (DATA_ENTRY_TYPE is still
supported and auto-converted). If missing, END_DIGIT_CHECK
defaults to FALSE
- Bug fix: Contradiction rules failed in specific cases if
NA were in the data
- Bug fix: cross-item_level normalization crashed, causing rules to
fail, e.g.,
JUMP_LIST could be added to the item-level
metadata if missing, but causing this type of failing rules
- Bug fixes for
Windows and uncommon variable names
General
- Workbooks can now be loaded from the internet (using
prep_load_workbook_like_file and
meta_data_v2 = formal in dq_report2)
supporting http and https URLs (e.g.,
Excel or OpenOffice workbooks)
- Documentation updates
dataquieR 2.0.0
dq_report2 replaces dq_report. Please use
dq_report2 from now on.
- Full new reporting engine (needs
htmtools and supports
plotly)
- Better report layout and improved functionality
- Support for reading and referring to data in files/URLs
- Support for the integrity dimension in data quality report
- Included distribution and multivariate outlier (provide cross-item
level metadata for the latter) plots in data quality report
- Metadata scheme update (segment,
data.frame, and
cross-item levels). No required action by user, previous version still
supported
REDCap rules for contradictions (cross-item level
metadata), previous contradictions function still supported
- Support metadata describing segment data and study data tables
(segment and
data.frame-level metadata)
- New item-level metadata version (backwards compatible)
- Support for computation of qualified missingness based on labels
from the
AAPOR concept
acc_univariate_outlier and
acc_multivariate_outlier now allow selecting the methods
used to flag outliers
- Included distributional checks in the accuracy dimension for
location and proportion
- Rotation of plots can now be controlled
- Improved many figures
- Better control over warnings
- If
whoami is installed, reports now show a more
suitable user name
- Many minor improvements
- Updated citations
dataquieR 1.0.13
- fixed a left-over
~ from the ggplot2
updates causing acc_margins to fail for categorical
variables
dataquieR 1.0.12
- Addressed a problem with the markdown template underlying the
dq_report reports with wrong brackets
- Addressed deprecations from
ggplot2 3.4.0
- Added
ORCIDs for two authors
- Updated the
CITATION file
- Updated the
README.md file adding the funding
sources.
dataquieR 1.0.11
- Addressed a problem with some test platforms
- Added funding agencies in the manual
dataquieR 1.0.10
- Fixed
NEWS.md file
- Fixed documentation
dataquieR 1.0.9
- Fixed bug in
sigmagap and made missing guessing more
robust.
- Fixed checks on missing code detection failing for
logical.
- Fixed a damaged check for numeric threshold values in
acc_margins.
- Fixed wrongly named
GRADING columns.
- Improved parallel execution by automatic detection of cores.
- Tidy html dependency
dataquieR 1.0.8
- Removed formal arguments from
rbind.ReportSummaryTable
since these are not needed anyways and the inherited documentation for
those arguments rbind from base contains an
invalid URL triggering a NOTE.
dataquieR 1.0.7
- Fixed bugs in example metadata.
- Figures now have size hints as attributes.
- Added simple type conversion check indicator function of dimension
integrity,
int_datatype_matrix.
- Corrected some error classifications
prep_study2meta can now also convert factors to
dataquieR compatible
meta_data/study_data
- Slightly improved documentation.
- Bug fix in
com_item_missingness for textual response
variables.
- Added new output slot with heat-map like tables. Implemented some
generics for those.
dataquieR 1.0.6
- Robustness: Ensure
DT JS is always loaded when a
dq_report report is rendered
- Bug fix: More robust handling of DECIMALS variable attribute, if
this is delivered as a character.
- Bug Fix:
com_segment_missingness with
strata_vars / group_vars did not work
- Bug Fix: If
label_col was set to something else than
LABEL, strata_vars did not work for
com_unit_missingness
- More precise documentation.
- Fixed a bug in a utility function for the univariate outliers
indicator function, which caused many data points flagged as outliers by
the sigma- gap criterion.
- Made outlier function aware of too many non-outlier points causing
too complex graphics (e.g. pdf rendering crashes the PDF reader).
- Fixes and small improvements in
dq_report.
- Switched from
cowplot to patchwork in
acc_margins yielding figures that can be easier
manipulated. Please note, that this change could break existing output
manipulations, since the structure of the margins plots has changed
internally. However, output manipulations were hardly possible for
margins plots before, so it is unlikely, that there are pipelines
affected.
- More control about the output of the
acc_loess
function.
- More robust
prep_create_meta handling length-0
arguments by ignoring these variable attributes at all.
- Added a classification system for warnings and error messages to
distinguish errors based on mismatching variables for a function from
other error messages.
- JOSS
- Some tidy up and more tests.
dataquieR 1.0.5
- Fixed two bugs in
con_inadmissible_categorical (one
resp_var only and value-limits all the same for all
resp_vars)
- Changed LICENSE to BSD-2
- Slightly updated documentation
- Updated
README-File
dataquieR 1.0.4
- Fixed CITATION, a broken reference in Rd and a problem with the
vignette on
pandoc-less systems
- Improved an inaccurate argument description for multivariate
outliers
- Fixed a problem with error messages, if a
dataquieR
function was called by a generated function f that lives in
an environment directly inheriting from the empty environment, e.g.
environment(f) <- new.env(parent = emptyenv()).
- Marked some examples as
dontrun, because they sometimes
caused NOTEs on rhub.
dataquieR 1.0.3
- Addressed all comments by the CRAN reviewers, thank you.
dataquieR 1.0.2
- Bug Fix: If an empty data frame was delivered in the
SummaryTable entry of a result within a
dq_report output, the summary and also
print generic did not work on the report.
dataquieR 1.0.1
- Skipping some of the slower tests on CRAN now. On my local system, a
full
devtools::check(cran = TRUE, env_vars = c(NOT_CRAN = "false"))
takes 2:22 minutes now.
dataquieR 1.0.0
- Initial CRAN release candidate