| Title: | An Elegant Approach to Summarizing Clinical Data | 
| Version: | 0.1.0 | 
| Description: | Streamlines the analysis of clinical data by automatically selecting appropriate statistical descriptions and inference methods based on variable types. For method details see Motulsky H J (2016) https://www.graphpad.com/guides/prism/10/statistics/index.htm and d'Agostino R B (1971) <doi:10.1093/biomet/58.2.341>. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| Imports: | car, cli, dplyr, fBasics, glue, qqplotr, rlang, stats, stringr, tibble, tidyplots, tidyr | 
| Suggests: | knitr, rmarkdown | 
| VignetteBuilder: | knitr | 
| Depends: | R (≥ 4.1.0) | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-10 07:33:20 UTC; Lixiang | 
| Author: | Xiang Li [aut, cre] | 
| Maintainer: | Xiang Li <htqqdd@126.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-15 07:00:02 UTC | 
Add statistical test results to summary data
Description
Calculates and appends p-values with optional statistical details to a summary table based on variable types and group comparisons. Handles both continuous and categorical variables with appropriate statistical tests.
Usage
add_p(
  summary,
  digit = 3,
  asterisk = FALSE,
  add_method = FALSE,
  add_statistic_name = FALSE,
  add_statistic_value = FALSE
)
Arguments
| summary | A data frame that has been processed by  | 
| digit | A numeric determine decimal. Accepts: 
 | 
| asterisk | Logical indicating whether to show asterisk significance markers. | 
| add_method | Control parameter for display of statistical methods. Accepts: 
 | 
| add_statistic_name | Logical indicating whether to include test statistic names. | 
| add_statistic_value | Logical indicating whether to include test statistic values. | 
Value
A data frame merged with statistical test results, containing: - Variable names - Summary - Formatted p-values - Optional method names/codes - Optional statistic names/values
Examples
# `summary` is a data frame processed by `add_var()` and `add_summary()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
summary <- add_summary(data)
# Add statistical test results
result <- add_p(summary)
Add summary statistics to a add_var object
Description
This function generates summary statistics for variables from a data frame that has been processed by add_var(), with options to format outputs.
Usage
add_summary(
  data,
  add_overall = TRUE,
  continuous_format = NULL,
  norm_continuous_format = "{mean} ± {SD}",
  unnorm_continuous_format = "{median} ({Q1}, {Q3})",
  categorical_format = "{n} ({pct})",
  binary_show = "last",
  digit = 2
)
Arguments
| data | A data frame that has been processed by  | 
| add_overall | Logical indicating whether to include an "Overall" summary column.  | 
| continuous_format | Format string to override both normal/abnormal continuous formats. Accepted placeholders are  | 
| norm_continuous_format | Format string for normally distributed continuous variables. Default is  | 
| unnorm_continuous_format | Format string for non-normal continuous variables. Default is  | 
| categorical_format | Format string for categorical variables. Default is  | 
| binary_show | Display option for binary variables: 
 | 
| digit | digit A numeric determine decimal. | 
Value
A data frame containing summary statistics with the following columns:
-  variable: Variable name
-  Overall (n=X): Summary statistics for all data, ifadd_overall=TRUE
- Group-specific columns named - [group] (n=X)with summary statistics
Examples
# `data` is a data frame processed by `add_var()`:
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
# Add summary statistics
result <- add_summary(data, add_overall = TRUE)
result <- add_summary(data, continuous_format = "{mean}, ({SD})")
Prepare variables for add_summary
Description
This function processes a dataset for statistical analysis by categorizing variables into continuous and categorical types. It automatically handles normality checks, equality of variances checks, and expected frequency assumptions checks.
Usage
add_var(data, var = NULL, group = "group", norm = "auto", center = "median")
Arguments
| data | A data frame containing the variables to analyze, with variables at columns and observations at rows. | 
| var | A character vector of variable names to include. If  | 
| group | A character string specifying the grouping variable in  | 
| norm | Control parameter for normality tests. Accepts: 
 | 
| center | A character string specifying the  | 
Value
A modified data frame with an attribute 'add_var' containing a list of categorized variables and their properties:
-  var: List of categorized variables:-  valid: All valid variable names after checks
-  continuous: Sublist of continuous variables (further divided by normality/equal variance)
-  categorical: Sublist of categorical variables (further divided by ordered/expected frequency)
 
-  
-  group: Grouping variable name
-  overall_n: Total number of observations
-  group_n: Observation counts per group
-  group_nlevels: Number of groups
-  group_levels: Group level names
-  norm: Normality check method used
Examples
data <- add_var(iris, var = c("Sepal.Length", "Species"), group = "Species")
Test for Equality of Variances
Description
Performs Levene's test to assess equality of variances between groups.
Usage
equal_test(data, var, group, center = "median")
Arguments
| data | A data frame containing the variables to be tested. | 
| var | A character string specifying the numeric variable in  | 
| group | A character string specifying the grouping variable in  | 
| center | A character string specifying the  | 
Value
Logical value:
-  TRUE: Variances are equal, p-value more than 0.05
-  FALSE: Variances are unequal or an error occurred during testing
Methodology for Equality of Variances
Levene's test is the default method adopted in SPSS, the original Levene's test select center = mean, but here select center = median for a more robust test
Examples
equal_test(iris, "Sepal.Length", "Species")
Format p-values with significance markers
Description
Formats p-values as strings with specified precision and optional significance asterisks.
Usage
format_p(p, digit = 3, asterisk = FALSE)
Arguments
| p | A numeric p-value between 0 and 1. | 
| digit | A numeric determine decimal. Accepts: 
 | 
| asterisk | Logical indicating whether to return significance asterisks. | 
Value
Character of formatted p-value or asterisks.
Examples
format_p(0.00009, 4)
format_p(0.03, 3)
format_p(0.02, asterisk = TRUE)
Perform normality test on a variable
Description
Conducts normality tests for a specified variable, optionally by group. Supports automatic testing and interactive visualization.
Usage
normal_test(data = NULL, var = NULL, group = NULL, norm = "auto")
Arguments
| data | A data frame containing the variables to be tested. | 
| var | A character string specifying the numeric variable in  | 
| group | A character string specifying the grouping variable in  | 
| norm | Control parameter for test behavior. Accepts: 
 | 
Value
A logical value:
-  TRUE: data are normally distributed
-  FALSE: data are not normally distributed
Methodology for p-values
Automatically selects test based on sample size per group:
- n < 3: Too small, assuming non-normal 
- (3, 50] Shapiro-Wilk test 
- (50, 1000]: D'Agostino Chi2 test, instead of Kolmogorov-Smirnov test 
- n > 1000: Show p-values, plots QQ plots and prompts for decision 
Examples
normal_test(iris, "Sepal.Length", "Species", norm = "auto")
normal_test(iris, "Sepal.Length", "Species", norm = TRUE)
Check Sample Size Adequacy for Chi-Squared Test
Description
This function determines if a contingency table meets the expected frequency assumptions for a valid chi-squared test. It categorizes the data into "not_small", "small", or "very_small" based on sample size and expected frequencies.
Usage
small_test(data, var, group)
Arguments
| data | A data frame containing the variables to be tested. | 
| var | A character string specifying the factor variable in  | 
| group | A character string specifying the grouping variable in  | 
Value
A character string with one of three values:
-  "not_small": Sample size more than or euqal to 40 and all expected frequencies more than or euqal to 5
-  "small": Sample size more than or euqal to 40, all expected frequencies more than or euqal to 1 and at least one <5, only for 2*2 contingency tables
-  "very_small": Other conditions, including sample size <40 or any expected frequency <1
Examples
df <- data.frame(
  category = factor(c("A", "B", "A", "B")),
  group    = factor(c("X", "X", "Y", "Y"))
)
small_test(data = df, var = "category", group = "group")