| Type: | Package | 
| Title: | Data Cleaning for Psychological Analyses | 
| Version: | 0.1.1 | 
| Description: | Useful for preparing and cleaning data. It includes functions to center data, reverse coding, dummy code and effect code data, and more. | 
| License: | GPL (≥ 3) | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| Imports: | dplyr, tidyr, tibble, data.table, rlang (≥ 0.1.2) | 
| Suggests: | roxygen2, covr, misty, testthat (≥ 3.0.0) | 
| URL: | https://jasonmoy28.github.io/psycCleaning/ | 
| Config/testthat/edition: | 3 | 
| Depends: | R (≥ 2.10) | 
| LazyData: | true | 
| NeedsCompilation: | no | 
| Packaged: | 2023-11-04 20:21:58 UTC; Jasonmoy | 
| Author: | Jason Moy | 
| Maintainer: | Jason Moy <jason.moyhj@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2023-11-05 06:30:02 UTC | 
Pipe operator
Description
Pipe operator
Usage
lhs %>% rhs
Value
no return value
Center with respect to grand mean
Description
This function will compute grand-mean-centered scores.
Usage
center_grand_mean(data, cols, keep_original = TRUE)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. | 
| keep_original | default is 'FALSE'. Set to 'TRUE' to keep original columns | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with scores that are grand-mean-centered.
Examples
center_grand_mean(iris,where(is.numeric))
Center with respect to group mean
Description
This function will compute group-mean-centered scores.
Usage
center_group_mean(data, cols, group, keep_original = TRUE)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. | 
| group | character. grouping variable | 
| keep_original | default is 'TRUE'. Set to 'FALSE' to remove original columns | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with scores that are group-mean centered
Examples
center_group_mean(iris,where(is.numeric), group = Species)
Centering for multilevel analyses
Description
This function will group mean centered the scores at the level 1 and create a mean score for each group at L2.
Usage
center_mlm(data, cols, group, keep_original = TRUE)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. | 
| group | the grouping variable. Must be character. | 
| keep_original | default is 'TRUE'. Set to 'FALSE' to remove original columns | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with L1 scores that are group-mean centered. 3. Columns with L2 aggregated means.
Examples
center_mlm(iris,dplyr::ends_with('Length'),group = 'Species')
Composite column
Description
The function will perform a row-wise aggregation which then divided by the total number of columns.
Usage
composite_score(
  data,
  cols = dplyr::everything(),
  na.rm = FALSE,
  composite_col_name = "composited_column"
)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be composited See 'dplyr::dplyr_tidy_select' for available options. | 
| na.rm | Ignore NA. The default is 'FALSE'. If set to 'TRUE', the composite score will be 'NA' if there is one or more 'NA' in any of the columns. | 
| composite_col_name | Name for the new composited columns. Default is 'composite_column'. | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved. 2. Columns with composited scores.
Examples
test_df = data.frame(col1 = c(1,2,3,4),col2 = c(1,2,3,4), col3 = c(1,2,NA,4))
composite_df = composite_score(data = test_df)
Dummy Coding
Description
Create dummy-coded columns, supporting tidyselect syntax to process multiple columns simultaneously.
Usage
dummy_coding(data, cols)
Arguments
| data | data.frame object | 
| cols | Columns that need to be dummy-coded See 'dplyr::dplyr_tidy_select' for available options. | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved. 2. Columns that are dummy-coded.
Examples
dummy_coding(iris,Species)
Effect Coding
Description
Create effect-coded columns, supporting tidyselect syntax to process multiple columns simultaneously.
Usage
effect_coding(data, cols, factor = FALSE)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be effect-coded. See 'dplyr::dplyr_tidy_select' for available options. | 
| factor | The default is 'FALSE'. If factor is set to 'TRUE', this function returns a tibble with effect-coded factors. If factor is set to 'FALSE', this function returns a tibble with effect-coded columns. | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved. 2. Columns that are effect-coded.
Examples
effect_coding(iris,Species)
Listwise deletion
Description
Perform listwise deletion (the entire rows is disregarded if the row has one 'NA' value)
Usage
listwise_deletion(data, cols = dplyr::everything())
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to use listwise deletion. See 'dplyr::dplyr_tidy_select' for available options. | 
Value
An object of the same type as .data with rows revmoed if the row has one 'NA' value
Examples
test_df = data.frame(col1 = c(1,2,3),col2 = c(1,NA,3),col3 = c(1,2,NA))
listwise_deletion(test_df,col1:col2) # you can see that the row with NA in col3 is not deleted
mlbook_data
Description
Classic data-set from Snijders, Tom A.B., and Bosker, Roel J. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, second edition.
Usage
mlbook_data
Format
A data frame with 3758 rows and 34 variables:
- schoolnr
- School ID 
- pupilNR_new
- Student Identifier (Level 1 units) 
- langPOST
- Student language score 
- ses
- Student socioeconomic score, grand-mean centered (in points, M = 0)) 
- IQ_verb
- Student verbal IQ, grand-mean centered (in points, M = 0) 
- sex
- Student binary gender, 1 = female, 0 = not female 
- Minority
- Student minority status, 1 = minoritized, 0 = not minoritized 
- denomina
- School-level religious denominations, 5 categories 
- female_dum
- Dummy coded sex 
- female_eff
- Effect-coded sex 
- female_CMC
- Group-mean-centered of female_eff 
- fempct_agg
- Aggregated mean female_dum for each school 
- Zfempct_agg
- Z-scored aggregated mean female_dum for each school 
- ses_CMC
- Group-mean-centered SES 
- Zses_CMC
- Z-scored group-mean-centered SES 
- ses_agg
- Aggregated mean SES for each school 
- Zses_agg
- Z-scored aggregated mean SES for each school 
Source
https://www.stats.ox.ac.uk/~snijders/mlbook.htm
Recode values of a data frame
Description
Recode values of a data frame
Usage
recode_item(data, cols, code_from = NULL, code_to = NULL, retain_code = NULL)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be recoded. See 'dplyr::dplyr_tidy_select' for available options. | 
| code_from | vector. the order must match with vector for 'code_to' | 
| code_to | vector. the order must match with vector for 'code_from' | 
| retain_code | vector. Specify the values to be retain | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns except the recoded columns from .data will be preserved 2. Recoded columns
Examples
pre_recoded_df = tibble::tibble(x1 = 1:5, x2 = 5:1)
recoded_df = recode_item(pre_recoded_df, cols = dplyr::contains('x'),
                        code_from = 1:5,
                        code_to = 5:1)
Count the number of missing values
Description
It counts the number of missing (i.e.,'NA') values in each column.
Usage
summarize_missing_values(
  data,
  cols = dplyr::everything(),
  group = NULL,
  verbose = TRUE,
  return_result = FALSE
)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be checked for missing values. See 'dplyr::dplyr_tidy_select' for available options. | 
| group | character. count missing values by group. | 
| verbose | default is 'TRUE'. Print the missing value data frame | 
| return_result | default is 'FALSE'. Return 'data_frame' if set to yes | 
Value
An object of the same type as .data. that specified the number of NA values of the columns (only when 'return_result = TRUE')
Examples
df1 = data.frame(col1 = c(1,2,3),col2 = c(1,NA,3),col3 = c(1,2,NA))
summarize_missing_values(df1,everything())
Tidy eval helpers
Description
-  sym()creates a symbol from a string andsyms()creates a list of symbols from a character vector.
-  enquo()andenquos()delay the execution of one or several function arguments.enquo()returns a single quoted expression, which is like a blueprint for the delayed computation.enquos()returns a list of such quoted expressions.
-  expr()quotes a new expression locally. It is mostly useful to build new expressions around arguments captured withenquo()orenquos():expr(mean(!!enquo(arg), na.rm = TRUE)).
-  as_name()transforms a quoted variable name into a string. Supplying something else than a quoted variable name is an error.That's unlike as_label()which also returns a single string but supports any kind of R object as input, including quoted function calls and vectors. Its purpose is to summarise that object into a single label. That label is often suitable as a default name.If you don't know what a quoted expression contains (for instance expressions captured with enquo()could be a variable name, a call to a function, or an unquoted constant), then useas_label(). If you know you have quoted a simple variable name, or would like to enforce this, useas_name().
To learn more about tidy eval and how to use these tools, visit https://www.tidyverse.org and the Metaprogramming section of Advanced R.
Value
no return value
Grand mean z-score
Description
This function will compute z-scores with respect to the grand mean.
Usage
z_scored_grand_mean(data, cols, keep_original = TRUE)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. | 
| keep_original | default is 'FALSE'. Set to 'TRUE' to keep original columns | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with scores that are z-scored
Examples
z_scored_grand_mean(iris,where(is.numeric))
Z scored with with respect to the group mean
Description
This function will compute group-mean-centered scores, and then z-scored the group-mean-centered scores with respect to the grand mean.
Usage
z_scored_group_mean(data, cols, group, keep_original = TRUE)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. | 
| group | the grouping variable. If you need to pass multiple group variables, try to use quos(). Passing multiple group variables is not tested. | 
| keep_original | default is 'FALSE'. Set to 'TRUE' to keep original columns | 
Value
return a dataframe with the columns z-scored (replace existing columns)
Examples
z_scored_group_mean(iris, dplyr::ends_with("Petal.Width"), "Species")
Z-scored for multilevel analyses
Description
This function will group mean centered the scores at the level 1 and create an aggregated mean score for each group at L2. After that, the group-mean-centered L1 scores and mean L2 scores will be z-scored with respect to the grand mean. Please see 'center_mlm' if you want to use the version without the z-scoring.
Usage
z_scored_mlm(data, cols, group, keep_original = TRUE)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Columns that need to be centered. See 'dplyr::dplyr_tidy_select' for available options. | 
| group | The grouping/cluster variable. | 
| keep_original | default is 'TRUE'. Set to 'FALSE' to remove original columns | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with L1 scores that are group-mean centered then grand-mean z-scored. 3. Columns with L2 aggregated means that are z-scored
Examples
z_scored_mlm(iris,dplyr::ends_with('Length'),group = 'Species')
Z-scored for multilevel analyses
Description
This is a specialized function for mean centering categorical variables. There are two cases where this function should be used instead of the generic 'center_mlm'. 1. This function should be used when you need group mean centering for non-dummy-coded variables at L1. Variables at L2 are always dummy-coded as they represent the percentage of subjects in that group. 2. This function should be used whenever you want to z-score the aggregated L2 means
Usage
z_scored_mlm_categorical(
  data,
  cols,
  dummy_coded = NA,
  group,
  keep_original = TRUE
)
Arguments
| data | A data.frame or a data.frame extension (e.g. a tibble). | 
| cols | Dummy-coded or effect-coded columns for group-mean centering. Support 'dplyr::dplyr_tidy_select' options. | 
| dummy_coded | Dummy-coded variables (cannot be effect-coded) for L2 aggregated means. Support 'dplyr::dplyr_tidy_select' options. | 
| group | the grouping variable. Must be character | 
| keep_original | default is 'FALSE'. Set to 'TRUE' to keep original columns | 
Value
An object of the same type as .data. The output has the following properties: 1. Columns from .data will be preserved 2. Columns with L1 scores that are group-mean centered 3. Columns with L2 aggregated means (i.e., percentage) that are z-scored
Examples
z_scored_mlm_categorical(mlbook_data,cols='female_eff',dummy_coded='female_dum','schoolnr')