| Type: | Package | 
| Title: | Turn Clean Data into Messy Data | 
| Version: | 0.1.1 | 
| Description: | Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc. | 
| License: | MIT + file LICENSE | 
| Depends: | R (≥ 2.10) | 
| Imports: | assertthat, purrr, stringr | 
| Suggests: | charlatan, testthat (≥ 2.0.0), tibble, covr | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| URL: | https://github.com/mdlincoln/salty | 
| BugReports: | https://github.com/mdlincoln/salty/issues | 
| NeedsCompilation: | no | 
| Packaged: | 2024-08-31 04:04:06 UTC; mlincoln | 
| Author: | Matthew Lincoln | 
| Maintainer: | Matthew Lincoln <matthew.d.lincoln@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2024-08-31 04:20:02 UTC | 
salty: Turn Clean Data into Messy Data
Description
Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.
Author(s)
Maintainer: Matthew Lincoln matthew.d.lincoln@gmail.com (ORCID)
See Also
Useful links:
Access the original source vector for a given shaker function
Description
Access the original source vector for a given shaker function
Usage
inspect_shaker(f)
Arguments
| f | A shaker function | 
Value
A character vector
Examples
inspect_shaker(shaker$punctuation)
Sample a proportion of indices of a vector
Description
Sample a proportion of indices of a vector
Usage
p_indices(x, p)
Arguments
| x | A vector | 
| p | A numeric probability between 0 and 1 | 
Value
An integer vector of indices.
Salt vectors with common data problems
Description
These are easy-to-use wrapper functions that call either salt_insert (for including new characters) or salt_replace (for salting that requires replacement of specific characters) with sane defaults.
Usage
salt_punctuation(x, p = 0.2, n = 1)
salt_letters(x, p = 0.2, n = 1)
salt_whitespace(x, p = 0.2, n = 1)
salt_digits(x, p = 0.2, n = 1)
salt_ocr(x, p = 0.2, rep_p = 0.1)
salt_capitalization(x, p = 0.1, rep_p = 0.1)
salt_decimal_commas(x, p = 0.1, rep_p = 0.1)
Arguments
| x | A vector. This will always be coerced to character during salting. | 
| p | A number between 0 and 1. Percent of values in  | 
| n | A positive integer. Number of times to add new values from
 | 
| rep_p | A number between 0 and 1. Probability that a given match should be replaced in one of the selected values. | 
Details
For a more fine-grained control over how characters are added and whether , see the documentation for salt_insert, salt_substitute, salt_replace, and salt_delete.
Functions
-  salt_punctuation(): Punctuation characters
-  salt_letters(): Upper- and lower-case letters
-  salt_whitespace(): Spaces
-  salt_digits(): 0-9
-  salt_ocr(): Replace some substrings with common OCR problems
-  salt_capitalization(): Flip capitalization of letters
-  salt_decimal_commas(): Flip decimals to commas and vice versa
Delete some characters from some values
Description
Delete some characters from some values
Usage
salt_delete(x, p = 0.2, n = 1)
Arguments
| x | A vector. This will always be coerced to character during salting. | 
| p | A number between 0 and 1. Percent of values in  | 
| n | A positive integer. Number of times to add new values from
 | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
salt_delete(x, p = 0.5, n = 5)
salt_empty(x, p = 0.5)
salt_na(x, p = 0.5)
Insert new characters into some values in a vector
Description
Inserts a selection of characters into a percentage of values in the supplied vector.
Usage
salt_insert(x, insertions, p = 0.2, n = 1)
Arguments
| x | A vector. This will always be coerced to character during salting. | 
| insertions | A shaker function, or a character vector. | 
| p | A number between 0 and 1. Percent of values in  | 
| n | A positive integer. Number of times to add new values from
 | 
Value
A character vector the same length as x
Remove entire values from a vector
Description
Remove entire values from a vector
Usage
salt_na(x, p = 0.2)
salt_empty(x, p = 0.2)
Arguments
| x | A vector | 
| p | A number between 0 and 1. Proportion of values to edit. | 
Value
A vector the same length as x
Replace certain patterns into some values in a vector
Description
Inserts a selection of characters into some values of x. Pair salt_replace with the named vectors in replacement_shaker, or supply your own named vector of replacements. The convenience functions salt_ocr and salt_capitalization are light wrappers around salt_replace.
Usage
salt_replace(x, replacements, p = 0.1, rep_p = 0.5)
Arguments
| x | A vector. This will always be coerced to character during salting. | 
| replacements | A replacement_shaker function, or a named character vector of patterns and replacements. | 
| p | A number between 0 and 1. Percent of values in  | 
| rep_p | A number between 0 and 1. Probability that a given match should be replaced in one of the selected values. | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
salt_replace(x, replacement_shaker$capitalization, p = 0.5, rep_p = 0.2)
salt_ocr(x, p = 1, rep_p = 0.5)
Substitute certain characters in a vector
Description
Substitute certain characters in a vector
Usage
salt_substitute(x, substitutions, p = 0.2, n = 1)
Arguments
| x | A vector. This will always be coerced to character during salting. | 
| substitutions | Values to be substituted in | 
| p | A number between 0 and 1. Percent of values in  | 
| n | A positive integer. Number of times to add new values from
 | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
salt_substitute(x, shaker$digits, p = 0.5, n = 5)
Randomly swap out entire values in a vector
Description
Because swaps can be provided by either a character vector or a function
that returns a character vector, salt_swap can be fruitfully used in
conjunction with the charlatan::charlatan package to intersperse real data with
simulated data.
Usage
salt_swap(x, swaps, p = 0.2)
Arguments
| x | A vector. This will always be coerced to character during salting. | 
| swaps | Values to be swapped out | 
| p | A number between 0 and 1. Percent of values in  | 
Value
A character vector the same length as x
Examples
x <- c("Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
       "Nunc finibus tortor a elit eleifend interdum.",
       "Maecenas aliquam augue sit amet ultricies placerat.")
new_values <- c("foo", "bar", "baz")
salt_swap(x, swaps = new_values, p = 0.5)
salty: Turn Clean Data Into Messy Data
Description
Insert, delete, replace, and substitute bits of your data with messy values.
Details
Convenient wrappers such as salt_punctuation are provided for quick access
to this package's functionality with simple defaults. For more fine-grained
control, use one of the underlying salt_ functions:
-  salt_insert will insert new characters into some of the values of x. All the original characters of the original values will be maintained.
-  salt_substitute will substitute some characters in some of the values of xin place of some of the original characters.
-  salt_replace will replace some characters in some of the values of x. Unlike salt_substitute, salt_replace does conditional replacement dependent on the original values ofx, such as changing capitalization or simulating OCR errors based on certain character combinations.
-  salt_delete will remove some characters in the values of x
-  salt_na and salt_empty will replace some values of xwithNAor with empty strings.
-  salt_swap replaces entire values of xwith new strings
Get a set of values to use in salt_ functions
Description
shaker contains various character sets to be added to your data using salt_insert and salt_substitute. replacement_shaker is for salt_replace, and contains pairlists that replace matched patterns in your data.
Usage
shaker
replacement_shaker
available_shakers()
Format
An object of class list of length 6.
An object of class list of length 3.
Value
A sampling function that will be called by salt_insert, salt_substitute, or salt_replace.
Examples
salt_insert(letters, shaker$punctuation)
available_shakers()