---
title: "Getting Started with spell.replacer"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with spell.replacer}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(spell.replacer)
```

# Introduction

The `spell.replacer` package provides probabilistic spelling correction for character vectors in R. It uses the Jaro-Winkler string distance metric combined with word frequency data from the Corpus of Contemporary American English (COCA) to automatically correct misspelled words.

## Basic Usage

The main function is `spell_replace()`, which takes a character vector and returns it with corrected spellings:

```{r basic_example}
# Example text with misspellings
text <- c("This is a smple text with some mispelled words.",
          "We can corect them automaticaly.")

# Apply spell correction
corrected_text <- spell_replace(text)
print(corrected_text)
```

## How It Works

The package uses a two-step process:

1. **Identify misspelled words**: Uses the `hunspell` package to identify words not found in standard dictionaries
2. **Find corrections**: For each misspelled word, calculates Jaro-Winkler distance to words in the COCA frequency list and selects the best match

## Customizing Correction

You can adjust the correction behavior with several parameters:

```{r custom_example}
# More restrictive threshold (fewer corrections)
conservative <- spell_replace(text, threshold = 0.08)

# Ignore potential proper names
text_with_names <- "John went to Bostan yesterday."
corrected_names <- spell_replace(text_with_names, ignore_names = TRUE)
print(corrected_names)
```

## Single Word Correction

You can also correct individual words using the `correct()` function:

```{r single_word}
# Correct a single word
corrected_word <- correct("recieve", coca_list)
print(corrected_word)
```

## Working with Dataframes

One of the main benefits of `spell.replacer` is that it integrates seamlessly with tidyverse workflows. You can easily apply spell correction to entire columns of text data:

```{r dataframe_example, eval = FALSE}
library(dplyr)

# Example dataframe with text column
docs <- data.frame(
  id = 1:3,
  text = c("This docment has misspellings.",
           "Anothr exmple with erors.",
           "The finl text sampel.")
)

# Apply spell correction using tidy syntax
docs %>%
  mutate(text = spell_replace(text))
```

### Performance

The package processes approximately **1,000 words per second**, making it suitable for large-scale text processing tasks. For example:

- A 100,000 word corpus would take about 1.7 minutes
- A 1,000,000 word corpus would take about 16 minutes

This makes `spell.replacer` practical for preprocessing large text datasets before analysis.

## Word Frequency Data

The package includes the `coca_list` dataset with the 100,000 most frequent words from COCA:

```{r coca_data}
# Most frequent words
head(coca_list, 10)

# Check if a word is in the list
"hello" %in% coca_list

# Find the frequency rank of a word
which(coca_list == "hello")
```