This package can be used to create a highlighted source document based on the frequency of phrases found in single or multiple note sheets. The goal of this method is to indicate the portions of the source document that individuals felt was most worth copying into notes, based on phrase frequency. The inputs necessary for this procedure are a notes document and a source document. The output will be HTML code for generating the highlighted text.
This work was funded (or partially funded) by the Center for Statistics and Applications in Forensic Evidence (CSAFE) through Cooperative Agreements 70NANB15H176 and 70NANB20H019 between NIST and Iowa State University, which includes activities carried out at Carnegie Mellon University, Duke University, University of California Irvine, University of Virginia, West Virginia University, University of Pennsylvania, Swarthmore College and University of Nebraska, Lincoln.
You can install from CRAN with:
install.packages("highlightr")
You can install the development version of highlightr from GitHub with:
# install.packages("devtools")
::install_github("rachelesrogers/highlightr") devtools
# load library
library(highlightr)
# rename desired column of derivative documents to 'page_notes'
<- dplyr::rename(comment_example, page_notes=Notes)
comment_example_rename
# tokenize derivative documents
<- token_comments(comment_example_rename)
toks_comment
# rename desired column of source document to 'text'
<- dplyr::rename(transcript_example, text=Text)
transcript_example_rename
# tokenize source document
<- token_transcript(transcript_example_rename)
toks_transcript
# use fuzzy matching in collocation
<- collocate_comments_fuzzy(toks_transcript, toks_comment)
collocation_object #> Warning in join_func(a = a, b = b, by_a = by_a, by_b = by_b, block_by_a = block_by_a, : A pair of records at the threshold (0.7) have only a 95% chance of being compared.
#> Please consider changing `n_bands` and `band_width`.
# connect collocation frequencies to source document
<- transcript_frequency(transcript_example_rename, collocation_object)
merged_frequency
# create `ggplot` object of the transcript
<- collocation_plot(merged_frequency)
freq_plot
# add html tags to source document
<- highlighted_text(freq_plot) page_highlight
page_highlight
page_highlight
will produce HTML output that can then be
rendered into highlighted text. This can be done in R Markdown by
specifying the object outside of a code chunk as
`r page_highlight`
, and knitting the document to HTML.
Alternatively, the xml2
package can be used to save the
output as an html file, as shown in the following code:
# load `xml2` library
library(xml2)
# save html output to desired location
::write_html(xml2::read_html(page_highlight), "filename.html") xml2
The below image is generated through the resulting html output (as
seen in the vignette("highlightr")
).