--- title: "Get Started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Get Started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include = FALSE} library(datamuseum) ``` `datamuseum` is an R package designed to improve specimen management and improve access to legacy databases. This extends from first steps, such as removing samples with unclear or missing labels, to aesthetic improvements intended for publishable tables and figures. `datamuseum` is categorized into four main parts, and each will be described briefly here. Please refer to our [Articles](https://btorgovitsky00.github.io/datamuseum/articles/) for more in-depth workflows and breakdowns. # Geographic Data `datamuseum` includes a host of functions designed for manipulating and refining geographic data. Although the package as a whole is intended for biological samples, functions in the `latlong_*` group are universally applicable. Management with `datamuseum` is done in part with [`rnaturalearth`](https://CRAN.R-project.org/package=rnaturalearth), which provides a standard reference for geography and works alongside other functions of the package to ensure mappable data is consistent and ready for display with other packages like [`ggplot2`](https://CRAN.R-project.org/package=ggplot2). # Taxonomic Data At its core, `datamuseum` is intended for use with biological specimen data sets, with a focus on taxonomy. `datamuseum` accomplishes the task of robust taxonomic review in its taxon function group through a dual-reference system which checks against the Global Biodiversity Information Facility's rGBIF package, and the Integrated Taxonomic Information System via taxize. Users have the option to use GBIF, ITIS, or both for their taxonomic inquiries. `datamuseum` is only as good as its sources, and user input is occasionally essential. Although its ability to double-check provides safety against misspellings or inconsistent updates, some cases may still arise where manual changes are still necessary. # Example Data In lieu of its limitations, `datamuseum` includes practice data in the form of accessioned octopus specimen data sets refined to Japan. Superfamily Octopodoidea was selected and refined to Japan due to the long history of octopus fisheries and cuisine in the region, as well as recent regional taxonomic updates. More specifically, *Octopus vulgaris* (Cuvier, 1797) in East Asia was recently re-described as the also valid *Octopus sinensis* (d'Orbigny, 1834) by [Gleadall et al. (2016)](https://www.jstage.jst.go.jp/article/specdiv/21/1/21_31/_article). `datamuseum` would struggle to reflect a regional change like this since both species are still valid and accepted by GBIF and ITIS. As a result, we recommend adding a simple command like the one below to your `datamuseum` workflow for case-specific changes. This code, and its broader workflow with associated `taxon_*` and `latlong_*` functions, can also be found at [Octopodoidea in Japan](https://btorgovitsky00.github.io/datamuseum/articles/octopodoidea_japan.html). ```{r Sinensis -> Vulgaris, eval = FALSE} museum_clean <- museum_clean %>% mutate(SciName = case_when( SciName == "Octopus vulgaris" ~ "Octopus sinensis", TRUE ~ SciName )) ``` # Utilities Going beyond manipulating just geographic and taxonomic data, `datamuseum` also provides management options for accessioned data sets. This includes options for removing repeat data (deduplicating) when multiple sources are accessioned, expanding your data set when one specimen lot contains multiple individuals, and preparing your taxonomic data for presentation on a ggplot2 graph!