--- title:
Datasets in `"sdam"` package
date: "August 2022" author: - name:
Antonio Rivero Ostoic
affiliation:
Aarhus University
email:
jaro@cas.au.dk
output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Datasets in `"sdam"` package} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, echo=FALSE, message=FALSE} knitr::opts_chunk$set(echo=TRUE,error=TRUE) knitr::opts_chunk$set(comment = "") library("sdam") ``` ```{r set-options, echo=FALSE, cache=FALSE} options(width = 96) ```
## Preliminaries Install and load one version of `"sdam"` package. ```{r, echo=TRUE, eval=FALSE} install.packages("sdam") # from CRAN devtools::install_github("sdam-au/sdam") # development version devtools::install_github("mplex/cedhar", subdir="pkg/sdam") # a legacy version R 3.6.x ```
```{r} # load and check version library(sdam) packageVersion("sdam") ```
## Built-in datasets Package `"sdam"` comes with a suite of datasets and external data to execute different functions available in the package and to perform analysis. For a list of built-in datasets in `"sdam"` use the `"utils"` function `data()` or `utils::data()` with the `'package'` argument. The CRAN distribution has four built-in datasets, while the development and legacy distributions add three more built-in datasets.
```{r, echo=TRUE, eval=FALSE} # pop-up a new window data(package="sdam") ``` ```{r, echo=FALSE, eval=TRUE} print(data(package="sdam")) ``` ```{r} # Data sets in package 'sdam': # # rp Roman province names and acronyms as in EDH # rpcp Roman provinces chronological periods # rpd Roman provinces dates from EDH # rpmcd Caption maps and affiliation dates of Roman provinces ``` ```{r} # Additional built-in datasets in 'sdam': # # EDH Epigraphic Database Heidelberg Dataset # rpmp Maps of ancient Roman provinces and Italian regions # retn Roman Empire transport network and Mediterranean sea ```
A description of each dataset is available in the manual that from the R console is accessible as e.g. the `EDH` dataset in a non-CRAN distribution. ```{r, echo=TRUE, eval=FALSE} # Epigraphic Database Heidelberg Dataset help ?EDH ```
### Ancient Mediterranean built-in datasets The `EDH` dataset in `"sdam"` has information about Latin epigraphy retrieved from the Epigraphic Database Heidelberg API repository from the Roman world during the antiquity period. A list of Roman provinces and regions in this dataset is available in dataset `"rp"`, and use again function `data()` to load this built-in dataset to look at its internal structure with `utils::str()` function.
* Dataset `"rp"` is a named list with Roman provinces and regions with acronyms according to the Epigraphic Database Heidelberg. ```{r, echo=TRUE, eval=TRUE} # load dataset data("rp") # obtain object structure str(rp) ```
#### `edhw()` interface with `"rp"` dataset * Function `edhw()` is a wrapper to extract and transform the records in the `EDH` dataset that invokes `"rp"` dataset to retrieve the records from a specific Roman province or region in `EDH`. ```{r} # Armenian records in 'EDH' edhw(province="Arm")[1] ``` The `Warning` messages from `edhw()` are first because there is not an explicit input in `x`, it is assumed that the input data is from the `EDH` dataset. The second warning message just tells the type object to return is always a list for argument `province` alone.
#### `EDH` in data frames All records in the `EDH` dataset have a list format and it is possible to transform this information into a dataframe format with the wrapper function `edhw()`. For instance, displaying the first record from `Arm` as a data frame in argument `'as'` is made by the record `'id'` number. ```{r, echo=TRUE, eval=FALSE} # record HD015521 edhw(id="15521", as="df") ```
However, it is easier to visualise in the screen only the variables related to people. ```{r, echo=TRUE, eval=FALSE} # record HD015521 with explicit variables edhw(id="15521", vars="people", as="df") ``` ```{r, echo=FALSE, eval=TRUE} suppressWarnings(edhw(id="15521", vars="people", as="df")) ```
```{r, echo=TRUE, eval=FALSE} # record HD015521 with more explicit variables edhw(id="15521", vars=c("people", "province_label"), as="df") ``` ```{r, echo=FALSE, eval=TRUE} suppressWarnings(edhw(id="15521", vars=c("people", "province_label"), as="df")) ```
### Obtaining all `people` variables Start by looking at the `people` variables in the `EDH` dataset for the Roman province of **Armenia**.
#### Armenia ```{r, echo=FALSE, eval=TRUE, out.width="25%", fig.align="center", fig.cap="Roman province of Armenia (ca 117 AD)."} plot.map("Arm", cap=TRUE, name=FALSE) ```
Transformation of the entire province from the `EDH` dataset requires extracting first a list with the province content. Function `edhw()` is to obtain available inscriptions per province from `EDH` and all data attributes from `people` variable. The default outputs are a list and a dataframe for the first and the second instance of the function. ```{r, echo=TRUE, eval=FALSE} # people in Armenia edhw(province="Arm") |> edhw(vars="people") ``` ```{r, echo=FALSE, eval=TRUE} edhw(province="Arm") |> suppressWarnings() |> edhw(vars="people") |> suppressWarnings() ```
People attribute variables in inscriptions for `Armenia` are `age: years`, `cognomen`, `gender`, `name`, `nomen`, `person_id`, `praenomen`, and, `status`, but any inscription with `tribus` or `origo` as in the case of other provinces. For `Armenia`, two inscriptions have people variables and all people scripted are `male`, where record `HD015524` spans two rows because there are two persons where one have `nomen`, `cognomen`, and `name` ineligible.
### Datasets for cartographical maps The plotting of the Roman province in the previous section requires other datasets. Apart from `"rp"`. In `"sdam"`, there are other three datasets invoked for plotting cartographical maps related to the Roman Empire and the Mediterranean basin, which are `"rpmp"`, `"rpmcd"`, and `"retn"`. Function `plot.map()` calls dataset `"rpmp"` for the shapes and colours in the plotting of the cartographical maps of different regions of the Roman Empire. For the caption and province dates with this function shapes and colours are in dataset `"rpmcd"`.
* Dataset `"retn"` bears the shapes of places and routes of an ancient transportation system in the Mediterranean region and political divisions of the Roman Empire. It also has it contours and parts of the European continent. ```{r} # land contour around Mediterranean plot.map(type="plain") ```
```{r, echo=TRUE, eval=FALSE} # display settlements and shipping routes plot.map(type="plain", settl=TRUE, shipr=TRUE) ```
Vignette [Cartographical maps and networks](../doc/Maps.html) has more about transportation networks in the ancient Mediterranean.
### Datasets with dates There are built-in datasets in `"sdam"` related to dates as well that are either displayed in a cartographical map or used for other computations.
* Dataset `"rpd"` that has dates for provinces from the `EDH` dataset. It serves for performing a restricted imputation on data subsets in `EDH` or in another dataset. ```{r, echo=TRUE, eval=TRUE} # dates from EDH data("rpd") # three provinces in object structure str(rpd[1:3]) ``` From this set of three Roman provinces in the `EDH`, the longest timespan is for `Aem`, and on average `Ach` has the oldest incriptions, while `Aeg` has incriptions with the newest dates.
* Dataset `"rpcp"` with chronological periods for regions with early and later Roman influence per province. ```{r, echo=TRUE, eval=TRUE} # periods for Roman provinces data("rpcp") # object structure str(rpcp) ``` The early and later Roman influence in the 45 ancient provinces and regions are timespans with a *terminus ante quem* and a *terminus post quem*.
Vignette [Dates and missing dating data](../doc/Dates.html) has the visualisation of these and other dates.
### External data Apart from the built-in datasets, it is attached as external data the semi-colon separated file `StraussShipwrecks.csv` with the Shipwrecks dataset for performing analyses: Reference and documentation in Strauss, J. (2013). *Shipwrecks Database*. Version 1.0. Accessed (07-12-2021) from oxrep.classics.ox.ac.uk/databases/shipwrecks_database/
Built from Parker, A.J. *Ancient Shipwrecks of the Mediterranean and the Roman Provinces* (Oxford: BAR International Series 580, 1992)
Details about the access to the database are in: - [Shipwrecks network in the Mediterranean Basin (23-June-2022)](https://htmlpreview.github.io/?https://github.com/sdam-au/R_code/blob/master/HTML/Shipwrecks%20Network%20in%20the%20Mediterranean%20Basin.html) - Vignettes [Dates and missing dating data](../doc/Dates.html) and [Cartographical maps and networks](../doc/Maps.html) also use the Shipwrecks dataset.
### See also #### Vignettes * [Re-encoding `people` in the `EDH` dataset](../doc/Encoding.html) * [Dates and missing dating data](../doc/Dates.html) * [Cartographical maps and networks](../doc/Maps.html)
#### Manuals * [sdam: Digital Tools for the SDAM Project at Aarhus University](../html/sdam-package.html) * [`"sdam"` manual](https://github.com/mplex/cedhar/blob/master/typesetting/reports/sdam.pdf)
#### Project * [Release candidate version](https://github.com/sdam-au/sdam) * [Code snippets using `"sdam"`](https://github.com/sdam-au/R_code) * [Social Dynamics and complexity in the Ancient Mediterranean project](https://sdam-au.github.io/sdam-au/)