---
title: "Data Types"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Data Types}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## What are Data Types (and Why Should You Care?)
A data type is a description of how information is stored digitally and in which format.
In the context of BLOSC compression this is relevant as it is used to compress arrays
of structured data. How this data is structured is described by the data type.
For the use of BLOSC in R, this is also relevant, because `R` (by design) provides access
to a limited number of data types, most importantly: `raw()`, `logical()`, `integer()`, `numeric()` and `complex()`. Below you will find a table of typical storage formats
and how these are converted to `R` types.
Therefore, you probably need to convert the data type of stored data to something that
can be handled in `R` (or vice versa). For your convenience the functions `r_to_dtype()`
and `dtype_to_r()` handle such conversions. Note that these functions do not provide
exhaustive features, but are meant to handle most common conversions.
## Specification Version
The package at hand uses
[version 2](https://zarr-specs.readthedocs.io/en/latest/v2/v2.0.html) of data type specifications, while they are superseded by
[version 3](https://zarr-specs.readthedocs.io/en/latest/v3/data-types/index.html).
Why is this?
The old version is used as it still includes the endianness in its encoding and is
more compact. In combination with the endianness version 3 types can easily be annotated
with version 2 data types. You have to do this conversion yourself, it is not implemented
by this package.
## Overview of Data Types
Data types are represented by a code where the first character reflects the byte order of
the data (see Wikipedia article about [Endianness](https://en.wikipedia.org/wiki/Endianness)).
The second character reflects the main type of the data (such as integer, or floating
point). The following numerical characters indicate the size (in bytes) of each element.
For data types `M` (date time) and `m` (delta time), the specification also includes
the unit of time used to store the information.
The table below shows an overview of common types, how the are converted from and to
`R` types, and some important notes to consider while converting data.
+---------------+------------------------+--------------+--------------------------------------+
| __dtype code__|__Alternative notation__|__R type__ | __Notes__ |
+---------------+------------+-----------+--------------+--------------------------------------+
| `|b1` | 8 bit boolean | `logical()` | In `R` logical values are actually |
| | | | stored as a 32 bit integer. |
+---------------+------------------------+--------------+--------------------------------------+
| `|i1`, `i2`, `i4` | | | |
+---------------+------------------------+--------------+--------------------------------------+
| `|u1`, `u2` | unsigned integers | | |
+---------------+------------------------+--------------+--------------------------------------+
| `u4`, | 32 and 64 bit unsigned | `numeric()` | Not all numbers of these types can be|
| `u8` | integers | | adequately represented by neither R's|
| | | | `numeric()` nor `integer()`. Handle |
| | | | these types with caution |
+---------------+------------------------+--------------+--------------------------------------+
| `i8` | 64 bit signed | `numeric()` | Not all numbers of these types can be|
| | integers | | adequately represented by neither R's|
| | | | `numeric()` nor `integer()`. Handle |
| | | | these types with caution |
+---------------+------------------------+--------------+--------------------------------------+
| `f2`, | 16, 32 and 64 bit | `numeric()` | |
| `f4`, | floating point numbers | | |
| `f8`, | | | |
+---------------+------------------------+--------------+--------------------------------------+
|`c8`, | 64 bit and 128 bit | `complex()` | |
|`c16`,| complex numbers | | |
+---------------+------------------------+--------------+--------------------------------------+
| `M8[*]` where | | | unit as a 64 bit integer, whereas |
| *=unit | | | POSIXct stores the object as a |
| | | | `double`. Use with caution |
+---------------+------------------------+--------------+--------------------------------------+
| `m8[*]` where | object | | unit as a 64 bit integer, whereas |
| *=unit | | | `difftime` stores the object as a |
| | | | `double`. Use with caution |
+---------------+------------------------+--------------+--------------------------------------+
Some examples of encoding r data to dtypes
```{r encoding}
library(blosc)
r_to_dtype(c(TRUE, FALSE), "|b1")
r_to_dtype(1L:4L, "|u1")
r_to_dtype(c(1.4, 9.8e-6), "
dtype_to_r("
dtype_to_r("
dtype_to_r("|b1", na_value = NA_integer_)
## This can be fixed by specifying `na_value`
r_to_dtype(c(TRUE, NA, FALSE, TRUE), "|b1", na_value = -1) |>
dtype_to_r("|b1", na_value = -1)
## If the `na_value` is not specified for `dtype_to_r()`,
## it will be taken literally
r_to_dtype(c(1, NA, 4, 5), "
dtype_to_r("
dtype_to_r("