--- title: "Overview of supported structures" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Overview of supported structures} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(tibblify) ``` The idea of `tibblify()` is to make it easier and more robust to convert lists of lists into tibbles. This is a typical task after receiving API responses in JSON format. The following provides an overview of which kinds of R objects are supported and the JSON to which they correspond. ## Scalars There are 4 basic types of scalars in JSON: boolean, integer, float, string. In R there are not really scalars but only vectors of length 1. :::: {style="display: grid; grid-template-columns: 1fr 1fr; grid-column-gap: 10px; width: 100%;"} ::: {} ```{json} true 1 1.5 "a" ``` ::: ::: {} ```{r results='hide'} TRUE 1 1.5 "a" ``` ::: :::: Other R vectors without JSON equivalent are also supported as long as they: * are a vector in the vctrs definition and * have size one, i.e. `vctrs::vec_size(x)` is 1. Examples are `Date` or `POSIXct`. In general a scalar can be parsed with `tib_scalar()`. There are some special functions for common types: * `tib_lgl()` * `tib_int()` * `tib_dbl()` * `tib_chr()` * `tib_date()` * `tib_chr_date()` to parse dates encoded as string. ## Vectors A homogeneous JSON array is an array of scalars where each scalar has the same type. In R they correspond to a `logical()`, `integer()`, `double()` or `character()` vector. :::: {style="display: grid; grid-template-columns: 1fr 1fr; grid-column-gap: 10px; width: 100%;"} ::: {} ```{json} [true, null, false] [1, null, 3] [1.5, null, 3.5] ["a", null, "c"] ``` ::: ::: {} ```{r eval=FALSE} c(TRUE, NA, FALSE) c(1L, NA, 2L) c(1.5, NA, 2.5) c("a", NA, "c") ``` ::: :::: As with scalars, other vector types are also supported as long as they are a vector in the vctrs definition. They can be parsed with `tib_vector()`. As with scalars, there are shortcuts for some common types, e.g. `tib_lgl_vec()`. ### Empty lists Empty lists `list()` are a special case. They might appear when parsing an empty JSON array. ```{r} x_json <- '[ {"a": [1, 2]}, {"a": []} ]' x <- jsonlite::fromJSON(x_json, simplifyDataFrame = FALSE) str(x) ``` By default they are not supported but produce an error. ```{r error=TRUE} tibblify(x, tspec_df(tib_int_vec("a"))) ``` Use `vector_allows_empty_list = TRUE` in `tspec_*()` so that they are converted to an empty vector instead. ```{r} tibblify(x, tspec_df(tib_int_vec("a"), .vector_allows_empty_list = TRUE))$a ``` ### Homogeneous R lists of scalars When using `jsonlite::fromJSON(simplifyVector = FALSE)` to parse JSON to an R object one does not get R vectors but homogeneous lists of scalars. ```{r} x_json <- '[ {"a": [1, 2]}, {"a": [1, 2, 3]} ]' x <- jsonlite::fromJSON(x_json, simplifyVector = FALSE) str(x) ``` By default they cannot be parsed with `tib_vector()`. ```{r error=TRUE} tibblify(x, tspec_df(tib_int_vec("a"))) ``` Use `.input_form = "scalar_list"` in `tib_vector()` to parse them: ```{r} tibblify(x, tspec_df(tib_int_vec("a", .input_form = "scalar_list")))$a ``` ## Homogeneous JSON objects of scalars Sometimes vectors are encoded as objects in JSON. ```{r} x_json <- '[ {"a": {"x": 1, "y": 2}}, {"a": {"a": 1, "b": 2, "b": 3}} ]' x <- jsonlite::fromJSON(x_json, simplifyVector = FALSE) str(x) ``` Use `.input_form = "object"` in `tib_vector()` to parse them. To store the names use the `.names_to` and `.values_to` arguments. ```{r} spec <- tspec_df( tib_int_vec( "a", .input_form = "object", .names_to = "name", .values_to = "value" ) ) tibblify(x, spec)$a[[1]] tibblify(x, spec)$a[[2]] ``` ## Varying JSON also has lists where elements do not have a common type, but instead vary. For example: :::: {style="display: grid; grid-template-columns: 1fr 1fr; grid-column-gap: 10px; width: 100%;"} ::: {} ```{json} [1, "a", true] ``` ::: ::: {} ```{r eval=FALSE} list(1, "a", TRUE) ``` ::: :::: Such lists can be parsed with `tib_variant()`. ## Object The R equivalent to a JSON object is a named list where the names fulfill the requirements of `vctrs::vec_as_names(repair = "check_unique")`. :::: {style="display: grid; grid-template-columns: 1fr 1fr; grid-column-gap: 10px; width: 100%;"} ::: {} ```{json} { "a": 1, "b": true } ``` ::: ::: {} ```{r results='hide'} x <- list( a = 1, b = TRUE ) ``` ::: :::: They can be parsed with `tib_row()`. For example: ```{r} x <- list( list(row = list(a = 1, b = TRUE)), list(row = list(a = 2, b = FALSE)) ) spec <- tspec_df( tib_row( "row", tib_int("a"), tib_lgl("b") ) ) tibblify(x, spec) ``` ## Data Frames JSON can also store lists of objects. :::: {style="display: grid; grid-template-columns: 1fr 1fr; grid-column-gap: 10px; width: 100%;"} ::: {} ```{json} [ {"a": 1, "b": true}, {"b": 2, "b": false} ] ``` ::: ::: {} ```{r results='hide'} x <- list( list(a = 1, b = TRUE), list(a = 2, b = FALSE) ) ``` ::: :::: They can be parsed with `tib_df()`. ### Object of objects JSON can also store named lists of objects. In JSON they are represented as objects where each element is an object. :::: {style="display: grid; grid-template-columns: 1fr 1fr; grid-column-gap: 10px; width: 100%;"} ::: {} ```{json} { "object1": {"a": 1, "b": true}, "object2": {"b": 2, "b": false} } ``` ::: ::: {} ```{r results='hide'} x <- list( object1 = list(a = 1, b = TRUE), object2 = list(a = 2, b = FALSE) ) ``` ::: :::: They are also parsed with `tib_df()`, but you can parse the names into an extra column via the `.names_to` argument: ```{r} x_json <- '[ { "df": { "object1": {"a": 1, "b": true}, "object2": {"a": 2, "b": false} } }]' x <- jsonlite::fromJSON(x_json, simplifyDataFrame = FALSE) spec <- tspec_df( tib_df( "df", tib_int("a"), tib_lgl("b"), .names_to = "name" ) ) tibblify(x, spec)$df ``` ### Column-major format The column-major format is also supported. :::: {style="display: grid; grid-template-columns: 1fr 1fr; grid-column-gap: 10px; width: 100%;"} ::: {} ```{json} { "a": [1, 2], "b": [true, false] } ``` ::: ::: {} ```{r results='hide'} x <- list( a = c(1, 2), b = c(TRUE, FALSE) ) ``` ::: :::: Parse this using `.input_form = "colmajor"` in `tspec_*()`. ```{r} df_spec <- tspec_df( tib_int("a"), tib_lgl("b"), .input_form = "colmajor" ) tibblify(x, df_spec) ``` This is roughly equivalent to `tibble::as_tibble(x)`.