---
title: "Indexing Polygon Example"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Indexing Polygon Example}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  tidy = FALSE
)
```

# British National Grid Indexing Polygon Examples

A key component of the `osbng` package is the indexing functionality. The
`geom_to_bng` and `geom_to_bng_intersection` functions enable the indexing of
geometries, represented using
[`geos`](https://paleolimbot.github.io/geos/index.html) geometry objects, into
grid squares at a specified resolution. Both functions accept `geos` (or
optionally [`sf`](https://r-spatial.github.io/sf/index.html)) objects of the
following types: `Point`, `LineString`, `Polygon`, `MultiPoint`,
`MultiLineString`, `MultiPolygon`, and `GeometryCollection.` The geometry
coordinates must be encoded in the [British National Grid (OSGB36)
EPSG:27700](https://epsg.io/27700) coordinate reference system.

These functions facilitate grid-based spatial analysis, enabling applications
such as statistical aggregation, data visualisation, and data interoperability.
The two functions differ in their operation: `geom_to_bng` returns the British
National Grid (BNG) grid squares intersected by the input geometry, while
`geom_to_bng_intersection` returns the intersections (shared areas) between the
input geometry and the grid square geometries.

When deciding between the two functions, consider whether a decomposition of the
input geometry by BNG grid squares is required. The decomposition logic is
computationally more expensive but is useful when the intersection between the
input geometry and a grid square is needed. This approach supports spatial join
optimisations, such as point-in-polygon and polygon-to-polygon operations, using
the `is_core` value of the indexed geometry object. These optimisations are
particularly valuable for geospatial analysis of medium to large datasets in
distributed processing systems, where geometries may be colocated by their BNG
references.

## Indexing Functions Accepting Geometries

### geom_to_bng

This function returns a list of `BNGReference` objects representing the BNG grid
squares intersected by the input geometry. Note that `BNGReference` objects are
deduplicated in cases where multiple parts of a multi-part geometry intersect
the same grid square.

### geom_to_bng_intersection

This function returns a list of objects representing the decomposition of the
input geometry into BNG grid squares. Unlike `geom_to_bng`, no deduplication
occurs. If multiple parts of a multi-part geometry intersect the same grid
square, the intersection for each part is returned.

### geom_to_bng_intersection_explode

This convenience function applies `geom_to_bng_intersection()` to each geometry
in an `sf` data.frame, returning a flattened data.frame by exploding the list of
indexed geometries.

## Examples

The examples below demonstrate the application of the two indexing functions
using the London boundary from the administrative England Regions dataset
provided by the Office for National Statistics (ONS). Metadata for this dataset
is available from `?osbng::`. The indexing functions are applied to geometries
within an `sf` spatial data frame.

## Optional sf Dependency

While `osbng` is fully compatible with sf and can seamlessly work with data
frames in `R`, it does not require sf as a hard dependency. With the exception
of `geom_to_bng_intersection_explode`, the indexing functions can operate
directly on `geos` Geometry objects. This allows you to use these functions with
standard data structures (e.g., vectors, data frames) containing geos
geometries.

```{r setup}
library(osbng)
library(sf)
```

```{r loader}
# Read the Office for National Statistics (ONS) England Regions GeoPackage
# Create an sf data frame
# See examples/data/metadata.json for more information about the data source
gdf <- st_read(
  system.file("extdata", 
              "London_Regions_December_2024_Boundaries_EN_BFC.gpkg", 
              package = "osbng"), 
  quiet = TRUE)

# Filter the data frame columns
gdf_london <- gdf[, c("RGN24CD", "RGN24NM")]

# Return the data frame
gdf_london
```

Check/confirm the coordinate reference system used.

```{r crs}
# osbng indexing functions require geometry coordinates to be specified 
# in British National Grid (BNG) (OSGB36) cordinate reference system
# EPSG:27700
# https://epsg.io/27700
st_crs(gdf_london)
```

### geom_to_bng

Returns a list of `BNGReference` objects representing the BNG grid squares
intersected by the input geometry. The `BNGReference` provides functions to
access and manipulate the reference. This includes the following:

* `print(x, compact = TRUE)`: The BNG reference with whitespace removed.
* `bng_to_grid_geom()`: Returns a grid square as a `geos` or `sf` Polygon.

For more information on the BNG Reference object, see `?as_bng_reference`.

```{r geom_to_bng}
# Return the BNG grid squares intersected by the London Region
# Uses `geom_to_bng()`
# Uses a 5km grid square resolution
# Returns a list of `BNGRefernce` objects for each geometry
gdf_london$bng_ref_5km <- geom_to_bng(gdf_london, resolution = "5km")
```

The resulting data frame includes a list column containing all the BNG reference
objects that intersect the London region. To make it easier to map these grid
squares, we need to transform the data frame so that each reference is its own
row. The easiest way to do this is using `tidyr::unnest`, but this example uses
an alternative when that package is not available.

```{r expanding}
# Expand the bng_ref_5km column to separate rows for each BNG Reference object
df <- data.frame(id = rep(seq_len(nrow(gdf_london)), 
                          lengths(gdf_london$bng_ref_5km)), 
                 bng_ref_5km = as_bng_reference(unlist(gdf_london$bng_ref_5km)))

# Create a data frame of London and combine with BNG observations
df_london <- st_drop_geometry(gdf_london)

# Drop the original bng_ref_5km column
df_london <- df_london[, !names(df_london) %in% c("bng_ref_5km")]

df_london <- cbind(df_london[df$id, ], df)
```


```{r expanding_alt, eval=FALSE}
# Alternative approach requiring `tidyr` - NOT RUN

df_london <- gdf_london %>% 
  unnest(bng_ref_5km) %>%
  st_drop_geometry()
```

This new data frame contains a row for each of the BNG grid squares that
intersected the original region geometry. This column still contains
`BNGReference` objects, so we can extract information about the reference, such
as the geometry.

```{r get_geometry}
# Using the data frame with the set of BNG grid squares
# Get the geometry for each grid square
df_london$geometry <- bng_to_grid_geom(df_london$bng_reference, format = "sf")

# Convert this data frame to an `sf` object with the grid square geometry
gdf_london_exp <- st_sf(df_london)

# Return the first few rows
head(gdf_london_exp)
```

Now that we have the grid square geometries organised in a spatial data frame,
we can visualise the data and demonstrate the concept of assigning geometries to
BNG grid squares. Since this is an `sf` object, `ggplot2::geom_sf` or `tmap` can
be used. Here we demonstrate an alternative without dependencies.

```{r map_bng, fig.height=6.5, fig.width=6.5, warning=FALSE}
# Plot the original London Region
plot(st_geometry(gdf_london), 
     col = "#ffc61e", 
     border = "#fff",
     main = "BNG Grid Squares at 5km Resolution Intersected by London Region",
     cex.main = .8, 
     extent = st_bbox(gdf_london_exp))

# Add the indexed and exploded London regions
plot(st_geometry(gdf_london_exp), 
     col = NA, 
     border = "#333333", 
     add = TRUE)

# Add feature labels at grid square centroids
coords <- st_coordinates(st_centroid(gdf_london_exp))
text(coords[, 1],
     coords[, 2],
     gdf_london_exp$bng_reference, cex = 0.4)

```

### geom_to_bng_intersection_explode

Decomposes each geometry in the input `sf` data.frame bounded by their presence
in BNG grid squares at the specified resolution. Applies the
`geom_to_bng_intersection` function to the active geometry column of the input
data frame, which is expected to be set and in the OSGB36 / British National
Grid coordinate reference system (CRS) (EPSG:27700).

The resulting indexed geometries are exploded into individual rows, with each
row containing a new column for each part of the nested list: `bng_ref`,
`is_core`, and `geometry`.

The input geometry column is replaced with the `geometry` column. The input
geometry column can be retrieved if required by joining the resulting data frame
with the original data frame on the index (if not reset), or using a feature
identifier. Dropping the original geometry column reduces memory usage and
simplifies the resulting object.

All non-geometry columns from the input are retained in the resulting `sf` data
frame.

The columns added to the exploded data frame:

* `bng_ref`: The `BNGReference` object representing the grid square corresponding to the decomposition.
* `is_core`: A Boolean flag indicating whether the grid square geometry is entirely contained by the input geometry. This is relevant for Polygon geometries and helps distinguish between "core" (fully inside)and "edge" (partially overlapping) grid squares.
* `geometry`: The geometry representing the intersection between the input geometry and the grid square. This can be one of a number of geometry types depending on the overlap. When `is_core` is True, geom is the same as the grid square geometry.

```{r expanding_decompose}
# Create a new data frame of the London Region
# Filter the data frame columns
gdf_london <- gdf[, c("RGN24CD", "RGN24NM")]

# Decompose the London Region into a simplified representation
# bounded by its presence in each BNG grid square at a 5km resolution
# Uses the gdf_to_bng_geom_intersection_explode; sf required
gdf_london_exp <- geom_to_bng_intersection_explode(gdf_london, 
                                                   resolution = "5km")

head(gdf_london_exp)
```

This spatial data frame contains decomposed London Region's geometry decomposed
into BNG grid squares. Using this expanded set of records demonstrates the
`is_core` property when grid squares are fully contained by the original
geometry.

```{r viz_is_core, fig.height=6.5, fig.width=6.5, warning=FALSE}
# Plot the indexed and expanded London Region spatial data frame
plot(gdf_london_exp["is_core"],
     border = "#fff",
     col = c("#009ade", "#ff1f5b")[gdf_london_exp$is_core + 1],
     main = "Decomposition of the London Region into BNG Grid Squares at 5km Resolution",
     cex.main = .75)

legend(0.8, 0.3,
       title = "is_core",
       legend = c("True", "False"),
       fill = c("#ff1f5b", "#009ade"))
```

### Alternative: geom_to_bng_intersection

The example below demonstrates how the same data frame `gdf_london_exp` could be
derived using more verbose logic via the `geom_to_bng_intersection` function.

Starting again with the original London Region, the intersecting BNG grid
squares are identified and then the decomposed geometry is exploded into
separate rows.

```{r decompose_london_alt}
# Create a new data frame of the London Region
# Filter the data frame columns
gdf_london <- gdf[, c("RGN24CD", "RGN24NM")]

# Decompose the London Region intoa simplified representation bounded by its
# presence in each BNG grid square at a 5km resolution.
# Uses the geom_to_bng_intersection function
# Returns a list of nested lists
gdf_london$bng_ref_5km <- geom_to_bng_intersection(gdf_london, 
                                                   resolution = "5km")

# Drop original geometry column
gdf_london <- st_drop_geometry(gdf_london)
```

Given the list column of BNG reference objects, there are several ways to expand
and flatten this into multiple rows. One option is using `tidyr::unnest_wider`
followed by `tidyr::unnest`. The example code below shows an alternative
approach without the dependency on `tidyr`.

```{r expanding_decompose_alt, eval=FALSE}
# Alternative syntax - NOT RUN
# Expand the bng_ref_5km column to separate rows for each object.
gdf_london <- cbind("RGN24CD" = gdf_london[, c("RGN24CD")], 
                    data.frame(gdf_london$bng_ref_5km))

# Convert this data frame to an `sf` object with the grid square geometry
gdf_london_exp <- st_sf(gdf_london)

# Return the first few rows of the GeoDataFrame
head(gdf_london_exp)
```

```{r expanding_decompose_alt2, eval=FALSE}
# Alternative approach requiring `tidyr` - NOT RUN
gdf_london_exp <- gdf_london %>%
  unnest_wider(bng_ref_5km) %>%
  unnest(cols = c(BNGReference, is_core, geom)) %>%
  st_sf()
```