--- title: "Implementing a dist_structure subclass" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Implementing a dist_structure subclass} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(dist.structure) ``` `dist.structure` is a protocol package: the closed-form constructors (`exp_series`, `exp_kofn`, ...) and topology shortcuts (`series_dist`, `kofn_dist`, `bridge_dist`, ...) are reference implementations of an S3 contract that anyone can extend. Downstream packages do exactly that: `serieshaz` adds a `dfr_dist_series` class for arbitrary dynamic-failure-rate series systems; `kofn` builds inference machinery on the closed-form k-out-of-n classes; future packages will add their own. This vignette walks through the protocol from the implementor's side: what to provide, what you get for free, when to override defaults for performance, and how to validate your subclass. ## The contract: four steps Every `dist_structure` subclass provides: 1. **A class chain** that includes `"dist_structure"`, `"univariate_dist"`, and `"dist"`. Almost always also a sub-chain like `"coherent_dist"` if the system is coherent. 2. **`ncomponents(x)`**: number of components `m`. 3. **`component(x, j, ...)`**: the j-th component returned as an `algebraic.dist::dist` object with parameters baked in. The returned dist must be independently evaluable: `surv(component(x, j))(t)`, `sampler(component(x, j))(n)`, etc., must all work. 4. **At least one of**: - `phi(x, state)`: the structure function, returning 0 or 1 for any binary state vector of length `m`. - `min_paths(x)`: a list of integer vectors enumerating minimal functioning component subsets. `phi` and `min_paths` are dual: each has a default that derives from the other. If you provide *both*, they must be consistent (`phi(x, state) == 1` iff `state` covers some path in `min_paths(x)`). Inconsistent implementations produce silently inconsistent results, because `reliability` / `critical_states` / `is_coherent` route through `phi` while `min_cuts` / `system_signature` route through `min_paths`. Everything else has a default that derives from those four primitives. Defaults exist for `min_cuts`, `system_signature`, `critical_states`, `reliability`, `is_coherent`, `structural_importance`, `system_lifetime`, `system_censoring`, `dual`, `surv`, `cdf`, and `sampler`. ## A worked example: an alarm system Consider a fire alarm with `m` redundant smoke sensors and one master controller. The system **functions** if (i) at least `k` of the `m` sensors function AND (ii) the master controller functions. This is not exactly a k-out-of-n system (the master is special) and not exactly a series-of-(parallel-of-sensors-with-master) (the master gates everything). We can capture it directly as a custom `dist_structure`. ### Step 1: pick a representation Components are indexed `1..m` for the sensors and `m+1` for the master. ### Step 2: write the constructor ```{r} alarm_dist <- function(k, sensor_components, master_component) { m_sensors <- length(sensor_components) stopifnot(k >= 1L, k <= m_sensors, inherits(master_component, "dist"), all(vapply(sensor_components, function(d) inherits(d, "dist"), logical(1L)))) structure( list( k = as.integer(k), m_sensors = as.integer(m_sensors), m = as.integer(m_sensors + 1L), components = c(sensor_components, list(master_component)) ), class = c("alarm_dist", "dist_structure", "univariate_dist", "continuous_dist", "dist") ) } ``` The class chain declares membership in `dist_structure` and the `algebraic.dist` ancestors. We bundle the master into the components list so the `j`-th component is uniformly accessible. ### Step 3: implement the three required generics ```{r} ncomponents.alarm_dist <- function(x) x$m component.alarm_dist <- function(x, j, ...) { stopifnot(j >= 1L, j <= x$m) x$components[[j]] } phi.alarm_dist <- function(x, state) { stopifnot(length(state) == x$m) sensor_state <- state[seq_len(x$m_sensors)] master_state <- state[x$m] as.integer(sum(sensor_state) >= x$k && master_state == 1L) } ``` That's all that's strictly required by the protocol. ### Step 4: validate ```{r} sensors <- replicate(4, algebraic.dist::exponential(0.5), simplify = FALSE) master <- algebraic.dist::exponential(0.1) # master is most reliable alarm <- alarm_dist(k = 2L, sensors, master) validate_dist_structure(alarm) ``` If we had forgotten `phi.alarm_dist`, `validate_dist_structure()` would have stopped with a clear error pointing at the missing primitive instead of letting the failure surface much later in some default method's dispatch. ## What you get for free Every default below is now available on `alarm`: ```{r} # Number of components ncomponents(alarm) # Structure function evaluation phi(alarm, c(1, 1, 0, 0, 1)) # 2 sensors + master alive: alarms phi(alarm, c(1, 1, 1, 1, 0)) # all sensors but no master: silent # Minimal cut sets (derived from phi via the Berge transversal) min_cuts(alarm) # Critical states for component j critical_states(alarm, 5L) # the master: critical in every state where >= k sensors are alive # Reliability at uniform component reliability p reliability(alarm, 0.9) # Distribution algebra: system survival via component composition S <- algebraic.dist::surv(alarm) S(c(1, 5, 10)) # Sampling: m independent component lifetimes, combined via system_lifetime withr::with_seed(1, { x <- algebraic.dist::sampler(alarm)(1000) mean(x) }) ``` Notice that `min_cuts`, `critical_states`, `reliability`, `surv`, and `sampler` all "just work" without being implemented for `alarm_dist`. They composed from `ncomponents`, `component`, `phi`, and the component-level distribution algebra. ## When to override a default Defaults are designed for correctness across all topologies. They are not always the fastest path. The two most common reasons to override: **1. Closed-form distributional shortcuts.** The default `surv.dist_structure` enumerates over component states via the reliability polynomial, which is `O(2^m)`. If your subclass has an analytical formula for `S_sys(t)`, override `surv` directly: ```r surv.alarm_dist <- function(x, ...) { k <- x$k m_sens <- x$m_sensors sensor_surv <- lapply(seq_len(m_sens), function(j) algebraic.dist::surv(component(x, j))) master_surv <- algebraic.dist::surv(component(x, x$m)) function(t, ...) { vapply(t, function(ti) { # P(>= k sensors alive at ti) * P(master alive at ti). ps <- vapply(sensor_surv, function(S) S(ti), numeric(1L)) sensor_p <- sum(vapply(seq.int(k, m_sens), function(sz) { sum(vapply(utils::combn(m_sens, sz, simplify = FALSE), function(A) { prod(ps[A]) * prod(1 - ps[setdiff(seq_len(m_sens), A)]) }, numeric(1L))) }, numeric(1L))) sensor_p * master_surv(ti) }, numeric(1L)) } } ``` **2. Topology-specific shortcuts.** Closed-form `min_paths` saves the Berge transversal from rederiving them. For our alarm system, every minimal path is a `(k-of-m-sensors)` subset unioned with the master: ```r min_paths.alarm_dist <- function(x) { k <- x$k m_sens <- x$m_sensors master_idx <- x$m lapply(utils::combn(m_sens, k, simplify = FALSE), function(P) sort(c(as.integer(P), master_idx))) } ``` With `min_paths.alarm_dist` defined, `min_cuts(alarm)` becomes the default Berge transversal but starts from a small explicit list rather than re-deriving paths from `phi` (which it would do via the default chain through `binary_grid`). The general rule: **provide what you know in closed form; rely on defaults for the rest**. Do not pre-emptively override a default unless you have a reason; correctness across all dispatch chains is much easier when only the primitives are subclass-specific. ## Inheriting from `coherent_dist` If your subclass is a coherent system, you can inherit from `coherent_dist` and skip step 3 entirely: ```{r} parallel_of_master_and_sensors <- function(sensor_components, master_component) { components <- c(sensor_components, list(master_component)) m <- length(components) obj <- coherent_dist( min_paths = list(seq_len(m - 1L), m), # all sensors as one path; master as another components = components, m = m ) class(obj) <- c("master_or_sensors_dist", class(obj)) obj } ``` `coherent_dist` already implements `ncomponents`, `component`, and `min_paths` for you; you supply only the topology-specific data (min_paths and components). The class chain automatically includes `"coherent_dist"`, so all the usual dispatch (including `dual.coherent_dist`, which produces a real dual `coherent_dist` instead of a lazy wrapper) takes over. ## What the default `dual` does `dual(x)` produces the dual structure `phi_dual(state) = 1 - phi(x, 1 - state)`. When `x` inherits from `coherent_dist`, the `dual.coherent_dist` method swaps minimal paths and cuts to produce a fresh `coherent_dist`. When `x` is a generic `dist_structure` (your own subclass), the default returns a lazy wrapper of class `"dual_of_system"`. The lazy wrapper implements `ncomponents`, `component`, and `phi` by deferring to the original; everything else (signatures, importance, surv) flows through default methods exactly as for any other `dist_structure`. If you want `dual()` on your subclass to return something more specific, override: ```r dual.alarm_dist <- function(x) { # ... return a transformed alarm_dist ... } ``` ## Cross-references - The closed-form specializations in this package (`exp_series.R`, `wei_kofn.R`, etc.) are concrete examples of subclasses that override `surv`, `cdf`, `sampler`, `density`, and `hazard` for speed while inheriting the topology defaults. - `serieshaz::dfr_dist_series` is an out-of-package implementor that adds a fully general dynamic-failure-rate component family while participating in the protocol. - `kofn::kofn` is an inference layer built on top of `exp_kofn` / `wei_kofn`; it does not subclass `dist_structure` itself but uses the protocol heavily. For more detail on any specific generic, see the `?phi`, `?min_paths`, `?dual`, `?reliability`, and `?dist_structure` help pages.