library(remstats)

This vignette explains how to compute statistics for relational event history data using remstats(), for both tie-oriented and actor-oriented models. When using the package please cite:

Meijerink-Bosman, M., Back, M., Geukes, K., Leenders, R., & Mulder, J. (2023). Discovering trends of social interaction behavior over time: An introduction to relational event modeling. Behavior Research Methods. https://doi.org/10.3758/s13428-022-01821-8


Introduction

Relational event modeling approaches enable researchers to investigate both exogenous and endogenous factors influencing the evolution of a time-ordered sequence of relational events — commonly known as a relational event history. These models are categorized into tie-oriented models, where the probability of a dyad interacting next is modeled in a single step (e.g., see Butts, 2008), and actor-oriented models, which first model the probability of a sender initiating an interaction and subsequently the probability of the sender’s choice of receiver (e.g., see Stadtfeld & Block, 2017). The R package remstats is designed to compute a variety of statistics for both types of models.

The remstats package is part of the remverse suite of R packages developed at Tilburg University to support applied researchers in relational event modeling. For preparing the relational event history, remstats requires prior processing with remify::remify(). Model estimation can subsequently be executed using remstimate.


Quick workflow example

# Load data
data(history)
data(info)

# Set weights to 1 (no event weighting)
history$weight <- 1

# Define effects
effects <- ~ 1 + send("extraversion", info) + inertia()

# Prepare event history
reh <- remify::remify(edgelist = history, model = "tie")

# Compute statistics
stats <- remstats(reh = reh, tie_effects = effects)

# Estimate model parameters (requires remstimate)
# fit <- remstimate::remstimate(reh = reh, stats = stats, method = "MLE")

Data

Relational event history data describes a time-ordered series of interactions between actors in a network. A relational event minimally contains information on the time of the event and the actors involved.

We use the history dataset from the remstats package (see ?history): a small simulated relational event history with 115 events among 10 actors. Each event also records a setting (either "work" or "social") and an event weight.

head(history)
#>   time actor1 actor2 setting weight
#> 1  238    105    113    work      1
#> 2  317    105    109    work      1
#> 3  345    115    112    work      1
#> 4  627    101    115  social      1
#> 5  832    113    107  social      1
#> 6  842    105    109    work      1

Exogenous information on actors is stored in the info object (see ?info), containing age, sex, extraversion, and agreeableness scores measured at multiple time points for each of the 10 actors:

head(info)
#>   name  time age sex extraversion agreeableness
#> 1  101     0   0   0        -0.40         -0.14
#> 2  101  9432   0   0        -0.32         -0.26
#> 3  101 18864   0   0        -0.53         -0.45
#> 4  103     0   0   0        -0.13         -0.65
#> 5  103  9432   0   0        -0.43         -0.44
#> 6  103 18864   0   0        -0.13         -0.43

Tie-oriented model

Preparing the event history

We prepare the event history for the tie-oriented model with remify::remify(). When a weight column is present in the edgelist, it is used to weight events in the computation of statistics. Here we set weights to one to avoid this:

history$weight <- 1
reh <- remify::remify(edgelist = history, model = "tie")

Computing statistics

Statistics for the tie-oriented model are supplied to the tie_effects argument of remstats() as a formula object of the form ~ terms. An overview of available statistics is obtained with tie_effects() or ?tie_effects.

As a first example, we request the standardized inertia statistic:

effects <- ~ inertia(scaling = "std")
out <- remstats(tie_effects = effects, reh = reh)

The output is a 3-dimensional array. Rows correspond to unique time points, columns to potential events in the risk set, and slices to statistics:

dim(out)
#> [1] 114  90   2

The number of rows equals reh$M — the number of unique time points (115 here). When simultaneous events are present, the number of rows is less than the total number of events, since statistics are computed once per unique time point. There are 90 columns corresponding to the \(10 \times 9\) directed dyads in the full risk set. The two slices are: (1) a baseline statistic, automatically included when event timing is exact (ordinal = FALSE), and (2) the inertia statistic. The baseline can be suppressed by including -1 in the effects formula.

out
#> Relational Event Network Statistics
#> > Model: tie-oriented
#> > Computation method: per time point
#> > Dimensions: 114 time points x 90 dyads x 2 statistics
#> > Statistics:
#>   >> 1: baseline
#>   >> 2: inertia

The risk set composition is stored as an attribute of the output. Each row describes one dyad, with columns sender, receiver, type (if event types are present), and id (the column index in the statistics array):

head(attr(out, "riskset"))
#>   sender receiver id
#> 1    101      103  1
#> 2    101      104  2
#> 3    101      105  3
#> 4    101      107  4
#> 5    101      109  5
#> 6    101      111  6

Exogenous statistics require an attr_actors object. The attr_actors can be passed directly inside effect functions or globally via the attr_actors argument of remstats(). Statistics are separated by + in the formula, and interactions are specified with : or *:

effects <- ~ inertia(scaling = "std") + 
             send("extraversion", info) + 
             inertia(scaling = "std"):send("extraversion", info)
out <- remstats(tie_effects = effects, reh = reh)
out
#> Relational Event Network Statistics
#> > Model: tie-oriented
#> > Computation method: per time point
#> > Dimensions: 114 time points x 90 dyads x 4 statistics
#> > Statistics:
#>   >> 1: baseline
#>   >> 2: inertia
#>   >> 3: send_extraversion
#>   >> 4: inertia:send_extraversion

Memory

The memory argument controls which past events are included in the computation of endogenous statistics at each time point \(t\). Four options are available:

  • "full" (default): the entire event history before \(t\) is included.
  • "window": only events within a time window of length memory_value before \(t\) are included. For example, memory = "window", memory_value = 200 includes only events that occurred at most 200 time units ago.
  • "interval": only events within a specific time interval before \(t\) are included, defined by memory_value = c(lower, upper). For example, memory_value = c(100, 200) includes only events between 100 and 200 time units ago.
  • "decay": past events are weighted by an exponential decay function with half-life memory_value. More recent events receive higher weight.
# Full memory (default)
out_full <- remstats(
    tie_effects  = ~ inertia(),
    reh          = reh,
    memory       = "full"
)

# Window memory: only events within the last 500 time units
out_window <- remstats(
    tie_effects  = ~ inertia(),
    reh          = reh,
    memory       = "window",
    memory_value = 500
)

# Decay memory: exponential decay with half-life 200
out_decay <- remstats(
    tie_effects  = ~ inertia(),
    reh          = reh,
    memory       = "decay",
    memory_value = 200
)

Event types and consider_type

When an event type variable is present in the relational event history, the consider_type argument in each effect function controls how the statistic accounts for event types. We illustrate this using the setting column in history (values: "work", "social"), passed to remify() via the event_type argument:

reh_typed <- remify::remify(
    edgelist   = history,
    model      = "tie",
    event_type = "setting"
)

Three options are available for consider_type:

"ignore": event type is not taken into account. The statistic is computed over all past events regardless of type. This is the appropriate choice when event types are not a substantive concern, or when no types are present.

out_ignore <- remstats(
    tie_effects = ~ inertia(consider_type = "ignore"),
    reh         = reh_typed
)
dim(out_ignore)       # one inertia slice plus baseline
#> [1] 114  90   2
dimnames(out_ignore)[[3]]
#> [1] "baseline" "inertia"

"separate": the statistic is computed separately for each event type, resulting in one slice per type. For example, with two types "social" and "work", inertia yields: (1) the number of past "social" events for each dyad, and (2) the number of past "work" events for each dyad.

out_separate <- remstats(
    tie_effects = ~ inertia(consider_type = "separate"),
    reh         = reh_typed
)
dim(out_separate)       # two inertia slices (one per type) plus baseline
#> [1] 114  90   3
dimnames(out_separate)[[3]]
#> [1] "baseline"       "inertia.social" "inertia.work"

"interact": the statistic conditions on the type of the most recent event. The statistic at time \(t\) is computed only over past events of the same type as the event at \(t-1\), effectively allowing the effect of past event history to depend on the type of the triggering event.

out_interact <- remstats(
    tie_effects = ~ inertia(consider_type = "interact"),
    reh         = reh_typed
)
#> Warning in prepare_tomstats(effects = effects, reh = reh, attr_actors =
#> attr_actors, : "interact" requires extend_riskset_by_type = TRUE; coercing to
#> "separate".
dim(out_interact)
#> [1] 114  90   3
dimnames(out_interact)[[3]]
#> [1] "baseline"       "inertia.social" "inertia.work"

Tie-oriented model with case-control sampling

For large networks where computing statistics over the full risk set is computationally expensive, remstats supports case-control sampling (also known as dyad sampling). Instead of computing statistics for all dyads in the risk set at each time point, a random sample of dyads is drawn and used as the comparison set, substantially reducing computation time.

Case-control sampling is enabled by setting sampling = TRUE in remstats(). Two additional arguments control the sampling:

  • samp_num: the number of dyads to sample per event (must be ≤ the size of the active risk set).
  • seed: an optional integer random seed for reproducibility.
reh <- remify::remify(edgelist = history, model = "tie")

out_sampled <- remstats(
    tie_effects = ~ inertia() + send("extraversion", info),
    reh         = reh,
    sampling    = TRUE,
    samp_num    = 20,
    seed        = 42
)

The output is an object of class tomstats_sampled. Its structure differs from the standard output: rather than a full array over all dyads, it stores only the statistics for the sampled dyads plus the observed event at each time point. This object can be passed directly to remstimate for model estimation using the case-control likelihood.

Note that case-control sampling is only available for the tie-oriented model and is not supported for the actor-oriented model.


Actor-oriented model

Preparing the event history

We prepare the event history for the actor-oriented model. Note that actor-oriented modeling requires directed events:

reh <- remify::remify(edgelist = history, model = "actor")

Computing statistics

For the actor-oriented model, statistics must be specified separately for the two modeling steps:

  • Sender activity rate step: supplied to sender_effects — models which actor sends the next event.
  • Receiver choice step: supplied to receiver_effects — models which actor the sender chooses as receiver.

An overview of available statistics for each step is obtained with actor_effects() or ?actor_effects.

We start by requesting the outdegreeSender statistic for the sender step:

out <- remstats(
    sender_effects = ~ outdegreeSender(),
    reh            = reh
)

The output for the actor-oriented model is a list with two elements:

names(out)
#> [1] "sender_stats"   "receiver_stats"

Since no statistics were requested for the receiver step, receiver_stats is empty. The sender_stats object contains the baseline statistic (automatically included when ordinal = FALSE) and the requested outdegreeSender statistic:

out
#> Relational Event Network Statistics
#> > Model: actor-oriented
#> > Computation method: per time point
#> > Sender model:
#>   >> Dimensions: 114 time points x 10 actors x 2 statistics
#>   >> Statistics:
#>       >>> 1: baseline
#>       >>> 2: outdegreeSender

The dimensions of out$sender_stats are 115 × 10 × 2: rows are unique time points, columns are potential senders, and slices are statistics.

We extend the model with a statistic for the receiver choice step:

out <- remstats(
    sender_effects   = ~ outdegreeSender(),
    receiver_effects = ~ inertia(),
    reh              = reh
)

The statistics for the receiver choice step are accessed with out$receiver_stats. The baseline statistic is not defined for the receiver step, so its dimensions are 115 × 10 × 1: rows are time points, columns are potential receivers, and slices are statistics.

The computed values in the receiver choice step give the statistic for each potential receiver, given the current sender. For example:

# Label columns with receiver names and rows with senders
dimnames(out$receiver_stats)[[2]] <- reh$meta$dictionary$actors$actorName
dimnames(out$receiver_stats)[[1]] <- reh$edgelist$actor1[-1]
head(out$receiver_stats[,, "inertia"])
#>     101 103 104 105 107 109 111 112 113 115
#> 105   0   0   0   0   0   0   0   0   1   0
#> 115   0   0   0   0   0   0   0   0   0   0
#> 101   0   0   0   0   0   0   0   0   0   0
#> 113   0   0   0   0   0   0   0   0   0   0
#> 105   0   0   0   0   0   1   0   0   1   0
#> 113   0   0   0   0   1   0   0   0   0   0

At the first time point, inertia is zero for all receivers because no prior events have occurred. At the second time point, inertia is 1 for the receiver of the first event (given the same sender). At the third time point, the sender is a different actor with no prior sending history, so inertia is again zero for all receivers.