BiplotML 1.1.1

Resubmission to CRAN

This version restores the package to CRAN after it was archived on 2023-10-29 due to a dependency on the optimr package, which was removed from CRAN at the maintainer’s request. All calls to optimr() now use optimx::optimr(), which is the current home of that function.

Dependency changes

Bug fixes

Internal changes


BiplotML 1.1.0

New algorithm: projection-based logistic biplot with missing data

Version 1.1.0 introduced a major new fitting method for the logistic biplot model, described in:

Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2022). A coordinate descent MM algorithm for logistic biplot model with missing data. In process.

The problem addressed

All previous logistic biplot algorithms (alternating, external logistic, and the conjugate gradient / iterated SVD methods introduced in v1.0.0) share a structural limitation: each row of the data matrix has its own parameter vector theta_i = mu_i + sum_s a_is * b_s. Consequently:

The new approach

The new method reformulates the logistic biplot using Pearson’s (1901) data projection idea, extended to the logistic case by Landgraf & Lee (2020). Instead of treating each row’s coordinates as independent free parameters, the row markers are expressed as a projection of the (centred) data matrix onto a low-rank subspace V:

A = (X - 1 * mu') * V

This single change has three important consequences:

  1. The number of parameters no longer depends on n. Only the p x k matrix V (column markers) and the p-vector mu (intercepts) need to be estimated, regardless of how many rows the data matrix has.

  2. New individuals can be projected without refitting. Given estimated V and mu, the row markers of any new observation x_new are simply: a_new = (x_new - mu’) * V. No optimisation is required.

  3. Missing data are handled natively. A weight matrix W (W_ij = 1 if x_ij is observed, 0 if missing) is incorporated into the loss function. Missing entries are imputed at each iteration using the current fitted values and a per-variable threshold that minimises the Balanced Accuracy (BACC), ensuring that classification performance is optimised throughout fitting.

The algorithm

The objective function – the negative log-likelihood weighted by W – is non-convex. To avoid dealing with it directly, it is majorized at each iteration by a quadratic surrogate (the MM step), following the approach of Babativa-Marquez & Vicente-Villardon (2021). The surrogate is then minimised using a block coordinate descent algorithm:

Because each MM step reduces the surrogate, and the surrogate upper-bounds the true loss, the algorithm guarantees that the negative log-likelihood is non-increasing across iterations.

New and updated functions

LogBip() – updated

The main fitting function now accepts method = "PDLB" (Projection-based logistic biplot with block coordinate Descent) in addition to the existing "MM", "CG", and "BFGS" methods.

When method = "PDLB":

# Complete data -- coordinate descent MM algorithm (fast, no missing values)
res_MM <- LogBip(x = Methylation, method = "MM", maxit = 1000)

# Matrix with missing data -- projection-based block coordinate descent
set.seed(12345)
n <- nrow(Methylation); p <- ncol(Methylation)
miss         <- matrix(rbinom(n * p, 1, 0.2), n, p)
miss         <- ifelse(miss == 1, NA, miss)
x_miss       <- Methylation + miss
res_PDLB     <- LogBip(x = x_miss, method = "PDLB", maxit = 1000)
imputed_data <- res_PDLB$impute_x   # completed matrix

proj_LogBip() – new

Low-level function that implements the projection-based block coordinate descent algorithm directly. It is called internally by LogBip(method = "PDLB") but is also exported for advanced users who need direct control over the algorithm.

out <- proj_LogBip(x = x_miss, k = 2, max_iters = 1000, epsilon = 1e-5)
# out$mu      -- estimated intercept vector (length p)
# out$A       -- row-marker matrix (n x k)
# out$B       -- column-marker matrix (p x k)
# out$x_est   -- imputed binary matrix
# out$iter    -- number of iterations
# out$loss_funct -- loss function values per iteration

cv_LogBip() – updated

Cross-validation now supports method = "PDLB", allowing selection of the optimal number of dimensions k for datasets with missing values.

cv_result <- cv_LogBip(data = x_miss, k = 0:5, method = "PDLB", maxit = 1000)

References

Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2021). Logistic biplot by conjugate gradient algorithms and iterated SVD. Mathematics, 9(16), 2015. https://doi.org/10.3390/math9162015

Landgraf, A. J., & Lee, Y. (2020). Dimensionality reduction for binary data through the projection of natural parameters. Journal of Multivariate Analysis, 180, 104668. https://doi.org/10.1016/j.jmva.2020.104668

Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559-572.

Vicente-Villardon, J. L., & Galindo, M. P. (2006). Logistic biplots. In M. Greenacre & J. Blasius (Eds.), Multiple Correspondence Analysis and Related Methods (pp. 503-521). Chapman & Hall.


BiplotML 1.0.0

Initial CRAN release

First release of BiplotML, providing methods for fitting logistic biplot models to multivariate binary data.

Functions

Data

Reference

Babativa-Marquez, J. G., & Vicente-Villardon, J. L. (2021). Logistic biplot by conjugate gradient algorithms and iterated SVD. Mathematics, 9(16), 2015. https://doi.org/10.3390/math9162015