# Changelog ## To Do * Check for NA/NaN/Inf in input matrices. * Bring back support for non-numeric data (needs updates in `deadwood`). * Add support for non-square confusion matrices in `normalized_pivoted_accuracy` and `normalized_clustering_accuracy`. ## 1.3.0 (2026-02-23) * The package was heavily refactored; common MST-related functions and classes as well as functions from the `tools` and `plots` modules were moved to the new [`deadwood`](https://deadwood.gagolewski.com/) package, which is now required. * [BACKWARD INCOMPATIBILITY] Outlier detection based solely on whether a node is a leaf of a minimum spanning tree w.r.t. some mutual reachability distance turned out to be subpar in more detailed experiments, especially for smaller smoothing factors. Note that in the previous versions of the package, this feature was deemed merely experimental; Hence, `detect_noise` in `genie.default` and `skip_leaves`, `preprocess`, and `postprocess` elsewhere are no longer available. Instead, use the more universal `deadwood` package now. * [BACKWARD INCOMPATIBILITY] `quitefastmst` version >= 0.9.1 is now required; the introduced backward-incompatible changes have been addressed. In particular, the definition of mutual reachability distances has changed. Unlike in Campello et al.'s 2013 paper, now the core distance is the distance to the *M*-th nearest neighbour, not the *(M-1)*-th one (not including self). * [Python] [BACKWARD INCOMPATIBILITY] `internal` module was renamed `core`. * [BACKWARD INCOMPATIBILITY] Deprecated functions such as `mst_from_nn` have been removed. * [Python] [BACKWARD INCOMPATIBILITY] `compute_full_tree` is now always True. * [BUGFIX] #92: Passing a non-square confusion matrix to `normalized_pivoted_accuracy` and `normalized_clustering_accuracy` yields an error as such objects are yet to be supported. * [R] `gclust` and `genie` now return the computed MST via the `mst` object attribute. `genie` returns an object of the class `mstclust`. This makes it operable with `deadwood`. * [Python] [BUGFIX] Modifying `quitefastmst_params` via `set_state` now invalidates the cached MST. * [Python] [NEW FEATURE] `plots.plot_scatter` has new arguments: `asp`, `markers`, and `colours`. The module globals `mrk` and `col` were renamed accordingly. However, as mentioned above, `plots` was moved to `deadwood`. * [Python] [BACKWARD INCOMPATIBILITY] `compute_all_cuts` in `Genie` was renamed `coarser`. If `True`, `labels_` is still a vector representing the requested `n_clusters`. The coarser-grained labels are now stored in `labels_matrix_` whose `i`-th row represents an `(i+1)`-partition. ## 1.2.0 (2025-07-24) * [Python and R] Using the new implementation of Euclidean and mutual reachability minimum spanning trees (pretty fast in low dimensional spaces) from the [`quitefastmst`](https://quitefastmst.gagolewski.com/) package. * [BACKWARD INCOMPATIBILITY] [Python] Seeking approximate near-neighbours with `nmslib` is no longer supported directly; unfortunately, the package has not been updated for a while. * [BACKWARD INCOMPATIBILITY] `mlpack` is not used anymore. * [Python] `MSTClusterMixin`: A base class for Genie, GIc, and other MST-based clustering algorithms. [later moved to `deadwood`] * [BACKWARD INCOMPATIBILITY] [Python] `Genie` and `GIc`: `affinity` was renamed `metric`. ## 1.1.6 (2024-08-22) * [Python] The package now works with `numpy` 2.0. ## 1.1.5 (2023-10-18) * [BACKWARD INCOMPATIBILITY] Inequality measures are no longer referred to as inequity measures. * [BACKWARD INCOMPATIBILITY] Some external cluster validity measures were renamed: `adjusted_asymmetric_accuracy` → `normalized_clustering_accuracy`, `normalized_accuracy` → `normalized_pivoted_accuracy`. * [BACKWARD INCOMPATIBILITY] [Python] `compare_partitions2` has been removed, as `compare_partitions` and other partition similarity scores now support both pairs of label vectors `(x, y)` and confusion matrices `(x=C, y=None)`. * [Python and R] New parameter to `pair_sets_index`: `clipped`. * In `normalizing_permutation` and external cluster validity measures, the input matrices can now be of the type `double`. * [BUGFIX] [Python] #80: Fixed adjustment for `nmslib_n_neighbors` in small samples. * [BUGFIX] [Python] #82: `cluster_validity` submodule not imported. * [BUGFIX] Some external cluster validity measures now handle NaNs better and are slightly less prone to round-off errors. ## 1.1.4 (2023-03-31) * [Python] The GIc algorithm is no longer marked as experimental; its description is provided in . ## 1.1.3 (2023-01-17) * [R] `mst.default` now throws an error if any element in the input matrix is missing/infinite. * [Python] The call to `mlpack.emst` that stopped working with the new version of `mlpack` has been fixed. ## 1.1.2 (2022-09-17) * [Python and R] `adjusted_asymmetric_accuracy` now accepts confusion matrices with fewer columns than rows. Such "missing" columns are now treated as if they were filled with 0s. * [Python and R] `pair_sets_index`, and `normalized_accuracy` return the same results for non-symmetric confusion matrices and transposes thereof. ## 1.1.1 (2022-09-15) * [Python] #75: `nmslib` is now optional. * [BUILD TIME] The use of `ssize_t` was not portable. ## 1.1.0 (2022-09-05) * [Python and R] New function: `adjusted_asymmetric_accuracy`. * [Python and R] Implementations of the so-called internal cluster validity measures discussed in DOI: [10.1016/j.ins.2021.10.004](https://doi.org/10.1016/j.ins.2021.10.004); see our (GitHub-only) [CVI](https://github.com/gagolews/optim_cvi) package for R. In particular, the generalised Dunn indices are based on the code originally authored by Maciej Bartoszuk. Thanks. Functions added (`cluster_validity` module): `calinski_harabasz_index`, `dunnowa_index`, `generalised_dunn_index`, `negated_ball_hall_index`, `negated_davies_bouldin_index`, `negated_wcss_index`, `silhouette_index`, `silhouette_w_index`, `wcnn_index`. These cluster validity measures are discussed in more detail at . * [BACKWARD INCOMPATIBILITY] `normalized_confusion_matrix` now solves the maximal assignment problem instead of applying the somewhat primitive partial pivoting. * [Python and R] New function: `normalizing_permutation` * [R] New function: `normalized_confusion_matrix`. * [Python and R] New parameter to `pair_sets_index`: `simplified`. * [Python] New parameters to `plots.plot_scatter`: `axis`, `title`, `xlabel`, `ylabel`, `xlim`, `ylim`. ## 1.0.1 (2022-08-08) * A paper on the *genieclust* package is now available: M. Gagolewski, genieclust: Fast and robust hierarchical clustering, SoftwareX 15, 100722, 2021, DOI: [10.1016/j.softx.2021.100722](https://doi.org/10.1016/j.softx.2021.100722). * [Python] `plots.plot_scatter` now uses a more accessible default palette (from R 4.0.0). * [Python and R] New function: `devergottini_index`. ## 1.0.0 (2021-04-22) * [R] Using `mlpack` instead of `RcppMLPACK` (#72). This package is merely suggested, not dependent upon. ## 0.9.8 (2021-01-08) * [Python] Require Python >= 3.7 (implied by `numpy`). * [Python] Require `nmslib`. * [R] Use `RcppMLPACK` directly; remove dependency on `emstreeR`. * [R] Use `tinytest` for unit testing instead of `testthat`. ## 0.9.4 (2020-07-31) * [BUGFIX] [R] Fixed build errors on Solaris. ## 0.9.3 (2020-07-25) * [BUGFIX] [Python] Added code coverage CI. Fixed some minor inconsistencies. Automated the `bdist` build chain. * [R] Updated DESCRIPTION to meet the CRAN policies. ## 0.9.2 (2020-07-22) * [BUGFIX] [Python] Fix broken build script for OS X with no OpenMP. ## 0.9.1 (2020-07-18) * The package has been completely rewritten. The core functionality is now implemented in C++ (with OpenMP). * [R] R version is now available. * [EXPERIMENTAL] A preliminary version of clustering with respect to DBSCAN*-like mutual reachability distances is now supported. * The parallelised Jarník-Prim algorithm now supports on-the-fly distance computations. Euclidean minimum spanning tree can be determined with `mlpack`, which is much faster in low-dimensional spaces. * [EXPERIMENTAL] [Python] The GIc algorithm proposed by Anna Cena in her 2018 PhD thesis wad added. * [Python] Approximate version based on nearest neighbour graphs produced by `nmslib` was added. ## 0.1a2 (2018-05-23) * [Python] Initial PyPI release.