MOGAMUN

Elva María Novoa del Toro

Introduction

This document describes the use of MOGAMUN, the type and format of the input data, and the post-processing and visualization of the output data. MOGAMUN is a package to find active modules (i.e., highly connected subnetworks with an overall deregulation) in multiplex biological networks. For a detailed
description of MOGAMUN check out the preprint https://www.biorxiv.org/content/10.1101/2020.05.25.114215v1. All the expression datasets and networks that we used to obtain the results reported in our preprint are available in the GitHub repository https://github.com/elvanov/MOGAMUN-data.
IMPORTANT. Please note that there was a bug in the non-domination sorting process. All the runs executed between October 15th, 2020 and February 12th, 2021 must be re-executed. We apologize for the inconveniences.

Workflow

The workflow of MOGAMUN is composed of 3 main steps: initialization of parameters, providing input data and running the algorithm.

Initialization of parameters

We set here the values for the evolution parameters and other general parameters, such as the minimum and maximum sizes of the subnetworks, via mogamun_init. Please note that we strongly recommend to run the algorithm with the default values. The only exception is for MinSize and MaxSize, which you can adapt to get bigger or smaller subnetworks, knowing that MOGAMUN tends to give as result subnetworks of sizes near or equal to MinSize. In total, there are 11 customizable parameters:

If MOGAMUN is to be run with the default values, execute EvolutionParameters <- mogamun_init(). Otherwise, specify the parameters to change, separated by commas. Please be aware than although we recommend to let the evolution run for 500 generations, this process can be long. For instance, using a multiplex networks with the three layers that we provide in https://github.com/elvanov/MOGAMUN-data and the default values for all the parameters, the process took approximately 12 hours in a desktop computer with Intel processor i7 at 3.60GHz and 32GB of RAM.

parameters <- mogamun_init(Generations = 1, PopSize = 10)

Providing input data

MOGAMUN uses two sources of information: one or more biological networks, and the statistical values resulting from a differential expression analysis or any other test that gives as result p-values or False Discovery Rates (FDR) associated to genes. The second step of the workflow is to provide the input data using the mogamun_load_data function, which has 5 parameters:

dePath <- system.file("extdata/DE/Sample_DE.csv", package = "MOGAMUN")
scoresPath <-
    system.file("extdata/DE/Sample_NodesScore.csv", package = "MOGAMUN")
layersPath <-
    paste0(system.file("extdata/LayersMultiplex", package = "MOGAMUN"), "/")

loadedData <-
    mogamun_load_data(
        EvolutionParameters = parameters,
        DifferentialExpressionPath = dePath,
        NodesScoresPath = scoresPath,
        NetworkLayersDir = layersPath,
        Layers = "23"
    )

Running the algorithm

Once we have defined all the parameters and provided the input data, we are ready to run MOGAMUN using the mogamun_run function, which has 4 parameters:

mogamun_run(LoadedData = loadedData, ResultsDir = '.')
## [1] "Run 1. Gen. 1 completed"
## [1] "FINISH TIME, RUN 1: 2024-05-01 00:45:05.866718"
## [[1]]
##           used (Mb) gc trigger  (Mb) max used  (Mb)
## Ncells 1488587 79.5    2538442 135.6  2538442 135.6
## Vcells 2611809 20.0    8388608  64.0  4513613  34.5

In the results directory (ResultsDir) you will find a subfolder which name contains the date when you executed the experiment. Inside, there will be two files per run (MOGAMUN_Results_StatisticsPerGeneration_RunN.csv and MOGAMUN_Results__Run_N.txt). The file MOGAMUN_Results_StatisticsPerGeneration_RunN.csv contains the best values for the two objectives (average nodes score and density) per generation, which you can use to check the convergence, for instance. The file MOGAMUN_Results__Run_N.txt contains the complete final population of size PopSize (i.e. all the subnetworks from the last generation), one per row. The number of elements in every row is variable because the size of each subnetwork can vary between MinSize and MaxSize. If X_n is the number of elements in the n-th row: the nodes of the subnetwork are the first X_n-4 elements. The last four elements correspond to the average nodes score, density, rank, and crowding distance, respectively. The best (non-dominated) subnetworks have are those with rank = 1.

Postprocessing of the results

In our preprint (https://www.biorxiv.org/content/10.1101/2020.05.25.114215v1), we ran MOGAMUN 30 times for each experiment. This increases the chances to find the global maxima. Given that the result of every run is the set of subnetworks with rank = 1, if you execute MOGAMUN multiple times the final result is the union of the results of all the individual runs. But considering that in such set there might be subnetworks that are better than others (according to the Pareto dominance), to obtain the final result we calculate the accumulated Pareto front. To this goal, we re-rank the set composed by the union of all the results, and leave only those subnetworks with rank = 1. In addition, we propose to merge the subnetworks that are very similar, in order to avoid having two different networks if they only differ for one node, for instance (see JaccardSimilarityThreshold).

Depending on the number of runs you execute, there are other plots that might be generated during the postprocessing, such as scatter plots (always) and boxplots (only if NumberOfRunsToExecute in mogamun_run > 1).

Finally, it is possible to visualize the subnetworks from the accumulated Pareto front in Cytoscape. Please note that the network you used to build the multiplex will be filtered to leave only those interactions among the genes that are included in the result. The nodes will be colored according to their logFC value, where green stands for downregulated, red means upregulated, and white means no deregulation. Nodes corresponding to significantly differentially expressed genes will have black border, and the color of the edges will be different for each layer of the multiplex network. To check the layer an edge corresponds to, click on the edge and check the value of the TypeOfInteraction, in the edge table.

The postprocessing is done with the function mogamun_postprocess, which has 4 parameters:

Please note that you can make use of mogamun_postprocess with any number of runs. If the experiment to be postprocessed contains a single run, the result will be the set of subnetworks in the first Pareto front. Otherwise, the result will be the accumulated Pareto front, as we already explained.

mogamun_postprocess(
    LoadedData = loadedData, 
    ExperimentDir = '.', 
    VisualizeInCytoscape = FALSE
)