Type: Package
Title: Monte Carlo Simulation for Structural Equation Modeling
Version: 2.0.0
Description: Provides tools to conduct Monte Carlo simulations under different conditions (e.g., varying sample size, data normality) for structural equation models (SEMs). Data can be simulated based on user-defined factor loadings and correlations, with optional non-normality added via Fleishman's power method (1978) <doi:10.1007/BF02293811>. Once generated, models can be estimated using 'lavaan'. This package facilitates testing model performance across multiple simulation scenarios. When data generation is completed (or when generated data sets are given) model tests can also be run. Please cite as "Orçan, F. (2021). MonteCarloSEM An R Package to Simulate Data for SEM. International Journal of Assessment Tools in Education, 8 (3), 704-713."
License: GPL-3
Encoding: UTF-8
RoxygenNote: 7.3.1
Imports: Matrix, stats, utils, lavaan
Copyright: Fatih Orçan, Kahramanmaraş Sütçü İmam University, Türkiye
NeedsCompilation: no
Packaged: 2026-01-28 09:26:45 UTC; USER
Author: Fatih Orçan ORCID iD [aut, cre]
Maintainer: Fatih Orçan <fatihorcan84@gmail.com>
Repository: CRAN
Date/Publication: 2026-01-28 13:40:07 UTC

Introduces Missing at Random (MAR) Values into Data Sets.

Description

This function introduces missing values under the Missing at Random (MAR) mechanism into previously generated data sets (e.g., those produced by sim.skewed() or sim.normal()). Under MAR, the probability of missingness is associated with other variables in the data set, but not with the variable itself. If the baseV argument is not provided, two random variables (excluding the target variable itself) are selected. Their mean is then used to determine missingness in the target variable. For example, assume a data set with 8 items where missing values are to be introduced for item 2. Two items are randomly selected from items 1, 3, 4, 5, 6, 7, and 8 (e.g., items 5 and 7). Their mean is calculated, sorted, and used as the basis for assigning missingness to the item 2. Following the MAR rule, 90 percents of the missing values are drawn from the highest scores, and the remaining 10 percents are drawn randomly from the rest. For instance, with a sample size of 300 and 20 percents missingness (60 cases), the mean of the selected auxiliary variables is sorted in decreasing order. Missing values are then introduced in 54 cases (90 percents of 60) from the top portion, while 6 cases (10 percents of 60) are drawn randomly from the lower 240 observations. The missing values are represented by NA in the output files. New data sets containing missing values are saved as separate files, preserving the originals. Additionally, a file named "MAR_List.dat" is created, which contains the names of all data sets with MAR missingness.

Usage

MAR.data(
  misg = NULL,
  baseV = NULL,
  perct = 10,
  dataList = "Data_List.dat",
  f.loc
)

Arguments

misg

A numeric vector of 0s and 1s specifying which items will contain missing values. A value of 0 indicates the item will not include missingness, while 1 indicates missing values will be introduced. If omitted, all items are treated as eligible for missingness.

baseV

A list specifying the auxiliary variables on which MAR missingness will be based. This must match to the structure of misg. If not provided, two random variables (excluding the variable itself) are chosen automatically.

perct

The percentage of missingness to be applied (default = 10 percents).

dataList

The file name containing the list of previously generated data sets (e.g., "Data_List.dat"), either created by this package or by external software.

f.loc

The directory path where both the original data sets and the "dataList" file are located.

Author(s)

Fatih Orcan

Examples


# Step 1: Generate data sets

fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))
floc<-tempdir()
sim.normal(nd=10, ss=100, fcors=fc, loading<-fl,  f.loc=floc)

# Step 2: Introduce MAR missing values

mis.items<-c(1,0,1,1,0,0,0,0)
bV<-list(c(0,0,0,0,0,0,1,1),NA,c(0,0,0,0,0,1,1,0),c(0,0,0,0,0,1,1,1), NA,NA,NA,NA)
dl<-"Data_List.dat"  # must be located in the working directory
MAR.data(misg = mis.items, baseV=bV, perct = 20, dataList = dl, f.loc=floc )

Introduces Missing Completely at Random (MCAR) Values into Data Sets.

Description

This function introduces missing values under the Missing Completely at Random (MCAR) mechanism into previously generated data sets (e.g., those produced by sim.skewed() or sim.normal()). Missing values are inserted at random locations according to user specifications and are denoted as "NA" in the resulting files. The modified data sets are saved as new files, preserving the original data sets. In each data file, the first column contains the sample identifiers, while the subsequent columns show actual data with some entries replaced by NA. Additionally, a file named "MCAR_List.dat" is created, listing the names of all data sets to which missing values were introduced.

Usage

MCAR.data(misg = NULL, perct = 10, dataList = "Data_List.dat", f.loc)

Arguments

misg

A numeric vector of 0s and 1s specifying which items will contain missing values. A value of 0 indicates the item will not include missingness, while 1 indicates missing values will be introduced.If omitted, all items are treated as eligible for missingness.

perct

The percentage of missingness to be applied (default = 10 percents).

dataList

The file name containing the list of previously generated data sets (e.g., "Data_List.dat"), either created by this package or by external software.

f.loc

The directory path where both the original data sets and the "dataList" file are located.

Author(s)

Fatih Orcan

Examples


# Step 1: Generate data sets
fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))
floc<-tempdir()
sim.normal(nd=10, ss=100, fcors=fc, loading<-fl,  f.loc=floc)

 # Step 2: Introduce missing values

mis.items<-c(1,1,1,0,0,0,0,0)
dl<-"Data_List.dat"  # must be located in the working directory
MCAR.data(misg = mis.items, perct = 20, dataList = dl, f.loc=floc)


Introduces Missing Not at Random (MNAR) Values into Data Sets

Description

This function introduces missing values under the Missing Not at Random (MNAR) mechanism into previously generated data sets (e.g., those produced by sim.skewed() or sim.normal()). Under the MNAR mechanism, the probability of missingness depends on the observed values of the variable itself. Specifically, the target variable is first sorted in decreasing order. Based on the specified percentage of missingness, 90 percents of missing values are assigned randomly among the highest values, while the remaining 10 percents are assigned randomly among the rest of the sample. For example, with a sample size of 300 and a target of 20 percents missingness (60 cases), the variable is sorted in descending order. Missing values are then introduced in 54 cases (90 percents of 60) from the top of the distribution, while the remaining 6 cases (10 percents of 60) are randomly chosen from the lower 240 observations. The missing values are represented by NA in the output files. New data sets containing missing values are saved as separate files, preserving the originals. Additionally, a file named "MNAR_List.dat" is created, which contains the names of all data sets with MNAR missingness.

Usage

MNAR.data(misg = NULL, perct = 10, dataList = "Data_List.dat", f.loc)

Arguments

misg

A numeric vector of 0s and 1s specifying which items will contain missing values. A value of 0 indicates the item will not include missingness, while 1 indicates missing values will be introduced. If omitted, all items are treated as eligible for missingness.

perct

The percentage of missingness to be applied (default = 10 percents).

dataList

The file name containing the list of previously generated data sets (e.g., "Data_List.dat"), either created by this package or by external software.

f.loc

The directory path where both the original data sets and the "dataList" file are located.

Author(s)

Fatih Orcan

Examples


# Step 1: Generate data sets

fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))
floc<-tempdir()
sim.normal(nd=10, ss=100, fcors=fc, loading<-fl,  f.loc=floc)

 # Step 2: Introduce MNAR missing values

mis.items<-c(1,1,1,0,0,0,0,0)
dl<-"Data_List.dat"  # must be located in the working directory
MNAR.data(misg = mis.items, perct = 20, dataList = dl, f.loc=floc)

Generates Categorical Data Sets from Continuous Data.

Description

This function transforms previously simulated continuous data sets into categorical variables based on user-specified threshold values. The function reads in existing data sets, applies the thresholding procedure to discretize the observed scores, and saves the resulting categorical data sets into the designated file location. Additionally, it produces an updated list of the newly created categorical data sets for future reference.

Usage

categorize(f.loc, threshold, dataList = "Data_List.dat")

Arguments

f.loc

A character string indicating the file path where the generated categorical data sets will be saved.

threshold

A numeric vector specifying the threshold values used to discretize the continuous data into ordered categories.

dataList

A character string giving the name of the file that contains the list of previously generated data sets.

Author(s)

Fatih Orçan

Examples

tres<-c(-Inf, -1.645, -.643, .643, 1.645, Inf) # five categories
categorize(f.loc=tempdir(), threshold = tres)

Simulates Correlation matrix by a given SEM model.

Description

This function generates the model-implied covariance and correlation matrices based on a specified structural equation model (SEM). The function returns the implied covariance and correlation matrices.

Usage

cov.mtx(Model, nobs)

Arguments

Model

A lavaan model object specifying the measurement and structural components of the SEM.

nobs

An integer indicating the number of observed indicators (xs) in the model.

Value

Returns model implied covariance and correlation matrices.

Author(s)

Fatih Orçan

Examples

LavaanM <- '
# Measurement model (fixed factor loading)
F1 =~ 0.7*x1 + 0.7*x2 + 0.7*x3
F2 =~ 0.7*x4 + 0.7*x5 + 0.7*x6
F3 =~ 0.7*x7 + 0.7*x8 + 0.7*x9
# Structural regressions
F2 ~ 0.4*F1
F3 ~ 0.6*F1
# Fix latent variances
F1 ~~ 1*F1
# Residual variances
F2 ~~ 1*F2
F3 ~~ 1*F3
# Correlated residuals
F2 ~~ 0.5*F3
'

cov.mtx(Model=LavaanM, nobs=9)

Specifies the Factor Correlation Matrix for a Model

Description

This function generates a symmetric factor correlation matrix for a Structural Equation Model. The correlations must be provided as a vector of values between -1 and +1. The vector can be entered in either row-wise or column-wise order, but the correlations must be supplied in the correct sequence. If the model includes only a single factor, the function should be called as: "fcors.value(nf = 1, cors = c(1))"

Usage

fcors.value(nf, cors)

Arguments

nf

An integer specifying the number of factors.

cors

A numeric vector of correlations among the factors. Values must be between -1 and +1.

Value

The function returns the factor correlation matrix.

Author(s)

Fatih Orcan

Examples

# This example represents a three-factor CFA model
#
fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))

Fits Structural Equation Models to Simulated Data Using lavaan.

Description

This function applies a pre-specified SEM model to previously generated data sets (e.g., from sim.skewed() or sim.normal()) by utilizing the lavaan package. After model estimation, fit indices and parameter estimates with their standard errors are exported to a Comma-Separated Values (CSV) file named All_Results.csv. Each row in this file corresponds to the results of a single simulation. Most columns are self-explanatory; however, the second column (Notes) requires further clarification. This column indicates the convergence status of the model: CONVERGE – The model converged without any issues. NONCONVERGE – The model failed to converge; in this case, all values in the row are recorded as NA. WARNING – The model converged but produced warnings (e.g., negative variance estimates). Depending on the warning type, some values may be recorded as NA. To run the simulation, previously generated data sets (either via the package functions or other software) must be stored in the same folder as the dataset list file (Data_List.dat) within the working directory.

Usage

fit.simulation(
  model,
  PEmethod = "ML",
  Ordered = FALSE,
  dataList = "Data_List.dat",
  f.loc,
  missing = NULL
)

Arguments

model

A Lavaan model

PEmethod

The parameter estimation method. The default is ML.

Ordered

Logical, If TRUE, variables are treated as ordered categorical; otherwise, as continuous.

dataList

List of the names of data sets generated earlier either with the package functions or any other software.

f.loc

File location. It indicates where the simulated data sets and "dataList" are located.

missing

A specification for handling missing data. As in the lavaan package (See lavOptions)

Author(s)

Fatih Orcan

Examples


# Step 1: Generate data
fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))
sim.normal(nd=10, ss=100, fcors=fc, loading<-fl,  f.loc=tempdir())

# Step 2: Specify the model
lavaanM<-'
#CFA Model
f1	=~ NA*x1 + x2 + x3
f2	=~ NA*x4 + x5 + x6
f3 =~ NA*x7 + x8
#Factor Correlations
f1	~~ f2
f1	~~ f3
f2	~~ f3
#Factor variance
f1	~~ 1*f1
f2	~~ 1*f2
f3	~~ 1*f3
'
dl<-"Data_List.dat"  # must be available in the working directory

# Step 3: Fit the model across simulated data

fit.simulation(model=lavaanM, PEmethod="MLR", Ordered=FALSE, dataList=dl, f.loc=tempdir())


Specifies Factor Loading Values for a Model.

Description

This function creates a factor loading matrix for a given Structural Equation Model (SEM). The loadings must be provided as a vector and are assigned to the matrix column by column, where each column corresponds to a latent factor and each row corresponds to an observed item. All factor loadings must be specified as values strictly greater than 0 and less than 1. The resulting matrix has dimensions equal to the number of items by the number of factors.

Usage

loading.value(nf, fl.loads)

Arguments

nf

An integer specifying the number of factors.

fl.loads

A numeric vector of factor loadings. Values should be provided in column-wise order, corresponding to the items loading on each factor.

Value

The function returns the factor loading matrix.

Author(s)

Fatih Orçan

Examples

# This example represents a three-factor CFA model
#  where the factors are indicated by 3, 3, and 2 items respectively.
#
loading.value(nf=3, fl.loads=c(.6,.6,.6,0,0,0,0,0,0,0,0,.7,.7,.7,0,0,0,0,0,0,0,0,.8,.8))

Simulates Categorical Data Sets Based on a Structural Equation Model (SEM).

Description

This function generates categorical data sets from a specified SEM. The simulated data are organized such that the first column represents case identifiers, while the subsequent columns contain the simulated item responses. For example, in a model with two factors and three items per factor, the column labels will follow the format: "ID, F1_x1, F1_x2, F1_x3, F2_x1, F2_x2, F2_x3". The number of rows corresponds to the sample number of the data. In addition to the generated data sets, two supplementary files are also saved: (1) "Model_Info.dat" — containing the factor correlation and factor loading matrices (2) "Data_List.dat" — listing the names of all generated data files.

Usage

sim.categoric(
  nd = 10,
  ss = 100,
  fcors,
  loading,
  f.loc,
  threshold,
  cont = "FALSE"
)

Arguments

nd

An integer, the number of data sets to be generated.

ss

An integer, the sample size per data set (must be greater than 10).

fcors

The factor correlation matrix, which must be symmetric. For one-factor models, this should be "matrix(1,1,1)".

loading

The factor loading matrix. Columns correspond to factors, while non-zero rows specify the number of items associated with each factor.

f.loc

File path indicating the directory where the generated data sets will be saved.

threshold

Threshold values used to categorize continuous simulated data.

cont

Logical: If TRUE, the original continuous data sets are also saved in addition to the categorical versions.

Author(s)

Fatih Orçan

Examples

fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))
tres<-c(-Inf, -1.645, -.643, .643, 1.645, Inf) # Five response categories

sim.categoric(nd=100,ss=100, fcors=fc,loading=fl, f.loc=tempdir(), threshold = tres)

Simulates Data Sets Based on a Structural Equation Model (SEM).

Description

This function generates data sets based on a specified SEM. The simulated data are organized such that the first column represents case identifiers, while the subsequent columns contain the simulated item responses. For example, in a model with two factors and three items per factor, the column labels will follow the format: "ID, F1_x1, F1_x2, F1_x3, F2_x1, F2_x2, F2_x3". The number of rows corresponds to the sample number of the data. In addition to the generated data sets, two supplementary files are also saved: (1) "Model_Info.dat" — containing the factor correlation and factor loading matrices (2) "Data_List.dat" — listing the names of all generated data files.

Usage

sim.normal(nd = 10, ss = 100, fcors, loading, f.loc)

Arguments

nd

An integer, the number of data sets to be generated.

ss

An integer, the sample size per data set (must be greater than 10).

fcors

The factor correlation matrix, which must be symmetric. For one-factor models, this should be "matrix(1,1,1)".

loading

The factor loading matrix. Columns correspond to factors, while non-zero rows specify the number of items associated with each factor.

f.loc

File path indicating the directory where the generated data sets will be saved.

Author(s)

Fatih Orçan

Examples


fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.6,.6,.6,0,0,0,0,0,0,0,0,.4,.4))

sim.normal(nd=10, ss=1000, fcors=fc, loading<-fl,  f.loc=tempdir())


Simulates Data Sets from a Structural Equation Model (SEM) with Normal or Non-Normal Distributions

Description

This function generates data sets based on a specified SEM. The simulated data are organized such that the first column represents case identifiers, while the subsequent columns contain the simulated item responses. For example, in a model with two factors and three items per factor, the column labels will follow the format: "ID, F1_x1, F1_x2, F1_x3, F2_x1, F2_x2, F2_x3". The number of rows corresponds to the sample number of the data. In addition to the generated data sets, two supplementary files are also saved: (1) "Model_Info.dat" — containing the factor correlation matrix, factor loading matrix, a vector indicating non-normal items, and the coefficients B, C, and D from Fleishman’s power method (where A = –C). (2) "Data_List.dat" — listing the names of all generated data files.

Usage

sim.skewed(
  nd = 10,
  ss = 100,
  fcors,
  loading,
  nonnormal = NULL,
  Fleishman = NULL,
  f.loc
)

Arguments

nd

An integer, the number of data sets to be generated.

ss

An integer, the sample size per data set (must be greater than 10).

fcors

The factor correlation matrix, which must be symmetric. For one-factor models, this should be "matrix(1,1,1)".

loading

The factor loading matrix. Columns correspond to factors, while non-zero rows specify the number of items associated with each factor.

nonnormal

A numeric vector of 0s and 1s indicating whether each variable should be generated as normal (0) or non-normal (1). If not specified, all variables are generated as normal.

Fleishman

A numeric vector containing the coefficients B, C, and D from Fleishman’s power method. Note that A = –C.

f.loc

File path indicating the directory where the generated data sets and auxiliary files will be saved.

Author(s)

Fatih Orçan

Examples


fc<-fcors.value(nf=3, cors=c(1,.5,.6,.5,1,.4,.6,.4,1))
fl<-loading.value(nf=3, fl.loads=c(.5,.5,.5,0,0,0,0,0,0,0,0,.3,.3,.3,0,0,0,0,0,0,0,0,.4,.4))
ifN<-c(1,1,1,0,0,0,0,0)
fleis<-c(1.0174852, .190995, -.018577) # The coefficients for skewness = 1, kurtosis = 1

sim.skewed(nd=10, ss=100, fcors=fc,loading=fl, nonnormal = ifN, Fleishman = fleis, f.loc=tempdir())