| Type: | Package | 
| Title: | Computes Copula using Ranks and Subsampling | 
| Version: | 0.9.9.3 | 
| Date: | 2023-04-06 | 
| Author: | Jerome Collet | 
| Maintainer: | Jerome Collet <jeromepcollet@gmail.com> | 
| Description: | Estimation of copula using ranks and subsampling. The main feature of this method is that simulation studies show a low sensitivity to dimension, on realistic cases. | 
| License: | GPL (≥ 3) | 
| LazyLoad: | yes | 
| NeedsCompilation: | yes | 
| Packaged: | 2023-04-05 13:54:30 UTC; Jerome | 
| Repository: | CRAN | 
| Date/Publication: | 2023-04-05 17:50:05 UTC | 
Computes Copula using Ranks and Subsampling
Description
Estimation of copula using ranks and subsampling. The main feature of this method is that simulation studies show a low sensitivity to dimension, on realistic cases.
Details
The DESCRIPTION file:
| Package: | subrank | 
| Type: | Package | 
| Title: | Computes Copula using Ranks and Subsampling | 
| Version: | 0.9.9.3 | 
| Date: | 2023-04-06 | 
| Author: | Jerome Collet | 
| Maintainer: | Jerome Collet <jeromepcollet@gmail.com> | 
| Description: | Estimation of copula using ranks and subsampling. The main feature of this method is that simulation studies show a low sensitivity to dimension, on realistic cases. | 
| License: | GPL (>= 3) | 
| LazyLoad: | yes | 
| NeedsCompilation: | yes | 
Index of help topics:
corc                    Function to estimate copula using ranks and
                        sub-sampling
corc0                   Function to estimate copula using ranks and
                        sub-sampling, minimal version.
desscop                 Discrete copula graph, a two-dimensional
                        projection
desscoptous             Discrete copula graph, ALL two-dimensional
                        projections
estimdep                Dependence estimation
predictdep              Probability forecasting
predonfly               Probability forecasting
simany                  Test statistic distribution under any
                        hypothesis
simnul                  Test statistic distribution under independence
                        hypothesis
subrank-package         Computes Copula using Ranks and Subsampling
Taking a sample, its dimension, and a sub-sample size, allows to estimate a discretized copula. This object has interesting features: convergence to copula, robustness with respect to dimension.
Author(s)
Jerome Collet
Maintainer: Jerome Collet <jeromepcollet@gmail.com>
Examples
lon <- 31
a <- 2.85
x <- rnorm(lon)
y = a*x^2+rnorm(lon)
tablo = as.data.frame(cbind(x,y))
c=corc(tablo,c(1,2),8)
desscop(c,1,2)
Function to estimate copula using ranks and sub-sampling
Description
Takes a sample, its dimension, a sub-sample size, and returns a discrete copula.
Usage
corc(dataframe, varnames, subsampsize, nbsafe=5,mixties=FALSE,nthreads=2)
Arguments
| dataframe | a data frame, containing the observations | 
| varnames | the name of the variables we want to estimate the dependence between | 
| subsampsize | the sub-sample size | 
| nbsafe | the ratio between the number of sub-samples and the cardinality of the discretized copula. | 
| mixties | if  | 
| nthreads | number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() | 
Value
| cop | an array representing the discretized copula | 
| ties | the number of sub-samples with a tie | 
| nsubsampreal | the effective number of sub-samples drawn | 
| varnames | the name of the variables studied | 
| nnm | the number of observations without missing values | 
Author(s)
Jerome Collet
Examples
lon <- 30
a <- 2
x <- rnorm(lon)
y = a*x^2+rnorm(lon)
datatable = as.data.frame(cbind(x,y))
c=corc(datatable,c("x","y"),8)
c
sum(c$cop)
Function to estimate copula using ranks and sub-sampling, minimal version.
Description
Minimal version of function corc.
Usage
corc0(datavector,sampsize,dimension,subsampsize,nboot,u,mixties=FALSE,nthreads=2)
Arguments
| datavector | a vector, containing the observations | 
| sampsize | the sample size | 
| dimension | the sample dimension | 
| subsampsize | the sub-sample size | 
| nboot | the number of sub-samples (must be big) | 
| u | a random seed, integer | 
| mixties | if  | 
| nthreads | number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() | 
Value
the number of hits for each vector of ranks, plus 2 last values of the vector : number of ties and number of sub-samples really used.
Author(s)
Jerome Collet
Examples
lon <- 30
a <- 2.85
x <- rnorm(lon)
y = a*x^2+rnorm(lon)
c=corc0(c(x,y),lon,2,8,1e5,75014)
c
c0=c(
1203, 1671, 1766, 959, 1586, 1715, 1803, 1205, 1260,1988, 2348, 1917, 3506, 2045, 1340,
1093, 2694, 2757,2233, 1085, 2322, 1793, 1569, 1263, 1709, 1747, 1512,1308, 1778, 1354,
1184, 1097, 2487, 2730, 2112, 1100,2435, 2033, 1572, 1093, 1369, 1722, 1462, 1015, 1228,
1419, 1776, 1852, 1009, 1097, 1179, 1323, 1595, 1316,1477, 2628, 889, 1178, 1981, 4000, 
35, 840, 2091, 4467,0, 27405)
set.seed(75013)
lon=30
dimension=3
sssize=4
c0==corc0(rnorm(lon*dimension),lon,dimension,sssize,1e5,75014)
Discrete copula graph, a two-dimensional projection
Description
Draws a discrete joint probability, for 2 variables, using bubbles
Usage
desscop(copest, xname, yname, normalize = FALSE, axes = TRUE)
Arguments
| copest | the estimated copula (the whole structure resulting from  | 
| xname | the name of the variable we want to put on the horizontal axis | 
| yname | the name of the variable we want to put on the vertical axis | 
| normalize | if TRUE, the smallest probability is rescaled to 0, and the largest to 1 | 
| axes | if TRUE, puts the name of the variables on the axes | 
Author(s)
Jerome Collet
Examples
lon <- 31
a <- 2.85
x <- rnorm(lon)
y = a*x^2+rnorm(lon)
tablo = as.data.frame(cbind(x,y))
c=corc(tablo,c("x","y"),8)
desscop(c,"x","y")
tablo = as.data.frame(cbind(x=rep(0,each=lon),y=rep(0,each=lon)))
c=corc(tablo,c("x","y"),8,mixties=TRUE)
desscop(c,"x","y")
Discrete copula graph, ALL two-dimensional projections
Description
Draws a discrete joint probability, for 2 variables, using bubbles
Usage
desscoptous(copest, normalize = FALSE)
Arguments
| copest | the estimated copula (the whole structure resulting from  | 
| normalize | if TRUE, the smallest probability is rescaled to 0, and the largest to 1 | 
Author(s)
Jerome Collet
Examples
lon <- 31
a <- 2.85
x <- rnorm(lon)
y = a*x^2+rnorm(lon)
z = rnorm(lon)
tablo = as.data.frame(cbind(x,y,z))
c=corc(tablo,c("x","y","z"),8)
desscoptous(c)
Dependence estimation
Description
From a set of observations, builds a description of the multivariate distribution
Usage
estimdep(dataframe,varnames,subsampsize,nbsafe=5,mixties=FALSE,nthreads=2)
Arguments
| dataframe | a data frame containing the observations | 
| varnames | the name of the variables we want to estimate the multivariate distribution | 
| subsampsize | the sub-sample size | 
| nbsafe | the ratio between the discretized copula size and the number of sub-samples | 
| mixties | if  | 
| nthreads | number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() | 
Value
the description of the dependence, it is an object with the following parts:
| cop | the array representing the discretized copula | 
| margins | the matrix representing the margins, estimated using kernel density estimation | 
| varnames | the names of the variables | 
Author(s)
Jerome Collet
Examples
lon=3000
plon=3000
subsampsize=20
##############
x=(runif(lon)-1/2)*3
y=x^2+rnorm(lon)
z=rnorm(lon)
donori=as.data.frame(cbind(x,y,z))
depori=estimdep(donori,c("x","y","z"),subsampsize)
knownvalues=data.frame(z=rnorm(plon))
prev <- predictdep(knownvalues,depori)
plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5)
points(donori[,1:2],col='red',pch=20,cex=.5)
knownvalues=data.frame(x=(runif(lon)-1/2)*3)
prev <- predictdep(knownvalues,depori)
plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5)
points(donori[,1:2],col='red',pch=20,cex=.5)
knownvalues=data.frame(y=runif(plon,min=-2,max=4))
prev <- predictdep(knownvalues,depori)
plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5)
points(donori[,1:2],col='red',pch=20,cex=.5)
Probability forecasting
Description
From a set of incomplete observations, and a description of the dependence, provides simulated values of the unknown coordinates. It is also possible to simulate unconditionally, with empty observations.
Usage
predictdep(knownvalues,dependence,smoothing=c("Uniform","Beta"),nthreads=2)
Arguments
| knownvalues | in case of conditional simulation, a matrix containing incomplete observations, the known coordinates being the same for all observations. If no variable name in  | 
| dependence | the description of the dependence we want to use to forecast, as built by function  | 
| smoothing | the smoothing method for input and output ranks. | 
| nthreads | number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() | 
Value
the matrix of the completed observations
Author(s)
Jerome Collet
Examples
lon=100
plon=100
subsampsize=10
shift=0
noise=0
knowndims=1
x=rnorm(lon)
y=2*x+noise*rnorm(lon)
donori=as.data.frame(cbind(x,y))
depori=estimdep(donori,c("x","y"),subsampsize)
##
knownvalues=data.frame(x=rnorm(plon)+shift)
prev <- predictdep(knownvalues,depori)
##
plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5)
points(donori[,1:2],col='red',pch=20,cex=.5)
##
knownvalues=data.frame(x=rnorm(plon)+shift)
prev <- predictdep(knownvalues,depori,smoothing="Beta")
##
plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5)
points(donori[,1:2],col='red',pch=20,cex=.5)
# souci normal si |shift|>>1
knownvalues=data.frame(z=rnorm(plon)+shift)
prev <- predictdep(knownvalues,depori)
##
plot(prev$x,prev$y,xlim=c(-2,2),ylim=c(-2,5),pch=20,cex=0.5)
points(donori[,1:2],col='red',pch=20,cex=.5)
Probability forecasting
Description
From two sets of observations, first one of complete observations and second one of incomplete observations, provides simulated values of the unknown coordinates.
Usage
predonfly(completeobs,incompleteobs,varnames,subsampsize,nbpreds=1,mixties=FALSE,
          maxtirs=1e5,complete=TRUE,nthreads=2)
Arguments
| completeobs | the set of complete observations. | 
| incompleteobs | the set of incomplete observations. | 
| varnames | the modeled variables. | 
| subsampsize | the sub-sample size. | 
| nbpreds | the number of predictions for each incomplete observation. | 
| mixties | if  | 
| maxtirs | the maximum number of sub-samples, to stop the computation even if they did not provide  | 
| complete | If  | 
| nthreads | number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() | 
Value
the matrix of the completed observations
Author(s)
Jerome Collet
Examples
lon=100
plon=30
subsampsize=10
x=rnorm(lon)
y=2*x+rnorm(lon)*0
donori=as.data.frame(cbind(x,y))
##
knownvalues=data.frame(x=rnorm(plon))
prev <- predonfly(donori,knownvalues,c("x","y"),subsampsize,100)
##
plot(prev$x,prev$y,pch=20,cex=0.5,
     ylim=range(c(prev$y,donori$y),na.rm=TRUE),xlim=range(c(prev$x,donori$x)))
points(donori[,1:2],col='red',pch=20,cex=.5)
lon=3000
mg=20
dimtot=4
rayon=6
genboules <- function(lon,a,d)
{
  ss <- function(vec)
  {return(sum(vec*vec))}
  surface=matrix(nrow=lon,ncol=d,data=rnorm(lon*d))
  rayons=sqrt(apply(surface,1,ss))
  surface=surface/rayons
  return(matrix(nrow=lon,ncol=d,data=rnorm(lon*d))+a*surface)
}
##############
donori=genboules(lon,rayon,dimtot)
donori=as.data.frame(donori)
dimconnues=3:dimtot
valconnues=matrix(nrow=1,ncol=length(dimconnues),data=0)
valconnues=as.data.frame(valconnues)
names(valconnues)=names(donori)[3:dimtot]
prev <- predonfly(donori,valconnues,names(donori),subsampsize,100)
boule2=genboules(plon,rayon,2)
plot(boule2[,1:2],xlab='X1',ylab='X2',pch=20,cex=.5)
plot(prev$V1,prev$V2,xlab='X1',ylab='X2',pch=20,cex=.5)
Test statistic distribution under any hypothesis
Description
Simulates the test statistic, under independence
Usage
simany(sampsize,dimension,subsampsizes,sampnum,nbsafe=5,nthreads=2, fun=NULL, ...)
Arguments
| sampsize | sample size | 
| dimension | sample dimension | 
| subsampsizes | vector of sub-sample sizes | 
| sampnum | number of samples | 
| nbsafe | the ratio between the number of sub-samples and the cardinality of the discretized copula. | 
| nthreads | number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() | 
| fun | the function describing the dependence. | 
| ... | optional arguments to  | 
Value
| lrs | the distances with independent case | 
| lrs2mean | the distances with theoretical value, given dependence  | 
| scarcities | the proportions of non-reached vector ranks | 
| DistTypes | a recall of the list of the distance types: "KL","L2","L1","APE" | 
Author(s)
Jerome Collet
Examples
depquad <- function(lon,dd,a)
{
  x <- rnorm(lon)
  y0 <- a*x^2
  y <- y0 + rnorm(lon)
  reste=rnorm((dd-2)*lon)
  return(c(x,y,reste))
}
sims0=simany(101,3,8,50,nbsafe=1)
seuils=apply(sims0$lrs,3,quantile,0.95)
seuils=matrix(ncol=4,nrow=50,seuils,byrow=TRUE)
sims1=simany(101,3,8,50,nbsafe=1,fun=depquad,a=0.5)
apply(sims1$lrs[,1,]>seuils,2,mean)
Test statistic distribution under independence hypothesis
Description
Simulates the test statistic, under independence
Usage
simnul(sampsize, dimension, subsampsizes, sampnum,KL=TRUE,nbsafe=5,nthreads=2)
Arguments
| sampsize | sample size | 
| dimension | sample dimension | 
| subsampsizes | vector of sub-sample sizes | 
| sampnum | number of samples | 
| KL | if TRUE, returns the Kullback-Leibler divergence with the independent case, if FALSE, the L2 distance. There is no re-normalization, contrary to what happens for  | 
| nbsafe | the ratio between the number of sub-samples and the cardinality of the discretized copula. | 
| nthreads | number of number of threads, assumed to be strictly positive. For "full throttle" computations, consider using parallel::detectCores() | 
Value
| lrs | the distances with independent case | 
| scarcities | the proportions of non-reached vector ranks | 
Author(s)
Jerome Collet
Examples
library(datasets)
# plot(swiss)
c=corc(swiss,1:3,8)
c
RV=sum(c$cop*log(c$cop),na.rm=TRUE)+3*log(8)
sims=simnul(47,3,8,100)
pvalue=mean(RV<sims$lrs)
pvalue
RV
summary(sims$lrs)