Adaptive VEM algorithm
Description
Principal adaptive VEM algorithm for histogram (with model selection) or for kernel method.
Usage
mainVEM( data, n, Qmin, Qmax = Qmin, directed = TRUE, sparse = FALSE, method = c("hist", "kernel"), init.tau = NULL, cores = 1, d_part = 5, n_perturb = 10, perc_perturb = 0.2, n_random = 0, nb.iter = 50, fix.iter = 10, epsilon = 1e-06, filename = NULL )
mainVEM( data, n, Qmin, Qmax = Qmin, directed = TRUE, sparse = FALSE, method = c("hist", "kernel"), init.tau = NULL, cores = 1, d_part = 5, n_perturb = 10, perc_perturb = 0.2, n_random = 0, nb.iter = 50, fix.iter = 10, epsilon = 1e-06, filename = NULL )
Arguments
data |
Data format depends on the estimation method used!!
|
n |
Total number of nodes, |
Qmin |
Minimum number of groups |
Qmax |
Maximum number of groups |
directed |
Boolean for directed (TRUE) or undirected (FALSE) case |
sparse |
Boolean for sparse (TRUE) or not sparse (FALSE) case |
method |
Either Beware: |
init.tau |
List of initial values of |
cores |
Number of cores for parallel execution If set to 1 it does sequential execution Beware: parallelization with fork (multicore) : doesn't work on Windows! |
d_part |
Maximal level for finest partition of time interval [0,T] used for k-means initializations.
|
n_perturb |
Number of different perturbations on k-means result When |
perc_perturb |
Percentage of labels that are to be perturbed (= randomly switched) |
n_random |
Number of completely random initial points. The total number of initializations for the VEM is |
nb.iter |
Number of iterations of the VEM algorithm |
fix.iter |
Maximum number of iterations of the fixed point into the VE step |
epsilon |
Threshold for the stopping criterion of VEM and fixed point iterations |
filename |
Name of the file where to save the results along the computation (increasing steps for The file will contain a list of 'best' results. |
Details
The sparse version works only for the histogram approach.
Value
The function outputs a list of Qmax-Qmin+1 components. Each component is the solution obtained for a number of clusters Q, with and is a list of 8 elements:
-
tau
- Matrix with sizecontaining the estimated values in
that cluster q contains node i.
-
rho
- When method=hist
only. Either 1 (non sparse method) or a vector with length(undirected case) or
(directed case) with estimated values for the sparsity parameters
. See Section S6 in the supplementary material paper of Matias et al. (Biometrika, 2018) for more details.
-
beta
- When method=hist
only. Vector with length(undirected case) or
(directed case) with estimated values for the sparsity parameters
. See Section S6 in the supplementary material paper Matias et al. (Biometrika, 2018) for more details.
-
logintensities.ql
- When method=hist
only. Matrix with size(undirected case) or
(directed case). Each row contains estimated values of the log intensity function
on a regular partition with K parts of the time interval [0,Time].
-
best.d
- When method=hist
only. Vector with length(undirected case) or
(directed case) with estimated value for the exponent of the best partition to estimate intensity
. The best number of parts is
.
-
J
- Estimated value of the ELBO. -
run
- Which run of the algorithm gave the best solution. A run relies on a specific initialization of the algorithm. A negative value maybe obtained in the decreasing phase (for Q) of the algorithm. -
converged
- Boolean. If TRUE, the algorithm stopped at convergence. Otherwise it stopped at the maximal number of iterations.
References
DAUDIN, J.-J., PICARD, F. & ROBIN, S. (2008). A mixture model for random graphs. Statist. Comput. 18, 173–183.
DEMPSTER, A. P., LAIRD, N. M. & RUBIN, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39, 1–38.
JORDAN, M., GHAHRAMANI, Z., JAAKKOLA, T. & SAUL, L. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37, 183–233.
MATIAS, C., REBAFKA, T. & VILLERS, F. (2018). A semiparametric extension of the stochastic block model for longitudinal networks. Biometrika. 105(3): 665-680.
MATIAS, C. & ROBIN, S. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. & Surveys 47, 55–74.
Examples
# load data of a synthetic graph with 50 individuals and 3 clusters n <- 20 Q <- 3 Time <- generated_Q3_n20$data$Time data <- generated_Q3_n20$data z <- generated_Q3_n20$z step <- .001 x0 <- seq(0,Time,by=step) intens <- generated_Q3_n20$intens # VEM-algo kernel sol.kernel <- mainVEM(data,n,Q,directed=FALSE,method='kernel', d_part=0, n_perturb=0)[[1]] # compute smooth intensity estimators sol.kernel.intensities <- kernelIntensities(data,sol.kernel$tau,Q,n,directed=FALSE) # eliminate label switching intensities.kernel <- sortIntensities(sol.kernel.intensities,z,sol.kernel$tau, directed=FALSE) # VEM-algo hist # compute data matrix with precision d_max=3 (ie nb of parts K=2^{d_max}=8). K <- 2^3 Nijk <- statistics(data,n,K,directed=FALSE) sol.hist <- mainVEM(list(Nijk=Nijk,Time=Time),n,Q,directed=FALSE, method='hist', d_part=0,n_perturb=0,n_random=0)[[1]] log.intensities.hist <- sortIntensities(sol.hist$logintensities.ql,z,sol.hist$tau, directed=FALSE) # plot estimators par(mfrow=c(2,3)) ind.ql <- 0 for (q in 1:Q){ for (l in q:Q){ ind.ql <- ind.ql + 1 true.val <- intens[[ind.ql]]$intens(x0) values <- c(intensities.kernel[ind.ql,],exp(log.intensities.hist[ind.ql,]),true.val) plot(x0,true.val,type='l',xlab=paste0("(q,l)=(",q,",",l,")"),ylab='', ylim=c(0,max(values)+.1)) lines(seq(0,1,by=1/K),c(exp(log.intensities.hist[ind.ql,]), exp(log.intensities.hist[ind.ql,K])),type='s',col=2,lty=2) lines(seq(0,1,by=.001),intensities.kernel[ind.ql,],col=4,lty=3) } }
# load data of a synthetic graph with 50 individuals and 3 clusters n <- 20 Q <- 3 Time <- generated_Q3_n20$data$Time data <- generated_Q3_n20$data z <- generated_Q3_n20$z step <- .001 x0 <- seq(0,Time,by=step) intens <- generated_Q3_n20$intens # VEM-algo kernel sol.kernel <- mainVEM(data,n,Q,directed=FALSE,method='kernel', d_part=0, n_perturb=0)[[1]] # compute smooth intensity estimators sol.kernel.intensities <- kernelIntensities(data,sol.kernel$tau,Q,n,directed=FALSE) # eliminate label switching intensities.kernel <- sortIntensities(sol.kernel.intensities,z,sol.kernel$tau, directed=FALSE) # VEM-algo hist # compute data matrix with precision d_max=3 (ie nb of parts K=2^{d_max}=8). K <- 2^3 Nijk <- statistics(data,n,K,directed=FALSE) sol.hist <- mainVEM(list(Nijk=Nijk,Time=Time),n,Q,directed=FALSE, method='hist', d_part=0,n_perturb=0,n_random=0)[[1]] log.intensities.hist <- sortIntensities(sol.hist$logintensities.ql,z,sol.hist$tau, directed=FALSE) # plot estimators par(mfrow=c(2,3)) ind.ql <- 0 for (q in 1:Q){ for (l in q:Q){ ind.ql <- ind.ql + 1 true.val <- intens[[ind.ql]]$intens(x0) values <- c(intensities.kernel[ind.ql,],exp(log.intensities.hist[ind.ql,]),true.val) plot(x0,true.val,type='l',xlab=paste0("(q,l)=(",q,",",l,")"),ylab='', ylim=c(0,max(values)+.1)) lines(seq(0,1,by=1/K),c(exp(log.intensities.hist[ind.ql,]), exp(log.intensities.hist[ind.ql,K])),type='s',col=2,lty=2) lines(seq(0,1,by=.001),intensities.kernel[ind.ql,],col=4,lty=3) } }