missSBM: An R Package for Handling Missing Values in the Stochastic Block Model

Pierre Barbillon, Julien Chiquet, Timothée Tabouy

Main Article Content

Abstract

The stochastic block model is a popular probabilistic model for random graphs. It is commonly used to cluster network data by aggregating nodes that share similar connectivity patterns into blocks. When fitting a stochastic block model to a partially observed network, it is important to consider the underlying process that generates the missing values, otherwise the inference may be biased. This paper presents missSBM, an R package that fits stochastic block models when the network is partially observed, i.e., the adjacency matrix contains not only 1s or 0s encoding the presence or absence of edges, but also NAs encoding the missing information between pairs of nodes. This package implements a set of algorithms to adjust the binary stochastic block model, possibly in the presence of external covariates, by performing variational inference suitable for several observation processes. Our implementation automatically explores different block numbers to select the most relevant model according to the integrated classification likelihood criterion. The integrated classification likelihood criterion can also help determine which observation process best fits a given dataset. Finally, missSBM can be used to perform imputation of missing entries in the adjacency matrix. We illustrate the package on a network dataset consisting of interactions between political blogs sampled during the 2007 French presidential election.

Article Details

Article Sidebar