Inference tools for Markov Random Fields on lattices: The R package mrf2d

Markov random fields on two-dimensional lattices are behind many image analysis methodologies. mrf2d provides tools for statistical inference on a class of discrete stationary Markov random field models with pairwise interaction, which includes many of the popular models such as the Potts model and texture image models. The package introduces representations of dependence structures and parameters, visualization functions and efficient (C++ based) implementations of sampling algorithms, common estimation methods and other key features of the model, providing a useful framework to implement algorithms and working with the model in general. This paper presents a description and details of the package, as well as some reproducible examples of usage.


Introduction
A Markov random field (MRF) is a generalization of the well-known concept of a Markov chain where variables are indexed by vertices of a graph instead of a sequence and the notion of memory is substituted by the neighborhood (edges) of that graph. Markov random fields on lattices, or more generally, Gibbs distributions, have been studied in statistical mechanics as models for interacting particle systems. They range from the basic Ising model (or its generalization Potts model) with pairwise nearest-neighbor interaction to models with more complex interaction types, presenting long-range and/or higher-order interaction. For an introduction to the subject we refer to Liggett (2012) and references therein.
A finite 2-dimensional lattice is a direct representation of pixel positions on a digital image. Geman and Geman (1984) make an analogy between image models and statistical mechanics systems, introducing probability-based computational methods for image restoration under a specific type of noise. Higher-order dependence structures are also described, for example, In this paper, when we refer to a MRF, we consider the particular case where variables are indexed by points of a 2-dimensional lattice, not a general graph structure. The regular grid naturally creates a spatial structure and notions of distance and direction for the variables, allowing models to be specified based on this spatial structure (see Besag 1974, for examples).
Parametric inference based on maximum likelihood for such models is difficult, even for the simple models, because of the intractable constant that appears in the likelihood. Inference for the simplest non-trivial case of the Ising model was first studied by Pickard (1987) and continues to present challenges, see for example Bhattacharya and Mukherjee (2018). On the other hand, while there is a continuous development of methodologies used in MRFs in the theoretical field, implementing new algorithms is a challenge in practice, mostly due to the high-dimensionality of the problem and the complexity of the data structures required to represent the data in this type of problem. An overview of this topic, mainly from the Bayesian perspective, can be found in Winkler (2012).
Most methodologies developed are based on Monte Carlo Markov chain methods, thus simple tasks like evaluating pairs of pixels or sampling individual pixels need to be repeated millions or billions of times in iterative methods, depending on the image size, making an efficient implementation of such methods one of the main demands for researchers of the topic.
R (R Core Team 2021) is one of the most used programming languages among statistics researchers, what makes the existence of good packages important for any field of statistics. In a general context (considering the definition of MRFs with general undirected graphs), packages like graph (Gentleman, Whalen, Huber, and Falcon 2021) and network (Butts 2008) provides tools for representation and manipulation of graph structures which can be used for constructing and visualizing graph-based models. Different versions of graph-based MRFs appear in many packages. For example, the CRF package (Wu 2019) has inferential tools Markov random fields with pairwise and unary interactions and their hidden MRF version, MRFcov (Clark, Wells, and Lindberg 2018) allows inference for the interaction parameter of between nodes of a graph considering covariates and gamlss.spatial (De Bastiani, Rigby, Stasinopoulous, Cysneiros, and Uribe-Opazo 2018) allows fitting Gaussian Markov random fields in a spatial context, similar to INLA (Rue, Martino, and Chopin 2009) and mgcv (Wood 2017).
Outside of the R ecosystem, there are powerful software in C++ used for image analysis related to Markov random fields, such as the DGM library (Kosov 2013) and densecrf (Krähenbühl and Koltun 2011), which also has a Python wrapper (Beyer 2015), and can be used for a variety of tasks and use extremely efficient computational methods.
For discrete MRFs on lattice data, closer to what is proposed in mrf2d, there are some R packages available. The potts package (Geyer and Johnson 2020), implements simulation algorithms and parameter estimation via composite-likelihood for a Potts model with nearest-neighbor interactions only. PottsUtils (Feng 2018) also implements simulation and tools for computing normalization constants in one, two and three-dimensional Potts model. The package bayesImageS (Moores, Nicholls, Pettitt, and Mengersen 2020) provides Bayesian image segmentation algorithms considering Gaussian mixtures driven by hidden Potts models with slightly more complex interaction neighborhood. GiRaF (Stoehr, Pudlo, and Friel 2020) allows calculation on, and sampling from general homogeneous Potts model. The Pottslab (Storath and Weinmann 2014) package for MATLAB also provides image segmentation algorithms using the Potts model, including for multivariate-valued data.
Although the available packages for discrete-valued MRFs offer efficient implementations of their methods, they do not provide an interface that allows simple extensions to different cases, for example, different interaction types for different positions and sparse long-range interaction neighborhoods. Some of the algorithms used also rely on specific characteristics of the specific setups they consider and cannot be applied more generally.
The mrf2d package (Freguglia 2022) provides a complete framework for statistical inference on discrete-valued MRF models on 2-dimensional lattice data, where all the elements used by algorithms (such as conditional probabilities, pseudo-likelihood function, simulation, sufficient statistics and more) are available for the user, as well as many built-in model fitting functions.
The package uses the model described in Freguglia et al. (2020) as a reference. Many other models, such as the Potts model and auto-models, are particular cases of our model obtained by including restrictions to the parameters or using specific interacting neighborhoods. These neighborhoods can be freely specified within the package and 5 families of parameter restrictions are available to cover the particular cases.
mrf2d (Freguglia 2022) is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=mrf2d and can be installed and loaded with R> install.packages("mrf2d") R> library("mrf2d") The development version is in its GitHub repository and can be installed with R> devtools::install_github("Freguglia/mrf2d") This paper is organized as follows. Section 2 describes the model considered in mrf2d, Section 3 presents the main functionalities of the package and details of the implementation, which are illustrated by examples in Section 4. We finish with a discussion in Section 5. All the results of example code in this article were obtained using R version 4.0.2 and mrf2d version 1.0.

Model description
Let L ⊂ {i = (i 1 , i 2 ) ∈ N 2 } be a finite set of locations in a two-dimensional lattice region and Z = {Z i } i∈L a field of random variables indexed by those locations. The main purpose of mrf2d is to provide a general framework for Markov random field models which satisfy the following assumptions: (a) Finite support Each Z i can take values in Z = {0, . . . , C} for some finite C > 0.

(b) Pairwise interactions
The probability of a complete configuration P(Z = z) can be decomposed into a product of functions of the pairs (z i , z j ), i ̸ = j ∈ L.
(c) Homogeneous interactions For any two pixels i and j and any relative position r, the interaction between pixels i and i + r is the same as for pixels j and j + r, i.e., the interaction depends on the relative position of the pair of pixels, not on their position in the lattice.
These assumptions are satisfied by most commonly used models in image processing.
We use the representation in Freguglia et al. (2020) which expresses the probability distribution of the random field in the form of the exponential family and introduce additional constraints to parameter space and/or different dependence structures to include particular features of the model under study.

Homogeneous Markov random field with pairwise interactions
MRF models are characterized by their conditional independence property. Let N be a neighborhood system on L, then Z is a Markov random field with respect to N if Z i given its neighbors Z N i is conditionally independent from all other variables where Z −i denotes the set of variables {Z j , j ̸ = i}.
To start defining MRFs in an image processing context, a location of the lattice i ∈ L will be referred as a pixel i and an observed value of the variable z i ∈ Z as pixel value or color.
We denote by R ⊂ Z 2 a set of interacting relative positions such that, for no pair of elements r, r ′ ∈ R we have r ′ = −r (no position in R is a reflection of another). Based on R, we can construct a neighborhood system (interaction structure) N in such way that the set of neighbors of site i, N i can be represented by a graph with vertices L where there is an edge connecting i and j if, and only if, j = i ± r. For example, a nearest-neighbor structure corresponds to R = {(1, 0), (0, 1)}.
Given an interaction structure R, for any relative position r ∈ R the interactions associated to that relative position are characterized by a map θ r (·, ·), θ r : Z 2 → R. For a, b ∈ Z, the value θ r (a, b) is called a potential.
The model in mrf2d considers a neighborhood system N that connects pairs of pixel positions i, j such that i − j ∈ R. Under assumptions (a), (b) and (c), the Hammersley-Clifford theorem (Hammersley and Clifford 1971) implies that the probability function for Z belongs to the exponential family and can be described by a set of natural parameters θ = {θ r (a, b), r ∈ R, a, b, ∈ Z}, where Note that adding a constant a constant c r to the potentials associated with a relative position r ∈ R results in the same probability because the constant cancels when dividing by ζ θ . Thus, constraints for the potentials θ r (a, b) are necessary to obtain identifiability in the model. We consider θ r (0, 0) = 0 for all relative positions r, which ensures identifiability and also gives an interpretation for interactions in terms of the pair (0, 0): θ r (a, b) < 0 (resp. > 0) means that the pair (a, b) is less (resp. more) likely to appear in a pair with relative position r than (0, 0).

Potts model as a particular case
The Potts model (Potts 1952 Assumptions (a), (b) and (c) are satisfied, thus, we can rewrite Equation 4 in terms of an interaction structure R and potentials θ by noticing • The set i, j : ||i − j|| = 1 are vertical and horizontal pairs of neighbors, therefore, the interaction structure R is the set {(1, 0), (0, 1)}.
• The potential θ r (a, b) is equal to ϕ if a ̸ = b and 0 otherwise, regardless of r. The constraint θ r (0, 0) = 0 is satisfied in this definition. Therefore, we have the parameter restriction θ r (a, b) = ϕ1 (a̸ =b) for all r ∈ R.
This parameter restriction corresponds to the "onepar" family described in Section 3.2.

Important elements of the model
The main inference challenge for MRFs lies in the normalizing constant ζ θ appearing in Equation 2. It cannot be evaluated in practice as it requires summing over Z |L| possible field configurations and there is no analytical expression for it, except for trivial cases, leading to an intractable likelihood.
Being unable to evaluate the likelihood function hinders the use of most statistical methods. Inference under intractable likelihoods have been developed over the years. The main studies involve using conditional probability-based functions, like pseudo-likelihood (Jensen and Künsch 1994, for example) and Monte Carlo methods (Geyer and Thompson 1992;Møller, Pettitt, Reeves, and Berthelsen 2006, for example).
Although there is a wide variety of inferential methods available, most of them are built using the same pieces of the model. Thus, having access to each of these pieces is necessary to implement algorithms. We highlight important characteristics of the model available in mrf2d that are used by inference methods.

Conditional probabilities
A consequence of the Markov property (conditional independence) is a simple expression for conditional probabilities. H(z, θ) is a sum of terms that only depends on pairs of pixel values, which implies that all terms not involving position i cancel out when evaluating P(Z i | Z −i ). Define the part of the sum that involves the pixel in position i as The conditional probability of Z i = k given all other locations, P(Z i = k | Z N i ), is then given by the standard softmax of h i (k | z),

Pseudo-likelihood function
The pseudo-likelihood function (Besag 1974(Besag , 1975 is defined as the product of conditional probabilities of each variable given all other variables of a random field, Algorithm 1: Approximate sampling algorithm for MRFs using T steps of Gibbs sampler. Initialize z with a starting configuration z = z (0) ; Initialize the iteration counter t = 0; while t ≤ T do Sample {i (1) , i (2) , . . . , i (|L|) } a random permutation of the pixel positions L; for ℓ in 1, . . . , |L| do Update z i (ℓ) conditional to the rest of the field z −i (ℓ) with probabilities from Equation 6; end t = t + 1; Result: output the final configuration z. end In the special case of an independent field, it is equivalent to the likelihood function. Notice that the pseudo-likelihood function does not depend on the intractable normalizing constant and Equation 7 is numerically equivalent to a logistic regression problem where each pixel values corresponds to independent observations and the interacting pixel values are covariates with coefficients corresponding to the associated potentials.

Generating MRFs via Gibbs sampler
While exact sampling from dependent and high-dimensional processes is a challenging task overall, the conditional independence of MRFs simplifies the implementation of the Gibbs sampler algorithm (Geman and Geman 1984). In the Gibbs sampler algorithm, each pixel value is updated conditionally to the current state of its neighbors and a Gibbs sampler cycle consists of updating each pixel exactly one time.
To avoid introducing any kind of bias due to updates order, a random permutation of L is drawn to define the order in which pixels are updated at each cycle. After running a suitable number of cycles in Algorithm 1, the distribution of the resulting field sampled in the process is approximately the joint distribution of the MRF.
Sampling a field conditional to a subset of pixel values can be achieved with the same algorithm by skipping the updates for those pixels which are being conditioned on.
There exist faster mixing algorithms for particular cases such as Swendsen-Wang algorithm (Wang and Swendsen 1990), but they require specific conditions from the model and/or particular implementations to be efficient. Therefore, despite its slower mixing times in some scenarios, we keep the Gibbs sampler as the method of choice in this work due to its generalization ability as it only requires computing conditional distributions.

Sufficient statistics
An important computational consequence of the model assumptions is the fact that, in order to evaluate the probability (or likelihood) function for a particular observed field z, it is not necessary to determine the values of each pixel individually, but only the co-occurrence counts for each relative position r ∈ R.

The function H(z, θ) can be rewritten as
where n a,b,r (z) = i∈L 1 (z i =a,z (i+r) =b) is the count of occurrences of the pair (a, b) ∈ Z 2 in pairs of pixels with relative position r. Therefore, is a vector of sufficient statistics, where each component n a,b,r (z) is associated with a corresponding potential θ r (a, b). Gimel'farb (1996) calls this sufficient statistic the co-occurrence histogram.
Parameter constraints reduce the dimension of the sufficient statistic. Our identifiability constraint θ r (0, 0) = 0 implies that all n 0,0,r (z) are excluded from S R (z) and equality constraints require aggregating (sum) co-occurrence counts to match the parameter dimension. We shall keep the same notation for the constrained version of the sufficient statistics S N (z) and potentials θ.
The main advantages of the representation with sufficient statistics are the reduced memory usage in Monte Carlo methods and a convenient representation of H(z, θ) as an inner product that simplifies dealing with likelihood ratios as it is done in Geyer and Thompson (1992),

Gaussian mixtures driven by hidden MRFs
Another class of models present in the image processing field are hidden Markov random field models (HMRFs). The hidden version considers a latent (unobserved) process, denoted Z and an observed field, denoted Y, where Z is distributed as a MRF and the distribution of Y | Z is reasonably simple.
In this type of modeling, Z is often considered the "true" image and Y is a noisy image. Usually the goal of the analysis in this context is to recover the underlying field. Note that for the models considered in this work, where Z has finite support, the hidden field defines a segmentation of the image, making it a suitable approach for image segmentation.
In mrf2d, we provide built-in tools for the case where Y | Z is a finite Gaussian mixture with mixture components driven by the hidden field. Additional covariates can also be included as fixed effects for the mean, Given the latent field, observed values Y are assumed to be independent leading to the conditional density and the complete likelihood function Inference for this models involves estimating the parameters (µ k , σ k ) k=0,1,...,C and β associated with the Gaussian Mixture and predicting the labels of the latent field z simultaneously. Bayesian methods and the EM algorithm are the most common approaches. The parameters of the latent field distribution θ are fixed a priori and considered tuning hyper-parameters of the algorithm.

Model representation
The model described in Section 2 can be completely characterized by three components: the random field z, the interaction structure R and the potentials θ. Additionally, y and the mixture parameters (µ k , σ 2 k ) k=0,...,C are included in hidden MRFs. A consistent representation of each component is provided in the package so that inputs and outputs of built-in functions, as well as methods the user may implement, are compatible and usable in the analysis pipeline. Representations are described in Table 1.

Random fields z and y.
Realizations of a random fields z and y are represented by simple matrix objects with di- A three-dimensional array object with dimensions (C + 1) × (C + 1) × |R|. For a pair of values (a, b) and the s-th interacting relative position r s of R, the corresponding potential is mapped at theta[a+1, b+1, s]. coordinates. This matrix represents a rectangular set of pixels that contains L. The value in row i 1 and column i 2 represents the observed value of the random field in position (i 1 , i 2 ): an integer in {0, 1, . . . , C} for z or a real number for y.
We do not require L to be a complete rectangular region. Pixels which position does not belong to L are assigned the NA value.
Two functions are available for visualizing random fields: dplot() and cplot(). dplot() should be used for discrete-valued fields z while cplot() is used for continuous-valued matrices y. These functions provide an alternative to base R image() function, producing elegant images in the form of ggplot objects. The main advantage is that they allow the use of the ggplot2 package (Wickham 2016) to customize the image using the grammar of graphics.
Details and examples of customization of the images produced using ggplot2 can be found in Appendix A.

Interaction structures R
Interaction structures are represented by objects of the S4 class 'mrfi' implemented in mrf2d. These objects can be created with the mrfi() or rpositions() functions, which have arguments max_norm, norm_type and positions. In mrfi(), the interaction structure created will include all relative positions which satisfy ||(i 1 , i 2 )|| ≤ max_norm for the specified norm type. positions can be passed as a list containing length 2 integer vectors with relative positions to include when using rpositions(). The function automatically checks for repeated and opposite relative positions to ensure the structure is valid.
An algebra of mrfi objects is implemented for manipulating these objects. + is used to perform union of two mrfi objects or a mrfi object and a numeric vector with 2 integers can be used to add a single interacting position to an existing mrfi object. Similarly, theoperator can be used to perform set difference between two mrfi objects or to remove a single position if a vector with 2 integers is used in the right-hand-side.
Some examples for creating different R are detailed below.
Additionally, conversion of mrfi objects to list is implemented in the as.list() method. Subsetting methods are also available with the "[]" and "[[]]" operators. These methods are particularly important for model selection algorithms, as many distinct sparse interaction structures can be obtained by using different subsets of a large reference base structure.
A plot method is available for mrfi objects. The code chunk below exemplifies the usage of plotting functions and manipulation of mrfi objects. The resulting plots are presented in Figure 2. The black square represents the origin position (0, 0), positions included in the interaction structure R are represented by the dark-gray squares with black borders, while their opposite directions are the light-gray squares.

Potentials array θ
The collection of potentials, θ r (a, b), is represented by an array object with dimensions (C + 1) × (C + 1) × |R|. Rows and columns are used to map a and b, respectively, while slices are used to map relative positions r. A set of potentials {θ r (a, b), a, b ∈ Z, r ∈ R} is always related to an interaction structure R = {r 1 , r 2 , . . . , r |R| }. Since the i-th slice maps the i-th relative position of R, r i , this is the minimal representation required to store all parameters required by the model.
An important detail is that array indices in R start at 1, while we consider our set of possible values Z = {0, 1, . . . , C}, therefore we need to shift a and b by one position when accessing their value in the R array. Figure 3 illustrates how potentials can be represented as an array in R in the C = 2 case. Two elements are highlighted and the associated indices used to access them are shown as examples.

Parameter restriction families
Parameter restrictions play an important role in the inference process of our Markov random field models. mrf2d functions support 5 families of parameter restrictions for the array of potentials to be considered in inference algorithms. They are specified by the family argument of functions to ensure the resulting output array (theta) respects those constraints. A brief description of each interaction structure is given next. Table 2 presents the mathematical definitions, number of free parameters and an example of a slice of the array of potentials for the case with C = 2 in each family.
"absdif" For each r ∈ R, the potentials θ r (a, b) depend only on the absolute differences of their pixel values d = |b − a|. Note that "absdif" is equivalent to "oneeach" when C = 1.
"dif" Generalizes the "absdif" family allowing opposite signal differences to have different interactions.
Families "dif" and "absdif" should only be used when pixel values represent quantities and their differences are well-defined, for instance, in grayscale images with few levels, as in the example analysed in Gimel'farb (1996). If relabeling the values does not change the interpretation of the problem, then these restrictions are probably not suitable.
The function smr_array(theta, family) can be used to transform a parameter array into a vector of appropriate length containing only the free parameters corresponding to the provided array (theta) and the restriction family. The opposite operation is also available as the expand_array(theta_vec, family, mrfi, C) function. These transformations use a simpler and less memory consuming structure so they are particularly useful for optimization problems as most functions, for example R built-in optim function, require a vector of parameters. Also, they are convenient for storing multiple vectors, for example in Monte Carlo methods.

Random field sampler
Being able to sample observations of Markov random fields is a key component of many inference methods that aim to avoid the intractable normalizing constant. In mrf2d, a complete and efficient routine to sample fields using the Gibbs sampler algorithm described in Algorithm 1 is provided by the rmrf2d() function. Its arguments are: • init_Z: The initial field configuration, or a length-2 vector with the dimensions of the field to be sampled. If the dimensions are provided, the initial configuration is randomly sampled from independent discrete uniform distributions.
• mrfi: A mrfi object representing the interaction structure R.
• theta: An array of potentials.
• cycles: The number of Gibbs sampler cycles.
• sub_region: Optional argument used for non-rectangular images when init_Z is a vector holding the dimensions. A logical matrix with the same dimensions as specified in init_Z. Pixels with FALSE value are not included in the image.
• fixed_region: Optional. A matrix with logical values. Pixel positions with TRUE value are conditioned on their initial configuration (init_Z) value and are not updated.
We illustrate below the use of the sampling function on two fields: a 200×200 field (z_sample) without conditioning on any pixel (nothing specified in fixed_region) and a 100 × 100 field (z_border) conditioning on the boundary values being fixed as 0, sampled from a random initial configuration. The resulting images are presented in Figure 4. Non-rectangular fields can be sampled either by passing a non-rectangular field as the init_Z argument or by using the sub_region argument and specifying the dimensions of the sampled field.
Another important feature is conditioning on a subset of pixel values. There are many situations where keeping a subset of pixels fixed during the sampling process can be useful, for example, filling a region of missing pixel values via simulation, defining boundary conditions (our model corresponds to a free boundary condition, but other types such as fixed or periodic boundary can be sampled with proper manipulation of the initial configuration and conditioning region) or performing block-wise updates of the data using conditionally independent blocks (for parallelization of algorithms).
Since the Gibbs sampler algorithm updates each pixel value multiple times, performance is one of our main implementation concerns. To improve the performance and speed up computations considerably, the internals of the sampling function, as well as most other computationally intensive functions are written in C++ with the use of Rcpp (Eddelbuettel and François 2011) and RcppArmadillo (Eddelbuettel and Sanderson 2014) packages.

Statistical inference in mrf2d
Inference methods for MRF models are diverse and their suitability highly depend on the type of data being analyzed. The framework provided by mrf2d can be used to implement all sorts of algorithms that are built from a common stack of components: simulation, conditional probabilities and sufficient statistics. It also provides complete built-in routines for some estimation algorithms.

Miscellaneous rmrf2d
Generates samples of a MRF via Gibbs sampler. Used for Monte Carlo based methods. cp_mrf2d Computes the conditional probabilities for a pixel position given its neighbors. pl_mrf2d Computes pseudo-likelihood value for an observed field considering interaction structure mrfi and array of potentials theta. cohist Creates the co-occurrence histogram of an observed field given an interaction structure. Can be converted to a vector of sufficient statistics given a restriction family with the smr_stat function. smr_array and expand_array Conversions between array and vector representation of potentials given a parameter restriction family.

Built-in inference algorithms
fit_pl Estimates the parameter array given an observed field via pseudolikelihood optimization. Returns a mrfout object. fit_sa Estimates the parameter array given an observed field via stochastic approximation algorithm. Returns a mrfout object. fit_ghm Fits a Gaussian mixture driven by a given hidden MRF model using the EM algorithm from Zhang et al. (2001). polynomial_2d and fourier_2d can be used to create polynomial and 2-dimensional Fourier basis functions, respectively, to be used as a fixed effect. Returns a hmrfout object. Table 3: List of available functions used for inference in mrf2d with a brief description of each one. Table 3.4 presents a list of functions available in the package that can be used to construct inference algorithms, as well as built-in functions for parameter estimation for MRF and for the hidden MRF models defined in Section 2.3 that we describe next. The built-in inference functions return objects of class 'mrfout' (MRF data) or 'hmrfout' (hidden MRF models), which contains the information about the fitted model, as well as summary and plot methods associated for interpretation of the results.

Maximum pseudo-likelihood estimation
The pseudo-likelihood function in Equation 7 can be evaluated efficiently because it does not depend on the intractable normalizing constant. A common estimation procedure for intractable likelihood problems is optimizing the pseudo-likelihood with respect to the parameters. The pseudo-likelihood estimator is given bŷ The function fit_pl() from mrf2d implements an optimization of the pseudo-likelihood function using R built-in optim() function. It handles the conversions between array and vector representation of potentials automatically, respecting the restriction family selected and returns the estimated array of potentials and maximum value of the pseudo-likelihood in logarithmic scale. The arguments of fit_pl are: • Z: The observed random field z.
• mrfi: A mrfi representing an interaction structure R.
• family: A parameter restriction family.
• init: An array with the initial configuration used in the optimization. 0 can be used to start from the independent model.
• optim_args: A named list with additional arguments passed to the optim() function call.

Stochastic approximation algorithm
Given an observed field z (0) , the stochastic approximation algorithm (Robbins and Monro 1951) seeks to create a Markov chain of parameter vectors {θ (t) } t≥1 that converges to the maximum likelihood estimate of θ, which is the solution of the zero gradient condition E θ (S R (Z)) = S R (z (0) ), derived from Equation 9. The algorithm is defined by the recurrence where z (t) is a field sampled using θ (t) and γ (t) is a sequence of positive constants that satisfies ∞ t=1 γ (t) = ∞ and ∞ t=1 γ (t) 2 < ∞.
Stochastic approximation is implemented in mrf2d as the fit_sa() function. It samples z (t) via Gibbs sampler considering the previous field z (t−1) as the initial configuration. Periodically, the field samples are refreshed, starting from an independent discrete uniform distribution and running a greater number of Gibbs sampler cycles, this procedure prevents the algorithms from getting stuck in problematic field samples. Its arguments are: • Z: The observed field z (0) .
• family: The family of parameter restrictions considered when converting the potentials array to a vector.
• gamma_seq: A sequence of step size values to be used as γ (t) . These values are divided by the number of pixels |L| internally to be invariant with respect to the image size. The typical sequence recommended is seq(from = M, to = 0, length.out = B), with M ranging from 0.5 to 2 and large number of iterations (B).
• init: The initial array of parameters or the value 0 to start from the independent model.
• cycles: Number of Gibbs sampler ran between iterations.
• refresh_each: Restarts the sample z (t) from a random configuration each refresh_each iterations.
• refresh_cycles: When a refresh happens, how many Gibbs sampler cycles are ran in the current parameter configuration.
Among the elements of the returned mrfout object, theta contains the estimated potential array and metrics a data frame with the Euclidean distances between S R (z (0) ) and S R (z (t) ) for each iteration. This sequence of distances is used to monitor the convergence of the algorithm as a form of diagnostics analysis.

EM algorithm for HMRF models
Gaussian Mixtures driven by hidden MRFs can be fitted in mrf2d with an extension of the EM algorithm from Zhang et al. (2001) to include a fixed effect (see Freguglia et al. 2020, for details). The probabilities computed for the latent label of each pixel in the E-step are conditioned on the global maximum probability configuration of its neighbors, obtained via iterated conditional modes (ICM) algorithm at each iteration (Besag 1986).
The complete algorithm is available in the fit_ghm function. Its main arguments are • Y: The observed continuous-valued field y.
• mrfi: Interaction structure of the latent field R.
• theta: The array of potentials that defines the latent field distribution.
• fixed_fn: A list of functions of pixel positions f (i 1 , i 2 ) to be used as fixed effect. Constructors for 2-dimensional polynomials and Fourier basis are available in the functions polynomial_2d and fourier_2d, respectively.
• equal_vars: A logical value indicating if mixture components are forced to have equal variances.
• init_mus and init_sigmas: Optional initial values of (µ a , σ a ) a=0,...,C . If none is passed, an independent Gaussian mixture is fitted with initial values based on quantiles and the estimates of this fitted model are used as (often good) starting values in the main procedure.
• maxiter: Maximum number of iterations before stopping.
• max_dist: Defines a stopping condition for the EM algorithm. For consecutive iterations t and t+1, the absolute difference in each parameter, |µ | are computed for k = 0, 1, . . . , C. The algorithms stops if all differences are less than max_dist. configuration of the latent field computed via ICM algorithmẑ, fixed -a matrix with the estimated fixed effects x ⊤ iβ for each pixel, and predicted -a matrix with the predicted mean for each pixel x ⊤ iβ +μẑ i .

Description
To illustrate the usage of mrf2d for finite-valued images, we use the object field1 available in the package, which contains a binary field with anisotropic pattern as seen in Figure 5. It is a synthetic texture image of the same type as the binary texture data presented in Cross and Jain (1983). The data can be loaded and viewed using the code chunk below.
R> data("field1", package = "mrf2d") R> dplot(field1, legend = TRUE) Our goal is to fit a MRF model to this data and sample images from the fitted model to evaluate if the patterns achieved in the generated data are similar to the original data. This is the typical setup of a texture synthesis problem with finite-valued images.
This analysis involves three main stages: Specifying the model (interaction structure and parameter restrictions), estimating the parameters and evaluating the fitted model. All of the run times described in this section were obtained using an Intel Core i7-7500U 2.70GHz CPU processor.

Specifying R and parameter family
Model selection under intractability is a challenging problem because most algorithms require comparing (maximum) likelihood functions for different models, what cannot be done exactly and/or have a high computational cost for MRF models.
The main routes in the model specification stage are: using prior information of the data or problem to select what type of restrictions and interaction structure are best suited, using the most general model (e.g., no restrictions and a complex interaction structure) as in Freguglia et al. (2020) or using some estimation technique. This image presents a diagonal pattern what indicates a nearest-neighbor interaction structure may not be appropriate to capture all the dependence present in the field. We first choose what kind of parameter restriction family will be considered by checking that relabeling the values Z does not change the patterns in the image indicating symmetric potentials should be suited for this image, and there is a clear difference in the interactions when considering pixels in different directions, what indicates we need different types of interaction for different relative positions. These characteristics match the "oneeach" family that will be used in this example.
In order to estimate the set of interacting positions R, we use use a naive algorithm which consists of performing 300 steps of stochastic approximation considering a large set of candidate interacting positions (all positions with maximum norm less or equal 6, 84 positions total) and then select the positions with absolute value of the associated potential higher than a threshold value. This is a strategy similar to the heuristic search algorithm from Gimel'farb (1996). Stochastic approximation was preferred over maximum pseudo-likelihood, for example, because it is computationally more suited for high-dimensional situations and we are not requiring a very accurate estimation at this point, so we can use a reasonably low number of iterations.
The code below implements this naive interaction selection algorithm in a few lines using the tools available in mrf2d considering a threshold value of 0.10. A large set of interaction position (candidates) is defined and the stochastic approximation algorithm is executed based on this complete interaction structure, obtaining complete_sa, then a selection based on thresholding absolute values of interactions is performed (selected).
R> plot(complete_sa) R> thr_value <-0.1 R> theta_vec <-smr_array(complete_sa$theta, "oneeach") R> selected <-which(abs(theta_vec) > thr_value) R> R1 <-candidates[selected] R> R1  Estimating θ Considering the parameter restriction family "onepar" and the selected interaction structure R = {(1, 0), (0, 1), (4, 4)}, we have a model with 3 free parameters. A 3-dimensional optimization problem is simple enough to be solved using the built-in pseudo-likelihood optimization function. We also fit the model via stochastic approximation, now only considering the selected interaction structure, for comparison. The results are compared with the summary() method for the 'mrfout' class and presented below. The resulting estimates were roughly the same using the two functions. The fit_pl call ran in 2.371 seconds while the fit_sa call took 56.627 seconds to complete. The optimization process from maximum pseudo-likelihood estimation was substantially faster, mainly due to the low-dimensionality of the problem, than running a satisfactory number of stochastic approximation steps to achieve reasonable precision.

Evaluating the fitted model
To evaluate how well the estimated parameters fit the data, we generate a new sample from the fitted model. Figure 8 shows the original image and the image simulated from the fitted model for comparison. The patterns created are visually very similar. Therefore the fitted MRF model successfully describes the characteristics of the data and is capable of synthesizing new images with the same texture pattern.

Description
The data available in the object hfield1 in the package will be used to illustrate the use of mrf2d for Gaussian mixtures driven by hidden MRFs. It consists of an image with continuousvalued pixels ranging from 0.3 to 15.2. A pattern similar to the previous example can be observed with the addition of a continuous noise.
R> data("hfield1", package = "mrf2d") R> cplot(hfield1) We consider an image composed by a latent (hidden) MRF plus a random noise whose distribution for each pixel may depend on the pixel value in the latent field. The main goal for this type of data is to recover the segmentation of the underlying pixel labels, an image segmentation problem (Li, Wu, and Zhang 2009;Shah and Chauhan 2015, for example). This is a typical problem where Gaussian mixtures driven by hidden Markov random fields are well suited.

Fitting a hidden MRF with no fixed effect
The built-in function for fitting hidden MRFs (fit_ghm()), like most algorithms used for Gaussian mixtures driven by HMRFs, considers the distribution of the underlying field as a hyper-parameter specified a priori. In this example, since we observe an underlying pattern similar to one in Example 1, we will reuse the model estimated by penalized likelihood as the MRF distribution.
We fit a HMRF model to the data using the fit_ghm function. The mixture parameters estimates are shown below and the resulting segmentation is presented in Figure 10(b).

Adding a polynomial trend as fixed effect
A HMRF model without covariates has an intrinsic assumption that the mean values of pixel intensities, given their labels, are homogeneous along the image region. This is not the case for the data in Figure 9, as a vertical gradient effect can be observed.
In order to incorporate this spatial effect not captured in the model, we include spatial covariates, in the form of polynomial functions of pixel positions (i 1 , i 2 ) as a fixed effect. These covariates can be specified in fixed_fn argument of the function and a (centered) polynomial can be created with the polynomial_2d() function from mrf2d. In this example, we include all terms of a two-dimensional centered cubic polynomial, that is, where the centering position (c 1 , c 2 ) is the middle pixel position of the image.
The code chunk below illustrates how the resulting fields available in the function output can be visualized. The results are presented in Figure 10.
R> cplot(hfield1) R> dplot(hmrf_nofixed$Z_pred, legend = TRUE) R> cplot(hmrf_poly$fixed) R> dplot(hmrf_poly$Z_pred, legend = TRUE) This example highlights two features of mrf2d that are not available in other packages: The possibility to specify a distribution for the underlying field that is more flexible than a simple Potts model and the option to include covariates (in this example, the polynomial trend) that are estimated simultaneously to the mixture parameters, preventing undesired effects in the segmentation results.

Example 3: Neuroimaging segmentation with BOLD5000 data
Neuroimaging is one of the most frequent applications of HMRF models (Zhang et al. 2001;Shah and Chauhan 2015). We illustrate a brain magnetic resonance image segmentation using a sample of the BOLD5000 dataset (Chang, Pyles, Marcus, Gupta, Tarr, and Aminoff 2019) available in the bold5000 object in the package.
R> data("bold5000", package = "mrf2d") R> cplot(bold5000) Our main goal in this problem is to segment the brain image into large regions corresponding to different elements, like a background, bones, fat, grey matter, white matter, etc.
The most common approach for the segmentation using HMRFs is to consider a simple Potts model (nearest-neighbor interaction structure R = {(1, 0), (0, 1)} and the "onepar" parameter restriction family. The potential associated with different-valued pairs controls, as well as the number of components are considered fixed a priori and will not be discussed in this paper. For the purpose of illustration, we use 4 components (C = 3) and the value −1 for the potentials of different-valued pairs.
R> Rnn <-mrfi(1) R> theta_nn <-expand_array(-1, family = "onepar", C = 3, mrfi = Rnn) We add a constraint that all variance parameters of the mixture components must be equal by setting the equal_vars parameter to TRUE. This improves the results in this problem by preventing some of the mixture components to be estimated with too high variance, what may causes pixels with large and small values to be predicted in the same class with high probability. We also fit an independent Gaussian mixture (by multiplying all potentials by zero) for a comparison with the HMRF model. Segmentation results are presented in Figure 12    The resulting parameter estimates are not much different when comparing the independent mixture model and the HMRF and both ran in approximately 85 seconds, but the segmentation is cleaner when using the HMRF model, without sparse different-labeled pixels inside regions.

Discussion
mrf2d provides a consistent programming interface for statistical inference in a large class of discrete Markov random field models defined on 2-dimensional lattices. It has an efficient and simple to use implementation of the main stack of computations used by most of inference algorithms, as well as complete routines for some commonly used and more complex estimation methods. The objects used for representing each model component have been carefully designed and tuned over several iterations to achieve a balance between performance and usability in the stable version.
The model featured in the package generalizes Potts model from other available packages in different ways, such as allowing a flexible definition of interacting pixel positions and interaction types, with the drawback that it cannot take advantage of algorithms that require the setup of a Potts model to improve their efficiency.
The versatility from the non-parametric model behind mrf2d and the flexible representation proposed in the package allows us to create special 2-dimensional structures that are equivalent to 3-dimensional representations of data. This can make mrf2d also an interesting option for applications with 3-dimensional data where the assumptions of the model also hold. A vignette is available with a deeper explanation on how to reshape 3-dimensional problems for the mrf2d framework and it can be viewed by using the vignette() function.
R> vignette("three-dimensions-on-mrf2d", package = "mrf2d") We currently have over 160 unit tests supported by the testthat package (Wickham 2011) and more than 90% of the code covered in the tests. These tests were designed to verify mathematical correctness of functions, the behavior of functions with unexpected input and the consistency of error messages. The package is in constant development and new tests are added whenever new functionalities are implemented to ensure its reliability over time.
For these reasons, mrf2d is an important tool for making statistical inference in images using Markov random fields more accessible, allowing researchers to perform data analysis and implement new algorithms in R with a simple and consistent framework.

A.1. Random field visualization
The two plotting functions dplot() and cplot() return an object of the 'ggplot' class, what allows users to produce customized visualizations by changing scales, legend characteristics, titles, themes and much more by using the grammar of graphics from the ggplot2 package (Wickham 2016).
Random fields (represented by matrix objects) are transformed into a data.frame structure with columns x, y and value. x and y are the indices of the matrix object while value maps the pixel-value in that position (Z[x,y]).
The plots are constructed using a tile plane with rectangles (geom_tile() from ggplot2) with the value column map to the fill aesthetics. In dplot(), which is used for finite-valued fields, value is treated as a factor, while in cplot() it is a continuous numeric. This is the only difference between the functions and it should be kept in mind when defining custom color scales. R> library("ggplot2") R> base_plot <-dplot(field1, legend = TRUE) R> base_plot + ggtitle("This is a custom title") R> base_plot + theme_void() + theme(legend.position = "none") R> base_plot + scale_fill_manual(values = c("red", "blue")) R> base_plot + theme(legend.position = "bottom")

A.2. Interaction structure visualization
Similarly to the functions used to visualize random fields, a plot() method is available for mrfi objects. A data.frame with columns named rx and ry is created internally with the coordinates of each interacting relative position. These columns are mapped to the x and y axis, respectively and a geom_tile is used to produce the plot. The reverse positions are included automatically with a light-gray color, but this can be prevented by setting include_opposite = FALSE in the plot call. It also returns a ggplot object that can be customized. The code chunk next presents examples of how plots can be customized by adding custom colors, text labels to the interacting positions and a custom title.

B. Comparing methods between packages for a Potts model
One of the most important particular cases for the model presented in Section 2.1.1 is the Potts model, which is used in the theory supporting many available packages. In particular, the potts package (Geyer and Johnson 2020) provides functionalities similar to some of the ones implemented in mrf2d, mainly random sampling and computing the composite likelihood (comparable to the pseudo-likelihood function). The main difference between the packages is that mrf2d uses a Gibbs sampler scheme, with sequential updates of individual pixel while potts uses the Swendsen-Wang algorithm (Wang and Swendsen 1990), that updates blocks of pixels at each iteration. We conducted a simulation study in order to compare mrf2d and the potts package version 0.5-9 with respect to the Potts model simulation algorithms. Our goal was not to compare implementations as the algorithms are different, but to check whether the simulated fields are comparable between packages as a form of validation and understanding the behavior of our simulation method with respect to the number of Gibbs sampler cycles.
Considering the parametrization of the Potts model in Equation 4, 100 realizations of a threecolor (C = 2) Potts model with parameter ϕ = −1 (corresponds +1 in the parametrization used in the potts package) were simulated considering number of iterations equal to 5,15,25,35,45,55,65 and 75 in each package. We use the term iteration referring to the cycles parameter in mrf2d and nspace in potts. For each simulated field, we computed the sufficient statistics T (z) = ||i−j||=1 1 (z i ̸ =z j ) . The results are presented in Figure 15 and the code for the simulations is available in the replication script for this article.
We can consider the algorithms achieved equilibrium when executing more iterations do not change the distribution of the samples, which we summarize by the distribution of their sufficient statistics. In this case, the Gibbs sampler from mrf2d seems to achieve equilibrium within 35 to 45 cycles while roughly a value of 25 for the nspace parameter of potts seems to be enough to achieve equilibrium in the Swendsen-Wang algorithm. Finally, we verify that the distribution of the sufficient statistics after reaching an equilibrium state is approximately the same in both packages and we conclude that even though potts uses the more specialized and efficient Swendsen-Wang algorithm, mrf2d with a Gibbs sampler provides a valid alternative simulation tool for the model that can also be extended, even within the scope of the Potts model, to cases such as positive ϕ, not covered by the Swendsen-Wang algorithm.
We also compare estimation algorithms in both packages for the maximum pseudo-likelihood method, which is a particular case of composite likelihood, thus available in potts. We simulated 100 realizations of a 3 color Potts model in a 64 × 64 lattice again with the parameter ϕ = −1. For each realization, we computed maximum pseudo-likelihood estimates using mrf2d and potts to compare the results. Results are presented in Figure 16. In our comparison, estimates obtained with potts were consistently greater (smaller absolute value) than the ones obtained via mrf2d. The average estimate for mrf2d was −1.0027996 while for potts the average was −0.9930211, with sample deviations equal to 0.0213258 and 0.0216528, respectively. We conclude that maximum pseudo-likelihood estimation methods are roughly equivalent in both packages.