FuzzyStatProb : An R Package for the Estimation of Fuzzy Stationary Probabilities from a Sequence of Observations of an Unknown Markov Chain

Markov chains are well-established probabilistic models of a wide variety of real systems that evolve along time. Countless examples of applications of Markov chains that successfully capture the probabilistic nature of real problems include areas as diverse as biology, medicine, social science, and engineering. One interesting feature which charac-terizes certain kinds of Markov chains is their stationary distribution, which stands for the global fraction of time the system spends in each state. The computation of the stationary distribution requires precise knowledge of the transition probabilities. When the only information available is a sequence of observations drawn from the system, such probabilities have to be estimated. Here we review an existing method to estimate fuzzy transition probabilities from observations and, with them, obtain the fuzzy stationary distribution of the resulting fuzzy Markov chain. The method also works when the user directly provides fuzzy transition probabilities. We provide an implementation in the R environment that is the ﬁrst available to the community and serves as a proof of concept. We demonstrate the usefulness of our proposal with computational experiments on a toy problem, namely a time-homogeneous Markov chain that guides the randomized movement of an autonomous robot that patrols a small area.


Introduction
Assume we observe the state of a system at certain time points (i.e., time is discretized for our observations). An observation X t obtained at time t represents one state the system can be in. In fact, the collection of observations X 0 , X 1 , . . . , X n represents an indexed sequence of random variables, each of which can take values within the system state space S " t1, . . . , ru, which we will assume finite. Such indexed sequence tX t , t " 0, 1, 2, . . .u is known as a stochastic process (Medhi 2002) in discrete time and with discrete state space S.
The method can be summarized as follows: The fuzzy probabilities of the fuzzy transition matrix will be first decomposed in their α-cuts, which will be used to solve the problem in separate pieces, and finally the results will be aggregated to re-construct the fuzzy stationary probabilities we are searching for. The starting point of the problem is a finite sequence of observations of the system state at each step. This is the input required by the FuzzyStatProb package we have developed. The output is provided as a list of fuzzy numbers, one for each state of the input Markov chain, using the FuzzyNumbers package described in Gagolewski (2015) that provides plotting functionality and is currently under active development.
The paper is structured as follows. In Section 2, the mathematical background and the method implemented in the package are formally developed. Section 3 describes implementation details and the signature and options of the public function of the package. Section 4 shows use examples and provides insights on the effect of a greater number of observations over the fuzzy output of the method. Finally, Section 5 is devoted to conclusions and further work.

A method to compute fuzzy stationary probabilities
An homogeneous Markov chain is said to be irreducible if all states form a single group so that every state is accessible from every other state, be it in one or more than one step. A finite, irreducible Markov chain with transition matrix P has a stationary distribution π such that π " πP (Medhi 2002). The FuzzyStatProb package is aimed at computing a vector of fuzzy stationary probabilitiesπ " pπ 1 , . . . ,π r q departing from an uncertain (fuzzy) transition matrixP . The remainder of this section briefly summarizes the ideas on fuzzy Markov chains presented in Buckley (2005), as they lie at the very core of our package.

Fuzzy numbers
Let us consider an uncertain transition matrixP " pp ij q in which one or more elements are uncertain. We capture the uncertainty regarding some of the elements by substituting them by fuzzy numbers, which are a special case of fuzzy sets (Zadeh 1965;Negoita and Ralescu 1975). Roughly speaking, a fuzzy setÃ over a universe U is a set whose characteristic function is not binary, f : U Ñ t0, 1u as in a classic set, but takes values in the unit interval, µ : U Ñ r0, 1s, so that µÃpxq is the degree to which element x P U belongs to the fuzzy set A. A fuzzy number is a fuzzy set over R representing a number with uncertainty (Dubois and Prade 1978). For instance the uncertain quantity "about 2" may be represented as a fuzzy numberÃ whose membership function reaches its maximum value 1 at x " 2, but also assigns non-zero membership degrees to other values close to 2, since a value of, say, 1.8 is also ("belongs also to") "about 2", not with degree 1 but a bit less, say, 0.6. Then µÃp2q " 1 and µÃp1.8q " 0.6. We now explain this concept formally as it is the mathematical structure used in our package to cope with uncertainty in the transition and stationary probabilities.
Definition 1. The α-cut of a fuzzy setÃ for any α P r0, 1s is defined as the set of elements that belong toÃ with degree α or greater: Apαq " tx P U : µÃpxq ě αu.

Definition 2.
A fuzzy numberÃ is a fuzzy set on R such that (i)Ãpαq are nonempty convex sets @α P r0, 1s.
Since α-cuts of a fuzzy number are closed intervals of R, they can be written asÃpαq " rÃ L pαq,Ã R pαqs. The first two conditions are basically equivalent to the fact that the membership function of a fuzzy number must be continuous and monotone, which are two features of the output fuzzy numbers computed by our method as will be shown later.

Construction of fuzzy transition probabilities from observations
As stated in Section 1, our data consists of a finite sequence of observations drawn from the system at consecutive time points. Each observation is the state of the system at that time point, represented as an integer belonging to the state space of the chain. In such a sequence, an estimation of the transition probability p ij can be computed as the number of times N ij that state i was followed by j in our data, divided by the number of transitions N i observed from i to another state (including i itself). However, this would be just a point estimate and does not capture the uncertainty associated to it. Interval estimation would be the next step as it also takes into account the sample size and variance of the data, but it calculates an interval only for a significance level that is fixed beforehand.
Our aim is to model each uncertain transition probability as a fuzzy number that properly captures all the uncertainty of the data, regardless of external parameters such as the confidence level. For this purpose, the fuzzy number is obtained by the superposition of confidence intervals for the true transition probabilities at different confidence levels, one on top of another, as done in Buckley (2005). Although the significance level and the membership degree are numerically the same in this case, it should be noted that they are completely different concepts for which the letter α is commonly used. Such intervals are in fact simultaneous confidence intervals for multinomial proportions, since the problem of estimating the transition probabilities from a given state can be reduced to that of estimating the parameters (proportions) of a multinomial distribution. The method applied was introduced by Sison and Glaz (Sison and Glaz 1995;May and Johnson 2000) although many others had been proposed before.

Fuzzy Markov chains and restricted matrix multiplication
As we have said, fuzzy numbers are used in those entries of the transition matrix on which there is uncertainty. The rest of the elements could be crisp since a crisp number is a special case of fuzzy number. However, the uncertainty is on the probabilities but not on the fact that every row must add to 1. Now assume a Markov chain with r states and fuzzy transition matrixP , then for a given α P r0, 1s,P pαq " pp ij pαqq represents a matrix of intervals. Such matrix can also be thought of as the set of all rˆr matrices M such that their elements fall inside their corresponding interval, m ij Pp ij pαq, and every row adds to 1, ř r j"1 m ij " 1, i " 1, . . . , r. The domain of row i for a given α value is defined as the set of r-dimensional vectors that simultaneously fulfill those two constraints for row i, i.e., the set of probability distributions that row i could take in our uncertain transition matrix when we take its α-cuts for that α. If we call ∆ r " tpx 1 , . . . , x r q : x i ě 0 and ř r i"1 x i " 1u, then The domain of the matrix for a fixed α, Dompαq is defined as the Cartesian product of the domain of every row and represents the space of matrices we admit when the uncertainty level is α. Dompαq is closed and bounded (compact) because it is the Cartesian product of compact sets, so any continuous function applied to its elements will have a compact image. In particular, as explained in Buckley (2005, Chapter 6), the power to exponent n of the rˆr transition matrix (which is done as successive matrix multiplications) can be thought of as a collection of r 2 independent functions f pnq ij : R r 2 Ñ R, each of which calculates the value of one entry of the resulting matrix by operating with the entries of P : p pnq ij " f pnq ij pp 11 , . . . , p 1r , p 21 , . . . , p 2r , . . . , p r1 , . . . , p rr q. Such functions are continuous and thus, when applied on a compact set like Dompαq for a concrete α, the image of all of them is another compact set (a closed interval) of R. Such interval can be viewed as an α-cut of the fuzzy numberp pnq ij , i.e.,p pnq ij pαq " f pnq ij pDompαqq. By the representation theorem (Negoita and Ralescu 1975), it is possible to reconstruct the fuzzy numberp pnq ij as the union of its α-cuts which, in theory, can be computed using the restricted fuzzy multiplication described before with different values of α. For our problem, we will use the superior of the α values as the union operator.

FuzzyStatProb: Fuzzy Stationary Probability Estimation in Markov Chains
In particular, the stationary distribution π matches any of the (identical) rows of a matrix Π such that Π " lim nÑ8 P n , which means that theoretically, π can be computed using matrix multiplication. We are thus in the same case explained above which guarantees that the resulting images are compact sets over R, or more precisely the α-cuts of the fuzzy stationary probabilities we aim to calculate.

User-specified fuzzy transition probabilities
The package also supports user-specified fuzzy transition probabilities. Instead of departing from a sequence of observations of the state of the chain at consecutive time instants, the user provides a fuzzy transition matrix composed of fuzzy numbers that constitute the transition probabilities. As explained before, despite being fuzzy, these numbers must form a probability distribution for each row. According to previous works (Halliwell and Shen 2008), this can be formalized by imposing the constraint thatp i1 'p i2 ' . . . 'p in Ě 1 χ for each row i " 1, . . . , n.
Here,Ã ĚB Ø µÃpxq ě µBpxq @x P R, the symbol ' stands for the fuzzy sum, and 1 χ is the real number 1 represented as a fuzzy number (singleton). Note this is a necessary condition to ensure that Dompαq is not empty, as proved in Villacorta, Verdegay, and Pelta (2013) and, therefore, our code first checks this condition and stops if it is not fulfilled.
One of the main applications of fuzzy probabilities is related to linguistic probabilities (Halliwell and Shen 2008), as we explain next. In case no data of the chain are available, we may have to elicit the transition probabilities from a human expert or group of experts. However, it can be difficult for them to express their expertise using numeric probabilities which, in addition, must fulfill the requirement of having a sum of exactly 1. Therefore, natural language can help express probabilities in a more natural way. The transition probability becomes a linguistic variable whose possible values are linguistic terms (Zadeh 1975b), such as Very unlikely, Most likely, It may, Extremely likely and so on. Each linguistic term has an underlying fuzzy number that captures the vagueness inherent to natural language. The input information would consist of subjective statements like The robot goes very often to location 6 when it is in location 1, it almost never goes from there to location 2, when it is placed in location 5 it seems to go to state 8 almost surely, etc. It is not relevant for our implementation whether the fuzzy transition probabilities have an associated linguistic term or not, but each fuzzy number of the input transition matrix must be referred by a name when calling the function (see the next section).

Computation of fuzzy stationary probabilities
The approach outlined in Section 2.3 did not make direct use of the membership functions of the fuzzy transition probabilities, but worked only with their α-cuts to establish Dompαq. Thus it is not necessary to fully reconstruct such fuzzy numbers; it suffices to compute their α-cuts for the desired levels of α. Each of those levels provides us with an interval transition matrix that defines a domain, i.e., a space in which we admit our uncertain transition matrix lies. Now, in order to find the α-cuts of the fuzzy stationary probabilities for a certain α, we just have to find, for each state of the chain, a pair of matrices within Dompαq which respectively minimize and maximize the stationary probability of that state. This is repeated independently for every state. In other words, assuming a Markov chain with r different states, and focusing on a significance level α, we have to solve r independent minimization problems and r independent maximization problems in order to compute the r different α-cuts of the fuzzy stationary probabilities we are looking for. Note that the search space is the same for all the problems, Dompαq, but the objective function is not: For each state s j whose stationary probabilityπ j is being computed, it is the expression which yields the stationary probability as a function of the entries of the transition matrix. The function is first minimized to find the lower bound of the α-cutπ j pαq, and then maximized to find the upper bound. It seems clear that a general expression cannot be calculated for any r-state chain to apply conventional optimization techniques, since such a function (in which all the matrix coefficients p ij will be symbolic as well) would be too complicated for chains with more than 5 or 6 states. Furthermore, R symbolic capabilities are very poor and make it unsuitable. For both reasons, heuristic search (optimization) algorithms will be employed, as suggested in Buckley (2005).
The α-cuts can be written as follows. Let g j : R r 2 Ñ R be a function that represents the calculation of the jth component of the stationary distribution of an r-state Markov chain whose transition matrix is expressed as an r 2 -dimensional vector pp 11 , . . . , p rr q. As explained in the previous section, such a distribution can be obtained by means of matrix multiplication that converges to a matrix Π with identical rows that are indeed the stationary distribution. Using the same notation and assuming we are only interested in the first row of Π, we can set g j " lim nÑ8 f pnq 1j , i.e., g j is the function that computes the element Π 1j . Such a function only applies sums and products during the calculation process, so it is continuous and hence g j pDompαqq is a compact set on R, i.e., an α-cut. Theñ π j pαq " rπ j L pαq, π j R pαqs, j " 1, . . . , r π j L pαq " mintw j |w j " g j pp 11 , . . . , p rr q, pp 11 , . . . , p rr q P Dompαqu (4) π j R pαq " maxtw j |w j " g j pp 11 , . . . , p rr q, pp 11 , . . . , p rr q P Dompαqu (5) Equations 4 and 5 state that, in order to compute the lower and upper bounds of an α-cut of π j (the jth fuzzy component of the fuzzy stationary distribution vectorπ), we must find the minimum and the maximum of the jth component of all the crisp stationary distributions corresponding to feasible crisp matrices, understanding feasibility as belonging to the space Dompαq for that α. In practice, the condition w j " g j pp 11 , . . . , p rr q is implemented as w " wP . The final step is to calculate an analytical expression for the membership function of the fuzzy stationary probabilities. If the above α-cuts were exact, i.e., if we could guarantee that the solutions for the optimization problems are global optima, then the adequate way to proceed would be to interpolate the lower and upper bounds for every sampled α to separately obtain expressions for the left and the right sides of the membership function. The number of points depends on how much precision we need for the membership function. If the membership function has to be reconstructed very accurately, we should probably sample α values in [0, 1] in steps of, say, 0.01, which yields 100 points in each side. A larger step size of about 0.05 (20 points) is probably enough for most real problems and is less time consuming.
However, since we have no guarantee that the α-cuts are exact because they have been obtained by heuristic optimization procedures, regression may also be a good choice, and the loss of precision is globally quite small. The form of the regression function depends on what the points look like. In order to allow for the maximum flexibility, it is the user who can choose between spline interpolation or some kind of regression to be applied. More details are provided in Section 3. 8

FuzzyStatProb: Fuzzy Stationary Probability Estimation in Markov Chains
Numerical example. The first part of the example does not use data from observations but serves to illustrate the mathematical procedure. LetÃ " pa, b, cq, where a, b, c P R, a ď b ď c, be a triangular fuzzy number (TFN), which can be viewed as a particular case of fuzzy number with the following membership function: Now consider the fuzzy transition matrix depicted on the left, formed by TFNs which, in this particular case, are symmetric (although they do not have to). Note some of them carry more uncertainty than others. Focusing on a concrete value for α, for instance α " 0.5, this matrix yields the interval matrixP p0.5q " pp ij p0.5qq on the right, constituted by the 0.5-cuts of the fuzzy transition probabilities ofP .
Domp0.5q " tpp 11 , p 12 , p 21 , p 22 q P R 4 : pp 11 , p 12 q P Dom 1 p0.5q, pp 21 , p 22 q P Dom 2 p0.5qu Let g 1 , g 2 be the functions that, for the given α " 0.5, compute the elements Π 11 and Π 12 of the (crisp) stationary matrix Π corresponding to each of the 2ˆ2 feasible matrices tP " pp 11 , . . . , p 22 q P Domp0.5qu. Let us rename the crisp stationary distribution pΠ 11 , Π 12 q of a feasible matrix as w " pw 1 , w 2 q. We are interested in finding four feasible matrices P 1 , P 2 , P 3 , P 4 P Domp0.5q which, respectively, minimize and maximize w 1 , and minimize and maximize w 2 . The minimum and maximum w 1 act as the lower and upper bounds of the 0.5-cutπ 1 p0.5q, and the same applies to w 2 with respect to the 0.5-cutπ 2 p0.5q: π 1L p0.5q " mintw 1 |w 1 " g 1 pP q, P P Domp0.5qu " mintw 1 |w " wP, P P Domp0.5qu π 1R p0.5q " maxtw 1 |w 1 " g 1 pP q, P P Domp0.5qu " mintw 1 |w " wP, P P Domp0.5qu π 2L p0.5q " mintw 2 |w 2 " g 2 pP q, P P Domp0.5qu " mintw 2 |w " wP, P P Domp0.5qu π 2R p0.5q " maxtw 2 |w 2 " g 2 pP q, P P Domp0.5qu " mintw 2 |w " wP, P P Domp0.5qu This procedure should be repeated by sampling α in [0, 1] with a step size that depends on the precision desired to reconstruct the fuzzy stationary vector from its α-cuts. Now, suppose we do not depart from a fuzzy transition matrix (provided by a human expert, for instance) as in the example above, but from a collection of observations of the state of the chain at several consecutive time instants, e.g., t1, 3, 2, 3, 4, 4, 2, 3, . . . u. Then, in order to estimate the α-cuts of the fuzzy transition probabilities that are needed to establish the domains for each α, we resort to simultaneous confidence intervals at level α of multinomial proportions, using the number of times that the chain transitioned from one state to any other as the input. Once the confidence intervals for the transition probabilities have been computed to compose the interval matricesP pαq for each α, we can define the domains and go on with the rest of the process as it remains unchanged. In this case, matrixP is never constructed explicitly because we are only interested in the α-cuts of the fuzzy entries, as they constitute the constraints of the optimization problems.

Implementation in the FuzzyStatProb package
The signature of the only public function in package FuzzyStatProb is fuzzyStationaryProb(data, options, step = 0.05, ...) where: • data: This argument can be: (a) an array of either strings or natural numbers representing the observed states of the chain at consecutive time points. The function first coerces the elements to a factor. (b) A 2D square matrix of strings representing fuzzy transition probabilities directly given by the user. Each string should be contained in names(fuzzynumbers) and refers to the corresponding 'FuzzyNumber' object in the fuzzynumbers vector (see below). When the transition probability from state i to j is 0 (in the crisp sense), then entry pi, jq must be NA. The matrix should have colnames and rownames set.
• options: A tagged list containing the following parameters: verbose: Boolean, set to TRUE if progress information should be printed during the process. It is set to FALSE if this option is not specified.
states: An array of strings indicating the states for which the stationary distribution should be computed. The values should match those specified in the data argument. If this option is not specified, the fuzzy stationary probabilities are computed for every state of the chain.
acutsonly: Boolean, set to TRUE if no regression should be done after computing the α-cuts. This option is set to FALSE if not specified.
regression: A string with the type of the regression to be applied at the end of the algorithm for fitting the membership functions of the fuzzy stationary probabilities. Possible values are "linear", "quadratic", "cubic", "gaussian", "spline" and "piecewise" (piecewise linear). In all cases (including the gaussian), a different curve is fitted for each side of the fuzzy number. The "gaussian" option fits curves of the form µpxq " exp`´1 2ˇx´c sˇm˘. The "spline" option performs interpolation by a monotone cubic spline according to the Hyman method (see splinefun documentation) while "piecewise" computes a piecewise linear membership function by connecting consecutive points of the α-cuts with straight lines, using the builtin 'PiecewiseLinearFuzzyNumber' subclass of the FuzzyNumbers package. If this option is not specified, quadratic regression is carried out by default. If acutsonly is set to TRUE, this option is ignored.
ncores: Positive integer representing the maximum number of cores that can be used when running in parallel. If set to more than 1, then each processor takes care of all the computations involving one of the values of α that have to be sampled, via the parLapply function of the parallel package. Defaults to 1 (sequential) if not specified. If ncores is greater than the actual number of cores in the computer, all available cores are used.
fuzzynumbers: A tagged list with all the different 'FuzzyNumber' objects that appear in data when data is a matrix of labels; ignored otherwise. Every element of the list must have a name, referenced in at least one entry of data.
• step: Step size for sampling α when computing the α-cuts. The smallest α is always present and equals 0.001, and the rest of values are calculated as α " k¨step for k ě 1.
The greatest sampled value that is always present as well is α " 0.999. It is set to 0.05 when this option is not specified.
• ...: Further arguments to be passed to DEoptim.control to customize the algorithm that finds the lower and upper bounds of the α-cuts by solving a minimization and a maximization problem.
The value returned by the function is a tagged list that belongs to a new S3 class called 'FuzzyStatObj'. This class has only two specific methods, print and summary, and both print exactly the same: a brief summary of the processing that has been done, including the number of states, number of observations, regression type, step size and time elapsed, and information about the names of the two tags that should be queried to retrieve the results, namely $fuzzyStatProb and $acuts. The former is a tagged list of 'FuzzyNumber' objects (see the FuzzyNumbers package cited before) whose length equals length(options$states) and whose names match those in options$nstates. The latter is another tagged list with the same length and names, consisting of data frame objects that contain the α-cuts of every output stationary probability, represented as pairs pπ jL pαq, αq and pπ jR pαq, αq. The object at each position represents the fuzzy stationary probabilities and α-cuts of the element of the options$states list that is in the same position. Note that in case the user specifies within options$state a state that is not found among the elements of the data argument, then the corresponding position of $fuzzyStatProb and $acuts will be NA. When setting acutsonly = TRUE, the returned object does not have a $fuzzyStatProb tag but only $acuts. Section 4 contains a fragment of code and the R output showing how our function should be called.

Implementation issues
This section describes how each step of the mathematical process explained before has been implemented in R and the external packages used. A general scheme of the package is depicted in Figure 1, which has two possible starting points depending on whether the user provides a sequence of observations or a fuzzy transition matrix.
Confidence intervals. The first step is to build the fuzzy numbers of the fuzzy transition matrix, as the superposition of confidence intervals. The procedure samples several α from 0 to 1; the step size of the sampling is specified by the user in the step argument of the function. For each α and for each state s i of the chain, the method builds as many simultaneous confidence intervals as transitions to other states s j have been observed from s i . The procedure makes use of the MultinomialCI package created by the first author (Villacorta 2012) which implements the Sison and Glaz method (Sison and Glaz 1995)  , ,. ,.
If the user has provided directly the fuzzy transition probabilities, this step is omitted and the α-cuts are taken directly from the fuzzy numbers passed to the function.
Computation of α-cuts of stationary probabilities. The second (and main) step of the method is to solve the optimization problems of Equations 4 and 5, using some heuristic optimization technique. As explained in Section 2.5, the confidence intervals which are the α-cuts of the fuzzy entries of the transition matrix are used in this step as box-constraints for the optimization that searches for the minimum and maximum value of each stationary probability within Dompαq. The DEoptim package (Mullen, Ardia, Gil, Windover, and Cline 2011) was employed for this purpose. It is an implementation of the Differential Evolution (DE) algorithm (Storn and Price 1997;Price, Storn, and Lampinen 2005) that has exhibited excellent performance in a wide variety of hard continuous optimization problems. The package provides built-in capabilities to specify box-constraints for every variable but cannot handle equality constraints (in this case, the probabilities of every row of the matrix being evaluated must add to 1), so they have to be controlled directly in the objective function.

Computation of membership functions via regression.
The third and last step is to reconstruct the membership function of the fuzzy stationary probabilities whose α-cuts were computed in the preceding step. Focusing on regression, the user can choose among linear, quadratic, cubic and gaussian regression. When fitting a quadratic or cubic membership function, care should be put to avoid non-monotonous functions, not allowed in this context for fuzzy numbers. To be precise: • Since the membership function has to be continuous, it should intersect with the horizontal axis in at least one point between 0 and the central (core) point of the fuzzy number for the left-side function, and between the central point and 1 for the right-side function. If the function cuts the X axis in more than one point, only the greatest of those between 0 and the central point is considered, as well as the smallest of those between the central point and 1. 12

FuzzyStatProb: Fuzzy Stationary Probability Estimation in Markov Chains
• The function only needs to be monotonic in those intervals so local minima cannot exist within them.
• The left and right functions (before normalizing to [0, 1]) must satisfy f (core) = g(core) = 1, so the degrees of freedom decrease by one because it is possible to write one of the curve parameters as a function of the others.
It is clear that the problem can be viewed as a constrained minimization of the squared error function with respect to the cloud of points representing the α-cuts. First, the nls function from the stats package for non-linear least squares regression is called. The diminished degrees of freedom are expressed in the formula passed to nls, in which one of the parameters of the curve is expressed as a function of the others. This function yields a solution satisfying f (core) = 1, but the rest of the conditions are not guaranteed by nls so they have to be checked in the obtained solution. If they are fulfilled, the solution is accepted and returned.
If they are not fulfilled, and taking into account that regression accuracy at this stage is not as fundamental as the satisfaction of all the constraints, again DE is used to minimize the total quadratic regression error function for gaussian, quadratic and cubic regression. Constraints were implemented in the objective function of DE. For quadratic regression, the vertex must not fall between the horizontal cut-points and the core, and such cut-points cannot be negative nor greater than 1. For cubic regression, a turning point must not exist between the cut-points and the core. Linear and gaussian regression are themselves monotonic so the only constraint to be considered is the intersection with the horizontal axis. By definition, the gaussian function does not intersect the horizontal axis at any point but for practical purposes, it can be rounded to 0 when the membership degree is smaller than, say, 0.01.

Computation of membership functions via interpolation.
This option always yields good and fast results, although the expression of the solution is a bit more complicated. Interpolation with cubic spline functions can be applied to the α-cuts to obtain a membership functions with a soft shape, yet continuous. Monotonicity is guaranteed by the use of the "hyman" option of the splinefun function of the stats package; see the associated vignette for details. An alternative interpolation method supported is linear piecewise interpolation, provided by the FuzzyNumbers package in the 'PiecewiseLinearFuzzyNumber' subclass. The membership function is built by connecting a monotone 1 sequence of points px i , µpx i qq with straight lines.
Parallelism. The most computationally demanding task is the calculation of the α-cuts of fuzzy stationary probabilities. It requires running DE twice for each α value, and it is expected that any user samples at least ten α values, which yields 20 full executions of the optimization process for each state whose stationary probability has to be calculated. In our implementation, when α has been set, the α-cuts are computed for all the chain states indicated by the user, which means that lengthy computations have to be done before moving to another α value. For this reason, and since the computations with each α are independent of the rest, we have introduced parallelism at this level by using the built-in parLapply function of the parallel package over a vector containing all the α values to be sampled. Each physical core thus takes care of all the computations for a given α, i.e., runs a minimization he patroller the intruder i for all i). ar feasibility les (C \ i is C): i,j s are well ler can only straints (4)e patroller's no action ected utility -linearity is m admits a ling strategy ble, we pass se i of the s maximum. ing problem, r-when(q, s) s follows: enter-when(04,01), namely to enter cell 04 when the patroller is in 01. Cells 05, 08, and 13 are excluded from the route of the patroller. Indeed, visiting these cells would allow the intruder to perform an always successful action, for example enter-when(06,08) (see Fig. 1).

D. Augmenting Patroller's Sensing Capabilities
In this section we extend the sensing model of the patrolling robot by considering that it can sense the presence of the intruder beyond its current cell. We introduce a matrix V (n × n) where v i,j = 1 if cell j can be sensed by the patroller from cell i and v i,j = 0 otherwise. Matrix V embeds a model of the detecting sensor of the robot. A general sensing model of the patroller can be considered in our approach by substituting constraints (4)-(5) above with: In this case γ h,w i,j is the probability that the patroller reaches cell j in h steps, starting from cell i and not having sensed cell w. For example, let us consider that the patroller is able to sense its current cell and the free cells that are one cell away from it (r = 2). For instance, in Fig. 1, from cell 05, it can sense cells 04 and 08; from cell 06, it can sense cells 01 and 09; from cell 11, it can sense cells 07, 10, and 12; and so on. With this sensor model, the optimal patrolling strategy for the robot is reported in Fig. 3. Comparing with Fig. 2, cells 04 and 12 have been excluded from the patrol route. This makes sense, since the patroller, due to the augmented sensing capabilities, is able to patrol them from adjacent cells and a maximization for every state of the chain. With this approach, possibly all the cores available will be engaged in heavy computations since the number of sampled α's will most likely be greater than the number of processors. Such coarse grain parallelization is enough, although the DEoptim function provides a built-in option to run in parallel as well. Parallelism is disabled initially in our package (ncores = 1 by default) to prevent a sudden slowdown of the computer when getting all the available processors engaged in our computations, but can be enabled by setting ncores to a higher number, which is strongly recommended to keep computation times within affordable limits.

The package in use
Now we will show how the package can be used with a toy example. We will generate a realization of a 10-state Markov chain which controls the randomized movement of an autonomous robot that patrols an area, modeled as a two-dimensional grid (Figure 2). Each cell of the grid is a state of the chain, and the robot can only move to an adjacent cell at a turn (a discrete time instant). The movement is based on a probability distribution over the adjacent cells. Such distribution depends only on the current cell, and not on the trajectory followed by the robot in the past to reach that cell, so clearly it is a Markovian movement. It is time-homogeneous as well, since the probabilities used by the robot do not change along time. Such models are very common in the field of autonomous robotic patrolling (Agmon, Kraus, and Kaminka 2008;Amigoni et al. 2009). There are two main reasons that justify the randomness of this kind of strategies: • The area is large enough to prevent full coverage with a deterministic patrolling scheme, since some locations would remain uncovered in the sense that the time between two consecutive visits is larger than the time needed by the intruder to successfully penetrate that location.
• The robotic movement should be unpredictable when facing a full-knowledge opponent, i.e., one that has perfectly learned the robot's patrolling scheme. This situation arises when the opponent has been observing the robot's movement for a long time before choosing the cell to attack. A randomized movement leads to maximizing the robot payoff even in this case, since the only knowledge the intruder may acquire is the exact probability distribution employed by the robot, not the actual trajectory he will follow at each walk. Note that considering the strongest adversary is the common assumption in game theory as it is the worst case one may face.
Although the problem is usually solved with game-theoretic techniques 2 , it represents indeed a good example of a Markov chain with very large state space (actually, proportional to the extension of the area under consideration). The concept of stationary distribution becomes relevant since it gives an idea of the locations in which the robot spends less time or, in other words, the places that have less coverage and are thus the most promising to be attacked 3 . If the assumption of perfect adversarial knowledge is eliminated, i.e., if we consider a more realistic situation in which the adversary only has a sequence of observations of the robot's movement, then what the opponent does is actually an estimation of stationary probabilities based on the observations from an unknown Markov chain that guides the robot's movement.
A fuzzy estimation captures more information about the observations than a point estimate.

Departing from a sequence of observations
The Markov chain of Figure 2 depicts the game-theoretic solution (Amigoni et al. 2009) against a full knowledge opponent in a map with 13 cells, three of which are not reachable and thus discarded for our Markov chain. Assume the adversary observes a realization of 200 time instants of this chain. The R code that generates such sequence of observations and then computes the fuzzy stationary probabilities is shown below. robotStates <-c("01", "02", ..., "12") is a 10-component string vector with the names of the states, and transRobot is the transition matrix of Figure 2. Both variables are contained in the package. The states argument is not specified in the call to fuzzyStationaryProb, which means the fuzzy stationary probabilities of all the states should be computed.

R> m <
plot ( ) + } R> plot(1, type = "n", axes = FALSE, xlab = "", ylab = "") R> plot_colors <-c("blue", "red", "springgreen4", "black", "orange") R> legend(x = "top", inset = 0, + legend = c("Linear", "Quadratic", "Cubic", "gaussian", "Spline"), + col = plot_colors, lwd = 2, cex = 1, bty = "n", horiz = FALSE) R> summary(quadratic) . Fuzzy stationary probabilities of a Markov chain with 10 states . Probabilities have been computed for states: 01 . Number of input observations: 200 . Parameters: Step size: 0.1 Execution was done in parallel ( 4 cores used ) Regression curve for output membership functions: quadratic . To retrieve the results use $fuzzyStatProb and $acuts with this object . Computation of alpha-cuts took 194.04 seconds . Membership functions regression took 34.71 seconds Let us retrieve the 'FuzzyNumber' object corresponding to the stationary probability of state "01", and the α-cuts from which such number was built. The α-cuts are returned as a data.frame object in which the first column represents the probability and the second, the membership degree α. Note there are always two rows (rows 1 and 10, 2 and 11, etc) with the same y (membership) value, one for the lower bound and the other for the upper bound of that α-cut. This way, the limits of the α-cuts can be plotted as if they were 2D points. An important remark should be made here. There are no output α-cuts for α " 0.8 and α " 0.9. This is due to the fact that the intervals of the input interval matrices were very small, and thus the minimization and maximization problems for each of these values have all the same solution. It is because the multinomial confidence intervals are very small when the required confidence level is also very low. This happens for α " 0.8, 0.9 and 0.999. In practice, the corresponding output α-cuts are points rather than intervals, and actually it is the same point for those three values of α. By the representation theorem (Negoita and Ralescu 1975), the membership degree of all the elements within such very small output α-cuts is actually the greatest of them, i.e., 0.999, so the others can be discarded and the 0.999 is the only one retained. In our implementation, before carrying out the regression we always discard those α-cuts such that the distance from the lower or upper bounds of the α-cut to the fuzzy number vertex (which is the point estimate of the stationary probability) is less than 0.005.

R> quadratic$fuzzyStatProb[["01"]]
The graphical output of the above code is the plot in Figure 3. A number of details can be observed. The gaussian curve usually provides the best fit, and this is actually the reason for which it was included as an option in our package. However, it has the most complicated expression. Cubic fitting also fits quite well. Regarding the shape of the figures, it is clear that the fuzzy numbers are in general not symmetric around the point estimate as the vertex is sometimes very close to 0 or 1 so the points cannot spread much in that side.
Moreover, fuzzy stationary probabilities provide more information than a point estimate based on the transition proportions from the sequence of observations. A comparison between stationary probabilities of states 6 and 8 serves to illustrate how beneficial it is to have fuzzy estimations. According to point estimates (which are the central point of the output fuzzy numbers), the stationary probability of being at state 6 is smaller than that of state 8. However, the α-cuts of fuzzy stationary probability for state 6 are wider and the fuzzy number is more right-skewed than state 8, indicating there is more uncertainty on this value and suggesting that it could be actually greater. In fact, the true stationary probabilities are π 6 " 0.1946, π 8 " 0.1154, and therefore π 6 ą π 8 . Note, additionally, that the output of the function steadyStates from package markovchain consists of punctual estimates of stationary probabilities. The package is able to compute bootstrap confidence intervals for the transition probabilities but not for the stationary probabilities. Therefore fuzzy transition probabilities are more informative. As could be expected, the point estimates of stationary probabilities given by steadyStates match the 1-cuts of our fuzzy stationary probabilities.

Departing from user-specified fuzzy transition probabilities
Now we depart from a matrix of fuzzy numbers that constitute the fuzzy transition proba- bilities and are given directly by the user. In this case we assume the input are linguistic probabilities as each of them is represented by a (meaningful) linguistic term: EU ("extremely unlikely"), VLC ("very low chance"), SC ("small chance"), IM ("it may"), MC ("meaningful chance"), ML ("most likely"), EL ("extremely likely"). However, we could have chosen the names to be meaningless labels like "L 1 ", "L 2 ", . . . , "L l " where l stands for the number of different fuzzy numbers employed in the matrix. The linguistic transition matrix created for our problem (Figure 4(a)) is similar to that in Figure 2(a). We have used trapezoidal fuzzy numbers (TrFNs) in accordance to previous studies about the probability ranges that human people associate with each linguistic expression (Bonissone and Decker 1985) (Figure 4(b)). Every row of the fuzzy matrix fulfills the condition of being a well-formed probability distribution in the sense stated in Section 2.4.
The code to compute the fuzzy stationary probabilities from this matrix is the following. Variable linguisticTransitions is a matrix of labels (strings) contained in the package. Its entries should match the names of allnumbers, which is a list of 'FuzzyNumber's. R> names(allnumbers) <-c("EU", "VLC", "SC", "IM", "MC", "ML", "EL") R> rownames(linguisticTransitions) <-robotStates R> colnames(linguisticTransitions) <-robotStates R> linear <-fuzzyStationaryProb(linguisticTransitions, list(verbose = TRUE, + regression = "linear", ncores = 4, fuzzynumbers = allnumbers), + step = 0.1) The code that plots the results ( Figure 5) is similar to the previous section so we will not reproduce it again. Notice the perfectly trapezoidal shape of the output fuzzy sets defined by State 12 x  Linear Figure 5: Fuzzy stationary distribution of the Markov chain of Figure 2, computed from user-specified fuzzy transition probabilities.
its α-cuts, mirroring the shape of the input probabilities. In this case, package markovchain cannot deal with uncertain transition probabilities directly, in absence of data.

On the reduction of uncertainty with more observations
It is clear that the longer we observe a chain, the more certain we are about its behavior. The causes of this intuitive idea can be mathematically explained as follows. When the number of observations increases, the simultaneous confidence intervals for multinomial proportions are smaller, representing a less uncertain value or, equivalently, a more accurate estimation. Since such intervals are indeed the α-cuts of our fuzzy transition matrix at the input, this means that the membership functions of our fuzzy transition probabilities are narrower and sharper in shape (they tend to be more singleton-like). Equivalently, as the intervals act as bound constraints for our optimization problems to compute the minimum and maximum stationary probabilities that are possible for each of the chain states, the feasible region of those optimization problems gets smaller and therefore, the minimum and the maximum found are closer to each other. This leads to smaller output α-cuts, which yields narrower, sharper fuzzy stationary probabilities which are also more singleton-like, mirroring the same phenomenon we have described for the input fuzzy transition probabilities.
In order to check this empirically, the following experiment has been designed. We have drawn a sequence of 1000 observations of the same Markov chain. From them, we take only the first 100 observations and use them to estimate the stationary probability of one of the states. After that, we do the same with the first 300 observations, then 500 and then the whole sequence of 1000 observations. The aim is to compare the uncertainty of the output fuzzy stationary probabilities obtained with every sequence. We have focused only on state 1, because the resulting fuzzy number in Figure 3 is quite wide so there is space for improvement. We have used a step size of 0.1 and piecewise linear fitting for the output membership function. The R code (run just after the code shown above) for this experiment is shown below.

R>
As a concluding remark concerning the uncertainty represented by a fuzzy number, it should be pointed out that, in some cases, fuzzy numbers are asymmetric as they tend to indicate where the true value could be located. This can be observed in some of the plots of Figure 3, especially in states 2 and 4. In both cases, the α-cuts have very large upper bounds at the base, indicating that values greater than the center are more likely than those smaller than it, i.e., the possibility that values located in the right side are the true stationary probability is greater. This is the kind of information that crisp numbers cannot provide.

Conclusions and further work
We have addressed the problem of estimating stationary probabilities of an unknown Markov chain from a sequence of observations. We first decided to estimate the transition probabilities by using fuzzy numbers, and carry out all the computations with them, to obtain fuzzy stationary probabilities. Our work serves as a proof of concept of the method proposed in Buckley (2005) and, at the same time, provides the first freely available, ready-to-use implementation of a fuzzy Markov chain estimation procedure in a widely extended programming environment, namely the R environment for statistical computing and graphics.
The R package developed for this purpose, FuzzyStatProb, has a number of advantages. The main of them is the ease of use: It just requires to have a sequence of observations represented as integers. In addition, it provides great flexibility as it allows the user to specify the kind of regression to be used for the output membership functions, and it can run in parallel, thus exploiting the computational facilities of modern multi-core architectures of conventional computers without any additional software requirement for parallelism. The results rely on the FuzzyNumbers package that has been recently published, is under active development and will most likely be adopted by the fuzzy community working on the R implementation of other fuzzy tools and methods.
The implementation has demonstrated the usefulness of the method, and the advantages of having fuzzy estimations for stationary probabilities. Fuzzy numbers are able to better capture uncertainty on the output, and this is indicated by the asymmetric, wider α-cuts. Defuzzification can better approximate the true crisp stationary probabilities. Furthermore, the method paves the way to the computation of linguistic stationary probabilities, which are more intuitive than crisp values and may be in turn the only type of output suitable when the input information about the chain is given in a linguistic manner, which is inherently uncertain.
Further work in this line may include the implementation of other fuzzy Markov chain approaches, mainly those based on fuzzy relations for the transition matrix, and establishing a comparison in the computational effort, usefulness and meaningfulness of the results.

24
FuzzyStatProb: Fuzzy Stationary Probability Estimation in Markov Chains