BFDA: A Matlab Toolbox for Bayesian Functional Data Analysis

We provide a MATLAB toolbox, BFDA, that implements a Bayesian hierarchical model to smooth multiple functional data with the assumptions of the same underlying Gaussian process distribution, a Gaussian process prior for the mean function, and an Inverse-Wishart process prior for the covariance function. This model-based approach can borrow strength from all functional data to increase the smoothing accuracy, as well as estimate the mean-covariance functions simultaneously. An option of approximating the Bayesian inference process using cubic B-spline basis functions is integrated in BFDA, which allows for efficiently dealing with high-dimensional functional data. Examples of using BFDA in various scenarios and conducting follow-up functional regression are provided. The advantages of BFDA include: (1) Simultaneously smooths multiple functional data and estimates the mean-covariance functions in a nonparametric way; (2) flexibly deals with sparse and high-dimensional functional data with stationary and nonstationary covariance functions, and without the requirement of common observation grids; (3) provides accurately smoothed functional data for follow-up analysis.


Introduction
Since Ramsay and Dalzell (1991) first coined the term "functional data analysis" (FDA) for analyzing data that are realizations of a continuous function, many statistical methods and tools have been proposed for FDA. For examples, Graves et al. (2010) provided both R package fda  and MATLAB package fdaM (Ramsay 2014) for standard functional data analysis (Ramsay andSilverman 2002, 2005); Febrero-Bande and de la Fuente (2012) provided R package fda.usc for implementing nonparametric functional data analysis methods (Vieu and Ferraty 2006) with fda ; Yao et al. (2005a,b) developed the key technique of Functional Principal Component Analysis (FPCA) for analyzing sparse functional data, accompanied by the MATLAB package PACE (Yao et al. 2015); Crainiceanu and Goldsmith (2010) proposed insights about implementing the standard Bayesian FDA using WinBUGS (Sturtz et al. 2005); and Shi and Cheng (2014) derived R package GPFDA for applying the Bayesian nonparametric Gaussian process (GP) regression models (Shi and Choi 2011). However, the smoothing step that constructs functions from noisy discrete data has been neglected by most of the existing FDA methods and tools, where functional representations are integrated in the analyzing models. On the other hand, most of the existing smoothing methods process functional samples individually (e.g., cubic smoothing spline (CSS) and kernel smoothing (Green and Silverman 1993;Ramsay and Silverman 2005)), which is likely to induce bias when functional samples are of the same distribution.
Here, we provide a MATLAB toolbox BFDA for simultaneously smoothing multiple functional observations from the same distribution and estimating the underlying mean-covariance functions, using a nonparametric Bayesian Hierarchical Model (BHM) with Gaussian-Wishart processes (Yang et al. 2016). This model-based approach borrows strength through modeling the shared mean-covariance functions, thus producing more accurate smoothing results than the individually smoothing methods (Yang et al. 2016). Moreover, BFDA is flexible for analyzing sparse and dense functional data without the requirement of common observation grids, suitable for analyzing functional data with both stationary and nonstationary covariance functions, and efficient for high-dimensional functional data using a Bayesian framework with Approximations by Basis Functions (BABF) (Yang et al. 2015). In addition, BFDA provides options for implementing the standard Bayesian GP regression method, conducting Bayesian principal component analysis, and using the fdaM package for follow-up FDA.
In the following context, we first review the BHM and BABF methods in Section 2, and then provide examples using BFDA with simulated data in Section 3. In Section 4, we compare the functional linear regression results by fdaM using the smoothed data by BFDA and CSS. Last, we conclude with a discussion in Section 5. Details of input options and outputs, as well as example MATLAB scripts of generating the simulation results in this paper, are provided in the Appendix.
The IWP prior on Σ Z (·, ·) models the covariance function nonparametrically and therefore makes the BHM method suitable for analyzing functional data with unknown stationary and nonstationary covariance structures.
The hyper parameter σ 2 s provides the flexibility of estimating the scale of the covariance structure in the IWP prior from the data.
For the hyper-parameter setup, we take µ 0 (·) as the smoothed empirical mean estimate, c as 1 for the same data variation on the mean function, δ as 5 for noninformative prior on the variance function, and determine (a , b , a s , b s ) by matching the hyper-prior moments with the empirical estimates. In addition, A(·, ·) can be taken as the Matérn correlation kernel for analyzing functional data with stationary covariance (default in BFDA), where d denotes the distance between two grid points, ρ is the scale parameter, ν is the order of smoothness, and K ν (·) is the modified Bessel function of the second kind; while as an smoothed empirical covariance estimate for analyzing functional data with nonstationary covariance.
Although the BHM is constructed with infinite-dimensional Gaussian-Wishart processes, practical posterior inference will be conducted in a finite manner, e.g., on the observation grids {t i }, pooled grid t = ∪ n i=1 t i , or other evaluation grids. For accommodating uncommon observation grids, especially sparsely observed data, BHM evaluates data functions and mean-covariance functions on the pooled grid, while estimating the unobserved functional data by conditioning on the observations (similarly as PACE).
s ) by the Monte Carlo Markov Chain (MCMC) algorithm (essentially a Gibbs sampler (Geman and Geman 1984)) as follows (refer to Yang et al. (2016) for details): Step 0: Set initial values. Set hyper-parameters (c, µ 0 , ν, ρ, a , b , a s , b s ). Take (µ, σ 2 ) as the empirical estimates, {Z t i } as the raw data {X t i }, and Σ Z as an identity matrix.
Step 1: Conditioning on {X t i } and (µ Z , Σ Z ), update {Z i (t)} from the corresponding conditional multivariate normal (MN) distributions; Step 2: Conditioning on {X t i } and {Z t i }, update the noise variance σ 2 from the conditional Inverse-Gamma (IG) distribution; Step 3: Conditional on {Z i (t)} and Σ Z , update µ Z from the conditional MN distribution; Step 4: Conditioning on {Z i (t)} and µ Z , update Σ Z from the conditional Inverse-Wishart (IW) distribution; Step 5: Conditioning on Σ Z , Update σ 2 s from the conditional Gamma distribution. Specifically, the averages of posterior samples of {Z i (t), µ Z , Σ Z } are taken as estimates for functional signals and mean-covariance functions.
In addition, BFDA uses existing MATLAB package mcmcdiag (Särkkä and Aki 2014) to diagnosis the MCMC convergence by potential scale reduction factor (PSRF) (Gelman and Rubin 1992), and implements the method proposed by Yuan and Johnson (2012) with the pivotal discrepancy measures (PDM) of standardized residuals for the goodness-of-fit diagnosis of the assumed model.

BABF
Because BHM (Yang et al. 2016) has computational complexity O(np 3 m) with n samples, p pooled-grid points, and m MCMC iterations, the method encounters computational bottleneck for analyzing functional data with large pooled-grid dimension p. To resolve this computational bottleneck issue with high-dimensional data, BFDA enables the option of using the BABF method proposed by Yang et al. (2015), in which the posterior inference in BHM is conducted with approximations using basis functions. Here, we briefly outline the BABF method.
First select a working grid based on data density, τ = (τ 1 , τ 2 , · · · , τ L ) ⊂ T , L << p, then approximate {Z i (τ )} by a system of basis functions (e.g., cubic B-splines). Let , a linear transformation of Z i (τ ). Note that even if B(τ ) is singular or non-square, ζ i can still be written as a linear transformation of Z i (τ ), e.g., using the generalized inverse (James 1978) of B(τ ).
Step 2: Conditioning on {ζ i }, update µ ζ and Σ ζ respectively from the conditional MN and IW distributions.
Step 3: Step 4: Conditioning on {Z t i } and {X t i }, update σ 2 by the conditional IG distribution; Step 5: Conditioning on Σ Z (τ , τ ), update σ 2 s by the conditional Gamma distribution.
As a result, the posterior estimates of ({ζ i }, µ ζ , Σ ζ ) are given by the averages of the MCMC samples, which are then used to estimate Step 3.

Basis-function construction
BFDA uses the existing MATLAB package bspline (Hunyadi 2010) to construct B-spline basis functions, using the optimal knot sequence for interpolation at the working grid τ . The optimal knot sequence is generated by the MATLAB function optknt (Gaffney and Powell 1976;Micchelli et al. 1976;de Boor 1977). Yang et al. (2015) instructed selecting τ to represent data densities (L maybe selected by grid search with test data), such as taking the 100 L+1 , · · · , L×100 L+1 percentiles of the pooled grid, or the equally-spaced grid for evenly distributed data.

Examples with simulated data
In this Section, we provide examples of using BFDA to analyze simulated functional data with stationary and nonstationary covariance functions, common and uncommon (sparse) observation grids, as well as random observation grids. The simulation data used for the example results were generated with n=30, p=40, au=0, bu=π/2, s= √ 5, r=2, nu=3.5, rho=0.5, dense=0.6, and pgrid as the equally spaced grid over (0, π/2) with length 40.
Let p denote the length of pgrid. The output GausFD_cgrid is a data structure consisted with a cell of true data (Xtrue cell 1×n ), a cell of noisy data (Xraw cell 1×n ), a cell of observation grids (Tcell 1×n ), a true covariance matrix on pgrid (Cov true p×p ), and a true mean vector on pgrid (Mean true 1×p ).

Common grids
First, we need to setup the required parameter structure by function setOptions_bfda. For example, to analyze functional data with common observation grids and stationary covariance function by BHM, the structure param can be set as > param = setOptions_bfda('smethod', 'bhm', 'cgrid', 1, 'mat', 1, 'M', 10000, 'Burnin', 2000, 'w', 1, 'ws', 1); where each parameter is followed by its value, and unspecified parameters are taken as default values (Appendix A.1.). Specifically, smethod='bhm' denotes using the BHM method; cgrid=1 denotes the analyzed data are of common-grid; mat=1 denotes taking A(·, ·) in Equation 1 as the Matérn correlation function; M=10000 denotes the number of MCMC iterations; Burnin=2000 denotes the number of MCMC burn-ins; w=1 and ws=1 are used to tune the Gamma priors for σ 2 and σ 2 s . With both param and GausFD_cgrid, we can then call the main function BFDA() by > [out_cgrid, param] = BFDA(GausFD_cgrid.Xraw_cell, GausFD_cgrid.Tcell, param); for smoothing and estimating the common-grid functional data by BHM. The output structure out_cgrid contains smoothed estimates for the signals (out_cgrid.Z), mean function (out_cgrid.mu), covariance function (out_cgrid.Sigma), and other parameters in Equation 1, along with the corresponding 95% point-wise credible intervals (Appendix A.1.). The output argument param is the updated parameter structure.

Example results
In Figure 1(a, b), we show that the smoothed signals by BHM (blue solid) are close to the truth (red dashed), and the coverage probabilities of the 95% point-wise credible intervals (blue dotted) are > 0.95, for both scenarios with common and uncommon grids. In addition, the nonparametric mean estimates by BHM (blue solid curves in Figure 1(c, d)) are also smooth and close to the truth (red dashed), while the corresponding 95% point-wise credible intervals (blue dotted) have coverage probabilities > 0.9. In addition, we show that the Bayesian nonparametric covariance estimates in Figure 2(a, b) are clearly smoother than the sample covariance estimate by using the raw common-grid data in Figure 2(c), and close to the true stationary covariance in Figure 2(d). Importantly, although 40% information is lost for the uncommon-grid scenario, BHM still produces good smoothing and estimation results.

Common grids
To apply BHM on functional data with nonstationary covariance function and common grids, e.g., GausFD_cgrid_ns generated by > GausFD_cgrid_ns = sim_gfd(pgrid, n, s, r, nu, rho, dense, cgrid, 0); the main function BFDA() can be called by Here, A(·, ·) in (2.1) is set as the empirical smooth covariance estimate (mat=0) that is given by PACE (Yao et al. 2005a(Yao et al. , 2015 with pace=1 (default), or by the sample covariance estimate using CSS smoothed data with pace=0.

Uncommon grids
If the nonstationary functional data are collected on uncommon (sparse) grids, e.g., GausFD_ucgrid_ns generated by > GausFD_ucgrid_ns = sim_gfd(pgrid, n, s, r, nu, rho, dense, 0, 0); we only need to set cgrid=0, mat=0 in the parameter structure for the common-grid scenario and then call the main function BFDA() by

Example results
Similarly, as shown in Figures 3 and 4, the BHM estimates of signals and mean-covariance functions are close to the truth. Specifically, the 95% pointwise credible intervals of the BHM signal estimates have coverage probabilities > 0.95. Although BHM overestimated the covariance, BHM captured the major covariance structure and produced a smoothed estimate. The magnitude of the BHM estimate can be tuned by ws, where a smaller ws will relatively shrink the magnitude of BHM covariance estimate. We suggest users to tune this parameter according to the magnitude of sample covariance estimate.

Analyze functional data by BABF
BFDA also provides the convenience to simulate stationary and nonstationary functional data with random observation grids from the same GPs as in Section 3.1. For example, a structure of functional data GausFD_rgrid, with n independent observations and p random grids per observation (uniformly generated from [au, bu], can be generated by > GausFD_rgrid = sim_gfd_rgrid(n, p, au, bu, s, r, nu, rho, stat); where stat=1 specifies simulating from the stationary GP, while stat=0 specifies simulating from the nonstationary GP.

Stationary data
To analyze stationary functional data by BABF, simply call the main function BFDA() by: where the working grid τ will be set as the equally spaced quantiles of the pooled grid by default, with length m (when tau is not initialized in param_rgrid).

Example results
With random observation grids, the BABF method can efficiently analyze both stationary and nonstationary functional data, producing smooth estimates for signals and mean-covariance functions that are close to the truth (Figures 5, 6). Specifically, the 95% pointwise credible intervals of signal estimates have coverage probabilities > 0.95.

Functional linear regression
We expect follow-up FDA results will be improved by using the accurately smoothed functional data produced by BFDA. Specifically, we show examples of functional linear regression in this Section, considering the following two models, where • Y in Equation 4 denotes a n × 1 vector of scalar responses; Y (t) = (y 1 (t), · · · , y n (t)) in Equation 5 denotes a vector of functional responses; • X(t) denotes a n × q design matrix of q functional independent variables; • β(t) denotes a q × 1 vector of coefficient functions for independent variables; • β 0 and β 0 (t) denote the intercept terms; • and (t) denote the error terms.
Note that X(t) and β(t) can also denote nonfunctional covariates and coefficients, because nonfunctional variables can be thought as constant functions of t.
Because the functional regression function fRegress() in fdaM requires inputs of functional data with common grids, we interpolated the simulated true data, smoothed data by BABF with BFDA, and the raw data with noises on the equally spaced common grid (with length 40) over [0, π/2], by cubic smoothing spline (CSS, using the function csaps() with the suggested optimal smoothing parameter 1). As a result, the interpolated signals from the raw data are equivalent to the individually smoothed ones by CSS (one example curve is shown in Figure  7(a)).
With the smoothed data by BABF and CSS, we respectively fitted the functional linear models (Equations 4 and 5) using 20 randomly chosen signals, and then tested the prediction results using the remains. We replicated this fitting process for 100 times, and evaluated the performance by the average mean square errors (MSEs) of the fitted and predicted responses.

Results with scalar responses
For the case with scalar responses, although the fitted coefficient functions using both smoothed data by BABF and CSS are close to the truth (Figure 7(b, c)), with coverage probabilities > 0.95 for the 95% confidence intervals, the average MSEs of the fitted and predicted responses from 100 replications are smaller for using the BABF smoothed data than the ones using the CSS smoothed data (0.311 vs. 0.388 for fitted responses, 0.497 vs. 0.677 for predicted responses, as shown in Table 1). Figure 8 shows the results of an example replication of this fitting and predicting process with scalar responses.

Results with functional responses
For the case with functional responses, we can see that the fitted intercept term using the BABF smoothed data is closer to the truth (constant 0) with narrower 95% confidence interval than the one using the CSS smoothed data (Figure 9(a, c)). In addition, the coefficient function using BABF smoothed data has narrower 95% confidence interval but higher coverage probability (Figure 9(b, d) Figure 10 shows the results of an example replication of this fitting and predicting process with functional responses.    Figure 9: (a) Intercept function β 0 (t) using the BABF smoothed data; (b) Intercept function β 0 (t) using the CSS smoothed data; (c) coefficient function β(t) using the BABF smoothed data; (d) coefficient function β(t) using the CSS smoothed data; along with 95% confidence intervals and true coefficient functions in the cyan dotted lines.

Discussion
The MATLAB tool BFDA presented in this paper can simultaneously smooth multiple functional observations and estimate the mean-covariance functions, assuming the functional data are from the same GP. The smoothed data by BFDA are shown to be more accurate than the conventional individual smoothing methods, thus improving follow-up analysis results. The advantages of BFDA include: • Simultaneously smoothing multiple functional samples and estimating mean-covariance functions in a nonparametric way; • Flexibly handling functional data with stationary and nonstationary covariance functions, common or uncommon (sparse) observation grids; • Efficiently dealing with high-dimensional functional data by the BABF method.
BFDA is suitable for analyzing data that can be roughly assumed as from the same GP distribution. We recommend using the BHM method for low-dimensional functional data with common grids or sparse functional data, and using the BABF method for high-dimensional data with dense grids (including both common and random grids). In addition, we recommend using the Matérn function as the prior covariance structure for analyzing functional data with stationary covariance functions, while using the empirical covariance estimate (e.g., the estimate by PACE is recommended) for analyzing functional data with nonstationary covariance functions.
The follow-up functional data analysis can be conducted using the existing softwares (e.g., fdaM in MATLAB, fda in R). Examples are provided in BFDA about using fdaM with the smoothed data by BFDA, showing improved regression results than using the individually smoothed data. Details about the inputs and outputs of BFDA, and part of the example MATLAB scripts are provided in the Appendices. The BFDA tool and example scripts are freely available at https://github.com/yjingj/BFDA. We will continue integrating more options of basis functions, Bayesian functional data regression using GPs, and functional classification into BFDA.
smethod, specifying the method used for analyzing the functional data. Default value is 'babf' for BABF method with basis function approximation; other choices are 'bhm' for BHM method without basis function approximation, 'bgp' for standard Bayesian GP regression, 'bfpca' for Bayesian principal components analysis; -Burnin, the number of burn-ins for the MCMC algorithm. Default value is 2000; -M, the number of iterations for the MCMC algorithm. Default value is 10000; cgrid, set as 1 if the functional data are observed on a common-grid, otherwise set as 0 for uncommon or random grids. Default value is 1; -Sigma_est, estimated smooth covariance matrix from previous analysis. Default is empty and will be estimated by PACE or sample estimate from individually smoothed data; -mu_est, estimated smooth functional mean from previous analysis. Default is empty and will be set as the smoothed sample mean; mat, set as 1 to use the Mateŕn covariance function as prior structure for stationary functional data; set as 0 to use the empirical covariance estimate Sigma_est as the prior structure for nonstationary functional data. Default value is 1; nu, order of smoothness for the Mateŕn covariance function. Default is empty and will be estimated based on Sigma_est; delta, shape parameter δ of the IWP. Default is 5 for a non-informative prior; c, determining the prior covariance for functional mean. Default is 1; w, ws, determining the prior gamma distributions for σ 2 and σ 2 s . Defaults are w=1, ws=0.1. The parameter ws should be tuned for a proper magnitude of the posterior covariance estimate; pace, if Sigma_est and mu_est are empty, set pace=1 to obtain Sigma_est and mu_est by PACE, and set pace=0 to use the empirical estimates from the individually smoothed data by CSS. Default is 1; m, tau, working grid tau is only required for 'babf' method. Default is empty and will be set up as the (0 : 100 m−1 : 100) percentiles of the pooled observation grid with length m; -eval_grid, evaluation grid for all functional estimates, only required for 'babf' methods; -lamb_min, lamb_max, lamb_step, determining the smoothing parameter candidates for general cross validation of the CSS method. Defaults are lamb_min=0.9, lamb_max=0.99, lamb_step=0.01; a, b, hyper parameters for the gamma distributions in 'bgp', and 'bfpca'.
-resid_thin, determine the MCMC thinning steps of the residuals that are used to test the goodness-of-fit of the model. Default is 10.

A.2. Output variables
The main function BFDA() has two output arguments, one structure outputted by the specified method, and the other parameter structure as specified by setOptions_bfda() containing updated parameter values.
Output structure with smethod = 'bhm': • Z, Z_CL, Z_UL, smoothed functional data, lower and upper 95% credible intervals; • Sigma, Sigma_CL, Sigma_UL, functional covariance estimate, lower and upper 95% credible intervals; • Sigma_SE, the empirical covariance estimate by using the smoothed data Z; • mu, mu_CI, functional mean estimate, 95% credible intervals; • rn, rn_CI, estimate and 95% credible interval for the noise precision; • rs, rs_CI, estimate and 95% credible interval for σ 2 s ; • rho, nu, estimated parameter values for the Matérn function; • residuals, MCMC samples of the residuals that are used to test the goodness-of-fit; • pmin_vec, p-values for testing the goodness-of-fit for all functional samples. P-value > 0.25 suggests no evidence of model inadequacy; 0.05 < p-value < 0.25 suggests some evidence of model inadequacy; p-value < 0.05 suggests strong evidence of model inadequacy.