NeuralSens: Sensitivity Analysis of Neural Networks

Neural networks are important tools for data-intensive analysis and are commonly applied to model non-linear relationships between dependent and independent variables. However, neural networks are usually seen as"black boxes"that offer minimal information about how the input variables are used to predict the response in a fitted model. This article describes the \pkg{NeuralSens} package that can be used to perform sensitivity analysis of neural networks using the partial derivatives method. Functions in the package can be used to obtain the sensitivities of the output with respect to the input variables, evaluate variable importance based on sensitivity measures and characterize relationships between input and output variables. Methods to calculate sensitivities are provided for objects from common neural network packages in \proglang{R}, including \pkg{neuralnet}, \pkg{nnet}, \pkg{RSNNS}, \pkg{h2o}, \pkg{neural}, \pkg{forecast} and \pkg{caret}. The article presents an overview of the techniques for obtaining information from neural network models, a theoretical foundation of how are calculated the partial derivatives of the output with respect to the inputs of a multi-layer perceptron model, a description of the package structure and functions, and applied examples to compare \pkg{NeuralSens} functions with analogous functions from other available \proglang{R} packages.

Machine-learning algorithms are commonly used in data-intensive analysis (Hastie, Tibshirani, and Friedman (2001), Butler, Davies, Cartwright, Isayev, and Walsh (2018), Vu, Adalı, Ba, Buzsáki, Carlson, Heller, Liston, Rudin, Sohal, Widge, Mayberg, Sapiro, and Dzirasa arXiv:2002.11423v2 [cs.LG] 8 Feb 2021 (2018)), as they are able to detect patterns and relations in the data without being explicitly programmed. Artificial Neural Networks (ANN) are one of the most popular machinelearning algorithms due to their versatility. ANNs were designed to mimic the biological neural structures of animal brains (McCulloch and Pitts (1943)) by "learning" to perform tasks by considering examples and modifying their structure through iterative algorithms (Rojas (1996)). The form of ANN that is discussed in this paper is the feed-forward multilayer perceptron (MLP) (Rumelhart, Hinton, and Williams (1986)). MLPs are one of the most popular form of ANNs and have been used in a wide variety of applications (Mosavi, Salimi, Faizollahzadeh Ardabili, Rabczuk, Shamshirband, and Varkonyi-Koczy (2019), Smalley (2017), Hornik, Stinchcombe, and White (1989)). This model consists of interconnected units, called nodes or perceptrons, that are arranged in layers. The first layer consists of inputs (or independent variables), the final layer is the output layer, and the layers in between are known as hidden layers (Özesmi and Özesmi (1999)). Assuming that there is a relationship between the outputs and the inputs, the goal of the MLP is to approximate a non-linear function to represent the relationship between the output and the input variables of a given dataset with minimal residual error (Hornik (1991), Cybenko (1989)).
Neural networks provide predictive advantages when compared to other models, such as the ability to implicitly detect complex non-linear relationships between dependent and independent variables. However, the complexity of neural networks makes it difficult to obtain information on how the model uses the input variables to predict the output. Finding methods for extracting information on how the input variables affect the response variable has been a recurrent topic of research in neural networks (Olden, Joy, and Death (2004), Zhang, Beck, Winkler, Huang, Sibanda, and Goyal (2018)). Some examples are: 1. Neural Interpretation Diagram (NID) as described by Özesmi and Özesmi (1999) for plotting the ANN structure. A NID is a modified version of the standard representation of neural networks which changes the color and thickness of the connections between neurons based on the sign and magnitude of its weight.
2. Garson's method for variable importance (Garson 1991). It consists of summing the product of the absolute value of the weights connecting the input variable to the response variable through the hidden layer. Afterwards, the result is scaled relative to all other input variables. The relative importance of each input variable is given as a value from zero to one.
3. Olden's method for variable importance (Olden et al. 2004). This method is similar to Garson's, but it uses the real value instead of the absolute value of the connection weights and it does not scale the result.
4. Input Perturbation (Scardi and Harding (1999), Gevrey, Dimopoulos, and Lek (2003)). It consists of adding an amount of white noise to each input variable while maintaining the other inputs at a constant value. The resulting change in a chosen error metric for each input perturbation represents the relative importance of each input variable.
5. Profile method for sensitivity analysis (Lek, Delacoste, Baran, Dimopoulos, Lauga, and Aulagnier (1996)). Similar to the Input Perturbation algorithm, it changes the value of one input variable while maintaining the other variables at a constant value.
These constant values are different quantiles of each variable, therefore a plot of model predictions across the range of values of the input is obtained. A modification of this algorithm is proposed in Beck (2018). To avoid unlikely combinations of input values, a clustering technique is applied to the training dataset and the center values of the clusters are used instead of the quantile values.
6. Partial derivatives method for sensitivity analysis (Dimopoulos, Bourret, and Lek (1995), Dimopoulos, Chronopoulos, Chronopoulou-Sereli, and Lek (1999), Muñoz and Czernichow (1998), White and Racine (2001)). It performs a sensitivity analysis by computing the partial derivatives of the ANN outputs with regard to the input neurons evaluated on the samples of the training dataset (or an analogous dataset).
7. Partial dependence plot (PDP) (Friedman (2001), Goldstein, Kapelner, Bleich, and Pitkin (2015)). PDPs help visualize the relationship between a subset of the input variables and the response while accounting for the average effect of the other inputs. For each input, the partial dependence of the response with regard to the selected input is calculated following two steps. Firstly, individual conditional expectation (ICE) curves are obtained, one for each sample of the training dataset. The ICE curve for sample k is built by obtaining the model response using the input values at sample k, except for the input variable of interest, whose value is replaced by other values it has taken in the training dataset. Finally, the PDP curve for the selected variable is calculated as the mean of the ICE curves obtained.
The complex neural network model is explained by approximating it locally with an interpretable model, such as a linear regression or a decision tree model.

9
. Forward stepwise addition (Gevrey et al. (2003)). It consists of rebuilding the neural network by sequentially adding an input neuron and its corresponding weights. The change in each step in a chosen error metric represents the relative importance of the corresponding input.
10. Backward stepwise elimination (Gevrey et al. (2003)). It consists of rebuilding the neural network by sequentially removing an input neuron and its corresponding weights. The change in each step in a chosen error metric represents the relative importance of the corresponding input.
These methods help with neural network diagnosis by retrieving useful information from the model. However, these methods have some disadvantages: NID can be difficult to interpret given the amount of connections in most networks, Garson's and Olden's algorithms only account for the weights of the input variable connections in the hidden layer, and Lek's profile method may present analyses of input scenarios not represented by the input training data or require other methods like clustering (using the center of the clusters instead of the range quantiles of the input variables) with its inherent disadvantages (Xu and Tian (2015)). Partial dependence plots have a similar disadvantage as they might provide misleading information if the value of the output variable depends not only on the variable of interest but also on compound effects of input variables. Local linearization is useful for interpreting the input variable importance in specific regions of the dataset, but it does not give a quantitative importance measure for the entire dataset. Forward stepwise addition and backward stepwise elimination perform a more exhaustive analysis, but are computationally expensive and may produce different results based on the order in which the inputs are added/removed and the initial training conditions of each model.
The partial derivatives method overcomes these disadvantages by analytically calculating the derivative of each output variable with regard to each input variable evaluated on each data sample of a given dataset. The contribution of each input is calculated in both magnitude and sign taking into account not only the connection weights and the activation functions, but also the values of each input. By using all the samples of the dataset, the effect of the input variables in the response is calculated for the real values of the data, avoiding information loss due to clustering. Analytically calculating the derivatives results in more robust diagnostic information, because it depends solely on how well the neural network predicts the output. As long as the neural network predicts the output variable with enough precision, the derivatives will be the same regardless of the training conditions and the structure of the network (Beck 2018).
As stated before, the main objective of the proposed methods is to extract information from a given neural network model. For example, unnecessary inputs may lead to a higher complexity of the neural structure and prevent finding the optimal model, thus, affecting the performance of the neural network. Several researchers defend the ability of the partial derivatives method to determine whether an explanatory variable is irrelevant for predicting the response of the neural network (White and Racine (2001), Zurada, Malinowski, and Cloete (1994), Engelbrecht, Cloete, and Zurada (1995)). Pruning the neural network of these irrelevant inputs improves the capability of the neural network to model the relationship between response and explanatory variables and, consequently, the quality of information that can be extracted from the model.
Using the partial derivatives method has some disadvantages that should be noted. The operations required to calculate partial derivatives are time-consuming when compared to other methods such as Garson's and Olden's. The computing time grows as the size of the neural network or the size of the database used to calculate the partial derivatives increases.
Additionally, the input variables should be normalized when using this method, as otherwise the value of the partial derivatives may depend on the scale of each variable and produce misleading results. However, its advantages with regard to other methods make sensitivity analysis a very useful technique for interpreting and improving neural network models.
This article describes the NeuralSens package (Pizarroso, Portela, and Muñoz (2019)) for R (R Core Team (2017)) which can be used to perform sensitivity analysis of MLP neural networks using partial derivatives. The main function of the package includes methods for MLP objects from the most popular neural network packages available in R. To the authors' knowledge, there is no other R package that calculates the partial derivatives of a neural network. The NeuralSens package is available at the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=NeuralSens, and the development version is maintained as a GitHub repository at https://github.com/JaiPizGon/NeuralSens. It should be mentioned that other algorithms to analyze neural networks are already implemented in R: NID, Garson's, Olden's and Lek's profile algorithms are implemented in NeuralNetTools (Beck (2018)), the partial dependence plots method is implemented in pdp (Greenwell (2017)) and local linearization is implemented in lime (Pedersen and Benesty (2019)).
The rest of this article is structured as follows. Section 2 describes the theory of the functions in the NeuralSens package, along with references to general introductions to neural networks. Section 3 presents the architecture details of the package. Section 4 shows applied examples for using the NeuralSens package, comparing the results with packages currently available in R. Finally, Section 5 concludes the article.

Theoretical foundation
The NeuralSens package has been designed to calculate the partial derivatives of the output with regard to the inputs of a MLP model in R. The remainder of this section explains the theory of multilayer perceptron models, how to calculate the partial derivatives of the output of this type of model with regard to its inputs and some sensitivity measures proposed by the authors.

Multilayer perceptron
A fully-connected feed-forward MLP has one-way connections from the units of one layer to all neurons of the subsequent layer. Each time the output of one unit travels along one connection to another unit, it is multiplied by the weight of the connection. At each unit the inputs are summed and a constant, or bias, is added. Once all the input terms of each unit are summed, an activation function is applied to the result. Figure 1 shows the scheme of a neuron in a MLP model and represent graphically the operations in Equation (1).
For each neuron, the output y l k of the k th neuron in the l th layer can be calculated by: where z l k refers to the weighted sum of the neuron inputs, n l−1 refers to the number of neurons in the (l − 1) th layer, w l kj refers to the weight of the connection between the j th neuron in the (l − 1) th layer and the k th neuron in the l th layer, φ l k refers to the activation function of the k th neuron in l th layer, b l refers to the bias in the l th layer and · refers to the scalar product operation. For the input layer thus holds l = 1, y 1−1 j = x j , w 1 kj = 1 and b 1 = 0. Figure 2 can be treated as a general MLP model. A MLP can have L layers, and each layer l (1 l L) has n l (n l 1) neurons. n 1 stands for the input layer and n L for the output layer. For each layer l the input dimension is equal to the output dimension of layer (l − 1). For a neuron i (1 i n l ) in layer l, its input vector, weight vector and output are yb l−1 = b l , y l−1 1 , · · · , y l−1 n l−1 , w l i = w l i0 , w l i1 , · · · , w l in l−1 and y l i = φ l i z l i = φ l i yb l−1 · w l i respectively, where φ l i : R → R refers to the neuron activation function and · refers to the matrix multiplication operator. For each layer l, its input vector is yb l−1 , its weight matrix is W l = w l 1 · · · w l n l and its output vector is y l = y l i , · · · , y l n l = Φ l z l = Φ l yb l−1 · W l , where Φ l : R n l → R n l is a vector-valued function defined as Φ l (z) = (φ l 1 (z 1 ), · · · , φ l n l (z n l )). Weights in the neural structure determine how the information flows from the input layer to the output layer. Identifying the optimal weights that minimize the prediction error of a dataset is called training the neural network. There are different algorithms to identify these weights, being the most used the backpropagation algorithm described in Rumelhart et al. (1986). Explaining these training algorithms are out of the scope of this paper.

Partial derivatives
The sensitivity analysis performed by the NeuralSens package is based on the partial derivatives method. This method consists in calculating the derivative of the output with regard to the inputs of the neural network. These partial derivatives are called sensitivity, and are defined as: where x n refers to the n sample of the dataset used to perform the sensitivity analysis and s ik xn refers to the sensitivity of the output of the k th neuron in the output layer with regard to the input of the i th neuron in the input layer evaluated in x n . We calculate these sensitivities applying the chain rule to the partial derivatives of the inner layers (derivatives of Equation (1) for each neuron in the hidden layers). The partial derivatives of the inner layers are defined following the next equations: • Derivative of z l k (input of the k th neuron in the l th layer) with regard to y l−1 i (output of the i th neuron in the (l −1) th layer). This partial derivative corresponds to the weight of the connection between the k th neuron in the l th layer and the i th neuron in the (l − 1) th layer: • Derivative of y l k (output of the the k th neuron in the l th layer) with regard to z l i (input of the i th neuron in the l th layer): where ∂φ l k ∂z l i refers to the partial derivative of the activation function of the k th neuron in the l th layer with regard to the input of the k th neuron in the l th layer evaluated for the input z l i of the i th neuron in the l th layer.
Equations (3) and (4) have been implemented in the package in matrix form to reduce computational time following the next equations: where W * l is the reduced weight matrix of the l th layer and J l l is the Jacobian matrix of the outputs in the l th layer with respect to the inputs in the l th layer.
Following the chain rule, the Jacobian matrix of the outputs in the l th layer with regard to the inputs in the (l − j) th layer can be calculated by: where 1 k (l − 1) and 2 l L. Using this equation with l = L and j = 1, the partial derivatives of the outputs with regard to the inputs of the MLP are obtained.

Sensitivity measures
Once the sensitivity has been obtained for each variable and observation, different measures can be calculated to analyze the results. The authors propose the following sensitivity measures to summarize the information obtained by evaluating the sensitivity of the outputs for all the input samples X n of the provided dataset: • Mean sensitivity of the output of the k th neuron in the output layer with regard to the i th input variable: where N is the number of samples in the dataset.
• Sensitivity standard deviation of the output of the k th neuron in the output layer with regard to the i th input variable: where N is the number of samples in the dataset and σ refers to the standard deviation function.
• Mean squared sensitivity of the output of the k th neuron in the output layer with regard to the i th input variable (Yeh and Cheng (2010), Zurada et al. (1994)): where N is the number of samples in the dataset.
In case there are more than one output neuron, such as in a multi-class classification problem, these measures can be generalized to obtain sensitivity measures of the whole model as follows: • Mean sensitivity with regard to the i th input variable: • Sensitivity standard deviation with regard to the i th input variable: • Mean squared sensitivity with regard to the i th input variable (Yeh and Cheng 2010): Methods in NeuralSens to calculate the sensitivities of a neural network and the proposed sensitivities measures were written for several R packages that can be used to create MLP neural networks: class 'nn' from neuralnet package (Fritsch, Guenther, and Wright (2019)), class 'nnet' from nnet package (Venables and Ripley (2002)), class 'mlp' from RSNNS (Bergmeir and Benítez (2012)), classes 'H2ORegressionModel' and 'H2OMultinomialModel' from h2o package (LeDell, Gill, Aiello, Fu, Candel, Click, Kraljevic, Nykodym, Aboyoun, Kurka, and Malohlava (2019)), 'list' from neural package (Nagy (2014)) and classnnetar from forecast package (Hyndman and Khandakar (2008)

Package structure
The functionalities of the package NeuralSens is based on the new R class 'SensMLP' defined inside the package itself. NeuralSens includes four main functions based on this class to perform the sensitivity analysis of a MLP model described in the previous section: • SensAnalysisMLP(): S3 method to perform the sensitivity analysis using partial derivatives of the outputs with regard to the inputs of the MLP model. This function returns a 'SensMLP' object with the results of the sensitivity analysis.
• SensFeaturePlot(): graphically represent the relation between the sensitivities of a 'SensMLP' object and the value of the input variables.
• SensTimePlot(): graphically represent the evolution among time of the sensitivities of a 'SensMLP' object.
Each of these functions are detailed in the rest of this section. The output of the last three functions are plots created with ggplot2 package functions (Wickham 2016).

The R class 'SensMLP'
The NeuralSens package defines an S3 object called 'SensMLP' as a list with the following components: • sens: 'list' of 'data.frames', one per neuron in the output layer, with the S avg ik , S sd ik and S sq ik sensitivity measures described in Section 2.3 (Equations (8), (9) and (10)). Each row of the data.frame contains the sensitivity measures with regard to a specific input.
• raw_sens: 'list' of 'matrixes', one per neuron in the output layer, with the sensitivities calculated following Equation (7) with l = 1 and L = L. Each column of each 'matrix' contains the sensitivities of the output with regard to a specific input and each row contains the sensitivities with regard to all the inputs corresponding to the same row of the trData component.
• mlp_struct: 'numeric' 'vector' indicating the number of neurons in each layer of the MLP model.
• trData: typically a 'data.frame' which contains the dataset used to calculate the sensitivities.
• coefnames: 'character' 'vector' with the names of the input variables of the MLP model.
• output_name: 'character' 'vector' with the names of the output variables of the MLP model.
Functions described in Sections 3.3 (SensitivityPlots()), 3.4 (SensTimePlot()) and 3.5 (SensFeaturePlot()) can be accessed through the plot method of the 'SensMLP' object. print() and summary() methods are also available for obtaining information on the sensitivities and sensitivity measures of the 'SensMLP' object. Examples of these methods are presented in the remaining sections.

MLP Sensitivity Analysis
The SensAnalysisMLP() function calculates the partial derivatives of a MLP model. This function consists of an 'S3' method (Chambers and Hastie 1992) to extract the basic information of the model (weights, structure and activation functions) based on the model class attribute and to pass this information to the default method. This default method calculates the sensitivities of the model as described in Section 2, and creates a SensMLP object with the result of the sensitivity analysis. SensAnalysisMLP() function performs all operations using matrix calculus to reduce the computational time.
In the current version of NeuralSens (version 1.0.0), the accepted activation functions are shown in Table 1. To calculate the sensitivities, the function assumes that all the neurons in a defined layer has the same activation function.

Name
Function Derivative  (1) and z refers to the vector of input values of the neuron layer.
In order to show how SensAnalysisMLP() is used, we use a simulated dataset to train an MLP model of class 'nn' (RSNNS). The dataset consists of a 'data.frame' with 1500 rows of observations and four columns for three input variables (X1, X2, X3) and one output variable (Y). The input variables are random observations of a normal distribution with zero mean and standard deviation equal to 1. The output Y is created following Equation (14) based on X 1 and X 2 : where ε is random noise generated using a normal distribution with zero mean and standard deviation equal to 1. X 3 is given to the model for training and a proper fitted model would find no relation between X 3 and Y .
?NeuralSens::simdata can be executed to obtain more information about the data. The library is loaded by executing the following code:

R> library("NeuralSens")
To test the functionality of the SensAnalysisMLP() function, mlp() function from RSNNS package trains a neural network model using the simdata dataset.
Sensitivity measures of each output: $Y mean std meanSensSQ X1 -0.005406908 1.94524276 1.94476390 X2 -0.485564931 0.06734504 0.49021056 X3 -0.003200699 0.02971083 0.02987535 summary() method prints the sensitivity measures of the output with regard to the inputs of the model. These measures are calculated using the sensitivities displayed when using the print() method described below. The mean column (S avg ik ) shows the mean effect of the input variable on the output. The std column (S sd ik ) shows the variance of the input variable's effect on the output along the input space. These columns provide information on the relation between inputs and output variables: • If both mean (S avg ik ) and std (S sd ik ) are near zero, it indicates that the output is not related to the input, because for all the training data the sensitivity of the output with regard to that input is approximately zero.
• If mean (S avg ik ) is different from zero and std (S sd ik ) is near zero, it indicates that the output has a linear relationship with the input, because for all the training data the sensitivity of the output with regard tothat input is approximately constant.
• If std (S sd ik ) is different from zero, regardless of the value of mean (S avg ik ), it indicates that the output has a non-linear relationship with the input, because the relation between the output and the input vary depending on the value of the input.
Setting an upperbound for std to be considered close to zero so that the relationship between output and input can be considered as linear is a non-trivial endeavor. The authors are working on a statistic to test whether the functional relationship between an input and an output variable can be considered linear and, if successful, it will be included in a future version of the package.
In the example, the mean and std values show: • X 1 has mean ≈ 0 and standard deviation ≈ 2. This means it has a non-constant, i.e., non-linear effect on the response variable.
• X 2 has mean ≈ 0.5 and standard deviation ≈ 0. This means it has a constant, i.e., linear effect on the response variable.
• X 3 has mean ≈ 0 and standard deviation ≈ 0. This means it has no effect on the response variable.
An input variable may be considered significant if their sensitivities s ik x j are significantly different from zero, whether they are positive or negative. In other words, a variable is considered to be significant when changes in the input variable produce significant changes in the output variable of the model. White and Racine (2001) conclude that the statistic ik is a measure of the changes in the output due to local changes in the input. Thus, S sq ik can be defined as a measure of the importance of the input variables from a perturbation analysis point of view, in the sense that small changes in that input will produce larger changes in the output.
'SensMLP' class has also a print() method to show the sensitivities of the output with regard to the inputs evaluated in each of the rows of the trData component of the sens object. A second argument n may be used to specify how many rows to display (by default n = 5).

Visualizing Neural Network Sensitivity Measures
Sensitivity measures of the output variables are useful for quantitative analysis. However, it can be difficult to compare sensitivity metrics when a large number of input variables are used. In order to visualize information on the calculated sensitivities, the authors propose the following plots: 1. Label plot representing the relationship between S avg ik (x-axis) and S std k (y-axis).
2. Bar plot that shows S sq k for each input variable. 3. Density plot that shows the distribution of output sensitivities with regard to each input (Muñoz and Czernichow (1998)): • The narrow distribution of sensitivity values for X2 (corresponding to a constant sensitivity) indicates a linear relationship between this input and the output of the neural net.
• The wide distribution of sensitivity values for X1 (corresponding to a variable sensitivity) indicates a non-linear relationship between this input and the output.
When the height of at least one of the distributions is greater than 10 times the height of the smallest distribution, then an extra plot is created using the facet_zoom() function of the ggforce package (Pedersen and RStudio (2019)). These plots provides a better representation of the sensitivity distributions.
These plots can be obtained using the SensitivityPlots() function and a 'SensMLP' object calculated using SensAnalysisMLP(). To obtain the plots of Figure 3:

R> SensitivityPlots(sens)
Or they can be generated using the plot() method of the 'SensMLP' object:

R> plot(sens)
In this case, the first plot of Figure 3 shows that Y has a negative linear relationship with X2 (std ≈ 0 and mean < 0), no relationship with X3 (std ≈ 0 and mean ≈ 0) and a non-linear relationship with X1 (std different from 0). The second plot shows that X3 barely affects the response variable, being X1 and X2 the inputs with most effect on the output.

Visualizing Neural Network Sensitivity over time
A common application of neural networks is time series forecasting. Analyzing how sensitivities evolve over time can provide a better understanding of the effect of explanatory variables on the output variables.
SensTimePlot() returns a sequence plot of the raw sensitivities calculated by the function SensAnalysisMLP(). The x-axis is related to a numeric or Posixct/Posixlt variable containing the time information of each sample. The y-axis is related to the sensitivities of the output with regard to each input.
In order to show how this function can be used, the DAILY_DEMAND_TR dataset is used to create a model of class 'train' from caret package (Kuhn et al. (2020)). This dataset is similar to the elecdaily dataset from fpp2 R package (Hyndman (2018)). However, DAILY_DEMAND_TR contains almost five years of daily data (elecdaily only one year), which makes it more suitable for training a neural network. It is composed of the following variables: • DATE: date of the sample, one per day from July, 2nd 2007 to November, 30th 2012.
• TEMP: mean daily temperature in ºC in Madrid, Spain.
• WD: working day, continuous parameter which represents the effect on the daily consumption of electricity as a percentage of the expected electricity demand of that day with regard to the demand of the reference day of the same week Moral-Carcedo and Vicéns-Otero (2005). In this case, Wednesday is the reference day (WD W ed ≈ 1).
• DEM: total daily electricity demand in GWh for Madrid, Spain.
The following code creates the plot in Figure 4: R> library("ggplot2") R> ggplot(DAILY_DEMAND_TR) + geom_point(aes(x = TEMP, y = DEM)) Figure 4 shows the relationship between the electricity demand and the temperature. A non-linear effect can be observed, where the demand increases for low temperatures (due to heating systems) and for high temperatures (due to air conditioners).  Figure 5 shows that the temperature variable has a seasonal effect on the response variable. In summer, the temperature is higher and cooling systems demand more electricity, therefore the demand is directly proportional to the temperature. In winter, the temperature is lower and heating systems demand more electricity, hence the demand is inversely proportional to the temperature. The sensitivity of the output with regard to WD has also a seasonal effect, with higher variance in winter than in summer and greater sensitivity in weekends. Figure 5 can also be generated using the plot() method of a 'SensMLP' object:

Visualizing the Neural Network Sensitivity relation as a function of the input values
Sometimes it is useful to know how the value of the input variables affects the sensitivity of the response variables. The SensFeaturePlot() function produces a violin plot to show the probability density of the output sensitivities with regard to each input. It also plots a jitter strip chart for each input, where the width of the jitter is controlled by the density distribution of the data (Pedersen and RStudio (2019)). The color of the points is proportional to the value of each input variable, which display whether the relation of the output with the input is relatively constant within a range of input values.
The following code produce the plot of Figure 6: R> SensFeaturePlot(mod2, fdata = DAILY_DEMAND_TR[1:(365*2),]) It can also be generated using the plot() method of a 'SensMLP' object: R> plot(sens2, plotType = "features") In accordance with the information extracted from Figure 5, Figure 6 shows that the sensitivity of the output with regard to the temperature is negative when the temperature is low and positive when the temperature is high. It also shows that the sensitivity of the output with regard to WD is higher in the weekends (lower values of WD).

Extending package functionalities to other MLP models
The current version of NeuralSens package (version 1.0.0), includes methods of SensAnalysisMLP() function for 'nn' (neuralnet), 'nnet' (nnet), 'H2ORegressionModel' and 'H2OMultinomialModel' (h2o), 'mlp' (RSNNS), 'list' (neural), 'nnetar' (forecast) and 'train' (caret) (only if the object inherits the class attribute from another of the available packages). Additionally, a 'numeric' method is available to perform sensitivity analysis of a new neural network model using only the weights of the model, its neural structure, and the activation function of the layers and their derivatives. The first information that must be extracted from the model are the weights of the connections between layers. These weights must be passed to the first argument of the SensAnalysisMLP() function as a 'numeric' 'vector', concatenating the weights of the layers in order from the first hidden layer (l = 2) to the output layer (l = L).
The bias weight should be added to the vector before the weights of the same layer, following the equation below: If the model has no bias, the bias weights must be set to 0 (b l = 0).
The second information is the neural structure of the model. The structure of the model must be passed to the mlpstr argument as a 'numeric' 'vector' equal in length to the number of layers in the network. Each number specifies the number of neurons in each layer, starting with the input layer and ending with the output layer: The last information that must be provided are the activation functions of each layer and their derivatives. If the activation function of a layer is one of those provided by the package (shown in Table 1), the function can be specified using its name. If the activation function is not one of those provided in the package, it should be passed as a function. The same applies to the derivative of the activation function. The activation function Φ l (z l ) of a layer l and its derivative ∂(Φ l (z l )) ∂(z l ) must meet the following conditions: • Φ l (z l ) must return a 'vector' with the same length as z l . The activation function of each neuron may be different, as long as this condition is met: must return a square 'matrix' with the derivative of Φ l (z l ) with regard to each component of z l : Examples of how to use the SensAnalysisMLP() function with new packages can be found in Appendix A.

Effect of network structure and training conditions
An important advantage of sensitivity analysis based on partial derivatives is the robustness of the analysis results regardless of the model's neural structure. Other methods such as Olden rely heavily on the neural structure and the initial starting weights. A similar analysis to the one performed on olden() in Beck (2018) has been performed on the SensAnalysisMLP() function. To observe the effect of the neural structure on the sensitivity metrics, these metrics have been calculated for models with 1, 10 and 20 neurons in the hidden layer. For each neuron level, 50 models with different random initial weights are trained. If the neural structure and the initial starting weights have no effect on the sensitivity metrics, these metrics should be the same for all the models. simdata dataset is used to train the models. Figure 7 shows the mean value of the sensitivity metrics from the 50 models for each neural structure. It also shows the minimum and maximum value of the metric to display the effect of the neural structure and the initial weights values. An important conclusion that can be derived from Figure 7 is that with enough neurons in the hidden layer, i.e., if the model can predict the output with enough precision; variance of sensitivity metrics is negligible compared to the value of the metric. Topics such as data pre-processing or network architecture should be considered before model development. Discussions about these ideas have been already held (Cannas, Fanni, See, and Sias (2006), Amasyali and El-Gohary (2018), Maier and Dandy (2000), Lek et al. (1996)) and are beyond the scope of this paper.

Multilayer perceptron for classification
In this example a multilayer perceptron is trained using the well-known iris dataset included in R. Figure 8 shows two scatterplots comparing the petal-related and sepal-related variables of the flowers in the dataset. It can be seen that setosa species have a smaller petal size than the other two species and a shorter sepal. It also shows that virginica and versicolor species have a similar sepal size, but the latter has a slightly smaller petal size.
The train() function from the caret package creates a new neural network model to predict the species of each flowers based on petal and sepal dimensions.
R> set.seed(150) R> mod3 <-caret::train(Species~., data = iris, preProcess = c("center", + "scale"), method = "nnet", linout = TRUE, trControl = trainControl(), + tuneGrid = data.frame(size = 5, decay = 0.1), metric = "Accuracy") SensAnalysisMLP() function calculates the sensitivities of the model, providing information of the relationships between each output class and each input variable. garson() and olden() method from the NeuralNetTools package (Beck (2018)) provide information on input importance. As they provide information related to the first output neuron, the comparison with SensAnalysisMLP() is done using the sensitivity measures for the first output class. R> SensitivityPlots(sens, der = FALSE, output = "setosa") R> garson(mod3) R> olden(mod3) Figure 9a shows the sensitivity metrics calculated by SensAnalysisMLP(), Figure 9b shows garson()'s importance metrics for the input variables and Figure 9c shows olden()'s importance metrics for the input variables. The mean value of the sensitivities in the top chart of Figure 9a is similar to olden()'s metrics observed in 9c, and the S sq ik values in the barplot of Figure 9a are similar to garson()'s values observed in Figure 9b. It must be noted that values from SensAnalysisMLP() are more robust against changes in the neural structure and initial weights as stated in Section 3.7 and Beck (2018).
The lime (Pedersen and Benesty (2019)) package can also be used to obtain information on the neural network model. In this case, lime() and explain() functions train a decision tree model using the entire dataset to interpret how the neural network predicts the class of three different samples (one for each iris species) using all the features in the training dataset. plot_explanations() function shows graphically the information given by the decision tree.
R> library("lime") R> plot_explanations (explain(iris[c(1,51,101),], lime(iris, mod3), + n_labels = 3, n_features = 4)) Figure 10: Facetted heatmap-style plots generated by applying the plot_explanations() function to mod3 for three selected samples of iris dataset. Each plot represents the contribution (positive or negative) of each feature to the probability of a specific class of iris[,"Species"] for the specified data samples.. Figure 10 confirms the relationships between the inputs and the output variable. However, this method does not provide a quantitative measure for the importance of each input. Due to the lack of quantitative measures for input importance this method can not be directly compared to the other methods exposed in this section ( Figure 9).
Sometimes it may be more interesting to obtain global importance measures instead of measures for each output variable. NeuralSens allows us to obtain global measures using the CombineSens() function. It computes the sensitivity measures of the whole model following Equations (11), (12) and (13). These global measures are an indicator of how much, on average, the output probabilities change when an input variable changes.

Multilayer perceptron for regression
The Boston dataset from the MASS (Ripley, Venables, Bates, Hornik, Gebhardt, Firth, and ca (2019)) package is used to train an nnet (Venables and Ripley (2002)) model. This dataset contains information collected by the U.S. Census Service on housing in the suburbs of Boston (run ?MASS::Boston to obtain more information about the dataset).
The objective of the model is to predict the nitric oxides concentration (parts per 10 million), stored in the nox variable. The input variables of the model are zn (proportion of residential land zoned for lots over 25,000 sq.ft.), rad (index of accessibility to radial highways) and lstat (lower status of the population (percent)). scale() function standardizes the input variables. SensFeaturePlot(), SensAnalysisMLP(), lekprofile() (NeuralNetTools Beck (2018)) and pdp() (pdp Greenwell (2017)) functions analyze the relationships of the output with regard to the inputs.
R> data("Boston", package = "MASS") R> Boston <-as.data.frame(scale(Boston[,c("zn","rad","lstat","nox")])) R> set.seed(150) R> mod4 <-nnet::nnet(nox~., data = Boston, size = 15, decay = 0.1, + maxit = 150) R> lekprofile(mod4, group_vals = 6) R> lekprofile(mod4, group_vals = 6, group_show = TRUE) R> library("pdp") R> pdps <-list() R> for (i in 1: pred.var = names(Boston)[i], + train = Boston, ice = TRUE), train = Boston, + center = TRUE, alpha = 0.2, rug = TRUE) + + theme_bw() + ylab("nox") + } R> gridExtra::grid.arrange(grobs = pdps, nrow = 1) R> SensFeaturePlot(mod4, fdata = Boston, output_name = "nox") R> SensAnalysisMLP(mod4, trData = Boston, output_name = "nox") Figures 11a and 11b display the results of Lek's profile method from the NeuralNetTools package. To prevent the analysis of non-representative scenarios in the input dataset, a kmeans clustering with 6 clusters has been applied to the dataset. Each subplot in Figure 11a shows the evolution of the output variable when varying the input variable of interest across a range of values corresponding to the center of the k-means clusters. The other inputs remain constant in their values in the center of each k-means cluster. Figure 11b shows the value of the input variables at each of the cluster centers. Figure 11c shows the partial dependence plots (pdp) and Individual Conditional Expectaction (ICE) plots of mod4 output with regard to each input variable. An ICE curve is calculated by maintaining the input variable of interest x i at a value x ik , where x ik is the value of x i in the k row of the dataset, and varying all other inputs across their values in the dataset. For a given dataset with N samples, there would be N ICE curves for each input variable. The PDP curve can be calculated as the mean of these ICE curves. PDP shows the marginal effect a given input variable has on the response variable of the neural network model, averaging the effects of the rest of the input variables. Calculating all ICE curves shows how the output variable change in the entire input space of the dataset. This comes at a large computational cost, since the number of curves that must be calculated are directly proportional to the number of samples and number of input variables. The computational time can be reduced by calculating only the PDP or a reduced number of ICE curves, but in that case some important scenarios might be ignored.
SensFeaturePlot() function performs an analysis similar to Lek's and pdp by plotting the sensitivity of the output with regard to the input colored proportionally to the value of the input variable. On the one hand, lekprofile() function indicates that lstat and rad have a direct relationship with the output and zn has an inverse relationship with the output. Figure  11a also suggests that all the input variables have a non-linear relationship with the output. On the other hand, Figure 11d shows that the sensitivity of the output has an approximately linear relationship with zn, and a non-linear relationship with the other two inputs. In this case, the lekprofile() function gives more information about the model, but it can be difficult to understand how the value of the input variables in each group affects the output variable.
SensAnalysisMLP() can be used to obtain more information on variable importance and relationships between variables. In this case, the rad variable affects the output the most, information which can be difficult to extract from the other functions.

Computational cost
As the size of neural network models increase exponentially to solve more complex tasks, performing a sensitivity analysis of a model could become an intensive computational task. Sensitivity analysis using partial derivatives requires matrix calcules where the size of the matrixes is directly proportional to the size of the hidden layers. As the number of neurons in hidden layers increase, the time to perform these calculations grows rapidly.
A comparison of how much time is required by a sensitivity analysis using the different methods included in Section 4 has been performed. This comparison has been carried out using the YearPredictionMSD dataset. This dataset consists of 90 input variables and 515345 samples. The authors propose to measure the computational time when varying the number of input variables, the number of samples and the size of the hidden layer of a single hidden layer MLP model. This analysis has been performed on a computer with the following specs: processor Intel(R) Core(TM) i7-8700 @3.20 GHz, 32 GB of RAM memory, R version 3.6.3 (2020-02-29), platform x86_64-w64-mingw32/x64 (64-bit) and running under Windows 10 x64 (build 18362). of neurons in the hidden layer and the number of samples, as the size of the matrixes increases with the number of neurons and the number of matrixes multiplication increase with the number of samples. lekprofile() is more affected by the number of input variables, because the number of curves to be created increase with the number of input variables. However, due to the fact that it uses only a fixed number of scenarios it does not depend on the number of samples. Since the number of neurons in the hidden layer barely affects the computational time to predict the output variable in each scenario, this parameter does not affect the computational time of lekprofile.
• The slowest function is the partial() function from the pdp package when calculating all ICE curves. Calculating all ICE curves instead of only the pdp curves adds a noticeable amount of computational time. However, if the form of the ICE curves is not constant throughout the samples of the dataset (as in Figure 11c), showing only the pdp curve gives misleading information as the form of the pdp curve does not resemble all the ICE curves. The computational time for partial() is directly proportional to the number of samples and the number of input variables, because the number of curves to be calculated increase exponentially as these parameters increase.
In addition to these conclusions, it must be mentioned that this analysis has been performed using a model with only one output variable. If there were several output variables, to obtain analogous information as SensAnalysisMLP() the other functions must be called once for each output. Because of this, the computational time of all the functions except SensAnalysisMLP() must be multiplied by the number of output variables of the model in order to obtain an approximate idea of the computational time they require.

Conclusions
The NeuralSens package provides functions to extract information from a fitted feed-forward MLP neural network in R. These functions can be used to obtain the partial derivatives of the neural network response variables with regard to the input variables and to generate plots to obtain different information on the network using these partial derivatives. Methods are available for the next CRAN packages: 'nn' (neuralnet), 'nnet' (nnet), 'mlp' (RSNNS), 'H2ORegressionModel' and 'H2OMultinomialModel' (h2o), 'list' (neural), 'nnetar' (forecast) and 'train' (caret) (only if the object inherits the class attribute from another package). An additional method for class 'numeric' is available to use with the basic information of the model (weights, structure and activation functions of the neurons).
The main objective of the package is to help the user to understand how the neural network uses the inputs to predict the output. This information may be useful for simplifying the neural network structure by eliminating the inputs which have no effect on the output. It could also provide a deeper understanding on the problem and the relationship among variables.
NeuralSens is another tool among several other methods for exploratory data analysis and model evaluation, and it can be used with other packages (Beck (2018), Greenwell (2017)) to obtain more information on the neural network model. Nevertheless, it must be noted that sensitivity analysis using partial derivatives provides information about variable relationships such as pdp or ICE plots significantly faster. Moreover, it also provides variable importance measures like Garson's or Olden's methods and these measures are independent of the neural structure and training conditions of the model as long as it predicts the output with enough precision.
Improving the information given by these methods will have value for exploratory dataanalysis and characterization of relationships among variables. Future versions of the package may include additional functionalities as: • Parallelizing the sensitivity calculations when workers registered to work in parallel are detected.
• Calculating the sensitivities of the output variables with regard to the output of hidden neurons, in order to obtain the importance of each hidden neuron and helping to select the optimal network structure.
• Calculating the sensitivities of other neural network models such as Probabilistic Radial Basis Function Network (PRBFN) or Recurrent Neural Network (RNN).
• Calculating the second order partial derivatives of an MLP model to analyze the effect of interactions between two input variables.
• Develop a statistic to determine if the relationship between the output and input is linear.