Hyperspectral Data Analysis in R: the hsdar Package

Hyperspectral remote sensing is a promising tool for a variety of applications including ecology, geology, analytical chemistry and medical research. This article presents the new \hsdar package for R statistical software, which performs a variety of analysis steps taken during a typical hyperspectral remote sensing approach. The package introduces a new class for efficiently storing large hyperspectral datasets such as hyperspectral cubes within R. The package includes several important hyperspectral analysis tools such as continuum removal, normalized ratio indices and integrates two widely used radiation transfer models. In addition, the package provides methods to directly use the functionality of the caret package for machine learning tasks. Two case studies demonstrate the package's range of functionality: First, plant leaf chlorophyll content is estimated and second, cancer in the human larynx is detected from hyperspectral data.


Introduction
Hyperspectral data refers to measurements of reflectance, transmission or absorption of electromagnetic radiation with a very high spectral resolution. Consider photographs taken with a normal digital camera to illustrate the concept of spectral resolution. The sensors in digital arXiv:1805.05090v1 [stat.OT] 14 May 2018 cameras have three bands that cover the blue, green and red portions of the visible electromagnetic radiation. Each band is sensitive to radiation in a wavelength range of approximately 100 nm. Hyperspectral sensors, in contrast, feature hundreds of such bands that are sensitive to a very narrow wavelength range along the electromagnetic spectrum (often down to 1 nm). Together, all bands continuously cover a certain portion of the electromagnetic spectrum. Additionally, most hyperspectral sensors feature bands within the infrared or ultraviolet ranges. For instance, the hyperspectral satellite sensor Hyperion provides data with 220 bands with a spectral resolution of approximately 11 nm (wavelength range) at each 10 nm (sampling interval) from 400 nm (visible) to 2500 nm (short-wavelength infrared, Pearlman et al. 2001).
Hyperspectral imaging, also referred to as imaging spectroscopy, is used in various disciplines, such as analytical chemistry (Blanco and Villarroya 2002), agricultural research (precision farming, Haboudane et al. 2002), ecology (Ustin et al. 2004), pedology (Gomez et al. 2008), geology (Bishop et al. 2011), and medical research (Calin et al. 2014;Regeling et al. 2015). The main advantages of hyperspectral imaging are its cost-effectiveness in spatial analysis, the nondestructive measurement of biophysical and biochemical properties of the investigated surface and the speed of analysis (up to real-time). Hyperspectral analysis is not restricted to spaceborn approaches. Many of the above-mentioned fields make use of portable spectrometers or hyperspectral cameras, which can be used in the field, in the laboratory or even in a surgical suite. The choice of the measuring device and its spectral specifications depends on the surface under investigation and the aim of the analysis. For instance, vegetation has a very prominent spectral feature called the red-edge. This refers to a sharp increase of reflectance values in the near infrared wavelengths. These wavelengths, in contrast, are less informative in geological analyzes, which usually require the short-and mid-infrared wavelengths.
Currently, most hyperspectral approaches use commercial software tools such as Erdas Imagine, ENVI or the hyperspectral toolbox in MATLAB. These tools are generally expensive and have limited functionalities for statistical analysis. Therefore, we developed a new package in the open source software R (R Core Team 2017). The Hyperspectral Data Analysis ("hsdar") package combines important hyperspectral analysis tools with the statistical power of R. This article is structured as follows: The first section summarizes the reasons why R is convenient for hyperspectral analysis. The next section outlines the main functionalities and the implementation of the hsdar package, and also compares it with other available software tools with a special focus on the other hyperspectral package "hyperSpec" in R. Finally, two examples demonstrate the effectiveness of combining hyperspectral techniques with the statistical power of R.

Why use R for hyperspectral imaging analysis
The methodology which is commonly applied in the analysis of hyperspectral datasets consists of three parts: (1) the preprocessing of spectra, (2) the extraction of the relevant information (i.e., spectral characteristics associated with biophysical properties of the target), and (3) a classification or regression analysis to predict biophysical properties in space and time. R is the most comprehensive software tool for performing statistical analyses during step (3). In this context, especially the machine learning algorithms such as support vector machines, Random forests and artificial neural networks are powerful tools for modelling different parameters across space and time (for applications see e.g., Schwieder et al. 2014;Hansen et al. 2002;Bacour et al. 2006). However, the functionality required for steps (1) and (2) has only been partly available in R, was distributed across multiple packages and was not directly applicable to hyperspectral data.
Thus, to take advantage of the statistical power of R for hyperspectral data analysis, a new package was developed that provides a framework for handling and analyzing hyperspectral data. A special focus was set on the analysis of large datasets taken under field conditions for e.g., vegetation remote sensing. The R-package hsdar implements commonly used processing routines for hyperspectral data and further combines or extends the existing functionality of R to include hyperspectral data into a broad range of statistical analyses.

Overview of the functionality of hsdar
This section gives a brief technical overview on the general functionality provided by hsdar. The description starts with a short introduction of the classes followed by a summary of the main functions.

Classes
To provide a framework to handle large hyperspectral datasets, the hsdar-package defines a new S4-class called "Speclib". This allows the user to store hyperspectral measurements and all information associated with those measurements in a single object (Figure 1). The hyperspectral measurements consist of reflectance values stored in the spectra slot and their spectral specifications. The spectra are stored either as a numeric matrix or a RasterBrickobject. The matrix is intended for smaller data sets such as point measurements, whereas the RasterBrick object may contain large hyperspectral (satellite) images. If the spectra are stored as a matrix, the rows delineate between different samples while the columns represent the different spectral bands. The spectral specification consists of two numeric vectors stored in the wavelength and the f ull-width-half-maximum (fwhm) slots. The wavelength gives the central position of each band and the fwhm value describes the difference between the wavelength values where the sensitivity of the sensor is half of its maximum in the respective band. Both values are specifications of the sensor used to acquire the data and must be in the same unit. It is preferred to use nm but automatic conversion from other typical units such as µm is supported. If the fwhm values are unknown, the difference between neighboring bands are used as an approximation. The associated data (termed SI as an abbreviation for supplementary information), which is included as a list, may contain any type of ancillary information like the measurement setup or the geographical position. Additionally, raster images are supported as part of the SI.
Speclibs can be created through several methods. For each method, the user must at least know the wavelength values of all bands that must be available as a numeric vector. The most important method to create an object of class Speclib is using the file path pointing to a hyperspectral raster image readable by rgdal or raster (Hijmans 2016;Bivand et al. 2016;Pebesma et al. 2015). The second option to create a Speclib is to read the reflectance values from a file (e.g., a comma-separated list) and store these in a matrix. This matrix, together with the wavelength information, can then be used to create a Speclib. In the following short example, the example dataset "spectral data" (which is already a Speclib) is divided into its basic components, which are then used to create a new Speclib: R> library("hsdar") R> data("spectral_data") R> reflectance <-spectra(spectral_data) R> class(reflectance) [1] "matrix" R> wv <-wavelength(spectral_data) R> class(wv) [1] "numeric" R> spec_lib <-speclib(reflectance, wv) R> class(spec_lib) [1] "Speclib" attr(,"package") [1] "hsdar" In this example, the spectra (reflectance) are stored as a matrix and the wavelength (wv) is stored as a numeric vector.
Aside from using local offline data, hsdar can search online hyperspectral databases and automatically download data. The following example searches for spectra from grass species in the USGS Digital splib04 Spectral Library and downloads the data. Note that missing data in the downloaded spectra are automatically masked out.
R> avl <-USGS_get_available_files() R> grass_spectra <-USGS_retrieve_files(avl = avl, + pattern = "grass-fescue") In the example above, the first command returns all available spectra. Users can specify a subset of spectra in a search string within the retrieve function (in this case "grass-fescue"), which is downloaded and converted to a Speclib. Note that the function supports approximate string matching so that entries similar to the search string are found.

Functionality
Along with the new Speclib class, hsdar includes several methods to summarize, plot, query and replace data in Speclib objects. Since many hyperspectral datasets are available as raster datasets (e.g., if acquired by satellite), hsdar provides a simple interface to the raster package that allows users to read and save data from and to all common raster formats via the rgdal interface (Hijmans 2016;Bivand et al. 2016;Pebesma et al. 2015). On commonly used hardware, hyperspectral raster datasets often exceed the capacity of the RAM. To overcome this issue, hsdar provides two processing options for such large datasets. The simpler, less computational effective option is to store the spectra as a RasterBrick object in a Speclib. In this case, the spectra are read into memory only upon request and most of the functions process the spectral data block-wise. In this context, the functions automatically detect if the data should be processed block-wise or if all the data should be read before executing the function. For block-wise computation, the resulting spectra are saved as a temporary raster file and the function returns a new Speclib object pointing to the temporary file. The disadvantage of this option is that if more than one function is applied, the spectra have to be saved and re-read multiple times. Thus, a second option is available, which follows the framework of the raster package but requires the user to be familiar with simple programming tasks in R. Like the raster package, hsdar provides writeStart, getValuesBlock, writeValues and writeStop methods for the Speclib class so that the user can easily process a large dataset by iteratively reading parts (chunks) of the images, passing it through multiple functions and writing the result to a new raster file. Only one reading and writing process is required in this case, which considerably expedites the analysis. A typical code block would look like the following. To execute it, note that wavelength needs to be defined and infile must point to an existing file readable by the raster package. The result will be a new file in the GeoTIFF-format defined by outfile featuring the same number of bands as the existing file (option 'nl'): R> ra <-speclib(infile, wavelength) R> tr <-blockSize(ra) R> res <-writeStart(ra, outfile, nl = nbands(ra), format = "GTiff") R> for (i in 1:tr$n) In the loop, function(s) provided by the hsdar package can be applied to the Speclib v1.
Examples of functions will be discussed in detail in the following sections. The result of the function(s) (termed v2 in this example) is then written to the initially defined file (res). Note that objects res and v1 are of class Speclib, while v2 may be a vector, matrix or a Speclib depending on the return value of the functions applied in between. Please read the help files and the corresponding vignette available in the raster package for further information.
The functionality provided by the hsdar package can be divided into preprocessing, analysis and modelling stages (Table 1). In the following, we briefly outline the most important features except those that are part of the analysis in the section of case studies.
Noise reduction is a critical preprocessing task in hyperspectral analysis because, as a consequence of their high spectral resolution, the sensors often suffer from low signal to noise ratios, thus, an important step of each hyperspectral analysis is filtering the spectra. In hsdar the function noiseFiltering applies one of four predefined filters (Savitzky-Golay-, Lowess-, mean-, Spline-filter) or any other filter function from the signal package (Ligges et al. 2013). Figure 2 shows the effect of filtering (red lines) spectra that were artificially affected by random noise (black lines). Additionally, hsdar provides functions to calculate variables derived from spectral features and allows the user to integrate (bin or spectrally resample) hyperspectral datasets to sensors featuring a lower spectral resolution. Spectral resampling can be performed using predefined spectral response functions of common satellite sensors or using Gaussian spectral response functions defined by the fwhm values of the sensor with the lower resolution. Alternatively, spectral response values may be stored in a Speclib and passed directly to the resampling function.
To analyze hyperspectral datasets, the computation of approximately 100 vegetation and soil indices is implemented in hsdar. The indices can be accessed via the functions vegindex and soilindex which encompass widely used indices such as the normalized difference vegetation index (NDVI, Tucker 1979) in addition to specialized indices such as the cellulose absorption index (CAI), which is a proxy for litter amounts and plant coverage (Nagler et al. 2003). Additionally, users can easily define their own index using a simple syntax. In (hyperspectral) remote sensing of vegetation, the sharp increase in the reflectance values between 680 and 750 nm (red edge) is the most important feature, as the shape of the red edge is determined by the amount of water and chlorophyll in the vegetation. Thus, the red edge is seen as a reliable indicator for plant health in addition to leaf area index, plant coverage, chlorophyll, water and nitrogen content (e.g., Filella and Peñuelas 1994). Different methods for extracting relevant information in the shape of the red edge are included in hsdar. These encompass common methods such as deriving the red edge inflection point using a Gaussian fit (Miller et al. 1990) or more recent advances such as the red edge position through linear extrapolation (Cho and Skidmore 2006). Finally, hsdar provides functionality to perform linear spectral unmixing (LSU, Sohn and McCoy 1997) e.g., for estimating the fractional vegetation cover.
hsdar implements two frequently used radiative transfer models to simulate the reflectance values of vegetation. The first one is the leaf reflectance model PROSPECT (vers. 5B and D, Jacquemoud and Baret 1990;Féret et al. 2017). The second one is the canopy reflectance model PROSAIL which enhances the functionality of PROSPECT and includes canopy directional reflectance simulation (Jacquemoud et al. 2009). In addition, the inverted PROSPECT model allows the user to estimate the content of various biochemical parameters in the leaves from hyperspectral data (Jacquemoud 1993).

Other hyperspectral imaging tools
Comparable functionality can be found in commercial software tools, i.e., MATLAB (The MathWorks, Inc., Natick, Massachusetts) and ENVI (Environment for visualizing images, Ex-elis Visual Information Solutions, Boulder, Colorado). A hyperspectral toolbox is available in MATLAB that provides feature extraction algorithms such as principal component analysis as well as supervised classification algorithms such as a Maximum Likelihood classifier (Arzuaga-Cruz et al. 2004). ENVI has functions for preprocessing hyperspectral images such as continuum removal and feature extraction algorithms such as the spectral angle mapper.
In the open source software R, hsdar completes its hyperspectral functionality together with another major hyperspectral package called hyperSpec (Beleites and Sergo 2016). The primary difference between the packages is that hsdar is intended for analyzing datasets collected under field conditions with satellites or spectrometers with a special focus on vegetation and ecosystem remote sensing (Dechant et al. 2017;Große-Stoltenberg et al. 2016;Lehnert et al. 2014;Meyer et al. 2017). In contrast, the hyperSpec package provides many useful functions for plotting with a special focus on hyperspectral data acquired under laboratory conditions as in chemistry or medical research (Beleites et al. 2011(Beleites et al. , 2013. Functions in hsdar allow it to interface with the hyperSpec package, i.e., to convert between Speclib objects and the hyperSpec class. Consequently, hsdar users also have access to various import and plotting functions provided by the latter package.

Case studies
In the following sections two study cases are presented to explore the functionality of hsdar. The first case study uses data from a field experiment conducted in central Germany where hyperspectral images were taken from grassland vegetation exposed to enhanced CO 2 air concentrations (Figure 3a). The example includes spectra preprocessing, followed by the extraction of absorption features, calibration and validation of a prediction model for chlorophyll content. In the second case study, emphasis is given to the calculation of normalized ratio indices and model parameterization to detect cancer cells in human larynx tissue using hyperspectral images (Figure 3b).

Remote sensing of vegetation: chlorophyll content
The first example demonstrates the applicability of hsdar for hyperspectral data analysis in vegetation studies. Specifically, the package is used to estimate chlorophyll content of plants from hyperspectral data. The dataset was acquired within the scope of a FACE (f ree air carbon dioxide enrichment) experiment conducted on a temperate grassland situated near Giessen, Germany (Kammann et al. 2005;Obermeier et al. 2017). On 15 plots (each 2 x 2 m), the chlorophyll content of the two most abundant grasses (Arrhenatherum elatius and Trisetum flavescens) was measured using a Konica Minolta SPAD-502Plus chlorophyll meter. The mean value of chlorophyll content of both species was calculated and weighted by their corresponding plant coverage. Hyperspectral data were acquired at the time of the chlorophyll measurements using a HandySpec ® field spectrometer, which simultaneously measures reflectance values from 305 nm to 1705 nm with a spectral resolution of 1 nm (Figure 3a). The field spectrometer has two sensors measuring from 305 to 1049 nm and 1050 to 1705 nm. On each plot, 24 spectra were collected under natural (solar) illumination and averaged. Each plot was visited three times, on 30.05.2014, 08.08.2014 and 13.05.2015. Thus, the dataset contains 45 observations.
The following paragraph describes the preprocessing steps that reduce measurement errors and artifacts in the spectral data. Then, the spectra are transformed to reduce the influence of the illumination at time of acquisition. Finally, the chlorophyll content is estimated with Random Forest using the transformed spectra as predictors (Breiman 2001). Here, we use the randomForest package by Liaw and Wiener (2002) in combination with the caret package created by Kuhn (2008).
In the first preprocessing step noise is removed from the spectra using a Savitzky-Golay filter (method "sgolay") with a length of 15 nm. The filter reduces the noise of the reflectance values by fitting a polynomial function and eliminates small differences between neighboring bands, which are most likely a result of measurement inaccuracy.
R> data("spectral_data") R> spectral_data <-noiseFiltering(spectral_data, method = "sgolay", The result is a Speclib object, which contains a filtered spectral signature in the original sampling resolution. In addition, the empirical function of Coste et al. (2010) is used to transform the chlorophyll SPAD values to µg cm −2 (C a,b ) to facilitate the interpretation of the chlorophyll content values: Note that the SPAD chlorophyll value is shipped with the example dataset and stored in the supplementary information (SI) of the object.

R> spectral_data <-spectral_data[, wavelength(spectral_data) >= 310 & + wavelength(spectral_data) <= 1000]
Since the absorption of chlorophyll is not restricted to the central wavelength, but also affects the neighboring bands, the reflectance values are considerably lowered in the blue and red parts which lead to "absorption features" in the spectral signature of the reflectance (shown as gray boxes in Figure 4a). The form and magnitude of these absorption features are correlated to the chlorophyll content of the measured vegetation . To enhance the form of the absorption features, the spectra can be transformed by constructing a continuum hull around each spectrum. In general, there are two methods for defining such a hull. In the first approach, the convex hull uses the global maximum of the reflectance values as an initial fix point. Then, additional fix points are found to create a convex hull (see red line in Figure 4a). The second approach is called segmented upper hull. Here, the slope of the line to the left and right of the maximum must be positive and negative, respectively (see blue line in Figure 4a). This does not necessarily mean the hull is convex, however. Geologic hyperspectral analyzes often use the convex hull because the distinct absorption features of minerals in the mid-infrared part of the spectrum are easily derived. In vegetation studies, the absorption features of chlorophyll are very close to one another and the reflectance maximum in the green part is considerably lower than in the near infrared. Consequently, only one absorption feature would be detectable. Therefore, a segmented upper hull (option 'sh') is used in this example to ensure that two small features are identified instead of one large feature. To enhance the chlorophyll absorption features, the reflectance values are afterward transformed into band depth values (option 'bd'): where R is the measured reflectance and CV is the reflectance value of the constructed continuum line at wavelength λ.
R> spec_bd <-transformSpeclib(spectral_data, method = "sh", out = "bd") The band depth values in relation to the wavelength of all 45 spectra are plotted in Figure  4b. The chlorophyll absorption features correspond to the first two peaks of the band depth values. The absorption features are now defined as the part of the spectrum between two fix points (band depth values of 0). Since the third absorption feature centered around 980 nm is related to plant water content and biomass rather than chlorophyll (Peñuelas et al. 1993), only the absorption features at 460 nm and 670 nm are selected for further analysis.  Table 2 for a subset of the resulting parameters of the example data set.

R> featureSpace <-feature_properties(featureSpace)
In the last part of this example, the chlorophyll contents of the measured samples are estimated using the parameters derived from the absorption feature and the band depth values within the features as predictors. Multivariate statistics and machine learning approaches are frequently used for this purpose, because prediction models based on multiple (and often correlated) variables usually out-perform the univariate approaches. To cope with multivariate and machine learning tasks, hsdar provides wrapper functions that enable the user to directly use the functionalities of the caret package. This is by far the most comprehensive multivariate package since it includes various approaches with the same syntax and functions.
To use the functions of caret, the response variable has to be defined, which must be stored in the SI attached to the Speclib object ("featureSpace").
R> featureSpace <-setResponse(featureSpace, "chlorophyll") The spectra are the default selection for predictors. However, additional predictor variables  from the attributes of the spectra can be included. In this example, all parameters extracted above are added.
R> featureSpace <-setPredictor(featureSpace, + names(SI(featureSpace))[4:ncol(SI(featureSpace))]) The final model for deriving chlorophyll content is trained by tuning the required parameter for the Random Forest model (Number of randomly selected predictor variables, mtry). 10fold cross validation is repeated 5 times for model tuning and estimating accuracy. The internal predictions of the final tuning setup are returned providing an independent data set for validation. The accuracy of the predictions performed by the model is evaluated with the root mean square error (RMSE) and the R 2 -value. For further information about strategies on model settings and cross validation see Kuhn and Johnson (2013) and Kuhn (2008).
R> ctrl <-trainControl(method = "repeatedcv", number = 10, repeats = 5, + savePredictions = "final") R> rfe_trained <-train(featureSpace, trControl = ctrl, method = "rf") The number of randomly selected predictor variables at each split of the trees is set to mtry = 453. Using the repeated cross validation, the chlorophyll contents estimated by the Random Forest model fit well if compared to the measured ones (RMSE = 2.49 mg, R 2 = 0.95, Figure  5). This shows that the proposed method incorporating hyperspectral data is a valid approach for chlorophyll estimation. The resulting model can be used to predict the chlorophyll content of plots where it has not been measured in the field (e.g., Lehnert et al. 2014).

Hyperspectral detection of cancer
The second example shows how hyperspectral imaging can be used in non-invasive detection of cancer of the human larynx (head and neck squamous cell carcinoma; hence referred to as "HNSCC"). This is demonstrated with a data subset acquired at the University of Bonn, Germany that includes hyperspectral images from 25 patients, 10 of which have a histopathological diagnosis of HNSCC. The images were acquired using an endoscope, which was coupled with a monochromatic CCD camera. A special Polychrome V light machine allowed researchers to change the wavelength of the impinging radiation so that several images taken under different illuminations could be combined into hyperspectral cubes (Figure 3b). The images were preprocessed and collocated using the methodology proposed by Regeling et al. (2015). The preprocessing is key because the different bands are acquired with short time lapse as a consequence of the varying light source. Medical experts' manual classification into cancerous and non-cancerous tissue was used as reference. The following code loads the data into R and plots them to explore the differences between cancerous and non-cancerous tissue ( Figure 6). R> data("cancer_spectra") R> plot(subset(cancer_spectra, infected == 1), ylim = c(0, 400), + col = "darkred") R> plot(subset(cancer_spectra, infected == 0), new = FALSE) Additionally, the response variable ("infected") is converted to a factor: R> SI(cancer_spectra)$infected <-as.factor(SI(cancer_spectra)$infected) In contrast to the first example, the spectra of the human larynx are expressed in counts and not reflectance values. Thus, the absolute values highly depend on the light source, the temperature of the sensor, and the illumination geometry. To cope with this limitation, normalized ratio indices are calculated instead of using the absolute count values. Mathematically, these are defined as: Here, R is the reflectance (or in this case the number of counts) at wavelength i or j. These indices are then calculated for all possible combinations of bands through the predefined function "nri".

R> nri_data <-nri(cancer_spectra, recursive = TRUE)
The NRI values can be directly used as predictors in univariate generalized linear models, for example. Note that a multitude of models must be derived depending on the number of bands in the hyperspectral dataset. Initially, it is worthwhile to resample the spectra to a coarser spectral resolution to reduce the number of models. Alternatively, some functions in hsdar directly support parallel processing using the foreach package. To execute a function on two cores in parallel, simply use the following code depending on the operating system. For Linux/Mac OS: R> library("doMC") R> n_cores <-2 R> registerDoMC(n_cores) For Windows: R> library("doMPI") R> n_cores <-2 R> cl <-startMPIcluster(count = n_cores) R> registerDoMPI(cl) Please note that the dataset in the current example is not large enough to benefit from parallel processing. Therefore, the previous code snippet can be skipped, and we continue by calculating the generalized linear models using the NRI values as predictors for infection: R> glm_models <-glm.nri(infected~nri_data, preddata = cancer_spectra, + family = binomial) It must be noted that the indices are highly correlated, which is a common drawback to using them in a multivariate analysis. In this example, however, each index is used as a predictor in a separate model to eliminate collinearity. The coefficients, p-values and test statistics of the generalized linear models can now be plotted in 2-d correlograms. In such diagrams, the x-axis and the y-axis represent the two spectral bands used to calculate the index. The color in the diagram symbolizes the coefficient of the model. Thus, the diagrams provide an initial look at band combinations that might be useful for distinguishing between cancerous and non-cancerous parts of the tissue.
R> plot(glm_models, coefficient = "z.value", legend = "outer") R> plot(glm_models, coefficient = "p.value", uppertriang = TRUE, The plot is shown in Figure 7. Almost every index calculated from wavelengths between 400 nm and 450 nm and any other band featured low p-values and, thus, had a significant effect on the distinction between cancerous and non-cancerous tissue (see white rectangle in Figure 7). Positive z-values were observed for NRI values calculated from longer wavelengths. Negative z-values were obtained for indices calculated from 450 nm to 550 nm for the first band and 400 nm to 480 nm for the second band. The index with the worst performance was calculated from bands 490 nm and 590 nm (see shaded black rectangle in Figure 7).  This approach, however, precludes multiple NRI values from being used as predictors because they are usually highly correlated, as previously mentioned. Thus, machine learning algorithms classify cancerous cells, as in the first example, because collinearity among predictor variables does not affect their predictive performance. Predictor and response variables have to be defined: As response variable, the column "infected" in the SI was used and the NRI values are used as predictors by default. The stage of the cancer is used as an additional predictor variable, because the spectral signal in the early stages of the cancer differs from that in later stages.
R> sel_feat <-rfe(nri_data, cutoff = 0.9) R> ctrl <-trainControl(method = "repeatedcv", number = 10, repeats = 5, + savePredictions = "final") R> rfe_trained_svm <-train(sel_feat, trControl = ctrl, + importance = TRUE, method = "svmRadial") R> rfe_trained_nnet <-train(sel_feat, trControl = ctrl, + importance = TRUE, method = "nnet") Table 3 shows the validation result of the final models for both methods. Support vector machine performed slightly better and yielded an overall accuracy of 93.33% as compared to 90% for the neural network classification. This shows that hyperspectral imaging and machine learning approaches may yield positive results for detecting cancer in human tissue. The data used in this case study have several drawbacks mainly due to the acquisition with a variable light source instead of a hyperspectral camera in combination with a constant light source. This causes the count values to be dependent on movements of the patient and the illumination geometry by the light source. However, the analysis based on normalized ratio indices yielded robust results clearly highlighting its large potential. Since hyperspectral imaging is a non-invasive measurement technology, the examination is relatively comfortable for the patient. However, it has to be noted that the detection of cancer with hyperspectral imaging may only facilitate the diagnose of a medical expert. At the moment, there is no possibility to automatically diagnose cancer in the human larynx without the knowledge of a trained medical expert (Regeling et al. 2016).

Conclusions
The two case studies provide an initial impression of what hyperspectral remote sensing can be used for and how a typical approach may look. Both examples show how the hsdar package can be used as a powerful tool within R for remote sensing and spatial applications. Based on the widely used raster package, hsdar introduces new functionalities for processing hyperspectral data and gives users control over the results of univariate and multivariate modeling approaches, including machine learning techniques. Although hsdar is dedicated to spectral data featuring many bands, it is applicable to any multispectral satellite data including Landsat 8 (8 bands in the visible and near infrared part of the electromagnetic radiation) or MODIS (19 bands) (Lehnert et al. 2015). For example, hsdar can perform linear spectral unmixing or calculate spectral indices such as the NDVI. hsdar differentiates itself from the other hyperspectral package available for R (hyperSpec, Beleites and Sergo 2016) by focusing on environmental instead of laboratory analysis. Data can easily be transferred between both packages since hsdar provides functions to convert to and from objects in hyperSpec. Both packages extend R by functions for all state of the art methods in hyperspectral imaging which have been available only in commercial software tools so far.