A Genetic Algorithm for Selection of Fixed-Size Subsets with Application to Design Problems

The R function kofnGA conducts a genetic algorithm search for the best subset of k items from a set of n alternatives, given an objective function that measures the quality of a subset. The function ﬁlls a gap in the presently available subset selection software, which typically searches over a range of subset sizes, restricts the types of objective functions considered, or does not include freely available code. The new function is demonstrated on two types of problem where a ﬁxed-size subset search is desirable: design of environmental monitoring networks, and D-optimal design of experiments. Additionally, the performance is evaluated on a class of constructed test problems with a novel design that is interesting in its own right.


Introduction
This article introduces an R (R Core Team 2015) function for selecting a subset of fixed size k from a set of n alternatives, given an arbitrary objective function measuring the quality of any subset. The function implements a genetic algorithm (GA), and should be suitable for various subset selection problems where k is constant and n is large enough to make exhaustive search impractical. Two types of problems fitting this description are the design of environmental monitoring networks, as discussed by Le and Zidek (2006); and D-optimal experimental design.
The use of stochastic search methods (including GA) for subset selection is not new. Various algorithms for carrying out such a search have been described in the literature, and some freely available software packages exist. At present, however, there appears to be no publicly available code that both i) restricts the search to only subsets of size k, and ii) places no restrictions on the objective function. The function described here, called kofnGA, aims to fill this gap.
The remainder of this section introduces the environmental monitoring network design and the D-optimal design problems, and reviews some of the relevant work in the area. Section 2 describes the kofnGA function and its use. Section 3 demonstrates the function's performance on seven test problems. The first four of the test cases are design problems, while the last three are constructed subset selection problems with a novel design that allows control over the size and difficulty of the problem. Section 4 provides a brief summary of the results and considers how further improvements might be made.
Remark. Because GA constitute a well established optimization approach, this work contains only limited consideration of the new algorithm's sensitivity to its tuning parameters and its performance in comparison with other optimization algorithms. Rather, the test problems of Section 3 are used to demonstrate that the new function is a reasonable implementation of a GA, that can be used with confidence to get good approximate solutions to subset selection problems. As with any optimization heuristic, application specific tuning of the algorithm is important to achieve optimal performance (and good performance relative to competitors). Further comments on this point are given at the end of Section 2.2.

Environmental monitoring network design
Environmental monitoring networks consist of a set of monitoring stations placed at various locations in the region of interest. The need for efficient use of resources leads to the question of optimal placement of monitoring stations. Given a pre-existing network, the network design problem can be one of extension (adding new sites to existing ones), of reduction (removing sites), or of redesign (adding some sites and removing others). Each of these cases is discussed by Le and Zidek (2006), and all are expressed as a search for the set of sites that maximize the determinant of an appropriately specified (hyper-)covariance matrix.
Take the network extension problem as default. If k new monitoring stations are to be added to a network, and there are n potential sites to choose from, the goal is to select the best subset of k sites from the n candidates -that is, to choose one combination from the n k possibilities that is optimal in some sense. Because of the combinatorial nature of the problem, one would be satisfied with fast near-optimal solutions when n is large. Le and Zidek (2006, Chap. 11) define optimality in terms of the determinant of the covariance matrix of the selected sites. Label the candidate sites by the integers 1 . . . n, and let the n × n matrix Σ be the covariance matrix for all n candidate sites. Let v be an index vector holding the indices of the k chosen sites. Define the k × k covariance matrix of the chosen sites to be Σ v ; it is formed by taking the intersection of the rows and columns of Σ that are indexed by v (such a matrix is called a principal submatrix of Σ indexed by v). The goal of optimization is to choose v such that the determinant |Σ v | is maximized. This criterion is based on the maximum entropy principle. Other factors, such as the cost of adding a station at each site, can also be included in the problem statement with suitable modifications to the objective function.
Recent work by Ruiz-Cárdenas, Ferreira, and Schmidt (2010) specifically addresses the need for fast approximate solutions to network design problems. They consider three different algorithms: a simulated annealing algorithm, a GA, and a GA incorporating a local search step to speed up convergence. Their GA bears much resemblance to the independently developed GA described here, however their article does not include code for running the search.
For the R computing environment, two packages offer GA subset selection capability, but neither is well suited to the network design problem. The genalg package (Willighagen 2015) contains GA functions for general continuous and discrete optimization, but the package does not allow restriction to a single subset size. The package subselect (Orestes Cerdeira, Duarte Silva, Cadima, and Minhoto 2015) contains functions for size constrained subset selection using both GA and simulated annealing, but it is restricted to use only certain built-in objective functions suitable for variable selection.

D-optimal design
Another large body of work that involves selection of fixed-size subsets, and therefore has many relevant algorithms, is D-optimal experimental design. The goal of D-optimal design is to choose k of n possible experimental runs such that the resulting model matrix X maximizes |X X|. The objective function is again the determinant of a matrix that is dependent on the chosen subset.
As with the network design problem, exhaustive search for D-optimal designs is impractical when the number of candidate runs is sufficiently large. An early approximate algorithm, based on repeated exchanges of rows, was given by Mitchell (1974a;1974b). An exact method using branch-and-bound to improve search efficiency has also been used for problems of moderate size (Ko, Lee, and Queyranne 1995); the key to this method is an upper bound for the solution, found using the eigenvalue interlacing property of symmetric matrices.
Other papers have proposed GAs for finding D-optimal designs. Broudiscou, Leardi, and Phan-Tan-Luu (1996) specifically address the value of GA for this application, and use a GA implementation that searches designs of all sizes. They employ a normalized determinant criterion to prevent overly large designs. More recently, Hamada, Martz, Reese, and Wilson (2001), Hereida-Langner, Montgomery, Carlyle, and Borror (2004), and Mandal, Wu, and Johnson (2006) employ various forms of GA for choosing D-optimal designs in different settings. Importantly, none of these works provide readily available code implementing their algorithms.

Algorithm description
The new subset selection function is called kofnGA(k, n, OF, ...). Its required inputs are the subset size k, the number of candidates n, and the objective function OF. In keeping with common optimization practice, the function attempts to minimize OF. The function OF takes as its first argument a k-vector of indices representing a particular subset.

GA design
This section discusses the main design decisions implemented in kofnGA. A basic understanding of GAs is assumed; see, for example, Michalewicz and Fogel (2004), Talbi (2009), or Holland (1992 for more background on GA design and implementation.

Population
The population is a set of candidate solutions (each solution is an index vector of length k). The fitness of each solution is determined by its objective function value. Since we are minimizing, fitness is based on reverse OF ranking, e.g., for a population of size P , the solution with the ith lowest objective value gets fitness score P − i. Larger sized populations will tend to improve the diversity of the search, especially in the early stages, but with a corresponding computational cost. The default population size in kofnGA is 200.

Selection
At each generation enough offspring are created to completely replace the previous generation. Mating pairs of solutions are chosen by tournament selection. For each new offspring, one parent is chosen by first selecting t solutions at random to form a tournament. The parent is chosen by sampling from this group, with probabilities proportional to within-tournament fitness ranks. The other parent is chosen similarly using another tournament. This form of selection ensures that poorer solutions can occasionally become parents, and gives easy control over the selection pressure. Increasing t puts more emphasis on good solutions and speeds up convergence. By default, the tournament size is set to be one tenth of the population size.

Crossover and mutation
Each new offspring is formed by first pooling the unique indices of the two parent vectors, and then sampling a set of k unique indices from this pool, uniformly at random. After this crossover step, each element of every new offspring has a fixed probability of undergoing mutation. Elements selected for mutation are assigned a new value at random from all of the indices not currently in the vector. Higher mutation rates will improve exploration of the search space, while reducing the convergence rate of the population. The default mutation probability is 0.01.

Elitism
Elitism refers to carrying forward the best solutions from the previous generation into the next generation's population. If the population size is P , let P e be the number of elites to retain. After all mutated offspring are created, their fitness is evaluated. The new population is formed by combining the P e most fit solutions from the old generation with the P − P e most fit offspring. Increasing the proportion of elites retained in the population will accelerate convergence; by default, one tenth of the population is reserved for elites.

Stopping
No stopping rule has been included in the function. The user must supply a number of generations to run (or accept the default value of 500). Setting the number of generations involves a trade-off between run time (which is highly dependent on the complexity of the objective function) and the quality of the eventual solution. In practice a series of short trial runs is usually adequate to determine an appropriate number of generations.
The usual evolution of the population is as follows. An initial sequence of generations sees the best OF value decrease rapidly, with a simultaneous decrease in the diversity of the population. During this period the crossover step is producing a variety of solutions and better regions of the search space are being found. Eventually this period ceases, and the population is quite homogeneous. The best OF value will continue to decrease occasionally, but improvements on the solution become smaller and less frequent. During this latter stage the improvements in the solution are largely due to mutations.

Using the function
The function kofnGA is available in the R package kofnGA (Wolters 2015), which is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/ package=kofnGA. The function's source and help files include detailed comments describing all optional arguments, outputs, and steps of the algorithm. The following trivial example demonstrates how the function is used. Consider the problem of choosing the k = 4 smallest numbers from a list of n = 100 values generated from a standard uniform distribution. The following code generates an instance of this problem.
The following lines create the objective function for the search, call kofnGA with default settings, and display the results of the search. The package includes a summary method (used above), as well as print and plot methods for inspecting the results. From the summary we can see that the globally minimal solution was found early on in the search, and that the final population is completely composed of this best solution (i.e., the search has converged). The default run of 500 generations is overly long for this problem.
The kofnGA function imposes no particular structure on the objective function, other than that its first argument (here, v) be an index vector of length k that encodes a solution. In the above example, the extra argument some_numbers is passed through to ObjFun to allow it to compute the desired sum.
The output of the function is a list with member bestsol holding the indices of the best solution found and bestobj holding the corresponding objective function value. Additional members of this list record details of the search history, and are described in the function's comments and its help file. Optional arguments popsize, keepbest, ngen, tourneysize, and mutprob allow user control of, respectively, the population size, the number of elites, the number of generations, the size of the selection tournament, and the mutation probability. An additional argument initpop can also be used to specify the initial population (which is otherwise randomly generated). This input is useful for continuing a search that is deemed to have terminated too early.
The optional arguments' default values have been chosen to work well with problems of moderate size (for example, n < 300 and k < 20). When the objective function is reasonably efficient to compute, good solutions to problems of this scale can be obtained in seconds or minutes. If further adjustment of the values is desired, the following insights may be helpful.
They are based on experience with a variety of problems.
• Changing the value of popsize is the simplest way to modify the behavior of the search.
Note that in addition to increasing the number of generations needed to obtain good solutions, increasing the population size will also increase the computation time per generation (because the objective function must be evaluated once for each solution in the population).
• Adjusting mutprob is the next most straightforward way to control the search. It is helpful to remember that if the mutation rate is ρ, the number of mutations in an offspring solution is binomially distributed with mean kρ. As most mutations will have a deleterious effect on fitness, generally one should choose mutprob to ensure that mutations occur only rarely (typically, the expected number of mutations per solution will be less than one, so that part of each new generation receives no mutations at all).
• The values of tourneysize and keepbest can also be changed to modify the genetic selection pressure from one generation to the next. The effects of these parameters are similar to those of popsize and mutprob. However, ordinarily it should not be necessary to change them from their default values. The exception to this is when the objective function is costly to compute. In this case it might not be feasible to increase the population size, and one could instead modify the tournament size or the number of elites to influence overall run time.
A characteristic of GA searches is rapid initial convergence to the neighborhood of local optima. This characteristic is observed over a wide range of parameter settings, for a wide range of problems (which is one of the reasons for the popularity of GAs). After this initial rapid descent, however, further improvements can be slow. A significant fraction of a problem's computational budget can be spent waiting for favorable mutations to bring down the objective function value.
This latter portion of the GA search is difficult to influence through the initial settings of control parameters like mutprob, tourneysize, and keepbest. To eliminate late-stage stagnation of the search, adaptive or iterative approaches can be considered. For example, one could periodically alternate the mutation probability, tournament size, or elitism to ensure that the search iterates through "exploration phases" and "improvement phases." In such a scheme, kofnGA can be called repeatedly, passing the search results of one phase on to the next using the initpop argument. The examples in the kofnGA package include a simple demonstration of this approach.

Test problems
The proposed GA was tested on four realistic design problems and three artificial problems.
Among the design problems, only the first is small enough that the true globally optimal solution is known; the size of the other three problems makes exhaustive search impractical.
The three artificial problems are constructed in such a way that the problem has both a very large search space and a known best solution.
Except where noted in the text, all of these examples were run with the same GA parameters: population size 200, tournament size 50, elite group size 50, mutation probability 0.01. The duration of the search was chosen on a problem-specific basis, starting with 500 generations and increasing as necessary for the larger problems.

Environmental monitoring, small problems (solution known)
Chapter 14 in Le and Zidek (2006) contains an example related to ozone concentrations in the state of New York. Among other things, this example calculates a 100 × 100 hypercovariance matrix Λ 0 that is subsequently used for design of a network extension. This matrix was used to test kofnGA for different problem sizes. The negative of the logarithm of the determinant was used to convert the problem into a minimization.
Considering only a subset of the 100 candidate points, and keeping k small, it is possible to find the true optimal solution by exhaustive search, e.g., using the ldet.eval function supplied with the R package EnviroStat (Le, Zidek, White, and Cubranic 2015) accompanying the aforementioned book. This was done for two cases: choosing k = 4 out of n = 64 sites (635,376 combinations), and choosing k = 5 out of n = 49 sites (1,906,884 combinations). The GA repeatedly found the optimal solution for both of these cases. The best solutions are shown on a map in Figure 1.
For the larger of these two problems, the exhaustive search took approximately 30 seconds.
The GA took about 40 seconds to run 500 generations, but the optimal solution was found very early (after tens of generations, or about four seconds).

Environmental monitoring, larger problem (solution unknown)
The ozone example can also be used to examine the performance of kofnGA for larger problems, by using the full covariance matrix and increasing k. Unfortunately the true optimal solution is not available in this case, but repeated runs of the GA can be used to see whether different solutions result. shows the progress of the GA solution over generations, for a single run. Note that the best solution very quickly decreases to be close to its final value. The algorithm was run with the same parameters as for the previous small examples, and again took approximately 40 seconds to complete 500 generations.

A constructed covariance matrix (solution unknown)
To get a hypothetical network design test problem of arbitrary size, it is desirable to be able to generate symmetric positive definite matrices at random. The generated matrix should not have any very small eigenvalues, otherwise numerical problems can result during the search. Ko et al. (1995) tackle this by generating diagonally dominant matrices; here, a method based on QR and eigenvalue decompositions is used.
An n × n covariance matrix Σ is generated as Σ = QLQ , where Q is orthogonal and L is a diagonal matrix with diagonal elements equal to the desired eigenvalues of Σ. The random orthogonal matrix Q is generated through QR decomposition of an n × n matrix populated with standard normal random variables. For the present test problem, values of n = 500 and k = 50 were chosen, and the eigenvalues were set to be evenly spaced numbers from 1 to 10. This problem is substantially larger than the ozone examples, with approximately 10 69 subsets to choose from. A matrix was generated, and the GA search was repeated ten times with a randomly initialized population at each run. The number of generations was increased to 1000 for these runs, but other control parameters were unchanged compared to the ozone problems.
The progress of the best solution over generations is shown for all ten runs in Figure 3. The runs show an unexpected level of consistency, all decreasing to be near their final solution by about 400 generations. Nine of the ten runs returned the same solution, the best one found. The other solution differed from this best solution by only one element, and had the same objective function value to four significant digits. Each run took less than two minutes to complete. Goos and Jones (2011, Sec. 4.2) consider a D-optimal design approach to an experiment involving three quantitative factors (temperature, pressure, and speed) and one categorical factor (supplier). All four factors have three levels, leading to a total of 81 possible experimental runs. An experiment consisting of 24 runs is desired. Restricting attention to designs that do not include replicates, the objective is to choose one design from the 81 24 ≈ 10 20 possibilities, such that the D-optimality criterion is maximized (henceforth we use the equivalent criterion of minimizing − log |X X|).

A D-optimal design problem (solution unknown)
The analysis in the original source used the software product JMP (SAS Institute Inc. 2012b) to find candidate D-optimal designs using a coordinate exchange algorithm (SAS Institute Inc. 2012a). The optimal criterion value quoted in the text is −47.728 on the negative log scale, although the design matrix given in the text does not actually achieve this value. To better assess the typical results from the coordinate exchange algorithm, the design generation step was repeated 20 times using the JMP software, with default settings (which allow the search to re-start ten times from different origins). None of these repetitions reached the quoted objective value. Rather, six of the designs had objective value −47.704, while the remaining 14 designs had objective value −47.653.
Twenty runs of kofnGA were performed, at the default parameter settings, to compare with the JMP results. The progress of the best objective function value over 1000 generations, and the corresponding best objective function values found, are shown in Figure 4. As with the previous network design examples, the objective value of the best GA solution drops rapidly in the first few hundred generations, and subsequently levels off or decreases only slightly due to favorable mutations.
Of the twenty kofnGA runs, four were able to improve on the JMP results, achieving the quoted optimal value of −47.728, which we may surmise is the global minimum. Five others found solutions with objective values of −47.681, which is between the objective values of the two groups of JMP solutions. The remaining nine runs returned solutions of lower quality, having objective values from −46.93 to −47.61. That the GA results showed greater spread than the exchange algorithm results is perhaps not surprising, since the exchange algorithm had the benefit of multiple restarts. Nevertheless, all of the solutions found are practically useful, as every solution had a relative D-efficiency (a comparative measure of solution quality in the D-optimality sense; see Goos and Jones 2011, p. 86) greater than 95%.

Sparse subset problem, version 0 (solution known)
Whitley, Beveridge, Guerra-Salcedo, and Graves (1998) construct a subset selection problem they refer to as the "120-bit sparse subset problem." They describe the problem as a search for the best subset of any size, but here we will aim at finding the best subset of size k = 20 from n = 120 alternatives. Potential solutions are represented as strings of 120 digits, containing 100 zeros and 20 ones. Bits with value 1 represent selected elements. From this binary representation, we may recover the corresponding index vector representation by reporting the locations of the nonzero bits.
Let a 120-bit string be b = [b 1 |b 2 | · · · |b 20 ], where b i , i = 1, 2, . . . , 20 are six-bit substrings referred to as blocks. Define the sequence 000001 to be the target configuration for a block, and denote this configuration by b t . The objective function is defined such that the optimal solution is b * = [b t |b t | · · · |b t ], that is, the string with all 20 blocks having the target configuration. This solution can be expressed as the index vector v * = [6, 12, 18, . . . , 120].
The objective function will be denoted by SS 0 (b). It is described in detail here since the next two test problems are more general variants of it. The problem is described as a maximization, i.e., the sign will be changed when it is to be solved by kofnGA.
For ease of exposition, let bits taking value 1 be referred to as 1-bits. For a given solution b, let s L be the number of 1-bits in the left half of the bit string; that is, in positions 1 through 60. Similarly take s R to be the number of 1-bits in positions 61 through 120, the right half.
Computation of the objective proceeds as follows: 1. Set SS 0 = 0. The value of SS 0 (b) is determined by the cumulative block score, plus any target bonuses, minus the left and right penalties. To make this more concrete, consider the following candidate solution: b = 000100 001010 000010 111111 000100 101000 000010 000000 000001 000000 000010 000001 001000 000000 000000 010000 000000 000010 000000 000000 .

For each block
The left half of b is shown on the first line, and the right half is shown on the second line. White space has been inserted to make the six-bit blocks visually more distinct. Two of the blocks, b 9 and b 12 , have the target configuration; these have been underlined. For this example, the total block score accrued by the 20 blocks is 62; the total target bonus is 18; and the left half of the string, having 15 1-bits, incurs a penalty of 30. Thus the final fitness value is SS 0 (b) = 50. By comparison, the optimal solution, with 000001 in each block, has SS 0 (b * ) = 240. To see why b * is optimal, first define a block with value 000000 to be empty, and a block with two or more 1-bits as a surplus block. Then note that, because there are a total of 20 1-bits and 20 blocks, there must be at least as many empty blocks as surplus blocks in any given solution. For any given solution, either of the following two operations will always improve the fitness value: 1. If there are any surplus blocks, move one of the 1-bits in that block to the rightmost position of an empty block, producing a target block.
2. If there are any blocks with a single 1-bit not in the rightmost position, replace this block with a target block.
Repeatedly performing these operations until no more moves are possible will produce the solution b * , with maximal fitness.
The kofnGA search was replicated ten times using −SS 0 (b) as the objective function. Figure 5 illustrates the solution progress over 500 generations. All of the runs found the true optimum, and did so in less than 100 generations. Each run took about 53 seconds for the full 500 generations.

Sparse subset problem, version 1 (solution known)
A simplified version of the sparse subset problem can be used to devise problems of various sizes, where n is a multiple of k. As with the original version, the solution is viewed as a binary string for the purpose of function evaluation. This time, however, the string consists of k blocks, each of size r (so n = rk). As before, the target configuration b t is a block with only a single 1-bit, in the rightmost position; the optimal solution b * has all blocks in the target configuration.
In this version of the problem the left and right penalties are eliminated and the objective function depends only on a block score and a target bonus. To write the objective function in a general way, let s i to be the number of 1-bits in block b i , and let N t be the number of blocks matching the target configuration. Then the objective function is where ϕ 1 (s i ) is the block score for b i , computed as follows: A block with no 1-bits in it (s i = 0) contributes a score of zero, and any block with s i > 0 contributes a score of r − s i . This means that the maximum possible block score is r − 1, achieved by any block containing a single 1-bit. To create a unique solution, a bonus of 1/k is added for each block matching the target configuration. Thus the optimal solution, consisting of k target blocks, receives a fitness of SS 1 (b * ) = k(r − 1) + 1, while any solution with exactly one 1-bit per block receives a fitness of at least k(r − 1). As an index vector, the best solution is v * = [r, 2r, 3r, . . . , kr]. Once again, the negative of this objective function is used in the actual optimization.
To illustrate this construction, consider a problem with five three-bit blocks (k = 5 and r = 3). In this case we are searching for best sets of size 5 taken from 15 options. The target configuration is 001. A solution can be encoded as a five-element vector or as a 15bit string. Consider, for example the solution v † = [2,4,5,8,12] and the optimal solution v * = [3,6,9,12,15], with corresponding binary representations b † = 010 110 010 001 000, b * = 001 001 001 001 001, where blocks matching the target configuration are underlined. The fitness values of these two solutions are SS 1 (b † ) = 7.2 and SS 1 (b * ) = 11. Equations 1 and 2 imply that non-target blocks containing (0, 1, 2, 3) 1-bits contribute scores of (0, 2, 1, 0) to the objective, while the specific block 001 contributes a score of 2.2. The optimal solution's function value is 11, and any other solution with a single 1-bit per block achieves a value of at least ten.
The kofnGA function was tested on a much larger instance of this problem, with k = 100 and r = 10, that is, taking subsets of 100 from 1000. The best solution has (negative) objective function value −901, and any other solution with a single 1-bit in each block has a function value below −900. Incidentally, note that the GA bases its selection on fitness ranks, not on actual fitness scores, so the magnitude of differences between solutions does not matter.
The number of generations was expanded to 2500 for this trial, since the solution space is now extremely large (approximately 10 140 combinations). This caused each run's execution time to increase to about 11 minutes. Figure 6 shows the results. The left plot in the figure shows the best solutions for ten runs over all generations, and the right plot focuses in on generations 1000 to 2500. By 1000 generations, all of the runs had settled on the solutions having only a single 1-bit per block. For the subsequent generations, improvements were still being made, with small drops of magnitude 1/k whenever the number of optimally-configured blocks was increased.
By the end of the 2500th generation, none of the runs had found the true optimum. Note that once the population has settled down to the set of best sub-optimal solutions, the GA is performing much like a random exchange algorithm. The solutions can be expected to improve with longer runs until eventually the true best solution is found. This behavior was observed in other trials for somewhat smaller versions of this problem, say with k = 50 and r = 20.

Sparse subset problem, version 2 (solution known)
A more difficult variant of the previous problem can be obtained by modifying the block score formula and the target bonus. In this version of the problem solutions still consist of k blocks of size r, and the target block and optimal solution are unchanged. However, the objective function is where and M = (2r − 1) k r + ϕ 2 (mod(k, r)) , with · and mod(·, ·) denoting the floor and modulo functions, respectively.
Consider the block score ϕ 2 (·) first. While SS 1 gave the largest block score to blocks having a single 1-bit, SS 2 assigns greater scores to blocks having more ones. If the block size is r = 5, for example, blocks with (0, 1, 2, 3, 4, 5) 1-bits get scores of (0, 1, 3, 5, 7, 9), respectively. Furthermore, we see that for any p and q nonzero such that p + q ≤ r, the inequality ϕ 2 (p) + ϕ 2 (q) < ϕ 2 (p + q) holds. This implies that if a solution includes two nonempty blocks b i and b j with r or fewer 1-bits between them, it is always advantageous to move all of the 1-bits into a single block and leave the other block empty.
The consequence of this is that the largest possible value for the total block score is achieved whenever the maximum possible number of blocks are full of 1-bits, with the remaining 1-bits also all together in the same block. Without loss of generality we may think of this as the situation where the first k bits in the string take value 1, with the rest zeros. This will result in the first k/r blocks being completely full, and the next block containing mod(k, r) 1-bits. The combined block score for this configuration is M, given in Equation 5.
The target bonus (M + 1)/k − 1 was chosen to force the same sparse solution b * to be optimal. Note that when every block has only a single 1-bit in its last position, k i=1 ϕ 2 (s i ) = k i=1 ϕ 2 (1) = k and N t = k. Then, referring to Equation 3, So by construction the fitness of solution b * exceeds M by one, where M is the largest objective value achievable without any target bonuses.
This version of the sparse subset problem is harder than either of the other two because the optimal solution is sparse, with one 1-bit per block, but most of the near-optimal solutions are dense, with many full blocks and many empty blocks. This makes it difficult for local modification operators like the GA crossover and mutation operators to produce a jump from a near-optimal solution to the neighborhood of the true optimum (the concept of a neighborhood has not been formalized here but one can, for example, use the Hamming distance to measure similarity of solutions).
The kofnGA function was run on this version of the sparse subset problem with the same problem size as the previous section, k = 100 and r = 10. Again, the number of generations was set to 2500, but this time the population size was increased from 200 to 300. This change further increased the run time (to about 17 minutes per trial). Ten repetitions were run as in the previous examples. Figure 7 shows the results. As with previous examples, the best solutions dropped quickly to be close to the optimal value. In this case, however, the algorithm was not able to reach the sparse solutions. Rather, the best solutions had 1-bits predominantly in clumps, just as expected from consideration of the objective function. The lower plot in Figure 7 shows all of the best solutions found, using a matrix of dark and white pixels to represent the 1000element bit strings. Each row of this image represents one run's best solution, and the selected locations are plotted as dark pixels. The grouping of the selected locations is evident. The optimal objective function value for this case is −191. Any solution having all its ones in blocks of ten has an objective value of M = −190. Of the ten solutions returned, three had objective values of −189, six had −188, and one had −187.

Final remarks
The kofnGA function performed well in the test problems. For the environmental network design problems and the original sparse subset problem, either the optimal solution was found, or the GA detected the same solution repeatedly when started from different initial conditions.
For both the D-optimal design problem and version 1 of the sparse subset problem, nearoptimal solutions were found easily, and it appears that the true optimum could be found given sufficient computing time. That said, the algorithm becomes less efficient once it gets near the optimum. At that point the population tends to be quite homogeneous, and improved solutions are found primarily through mutation. This suggests it would be advantageous to switch to using other local search methods (like a greedy exchange algorithm) once the rate of progress of the GA becomes slow.
The final problem (version 2 of the sparse subset problem) is very challenging, and we may conjecture that other combinatorial optimizers would also have difficulty with this problem.  Figure 7: Results of ten runs of the modified sparse subset problem (version 2, with n = 1000, k = 100). Top: best solution versus generation. Bottom: binary string representation of the ten best solutions found. Each row of the bottom figure has a dark pixel in the location of a selected index.
Furthermore, this problem might not be representative for the maximum-determinant search application. This is because covariance matrices tend to have some degree of "smoothness," in the sense that subsets that are nearly the same will not differ by too much in their determinant.
Solutions near the global optimum should also have near-optimal objective function values.
This statement about the smoothness of the problem is actually a conjecture at this point, but it might be possible to prove (or find), for example, some bounds on the maximal difference in determinant between principal submatrices differing by only d columns. The tools used by Ko et al. (1995), i.e., eigenvalue interlacing property, submodularity of the objective function, and so on, should be applicable here. Establishing such results could provide some theoretical basis for the use of the GA or other algorithms involving local moves between nearby solutions.
The run time for executing kofnGA was not prohibitive for any of the example problems. Nevertheless, more rapid execution times are always desirable to expand the user-friendliness of the function and the range of problems it can handle. While at present the function is written wholly in R, in future releases the use of compiled code will be explored as a means to speed up the slowest parts of the algorithm. For all but the simplest cases, however, the objective function itself is by far the dominant factor in determining execution time. Therefore users in search of speed improvements are encouraged to optimize their objective functions as much as possible.