Repositório ISCTE-IUL

A novel approach is proposed for the NP-hard min-degree constrained minimum spanning tree (m d -MST) . The NP-hardness of the m d -MST demands that heuristic approximations are used to tackle its intractability and thus an original genetic algorithm strategy is described using an improvement of the Martins-Souza heuristic to obtain a m d -MST feasible solution, which is also presented. The genetic approach combines the latter improvement with three new approximations based on diﬀerent chromosome representations for trees that employ diverse crossover operators. The genetic versions compare very favourably with the best known results in terms of both the run time and obtaining better quality solutions. In particular, new lower bounds are established for instances with higher dimensions.


Introduction
Let G = (V, E) be a connected weighted undirected graph, where V = {1, . . ., n} is the set of nodes and E = {e = {i, j} : i, j ∈ V } is the set of m edges.Positive costs, c ij , are associated with each edge connecting nodes i and j.For graph models, a common optimisation task involves finding a connected acyclic subgraph that covers all the nodes of the graph: a spanning tree.In the following, T = (V, E T ) denotes a spanning tree for G, with E T ⊆ E. deg T (i) is the degree of a node i ∈ V , i.e., the number of edges with node i as an end point.In the following, only connected graphs are considered.
The general min-degree constrained minimum spanning tree (md-MST) is defined as follows: given a positive integer d ∈ N, find a spanning tree T for G with the minimal total edge cost1 such that each tree node either has a degree of at least d, or it is a leaf node (a node with degree one).The solution tree is called feasible or admissible and the same designation represent each one of its nodes.Examples of feasible and unfeasible md-MST trees are given in Figures 1 for the graph G1 defined in Appendix A. The md-MST problem was first described by Almeida et al. [1] and it was proved to be an NP-hard problem for n/2 > d ≥ 3 [1,2].In order to overcome some of the computational difficulties encountered, Martins and Souza [3] designed new algorithmic approaches based on variable neighbourhood search (VNS) metaheuristics transformed for the md-MST and an enhanced version of a second order repetitive technique (ESO) to guide the search during several phases of the VNS method.They also presented an adaptation of a greedy heuristic based on Kruskal's algorithm for determining minimal spanning trees.
Akgün and Tansel [4] considered a new set of degree-enforcing constraints and used the Miller-Tucker-Zemlin sub-tour elimination constraints as an alternative to single or multi-commodity flow constraints for the tree-defining part of the md-MST formulations.Martinez and Cunha [5] proposed new formulations for the md-MST problem and presented a branch-and-cut algorithm based on the original directed formulation, obtainning several new optimality certificates and new best upper bounds for the md-MST.Murthy and Singh [6,7] published the only other known evolutionary approach by introducing Artificial Bee Colony and Ant Colony Optimisation heuristics, both tested using Euclidean and random instances for use with Steiner Tree problem instances.
The md-MST requires the computation of a MST with nodes that obey certain degree restrictions.The classical algorithms to construct minimal spanning trees are Prim's algorithm and Kruskal's algorithm (KA).Prim's algorithm [8] starts with a two node tree that contains a minimal edge, and employs a greedy search to build the tree, ensuring that an acyclic tree is obtained.Using a heap as the underlying data structure, this algorithm has a total time bound of O(m + n lg n), where m is the set of edges and n is the set of nodes.KA [9,10] also uses a greedy technique but works with forests.Starting with the forest of all the nodes, the algorithm iteratively chooses the cheapest edge to join two disjoint nodes until it obtains the complete tree.This algorithm has an asymptotic time bound of O(m log n) (assuming that the list of edges is already sorted by cost).The MST algorithm used in our approach is the KA.The rationale behind this choice is due mainly to its good numerical performance with generic dense graphs [11] but also because it can be modified easily for our algorithms.
Due to the NP-hardness of the problem, an exact algorithm is not usable because of the inherent memory limitations.Thus, a genetic algorithm heuristic is presented, exploring new codings for the candidate spanning trees and operators.The remainder of this paper is organised as follows.Next section begins by summarising the method of Martins and Souza [3] for obtaining feasible spanning trees for the md-MST, in order to explain the original computational improvement -MSHOI -of the previous heuristic for generating feasible md-MST trees.
In Section 3, a genetic-based approach to the md-MST problem is introduced, with formulations of three different GA versions.Section 4 reports the results of several computational experiments and the relative efficiency of the different heuristics is discussed.Finally, Section 5 presents concluding remarks as well as suggestions for possible directions for future research.
2. Improvement of the existing heuristic approach to the md-MST

MSH
Martins and Souza [3] presented a heuristic algorithm (MSH), which uses a modification of Kruskal's algorithm to build a MST, thereby ensuring the feasibility of the spanning tree but without ensuring its optimality.It is based on evaluating the need values in each step of the KA, i.e., the number of edges that need to be "added" to unfeasible nodes: the nodes i where deg The total number of edges in a spanning tree is n − 1.If a given candidate tree in the forest (F ) built by KA has k edges, then n − 1 − k edges are still needed to obtain the spanning tree.An edge can be included in a tree only if the need value is less than 2(n − 1 − k) after edge inclusion; otherwise, it is not necessary to consider this edge ever again [3].Thus, in every intermediate step, MSH computes the overall need value for a forest: total need(F ) = where need(T ) = ∀i :1<deg T (i)<d 1.After each KA iteration, the total need value of the forest and the individual tree need values are re-evaluated until termination by obtaining a complete feasible tree.For each T 1 , T 2 ∈ F that are required to be joined, the new tree topology depends on the number of nodes in the tree T 1 : 1. Size one tree (single node and zero edges): need(T 1 ) = 1.After the connecting edge has been added, the new tree T has this node as a leaf node, so it is admissible for T .
2. Size two tree (two nodes and one edge): need(T 1 ) = d − 1.For the new tree T , at least one of the two connecting nodes in T 1 or T 2 becomes an internal node, and thus there is a d − 2 need.

Other cases:
If the sum is null and this is not the final tree, it is necessary to add one edge to connect with another.

MSHOI -Computatinal improvement of the MSH
As described previously, the MSH's requirement to compute all of the need tree values at each iteration fundamentally determines the complexity of the algorithm.However, the overall efficiency can be improved by modifying the algorithm so it evaluates only each new need value iteratively based on the previous values.We refer to this improvement as the MSHOI heuristic.
It should be noted that at each step k of the MSH, a pair of trees in the forest F k , T 1 and T 2 , are joined to form a larger tree T .Therefore, the new total need of the forest can be evaluated only by using the knowledge of T 1 and T 2 (which are removed from the forest) and the new tree T .Since now we also use the another tree T 2 besides the T 1 , there's a total of six possible cases (excluding symmetry) to be considered.
Case 1 + 1: Trees T 1 and T 2 both have size one.Tree T will have two nodes of degree 1 and one edge, and thus need(T ) = d − 1.The new forest F k+1 has a total need of: Case 1 + 2 and Case 2 + 1: Assume that T 1 has size one and that T 2 has size two, and need = d − 1.Then, need(T ) = d − 2 since T will have one degree 2 node, and thus it is unfeasible.
The total need changes to Case 2 + 2: Trees T 1 and T 2 both have size two.Then, two of the nodes of T will have degree 2, and need(T ) = 2(d − 2).Thus, The last two cases imply exactly the same change in the total need so they can be aggregated.All of the previous cases yield a new tree T with unfeasible nodes.
The resulting tree may become feasible when we combine a tree with size three or more with another.This is a special case, so a new variable is introduced, inadm Ti , to represent the sum of needs for the nodes of a tree T i .After joining T 1 with T 2 , if the number of needs for T is zero but it is not a complete tree (has less than |V | nodes), then the need of T will be 1; otherwise, it will be equal to new inadm, where new inadm is calculated in the following way.
Case 1 + 3 and 3 + 1: Consider T 1 of size one joining T 2 of size 3 or greater.
Then, need(T ) = need(T 2 ).However, three different situations may occur depending on the node n 1 of T 2 used for joining: 1.If deg T 2 (n 1 ) = 1, then it becomes an unfeasible node by increasing its degree and the new unfeasibility is new inadm = inadm T2 + d − 2; In the case where the degree of n 1 is less than d, the general unfeasibility is reduced by 1 to become new inadm = inadm T2 − 1; Again, the node n 1 to which T 1 will be connected influences the feasibility of the new joint tree.
1.If n 1 had degree 1, the new unfeasibility is new inadm 2. If the degree is less than d, the unfeasibility is reduced by 1 and increased by d − 2 or new inadm = inadm T2 + d − 3.
3. If the degree was greater than or equal to d, the node is already feasible and so the change in unfeasibility is increased by d − 2 compared with joining with T 1 , i.e., new inadm = inadm T2 + d − 2.
Case 3 + 3: Two trees of size greater than 2 are joined.In this case, the sum of needs will be calculated separately for each.For T 1 and depending on the node n 1 to which T 2 is connected, we have the same possibilities described in (4) when changing inadm T2 for inadm T1 .The new unfeasibility for T 2 , new inadm T2 , is calculated in a similar manner and for the new tree T , we have, new inadm = new inadm T1 + new inadm T2 .
Determining the new total need (after joining trees T 1 and T 2 ) employs the same iterative procedure, thereby proving the following new result.
Theorem 1.For each iteration of the MSH algorithm where 2 sub-trees, T 1 and T 2 , are joined into a new tree T , the new total need is equal to subtracting the previous total need value of the associated needs for T 1 and T 2 plus the need of the new tree T .
This implementation results in a few code decision instructions, which are independent of the size of the graph being processed.Consequently, this improvement does not require additional usage of resources and the general algorithmic complexity remains the same as that of the original MSH algorithm.
The genetic algorithm approach needs to insure that feasible trees are generated for a successful evolution phase.Therefore, in the following, the improved MSHOI method is used for spanning tree construction.

GAs for the md-MST
The proposed approach relies on genetic algorithm (GA) metaheuristics, which are used to investigate a large number of different types of optimisation problems.This class of algorithms is inspired by the Darwinian process of evolution by natural selection [12,13].A GA aims to mimic the evolutionary process of species by starting with an initial population of randomly generated candidate solutions, where each individual is represented by a chromosome (its genotype) and each step (or iteration) involves the evolution of the population of candidates guided by a fitness function.This type of heuristic approach is used in the combinatorial optimisation of NP-hard problems [14], where the fitness function is usually referred to as the cost function.The evaluation of the cost function is performed based on the phenotypes, i.e., the individual chromosomes in the population, which is also known as the search space.
The same basic evolutionary strategy is used for all of the genetic variants described in the following, i.e., the classical GA using a predefined number of generations as the stopping criterion.until termination criterion is satisfied.
In this genetic approach, selection is mostly elitist.Thus, the selection mechanism retains 50% of the elements with the greatest fitness from the previous evolutionary population and replaces the least fit 50% with the best children of the new offspring.The general reproductive plan of the evolutionary algorithm, i.e, the evolution strategy after crossover is the (µ + µ)-ES [15].
Chromosome mutations are controlled by a random function where there is only a low probability of a mutation occurring.

Fitness function
To evaluate the fitness of a tree, the most obvious choice would be a linear combination of the cost of the tree T and a measurement of its unfeasibility as a penalty function.The latter can be defined by adding the difference between any unfeasible node's degree and the desired value d, c na (T ).For each candidate tree T , both costs would then be combined to obtain a fitness function by using the parameter α in a convex combination: α can be set at 0.9 and decreased gradually, thereby forcing infeasible solutions to be rejected increasingly.However, intensive computational tests have shown that the GA versions using this fitness function seldom find admissible trees for values of d above 8 or 9.In order to obtain an effective GA approach, we decided to use the MSHOI heuristic (Section 2.2) so that admissible tree candidate solutions (chromosomes) are always build.Therefore, the fitness function needs no penalty evaluation and is simply taken as the total edge cost of the tree,

Chromosome representations and operators
Based on a thorough investigation of previous studies, we only found two suitable chromosome representations for trees.The first proposes the use of Prüfer numbers, where according to the constructive demonstration of Cayley's formula discovered by Prüfer [16], every such number represents a different spanning tree.Nevertheless, despite its general use, it was argued [17] that this is a poor choice for the implementation of GAs because small changes in the chromosome might cause large differences in the corresponding spanning tree.
The second was suggested by Raidl and Julstrom [18] who used a completely different representation based on the vector of node weights introduced by Palmer and Kershenbaum [19].In the following, we describe three different encodings of candidate spanning tree structures with two original representations.

Version gen0 -Using node weights
The authors of [18] suggest the use of a vector based on the weights of the nodes since it can influence the performance of Kruskal's algorithm.The vector is initialised randomly with weights w i , ∀i ∈ V .When MSHOI is used to generate a feasible tree, the w i and w j values of each of the edge's {i, j} extreme nodes will be temporarily added to the current costs of the edge weights: To improve efficiency, the edges must be kept in an ordered list.The edges then need to be re-sorted for each candidate solution.A bucket sort2 linear sorting algorithm is used due to the unusual number of orderings required and because the range of the weights of the edges (known a priori) is limited.
For each generation, the reproduction operator alternates between uniform crossover, where each weight is copied randomly either from the father or the mother, and blending with extrapolation.In the latter process, the weight w child of the new chromosome (child) is obtained by the linear combination of the parents' weights as: Next, each element of the chromosome can be mutated according to a given mutation probability parameter by the addition or subtraction of a random amount relative to the respective weight.

Version gen1 -Using the leaf set
A different chromosome representation involves using a set of randomly selected leaves.In this case, an array of bits is used to represent the edges in the set.For this version, we need to modify KA in order to avoid choosing edges whose leaves are already joined in the tree.However, not using these might prevent the construction of a spanning tree for non-complete graphs.To overcome this problem, the algorithm goes through the edges again, but without excluding any this time.This phase is never used in a complete graph because all of the leaf nodes are connected to central nodes.
In general, it should be noted that this representation will not guarantee the depiction of all the possible existing trees.For instance, if we consider a graph with eight nodes where the first six are leaves (Figure 2), we need to find a tree with the minimum internal vertex degree d = 3. KA is a greedy strategy, so it always chooses the edges with lower costs to connect leaves to internal nodes, but this will generate a non-admissible solution for the m3-MST problem.For the graph G 2 , KA will build a tree where five of the leaves are connected to one 1 2 3 4 5 6 7 8 internal node of the tree and the remaining internal nodes will be connected with only one leaf, and thus it has an unfeasible degree of 2. However, all of the solutions generated are admissible trees because we use the MSHOI to build the trees.
The gen1 version uses uniform crossover, where each bit is copied at random either from the father or the mother.Before adding the child to the population, mutations are applied with a low probability by flipping bits in the chromosome.

Version gen2 -Using the Edge set
This version represents the chromosome by storing the set of edges in the tree as an array of bits.
When we generate a random set, the probability of hitting a spanning tree is low.In fact, for a complete graph with n nodes and m edges, there are 2 m = 2 n 2 −n/2 possible sets of edges.However, only n (n−2) are spanning trees.
For instance, in a complete graph with 25 nodes, only 25 23 out of 2 300 possible sets represent a spanning tree (only one out of 1.43E + 58).Again, KA is employed to overcome this drawback.When a randomly generated set that does not represent a spanning tree, a second phase occurs where KA is repeated but considering all of the edges this time.Unlike the previous version, even for complete graphs this version generally needs to use the second phase to complete the tree.
For gen2 version, reproduction and mutation are alternated between generations.Uniform crossover is used for each even generation, whereas for the odd generations, every individual in the population except the best is subjected to mutations with a low probability.The mutations are implemented by flipping bits in the chromosome.

Computational tests
For comparison with the works of Martinez and Cunha [5] and Martins and Souza [3], the experiments were performed using exactly the same instances as the test-bed.In particular, the three classes of instances, CRD, SYM, and

Parameters: study and evaluation
The behavior of any GA is affected by various parameters associated with the genetic operators.Those with major effects on the performance comprise the population size (both the original and evolved population sizes), number of generations, and mutation rate.The first two parameters are crucial because their product is an important measure of the computational efficiency of a GA, and factors such as genetic diversity can be obtained using specifically devised strategies for the crossover and mutation operators.The optimal values are unknown for the md-MST, so we must rely entirely on empirical tests3 .After trial-and-error experiments, it was clear that the best mutation rate value was 3% (although the difference was not significant for 1% or 2%), which agreed with previous studies.
We studied the number of generations and population sizes and their relationship.However, the major difficulty involved is that the algorithm is not deterministic, so the results obtained from each run depend on the pseudorandom generator employed.To determine whether changing a parameter is beneficial for the GA, a statistical criterion must be used to infer the actual significance of differences in performance.We considered two random variables, X 1 and X 2 , to represent the costs of the spanning trees obtained after executing the two versions of the algorithm compared.Given two samples of each, n 1 and n 2 , with dimensions greater than4 30, where X1 , X2 and the corrected standard deviations are Ŝ1 and Ŝ2 , respectively: If (2) returns a negative value, the difference between the two means is not statistically significant at a level of significance equal to 5% [24].
After several experiments based on evaluations using the significance criterion (2), we decided to generate an initial random population twice the size of the evolutionary population.It should be noted that the number of generations required to obtain the best average value did not vary significantly as the population size increased.Although this increase continually improved the quality of the results, only slight changes occur when the population exceeds a thousand (Figure 3).The improvement was no longer significant so the number  of generations required to obtain the best result did not vary greatly with the size of the evolutionary population.
Unlike the size of the evolutionary population, the initial population dimension did not have any significant effects on the final values obtained.

Comparisons of algorithms' performance
To evaluate the true effectiveness of the various versions of the GAs, their results were compared with the best published previously.Complete comparison tables can be found in Appendices B and C. The genetic algorithm results are compared with results presented by the authors using the same benchmark graph instances (CRD, SYM and ALM), namely Martinez and Cunha (BC) [5] and Martins and Souza (VNS) [3].Albeit not sharing the same benchmark instance set, GA results are also compared with the ones reported by Murthy and Singh [7], being at the moment the only other known evolutionary approach.
Akgün and Tansel ( [4]) also presented comparative results for their methods over some the benchmark instances.However, having only used some of the smaller instances that only improved the run times previously reported by Almeida et al. [1], the results are not effective for use this comparison.Moreover, this potential partial advantage is lost when compared with the fastest results included in [5].Thus, we will not include Akgün and Tansel results in our analysis.

Genetic versions versus other heuristics
On most occasions, some version of the GA obtained better (or at least equal) results than the best produced by the VNS heuristics of Martins and Souza [3].
The summary in Table 1 shows that the GA approach obtained better values in 66% of the tests and in only two cases with worse results.The BC heuristics of Martinez and Cunha [5] achieved generally lower values than the previous VNS in terms of the final lower bounds obtained, but the new GA versions are still competitive.In fact, comparing only with BC results, although the percentage of better solutions declined to 30%, the GA approach obtained exactly the same values for 70% of the remaining instances, and presented higher values on only 19 occasions.
In the tables in Appendix B, it is shown that the best value found by the genetic versions has better than all of the best VNS values, with only two exceptions: CRD-2 and CRD-3 using d = 5 (Appendix B, Table 7).The GA strategy is always better than VNS with the hardest instances, the ALM class (Appendix B, Table 7).In terms of the gap values, VNS−min GA VNS , the genetic algorithms achieved an average value of 8, 17% over all ALM results.In fact, the differences between the final GA values and VNS approaches were significant in terms of the new lower values obtained by GA (Appendix B).Over all test set instances, the average gap for VNS versus best GA value is 5, 59%.
Comparing only BC and GA results, for the hardest ALM class the average gap value for the better GA results is 4, 52% as detailed in Appendix B. Table 1 shows that 27 new lower bounds were established.It should be noticed that the new GA's lower values are particularly relevant for the hardest of the instances, the 500 nodes ALM instances (Table 2).
Overall, for the benchmark test set, the GA strategy presents itself as an effective heuristic approach for the md-MST problem, and the more so over the hardest of the instances of the set.
In relation with the works of Murthy and Singh [7], Table 3 is self-explanatory: for the hardest of the fixed Euclidean instances used by these authors, both gen 1 and gen 2 always perform better.The genetic version gen 0 was not tested since is was designed to work specifically with integer weights and these instances use reals.The GA versions achieved an average gap of 11, 5% and 10, 5% better performance than ABC and ACO heuristics, respectively (AppendixB, Table 5).

Run times for the GA versions
This section presents an analysis of the time performance of the GA.The reported times are average run times over 64 runs for each of the GA versions, evolving over 3000 generations with a population size of 3000.
The computer used has an Intel Q9550 processor (Core2 2.83 GHz quadruple core) and 4 GB RAM, so slightly slower than the systems used in the works used for comparison [3,5,7].The RAM capacity was of no consequence because the instance's dimensions were rather small and we never needed to use more than a small amount of the overall capacity.Therefore, the results of the performance test are directly comparable with those presented in previous studies.
The running times of the GA versions performance is quite stable for any instance and any number of nodes (Appendix C) when compared to the run times presented by Martins and Souza and Martinez and Cunha [3,5].helps the visualisation of the stability of the time performance for the genetic approach, and the same behaviour was observed in all of the CRD, SYM, and ALM classes.Note that gen0 was the slowest of the three versions.In general, for n ≤ 100, gen2 requires less than half the time needed by gen0 and gen1 is slightly faster than gen2, with only a few exceptions.Excluding gen0, the genetic versions are not affected by the different classes of instances and there is no direct relationship between the run time and increasing d.In contrast, increasing the number of nodes has a direct effect on the run time for any of the genetic versions (Figure 4), which proves that the number of nodes has impact on the run times while maintaining the relative time performance of the three GA versions.Complete graphs were always used so m = O(n 2 ) and the results presented are the worst case example.This explains the higher times required by gen0 because it re-sorts the list of edge weights for each candidate solution (the Bucket Sort algorithm is linear according to the number of edges sorted).

Run times: comparisons with other results
The run times of the GAs depend on the size of the population, the number of generations, and the number of tests for each instance.When these were kept constant, we have shown that the run times were also a function of the number of nodes n. [3] present run times that vary greatly, which is particularly obvious for the smaller instances.For example, for CRD with n ∈ {30, 50}, VNS requires times that range from only a few seconds to almost 12 minutes.

Martins and Souza
For the same class but with n ∈ {70, 100}, the time ranges from less than 5 minutes to almost 2 hours.For the ALM class, the reported times range from a few thousand to several thousand seconds.These run times do not follow clear patterns of variation relative to the instance dimension or parameters, so it is not possible to directly compare the time required by both approaches.For instance, in the case of ALM300-1 using d = 10, Martins and Souza obtained the best solution with a value of 13899 using almost 4 hours.By contrast, the best result found by the GAs with a smaller cost value of 13701 takes about 23 minutes.
Martinez and Cunha report shorter run times for BC [5] with the simplest of the instances in the test data set, but GA performed better for the all remaining instances, i.e., medium to large size and denser graphs (Appendix C).Especially for the larger and harder of the ALM instances class, GA always outperforms BC both in quality as in time, where BC achieves the maximum tolerate iteration time set by the authors.In short, the new GAs generally obtain quality competitive results in much shorter run times compared with both the VNS and the BC heuristics.
In relation with the evolutionary approach of Murthy and Singh, namely the ABC and ACO heuristics, the GA versions average times for the parameters used are always worse than the former: the best averaged times of the GA is around 6, 5% worst then the ABC, and around 3% worst than the ACO averaged times (Appendix C, Table 10).Nevertheless, the quality of the GA results is much better (Table 3).

Conclusions
In this study, we proposed a new algorithmic approach for the approximation of the NP-hard md-MST problem presenting three novel genetic algorithm approaches.An improvement for an existing heuristic procedure for the obtention of feasible md-MST trees (MSHOI) was also described.The results obtained with the new algorithms are quite promising, with computationally consistent run times as the instance dimension increases, unlike previously published direct benchmarking approaches.For these benchmarks, the GA versions achieved lower cost values for over 30% of the instances with competitive and consistent run times.In particular, for the higher instances dimensions, 27 new lower bounds were found, thereby demonstrating that the GA versions provide effective and time efficient solutions to the md-MST problem.Furthermore, when compared with the other known evolutionary approach, using an Artificial Bee Colony and a Ant Colony Optimization heuristics, although more time consuming, the quality of the present GA approach is far superior.However, some questions remain for future research.First, as usual, the GAs could be made more efficient by enforcing greater genetic diversity in the population throughout the evolutionary phase.This could be achieved by measuring the difference between each solution and the best obtained to date and using this difference to favour more diverse solutions.Naturally, it is difficult to design a difference function that is both useful and fast.Second, on several occasions, we found that an optimal parameter or strategy could not be selected because the performance depended on the instance of the problem tested.It might be interesting to implement an approach that runs diverse genetic strategies with the option of dynamically adjusting the genetic operators and parameters.Finally, we would like to perform exhaustive testing of this novel GA approach using a more comprehensive data set, ranging from smaller to harder larger dimensional instances and including non Euclidean ones, to facilitate a better evaluation of its empirical computational efficiency.

(n 1 ) increased by 1 3 . 2 :
If the degree of n 1 is greater than or equal to d, the unfeasibility value does not change: new inadm = inadm T2 .Consider tree T 1 of size 2 and need = d − 1, which is joined with a (bigger) tree T 2 .The degree of the node in T used for the connection increases by 2, so it will contribute with d−2 to the new unfeasibility.

Algorithm 1
Genetic Algorithm Initial population random generated using the MSHOI; Sort individuals based on their fitness value; Choose K with the best fitness as the first evolutionary population; repeat Select the parental mating pool for reproduction (M P ); Crossover: use the M P to choose two parents for reproduction to obtain a child; Mutation: decide on the gene mutation for each child; Selection: select individuals to form the next evolutionary population.

3 to 20 .
classes, are classic benchmark instances for testing the performance of algorithms for the degree-constrained problem [e.g.21, 22].For the weights of the edges, the CRD class uses the Euclidean distance between n randomly generated points within a square.The instances used have 30, 50, 70, and 100 nodes.The SYM class can be defined in a similar manner, except the points are generated in a Euclidean space with higher dimension.In this study, we used 30, 50, and 70 nodes.The ALM class represents larger dimension problems with 100, 200, 300, 400, and 500 nodes.These nodes are evenly distributed points in a grid measuring 480 × 640 and the weights of the edges are the truncated Euclidean distances between the points.Murthy and Singh [7] use different test sets, namely Euclidean instances for Euclidean Steiner tree problem available from http://people.brunel.ac.uk/ ~mastjjb/jeb/info.html.These consist of randomly distributed points in a unit square considered as nodes of a complete graph, whose edge weights are the Euclidean distances among them.The minimum bound on the node degree restriction d used as a control parameter depends on the size and of the instances, with values ranging from All the graphs were complete, and thus m = n 2 − n.

Figure 4 :
Figure 4: Comparison of the average run times for the GAs over the ALM class instances using d = 10 for increasing number of nodes.

Table 1 :
Summary of the performance comparisons: GA approach versus Martins-Souza's final results (VNS) and Martinez-Cunha (BC).

Table 2 :
BC and VNS best values versus minimum GA versions value performance evaluation

Table 3 :
[7]versus ACO and ABC ([7]) for the hardest Euclidean Instances (best and average results for each instance).

Table 4 :
Run times (seconds) for the genetic versions on CRD and SYM instances (graphs with n = 50 nodes).

Table 6 :
BC and VNS versus GA: performance evaluation -ALM class.

Table 7 :
BC and VNS versus GA: performance evaluation -CRD and SYM classes.

Table 10 :
[7]rage run times (seconds) of gen1 and gen2 for the 250 nodes Euclidean instances compared with ABC and ACO heuristics[7]