3D Fast Convex-Hull-Based Evolutionary Multiobjective Optimization Algorithm

The receiver operating characteristic (ROC) and detection error tradeo ﬀ (DET) curves have been widely used in the machine learning community to analyze the performance of classiﬁers. The area (or volume) under the convex hull has been used as a scalar indicator for the performance of a set of classiﬁers in ROC and DET space. Recently, 3D convex-hull-based evolutionary multiobjective optimization algorithm (3DCH-EMOA) has been proposed to maximize the volume of convex hull for binary classiﬁcation combined with parsimony and three-way classiﬁcation problems. However, 3DCH-EMOA revealed high consumption of computational resources due to redundant convex hull calculations and a frequent execution of nondominated sorting. In this paper, we introduce incremental convex hull calculation and a fast replacement for non-dominated sorting. While achieving the same high quality results, the computational e ﬀ ort of 3DCH-EMOA can be reduced by orders of magnitude. The average time complexity of 3DCH-EMOA in each generation is reduced from O ( n 2 log n ) to O ( n log n ) per iteration, where n is the population size. Six test function problems are used to test the performance of the newly proposed method, and the algorithms are compared to several state-of-the-art algorithms, including NSGA-III, RVEA etc, which were not compared to 3DCH-EMOA before. Experimental results show that the new version of the algorithm (3DFCH-EMOA) can speed up 3DCH-EMOA for about 30 times for a typical population size of 300 without reducing the performance of the method. Besides, the proposed algorithm is applied for neural networks pruning, and several UCI datasets are used to test the performance.


Introduction
Receiver operator characteristic (ROC) [1] and detection error tradeoff (DET) [2] curves are commonly used to evaluate the performance of binary classifiers in machine learning [3,4]. ROC describes the relationship between true positive rate (TPR) and false negative rate (FNR). High value of TPR and small value of FNR are preferable, however, the performance of TNR and FNR are in conflict with each other. DET curves show tradeoff between false positive 5 rate (FPR) and false negative rate (FNR). ROC convex hull (ROCCH) analysis, which covers potential optimal points for a given set of classifiers, has drawn much attention in [5,6]. More recently, multiobjective optimization techniques became useful for maximizing ROCCH [7,8,9,10,11]. The aim of ROCCH maximization is to find a set of classifiers that perform well in the ROC space. The ROCCH maximization is a special case of a multiobjective optimization problem [7], as the maximization of TPR and minimization of FNR can be viewed as two conflicting objectives, and lie on the surface of the augmented DET convex hull (ADCH). In 3DCH-EMOA the volume above DET surface (VAS) acts as a performance evaluation indicator of population quality at each generation of the algorithm. While dealing with 3D augmented DET convex hull maximization problems [9], 3DCH-EMOA can obtain solutions with good uniform distribution and also has good ability to cover only those points of a Pareto front, from which all other Pareto optimal points can be obtained by simple convex combination. No points are placed in concave parts, such as 35 dents, as this would be a waste of computational resources. In practice, to find a classifier has better performance than that of classifiers in concave parts there is no need to linearly combine two classifiers on the ADCH, as we can select a classifier have good performance on the ADCH by using the iso-performance theory [1]. Experimental results in [9] show that 3DCH-EMOA outperforms NSGA-II [20], GDE3 (the third evolution step of Generalized Differential Evolution) [30], SPEA2 (Strength Pareto Evolutionary Algorithm 2) [31], MOEA/D [21] and SMS-EMOA [23] on 40 the volume above surface (VAS) [32] performance indicator and in the Gini coefficient [33,9] on the size of gapsindicating how evenly distributed points are placed.
Also on application problems 3DCH-EMOA could obtain high quality results. More recently, 3DCH-EMOA has been successfully applied for sparse neural network optimization [9], in which, the performance of neural networks is evaluated in the augmented DET space and the sparsity is defined as the complexity objective to be optimized.
However, 3DCH-EMOA performs worse than several compared methods in terms of computational time, which is when not considering the time required for function evaluations. 3DCH-EMOA revealed high consumption of computational resources due to redundant convex hull calculations. In particular, it needs to build a new convex hull many times, and at each iteration it ranks the individuals in different priority levels. Very recently, several algorithms have been developed for convex hull maximization [10,11]. However, results focused so far on the 2D case and there was a lack of attention to efficient algorithms for the maximization of higher dimensional convex hulls. In this paper, a fast version of 3DCH-EMOA, which is denoted as 3DFCH-EMOA, is proposed to fast the implementation of 3DCH-EMOA by adopting incremental convex hull computation and several new strategies. The average computational time complexity of 3DCH-EMOA in each generation is improved from an average case complexity of O(n 2 log n) to O(n log n), where n is the size of population. For practical purposes, we only consider the three dimensional case because it has many applications [9,28] and still allows the visualization of the convex hull.
In addition, this paper presents several modern algorithms for multiobjective optimization, which were not applied to this problem domain previously. More recently, several studies focused on solving many-objective optimization problems, i.e., problems having four or more objectives [34]. Generally, most of these many-optimization algorithms have better performance than multiobjective optimization algorithms while dealing with tri-objective optimization problems, as many-optimization algorithms have taken the complexity distribution of solutions in high dimensional into consideration. In the experimental section, several state-of-the-art many-objective optimization algorithms are applied to solve multi-objective ADCH maximization problems, including the two-archive algorithm (Two Arch2) [35] which focuses on convergence and diversity separately, the decomposition based algorithms such as NSGA-III [36], the evolutionary algorithms based on both dominance and decomposition (MOEA/DD) [37], and the reference 70 vector guided evolutionary algorithm (RVEA) [38], an indicator based evolutionary algorithm with reference point adaptation (AR-MOEA) [39], and a multi-objective particle swarm optimization algorithm based on decomposition (MPSO/D) [40].
The remainder of this paper is organized as follows. The related work is introduced in Section 2. The details of 3DFCH-EMOA are described in Section 3. Section 4 presents discussion of performance evaluation results of 75 the proposed algorithm and its comparison to the state-of-the-art and previously developed algorithms on six test functions and neural networks pruning problems. Section 5 provides conclusions and suggestions for future work.

Related Work
As it is discussed in [9], the ADCH maximization can be described as a multiobjective optimization problem, and it can be defined by Eq. (1).
where x represents the parameters for a classifier to be optimized, and f 1 , f 2 , f 3 represent FPR (false positive rate), FNR (false negative rate) and CCR (classifier complexity ratio) [9], as defined by Eq. (2).
All functions have a co-domain of [0, 1] ⊂ R. Usually, the points lie on the convex hull surface are non-dominated with respect to each other, but there can be non-dominated points belonging to the Pareto front that are not on the convex hull surface. This is a special characteristic of ADCH maximization problem. The aim of 3DCH-EMOA is to 85 find a set of non-dominated solutions that covers the 3D convex hull surface, since the potential optimal classifiers lie on the convex hull surface.
The convex hull of a set of points is the smallest convex set that contains the points and it is a fundamental construction for mathematics and (computational) geometry [41,42,43]. The 3D convex hull CH of a finite set A ⊂ R 3 is given by Eq. (3), where a i ∈ A. The convex hull can be represented with a set of facets (F) and a set of adjacency ridges and vertices (V) for each facet [44]. Each ridge connects two adjacent facets, which are also called edges in 2D and 3D space. In this paper, we only consider the convex hull in 3D space, and the solutions of 3DCH-EMOA act as vertices on the convex hull surface. For a given convex hull surface, we can obtain its facets, edges and vertices. Several convex hull construction algorithms have been developed in the computational geometry community [41, adds a point to the convex hull of the previously processed points . Three steps are needed to add a new point to an  existing convex hull. Firstly, the visible facets for the new point and the horizon ridges on the visible facets should   100 be found. Secondly, a cone of new facets from the point to its horizon ridges should be constructed. Thirdly, the visible facets should be deleted to form a new convex hull with the new point and the previously processed points.
The computational complexity of the randomized incremental algorithm is analyzed in [48]. It has been proven that random insertions take expected time of O(log n) for 3D convex hulls. The incremental nature of this algorithm makes it attractive to be used in our algorithm.

105
The Quickhull algorithm was proposed in [44]. It has a time complexity of O(n log n) for 3D convex hulls. Empirical evidence was provided to show that the Quickhull algorithm uses less computer memory resources than most of the randomized incremental algorithms and executes faster for inputs with non-extreme points. Even though, the Quickhull algorithm can deal with convex hull with a certain set of points, it does not provide efficient mechanisms for dynamical updates.

110
The aim of 3DCH-EMOA is to find a set of solutions lying on the surface of 3D convex hull, which is constructed with population P ⊂ R 3 (the population is described in objective space) and reference point(s) R ⊂ R 3 . To select the set of reference points for a new or real-world application we should analyze the distribution of solutions in advance. Any effective solutions can construct a convex hull with the reference point together. We define the set of frontal solutions (FS) containing solutions that are located on the boundary of the convex hull, and denote it by Eq. (4).
Similarly, we define the set of non-frontal solutions (non-FS), which is complementary to FS set of solutions located in the interior of the convex hull, and denote it by Eq. (5).
non-FS(P) P \ FS(P) The volume above DET convex hull surface (VAS) is defined as the volume of convex hull CH, and is denoted by Eq. .
VAS is used as an indicator in 3DCH-EMOA to guide the evolution of the population. 3DCH-EMOA is time 120 consuming, due to the Quickhull algorithm running many times in each generation to rank the solutions. In this paper, we treat the procedure of evolution of 3DCH-EMOA as a process of randomized incremental 3D convex hull construction. Several strategies are adopted to speed up 3DCH-EMOA. Details of these strategies are introduced in the next section.

125
In this section, we describe the newly proposed fast version of 3DCH-EMOA, denoted as 3DFCH-EMOA. Several strategies are designed to accelerate the implementation of the algorithm: • Firstly, we propose 3D incremental convex-hull-based (3DICH-based) sorting method, in which the solutions are ranked in two levels at most.
• Secondly, the age of the individuals in the non-FS set is considered for older individuals to be deleted (forgotten) 130 first.
• Thirdly, we proposed a new method to calculate the contribution of each vertex to the volume of the convex hull by building a partial and usually small size convex hull rather than a convex hull composed by all points in the population, as it is done in 3DCH-EMOA.
• Finally, the idea of random incremental convex hull algorithm is adopted to take advantage of the prior convex hull data structure, which helps to reduce computational time by reusing the information of convex hull, rather than to rebuild the convex hull for each iteration, as it is done in 3DCH-EMOA.

3DICH-based sorting
In 3DCH-EMOA the population is ranked into several levels with 3DCH-based (3D convex-hull-based) sorting without redundancy strategy. The redundant solutions here have the same performance in objective space as solutions 140 in the non-redundant set. With the sorting strategy the redundant solutions will be ranked to the last priority level and will have the smallest chance to survive into the next generation. Non-redundant poor performing solutions will have a chance to survive, as the redundant solutions with good performance will be discarded to improve the diversity of the population. The procedure of ranking the solutions into several convex hull fronts is similar to non-dominance classification of the population in NSGA-II. For example, in Fig. 1 the population is sorted in three convex hull fronts,  Since only the solutions on the first level of convex hull (i.e., frontal solutions) contribute to the value of VAS of the whole population, it is not necessary to rank the solutions that are not on the first level of the convex hull, which is computationally expensive and doesn't contribute to VAS. The solutions in FS set obtained by 3DCH-EMOA lie on the surface of the convex hull. To obtain a good result 3DCH-EMOA should find a good approximation of the 150 true convex hull, which not only has a large value of VAS, but also has a uniform distribution of vertices covering the whole convex hull. Motivated by this idea, we designed a procedure of 3DFCH-EMOA to construct an incremental convex hull. In the procedure, we try to insert good solutions into the convex hull and remove bad solutions from it, while keeping the number of vertices on the convex hull equal to or less than the population size.
In this paper, we propose the 3D incremental convex-hull-based ranking (3DICH-based ranking) method. In 155 3DFCH-EMOA the population is classified in two sets, one is the FS set (FSset) that includes solutions on the first level of the convex hull surface (denoted as CH in this paper), the other one is the non-FS set (non-FSset) containing the remaining solutions, i.e., redundant solutions and solutions in the interior of the convex hull not contributing to the VAS and therefore irrelevant to the final solution set. Solutions in FS set are marked in red and solutions in non-FS set are marked in green, as it is shown in Fig. 2. If the non-FS set appears to be empty, the population is ranked into 160 one level only. The algorithm 3DICH-based sorting is described in Algorithm 1. In the algorithm, the population of solutions P and a set of reference points R are given. A convex hull CH is built with points in P ∪ R. The solutions on the surface of CH are ranked in the first level, and the remaining solutions are ranked in the second level. Both of the ranked solution sets and the structure of CH are returned for further use. To rank a new solution in each generation we should judge whether the solution is in or out of the convex hull, which is built with the points in the first level 165 and R. If a new solution that is out of the convex hull then it is first added to the CH and then ranked in the first level, otherwise it is ranked in the second level. We prefer to obtain a solution on the convex hull surface, as it has a chance to be a potential optimal classifier for the final decision. Generally, the time complexity of 3DICH-based sorting is equal to O(log n), where n denotes the number of vertices of CH. When the set of non-FS is empty n is equal to the population size. Algorithm 1 3DICH-based sorting (P, R) Require: P ∅, R ∅, P is a solution set, R is the set of reference points.

Age-based selection
Similarly to 3DCH-EMOA, 3DFCH-EMOA adopts (µ + 1) strategy (i.e., steady state strategy), according to which a new solution is generated and added to the population and a solution with bad performance will be deleted in each generation. Recently it has been shown that the selection of a subset of k (k> 1) points from n points in three dimensional to maximize the convex hull volume is a NP complete problem [49]. This is why the (µ + 1) strategy is 175 favored over a more general (µ + λ)(λ > 1) selection. This yields a monotonically increasing volume. To keep the population of fixed size, a solution should be deleted in each generation. If the non-FS set is not empty, we delete the oldest individual in the set. The age-based selection mechanism for individuals to participate in genetic operations was introduced for steady state strategies in [50,51]. In addition, it was successfully used in Hupkens et al. [24] in the SMS-EMOA (replacing 180 non-dominated sorting). The age of a newly generated individual is set to zero and it is increased by one at each generation. We use the age of individuals in the selection scheme, because it has low computational complexity of O(1) and because more recently generated individuals are more likely to be closer to the non-dominated frontier than older ones [52].
Young individuals are selected to survive in the next generation and the oldest individual is the first element in the 185 queue to be deleted at each generation. The age-based deletion strategy reduces the complexity of individual deletion in 3DCH-EMOA when the non-FS set is not empty. This process is comparably less resource consuming and requires time complexity of O(1). An aging queue (AgingQueue) is defined to store non-FS solutions, in which the oldest individual is always at the head of the queue. The algorithm of age-based selection is described in Algorithm 2.

Fast calculation of VAS contribution 190
If the non-FS set is empty, we delete the solution that has the least contribution to the VAS. To rank the solutions in the FS set the VAS contribution of each solution should be calculated.

3:
Remove the first element in AgingQueue.

4:
non-FSset ← non-FSset \ q. 5: end if 6: return AgingQueue, non-FSset The theory of random incremental convex hulls [48] shows that while inserting or deleting one vertex on the convex hull, most of the vertices keep the same topological structure. Only vertices sharing the same facet with the changed (added/deleted) vertex change the connection with other vertices. As shown in Fig. 3, deletion of vertex 1 in  Fig. 3(b) leads to the convex hull in Fig. 3(a). Only the local structure is changed when a vertex is inserted or deleted.
By comparing the two convex hulls in Fig. 3, we can conclude that with the insertion and deletion only the topology structure of related vertices changes. The related vertices (RV) are defined by the points on the convex hull that share the same facet with the vertex. The relationship of related vertices is denoted by Eq. (7).
where i = 1, 2, . . . , N F , N F is the number of facets of convex hull CH. The algorithm to find the related vertices for a given vertex q is described in Algorithm 3. The time complexity of Algorithm 3 is O(n), where n is the number of vertices on the convex hull. To make the algorithm effective, we preserve the structure of convex hull and the contribution to VAS of all the vertices for each generation. After insertion and deletion in each generation we only update the contribution of related 205 vertices. To update n vertices of the convex hull, an average time complexity in O(log n) is required [41].
The importance of individuals in the convex hull is evaluated by their contribution to VAS, which is denoted as ∆VAS. In [32], the contribution of an individual p is obtained by subtracting the volume of a new convex hull that is constructed without the individual, from the volume of the initial convex hull that includes p. The contribution of solution p is calculated by Eq. (8).

Algorithm 3 Finding related vertices (CH, q)
Require: CH is a convex hull, q is a vertex of CH, N F is the number of facets of CH, F is the set of facets of CH. Ensure: A set of related vertices RV is created.
if p q and p RV then To update the contribution to VAS for each vertex, a new convex hull is built without the vertex. As shown in Fig. 3, most vertices on the convex hull keep the same topological structure with or without the vertex 1, except for vertices labeled 2, 3, 4, 5, 6 and 7, which are denoted as related vertices of vertex 1. We can calculate the contribution of vertex 1 only with each of its related vertices and a reference vertex r. The reference vertex r acts as a vertex of the partial convex hull with related vertices together. Generally, to select a reference point r we should analyze the 215 distribution of solutions first. The fast way to compute the contribution of vertex p is described in Eq. (9).
where r is the reference vertex (r is point (1, 1, 1) in the context of VAS). In the implementation of Eq. (9) the convex hull CH RV(p) ∪ {r} is built first and then vertex p is added to CH to obtain CH RV(p) ∪ {p} ∪ {r} . A partial convex hull with just added vertex 1 and related vertices is shown in Fig. 4(a), another partial convex hull without vertex 1 is shown in Fig. 4(b). The contribution to VAS of vertex 1 can be obtained by calculating the 220 VAS difference between the two partial convex hulls shown in Fig. 4. The approach allows reducing computational complexity especially when the size of the population is large.
CH is a convex hull, q is a vertex of CH, r is a reference vertex. Ensure: VAS contribution of a population q is computed.
The algorithm of fast ∆VAS is described in Algorithm 4. We define the average number of points on the partial convex hull as m.

Incremental convex hull computation
We use CH to denote the convex hull of the population. The information of CH such as facets, vertices and the contribution of each vertex to the volume of the whole convex hull is preserved in the FSset. The (µ + 1) selection strategy is employed in this algorithm. According to this steady state strategy only a new offspring q will be produced 230 at each generation. When q is produced it will be judged whether it is in or out of the convex hull CH. If q is not yet in CH, i.e., q is out of CH, it will be added to CH as a new vertex and be stored in the FSset. If q is inside CH, it will be stored in the non-FSset.
Algorithm 5 Adding a point to CH (CH, FSset, non-FSset, q) Require: CH is the convex hull, FSset ∅, q is the new solution that will be added to CH.
When adding q to the convex hull, some facets of convex hull will be changed, the contribution of related vertices to the convex hull volume will be affected and needs to be updated. Due to the changes of the convex hull structure caused by the introduction of q, the vertices not belonging to the convex hull CH will be removed from the FS-set and added to the end of the AgingQueue. The details of adding a new point q to the convex hull CH are described in Algorithm 5. In the algorithm, the computational time complexity of adding a vertex to CH is equal to O(log n), where n is the population size. The time complexity of finding related vertices is equal to O(n). And the average computational time complexity of updating the contribution of related vertices is equal to O((log n) 2 ). So the average 240 computational time complexity of Algorithm 5 is equal to O(log n).
Algorithm 6 Deleting a point from CH (CH, FSset, q) Require: CH is the convex hull, FSset ∅, q is a solution that will be deleted. Ensure: The contribution to CH and FSset are updated. 1 CH.p.contribution ←Fast ∆VAS(CH, p, r) computation. 8: end for 9: return CH, FSset To keep the population size of the algorithm constant (of size n), an individual needs to be deleted in each iteration. The head element of the AgingQueue will be deleted if the queue is not empty. If the AgingQueue is empty (all individuals are on the convex hull), the individual with least contribution to VAS will be deleted. Then, the convex hull will be rebuilt with the incremental convex hull algorithm and the contribution of each solution in CH will be updated.
• Comparison of age-based selection to random selection of individuals in non-FSset for 3DFCH-EMOA.

280
Several metrics are chosen to evaluate the performance of studied algorithms, including volume under convex hull surface (VAS), Gini coefficient [9], inverted generational distance (IGD) [53], pure diversity (PD) [54] and execution time: • VAS metric can be used to evaluate the performance of algorithms on ZED and ZEJD test functions directly.
The smallest value of VAS is 0, the largest value of VAS is bounded from above by 5/6 for ZED test problems 285 and the largest value of VAS is bounded from above by 0.5 for ZEJD test functions. Generally, the larger the value of VAS, the better performance of the solution set of an algorithm.
• The Gini coefficient was used for measuring the distribution of solutions of evolutionary algorithms in [9]. Generally, the lower the value of the Gini coefficient, the more evenly distributed the solution set.
• The IGD metric is able to measure both diversity and convergence of solutions obtained by EMOAs, and a 290 smaller IGD value indicates a better performance.
• The PD is used as a new diversity metric in [54] to measure population diversity of evolutionary algorithms. A high population diversity leads to large value of PD.
• Execution time is used to measure the computational effort of all algorithms.

Parameter settings
All algorithms use a maximum of 30000 function evaluations, and the population size is set to 100 for all algorithms. The remains parameters are set as the default suggestion by PlatEMO for all algorithms.   The solutions obtained by each algorithm of ZED2 are shown in Fig. 6. The surface of frontal solutions of ZED2 is not continuous, and there is an area of concave on the frontal solutions, which is designed to test whether the algorithms can avoid the dent areas [32]. By analyzing the frontal solutions of ZED2 function obtained by the algorithms we can make some conclusions: 1) The distribution of solutions obtained by each algorithm is similar 315 with ZED1 test problems; 2) MPSO/D perform worse than others on the uniformity; 3) The solutions obtained by AR-MOEA are gathered in the concave area of the frontal solutions; 4) Most of the algorithms except for 3DCH-EMOA and 3DFCH-EMOA can found solutions in the concave area, which makes no sense for ADCH maximization problems; 5) 3DCH-EMOA and 3DFCH-EMOA can avoid sampling solutions in the dent area, which is better because these regions contain only redundant solutions since the goal of ADCH maximization is to find solutions that lie on solutions with good uniformity, however, as it is pointed in [54], a solution set with good uniformity does not necessarily mean that it also has good diversity; 4) 3DCH-EMOA and 3DFCH-EMOA can obtain solutions not only with good uniformity but also with good convergence.
The solutions obtained by each algorithm of ZEJD2 are shown in Fig. 9. The surface of frontal solutions of ZEJD2 is not continuous, and there is an area of concave on the frontal solutions [9]. By analyzing the frontal solutions of 335 ZEJD2 function obtained by the algorithms we can make some conclusions: 1) Two Arch2 performs worse than others on the convergence and uniformity; 2) 3DCH-EMOA and 3DFCH-EMOA can avoid sampling solutions in the dent area.
The solutions obtained by each algorithm of ZEJD3 are shown in Fig. 10. By comparing the frontal solutions of ZED and ZEJD test functions we can conclude that 3DFCH-EMOA obtains results as good as 3DCH-EMOA. Besides, only 3DCH-EMOA and 3DFCH-EMOA can omit the solutions in concave 345 areas, i.e., solutions on the Pareto front but not on the convex hull surface, do not provide better performance of classifiers when compared to those on the convex hull surface [9].
The statistical results of several metrics are listed in the following tables. In these tables the best results obtained are marked in light grey and the second best results are marked in dark grey. The statistical results (means and standard variances) of the VAS are shown in Table 1. VAS is the most important indicator in this study as it measures the size of  by comparing the frontal solutions above. This confirms that 3DFCH-EMOA has successfully inherited the good performance of 3DCH-EMOA. The statistical results of Gini coefficient are shown in Table 2. From the table we can see that 3DCH-EMOA and 3DFCH-EMOA outperform the other algorithms for all of ZED test problems. RVEA performs better than the other algorithms except of 3DCH-EMOA and 3DFCH-EMOA on ZED test functions. While dealing with ZEJD test 360 functions, MOEA/DD, 3DCH-EMOA and 3DFCH-EMOA performs better than the other algorithms. The statistical results of IGD are shown in Table 3. From the table we can see that Two Arch2, 3DCH-EMOA and 3DFCH-EMOA outperform the other algorithms for most of the test problems. Two Arch2 is slightly better than 3DCH-EMOA and 3DFCH-EMOA for most of the test functions. The statistical results of PD diversity metric are shown in Table 4. Two Arch2 has the best diversity metric 365 results, MPSO/D, 3DCH-EMOA and 3DFCH-EMOA have better performance than others on PD metric except for Two Arch2. NSGA-III performs the worst over all these method on PD metric. The statistical results on the execution times are shown in Table 5. RVEA has always the lowest execution time and MPSO/D performs better than the other algorithms except of SMPSO. 3DCH-EMOA cost the most time of all algorithms. 3DFCH-EMOA performs better than 3DCH-EMOA and AR-MOEA. 3DCH-EMOA uses more than 7 370 times as much computational time as 3DFCH-EMOA algorithm, that is to confirm that the new algorithm speeds up 3DCH-EMOA about more than 7 times with the population size 100.
The computational complexity of several algorithms are listed in Table 6, as the complexity of some algorithms was not mentioned in the proposed paper, we mark them as " − " in the table. From the table, we can see that 3DCH-EMOA has the highest computational complexity and 3DFCH-EMOA has the lowest computational complexity. The Table 6: Computational complexity of compared algorithms (with a population of n individuals of three objective optimization problems.
computational complexity represents the rate at the execution time increases with the population increases. Low computational complexity does not mean low execution time, 3DFCH-EMOA costs more time than Two Arch2 in Table 5. The execution time of 3DCH-EMOA and 3DFCH-EMOA with different population sizes will be discussed in the next part. A more comprehensive comparison between 3DFCH-EMOA and other EMOAs is presented in Table 7, which 380 gives the Wilcoxon sum-rank test [8] results for them. It is very clear that 3DFCH-EMOA performs very well over most of these EMOAs on VAS, Gini, IGD, and PD.

Experimental results and discussions
The results of VAS mean are listed in Table 8. By comparing the values of VAS we can see that 3DFCH-EMOA can obtain the same values of VAS as 3DCH-EMOA. We can conclude that 3DFCH-EMOA inherits from 3DCH-EMOA the good performance of 3D ROCCH maximization. The results of mean execution time of 3DFCH-EMOA and 3DCH-EMOA on ZED1 test function are listed in Table 9. Execution time analysis for several population sizes for ZED1 function is shown in Fig. 11. By comparing the results we can see that execution time increases with the increase of population size. The execution time of 3DCH-EMOA increases faster than 3DFCH-EMOA with the increase of population size. In addition, we can find that the 3DFCH-EMOA can speed up 3DCH-EMOA for about 30 times with population size of 300.

Comparison of age-based selection with random selection of 3DFCH-EMOA
In this subsection, we evaluate and analyze the strategies of age-based selection and random selection of individuals in non-FS set. We ran age-based selection and random selection on ZED1 test function for 30 independent runs and recorded the values of VAS in every generation. The average VAS over generations in 30 independent runs is shown in Fig. 12. We found that the age-based selection has a slightly faster convergence rate than the random  Deep neural networks have obtained human-level performance on large-scale classification tasks, however, these deep networks typically contain a large number of parameters due to dense matrix multiplications [55]. Recently, sparse neural networks have been attracted much attention [55], and evolutionary algorithms have been proved to be good tools for neural networks optimization [9,56]. Not only can the computational complexity reduced but also the generalization of neural networks can be improved by neural networks pruning. The diagram of neural networks pruning is shown in Fig. 13. We apply gate variables G s = {g s 1 , g s 2 , . . . , g s m } for sparsifying weight matrices by performing element-wise multiplication of G s with W = {w 1 , w 2 , . . . , w m }, to yield sparse weight matrices W s = {w s 1 , w s 2 , . . . , w s m }, as it is denoted by Eq. (10). w s = w i g s , i = 1, 2, . . . , m It represents an m-layer dense neural network architecture where w s i is a weight metric for the i th layer. The gate metric 415 g s i = {0, 1} n i contains n i elements for the i th layer. The sparsity is defined as the complexity objective to be optimized, as it is denoted by Eq. (11), where 1{·} is the indicator function, so that 1{a true statement} = 1, and 1{a false statement} = 0. A classifier with lower CCR should be prefered, as classifiers with a lower CCR will have a lower tendency of overfitting [9]. In our study we also follow a tri-objective problem formulation for neural networks pruning, as it is denoted by Eq. (12).
Neural networks pruning is a combinatorial optimization problem, in this part, several EMOAs are applied to seek sparse neural networks in augmented DET space.

UCI dataset
In this section, a total of 14 two-class datasets from the UCI repository [57] are used to evaluate the performance of EMOAs for neural networks pruning. Both balanced and unbalanced benchmark datasets are included, details 425 are described in Table 10. For all these datasets, 1/4 of instances are randomly selected as training datasets, 1/4 of instances are randomly selected as validation datasets, and the remains instances are selected as test datasets. The training dataset is used for neural networks pre-training, the validation dataset is used for neural networks pruning, and the test dataset is used for performance evaluation. Seven reference EMOAs (NSGA-III, MOEA/D, RVEA, AR-MOEA, MPSO/D, 3DCH-EMOA, and 3DFCH-EMOA) were tested for neural networks pruning. Experiments were performed with LightNet [58] and PlatEMO [53] in Matlab.

Parameter setting
All algorithms mentioned above are used to optimize a multilayer feedforward network with an input layer with 435 the size of the number of features of each dataset, two hidden layers with 10 and 6 neuron units and an output layer with 2 neuron units. The sigmoid function is selected as activation function in the neural networks. The batch size is set to 5, and 100 epochs are performed for the pre-training stage for each dataset. Encoding: We employed a binary encoding scheme where the chromosome is constituted by an array of 0 or 1, 0 means drop the connection and 1 means keep the connection between two neuron units. The length of the chromosome is n f × 10 + 10 × 6 + 6 × 2, where n f is the number of features of each dataset. In the pruning stage, only validation is performed to evaluate the performance of each chromosome.
Configuration: The seven algorithms are set with a maximum of 20000 function evaluations as the experimental stopping criteria. The binary single-point crossover and bitwise mutation are applied in the experiments. The crossover probability of P c = 0.9 and a mutation probability of p m = 1/ n i , where n i is the number of gate vari-445 ables. The population size is set to 100 for all algorithms. All of the algorithms are run 10 times independently. All of these experiments were run on an IBM X3650 server with Xeon E5-2600 2.9GHz processors and 32GB memory under Ubuntu 16.04LTS.

Experimental results and discussion
To evaluate the performance of these algorithms, we compare the statistical results of time cost and classification vote 9.14e − 01 9.49e−02 9.25e − 01 9.82e−02 9.13e − 01 9.48e−02 9.10e − 01 9.47e−02 9.43e − 01 1.70e−02 9.58e − 01 1.30e−02 9.26e − 01 9.85e−02 9.15e − 1 2.31e−2 wdbc 8.  Table 11 shows the mean and standard deviation of accuracy obtained by these EMOAs for UCI datasets. In this part, the highest accuracy in the population is listed in the table. To compare the results, the average accuracy of these UCI datasets is listed in the bottom of the table. From the table, we can make some conclusions: 1) EMOAs can obtain higher accuracy than neural networks without pruning on most UCI datasets; 2) 3DCH-EMOA and 3DFCH-EMOA 455 outperforms other EMOAs on most UCI datasets; 3) 3DFCH-EMOA performs as good as 3DCH-EMOA on most of these UCI datasets.  Table 12 shows the mean and standard deviation of time cost for UCI datasets. In the table, time cost is computed for neural networks pruning stage only. In the pruning stage, neural networks performance validation is executed for the chromosome evaluation, as the time cost for validation is not time-consuming, the time cost most depends on the computational complexity of each algorithm. From the table, we can see that the new proposed algorithm cost less time than 3DCH-EMOA, as in the neural networks pruning problems 3DFCH-EMOA cost far less computation than 3DCH-EMOA while dealing with solutions in non-FSset by applying age-based selection strategy.

Conclusions
In this paper, we proposed 3DFCH-EMOA, a fast version of 3DCH-EMOA, by adopting the incremental convex 465 hull algorithm and several other evolutionary strategies. To reduce the computational time complexity of an iteration, individuals are only ranked into two levels, one is convex hull level and the other one is non-convex hull level, where age is used as a selection criterion. Besides, a fast computation of the contribution of each vertex to the convex hull volume is proposed. In total the average time complexity of 3DCH-EMOA in each generation is reduced from O(n 2 log n) to O(n log n). Six test function and neural networks pruning problems were used to test the performance 470 of the proposed method. Experimental results show that the 3DFCH-EMOA can speed up 3DCH-EMOA for about 30 times with the size of the population 300, without reducing the performance of the method. Moreover, the benchmark was extended by modern algorithms, such as NSGA-III and MPSO/D.
Alternatively, the computational time complexity of iteratively computing the convex hull, which is O(n ( d/2 +2) ) [59] for d dimensions, could be achieved by iteratively using the gift-wrapping algorithm. However, also here, in-475 cremental algorithms might prove to be useful for obtaining a better average computational time complexity. In the future it would be interesting to derive fast algorithms for more than 3 dimensions.