Selecting services in the cloud: a decision support methodology focused on infrastructure-as-a-service context

Growing demand for reduced local hardware infrastructure is driving the adoption of Cloud Computing. In the Infrastructure-as-a-Service model, service providers offer virtualized computational resources in the form of virtual machine instances. The existence of a large variety of providers and instances makes the decision-making process a difficult task for users, especially as factors such as the datacenter location - where the virtual machine is hosted - have a direct influence on the price of instances. The same instance may present price differences when hosted in different geographically distributed datacenters and, because of that, the datacenter location needs to be taken into account through the decision-making process. Given this problem, we propose the D-AHP, a methodology to aid decision-making based on Pareto Dominance and Analytic Hierarchy Process (AHP). In the D-AHP, the dominance concept is applied to reduce the number of instances to be compared; the instances selection is based on a set of objectives, while AHP ranks the selected ones from a set of criteria and sub-criteria, among them the datacenter location. The results from case studies show that differences may arise in the results, regarding which instance is more suitable for the user, when considering the datacenter location as a criterion to choose an instance. This fact highlights the need to consider this factor during the process of migrating applications to the Cloud. In addition, Pareto Dominance applied early over the set of total instances has proved to be efficient, once it significantly reduces the number of instances to be compared and ordered by the AHP by excluding instances with less computational resources and higher cost in the decision-making process, mainly for larger application workloads.


Introduction
Cloud Computing has emerged as one of the most significant advancements in the field of Information Technology (IT) because of its advantages over local hardware infrastructures for aspects such as agility, elasticity [1], flexibility, and cost. Because of these attractive features, the International Data Corporation (IDC) estimates that spending on public cloud services is expected to reach US$ 370 billion in 2022 [2].
In Cloud Computing, providers offer computing services on a pay-per-use basis [3], providing significant savings in resources related to investment, management, and maintenance of local infrastructure. Software-as-a-Service (SaaS), Platformas-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS) are among the services offered by the providers [4].
Cloud Computing has been attracting interest from the scientific community. One of the identified gaps which still persists is lack of transparency on the part of providers regarding the pricing of Virtual Machine (VM) instances, and the variety (and constraints) of corresponding services when it comes to using these models in the decision-making process [5]. Thus, the decision to migrate is still considered complex due to the immaturity and dynamics of this environment. Even so, in the business field, migration is a strategic decision that can improve performance, productivity and growth, and increase competitiveness [6,7].
In the IaaS model, the prices of VM instances are based on a set of numerical variables (such as the CPU, memory, and storage amounts) and non-numeric variables (such as the VM operating system (OS) and the geographic datacenter location).
To gain a better understanding of the problems associated with selection of services, researchers have followed different approaches. The studies of Li et al. [8], Kihal et al. [9], Menzel and Ranjan [10], Murthy et al. [11], Menzel et al. [12], Emeras et al. [13], López-Pires and Barán [14], Mitropoulou et al. [15], Al-Faifi et al. [16], Nagarajan and Thirunavukarasu [17], and Chauhan et al. [18] used CPU, memory, and storage resources simultaneously as decision variables. However, they did not present mechanisms to reduce the dimensionality of the problem; in addition, many of these studies do not carry out a multi-provider approach, which may influence choice of services. Studies by Li et al. [8], Yao et al. [19], Malekimajd et al. [20], Souidi et al. [21], Menzel et al. [12], and Ziafat and Babamir [22] have considered the datacenter location as a decision variable, although they analyzed it in relation to the performance of the network. Mitropoulou et al. [15] have analyzed it in relation to cost, presenting regression models with low significance.
Datacenter location is one of the factors that most affects the price of instances. In addition, the lack of transparency regarding the variables which affect price differences of instances from different datacenters, plus the wide variety of geographic locations distributed around the world, turn the decision-making process difficult.
As a result, a user who wants to change datacenter for some reason without increasing the cost (for example, to improve latency) may be forced to select another instance with fewer computational resources in order to maintain a financial balance. In such cases, the ideal scenario would be to select an instance that does not have significant price variations in different datacenter locations.

3
Selecting services in the cloud: a decision support methodology… Some important issues may arise while selecting a VM instance. When prioritizing cost, users tend to select lower-priced instances, which may compromise the performance of the application to be migrated, since the instance may not have enough computational resources to guarantee good performance of the application. When prioritizing computational resources, users tend to select instances with large capacity in terms of resources such as the CPU, memory, and storage. However, if the application does not need all the capacity available, the VM tends to become idle, making the migration process expensive. In both cases, in order to make the best decision and optimize the resources involved in the decision-making process, it is important to be aware of the requirements of the application to be migrated, in order for the cost-benefit ratio to be optimized. Therefore, users must have a methodology that facilitates choice of a VM instance by reducing the number of available instances and which allows selection based on their preferences.
The lack of tool support to automate migration tasks is highlighted by Jamshidi et al. [23], whose work consists in the systematic analysis of studies about the migration of legacy applications from local infrastructures to the Cloud. In addition, the authors identify a need for both architectural adaptation and self-adaptive Cloudenabled systems.
In this sense, how can users be assisted during the decision-making process when choosing a provider and a VM instance when they are migrating their applications to an IaaS in the Cloud? Are the computational resources of the VM sufficient for the execution of such applications, considering the lowest price? Furthermore, how can the number of instances be reduced so that users have a smaller set of VM instances to compare, thereby reducing the resources involved in the decision-making process?
In order to assist with this process, we present the D-AHP, a methodology to help the decision-making of a VM instance be more suitable for migration of applications, using the Pareto Dominance concept and the multicriteria approach of the Analytic Hierarchy Process (AHP). For this, we define decision criteria as the price and amount of computational resources that comprise the VM instance. In addition, as a differential, we define different datacenter locations as sub-criteria due to the large influence of this variable on the price criterion.
In summary, we can cite the following contributions: -To propose an easy-to-understand decision-making methodology, allowing an analysis based on the importance level of the decision criteria for the selection of VM instances, assuming as a premise that the prioritization of one criterion in relation to others tends to accommodate the needs of users better; -To analyze the influence of datacenter location on the price of VM instances imposed by providers, proving its importance in decision-making processes; -To reduce the dimensionality of the VM instances selection problem, by reducing the number of comparable instances, thus providing greater agility to the decision-making process.
The remainder of this paper is organized as follows: Sect. 2 highlights related studies; in Sect. 3, the problem modeling is done; Sect. 4 presents details of the D-AHP methodology; Sect. 5 presents different case studies for the validation of the proposed methodology; Sect. 6 presents the conclusions.

Related work
A great number of researchers have focused their efforts on solving problems that involve IaaS selection to assist in the migration of applications to the Cloud by analyzing performance-related variables such as CPU, memory, storage, and cost, from the point of view of both the provider and user [24]. For this, they use different approaches; among them, the use of heuristics [25] and the use of classification and ordering methods. In this section, the related and described studies were divided into two groups, both in the IaaS context. In the first one, we present studies that cover the selection of services and modeling of prices; in the second group, we present studies that use optimization techniques for the selection of services. Based on the related works shown in Tables 1 and 2, we detail only of those that addresses the datacenter location in its proposals.
Generally, studies that deal with IaaS selection processes in the Cloud admit, as the main hypothesis, the reduction of costs, directing more attention to the analysis of the policies adopted by the providers to define the prices of the VM instances. Thus, researchers seek to detect the variables that most influence prices, among which are the computational resources and the datacenter location.
In Table 1, we present a list of related studies and the variables most commonly used by researchers within the context of the selection of services and price modeling.
For Al-Roomi et al. [46] and Mazrekaj et al. [57], despite the attempt to make the pricing policy practiced by providers more transparent through the search for an exact formula, the definition of these prices is made considering several factors, including commercial ones, which makes this task more complex. Others [8,10,11,21,28,29,35,39,40,44,46,47,49,63] [ 16-18, 30, 36, 37, 41, 42, 57-59, 61, 64] Thus, in the case of a problem composed of several variables that involve the selection of IaaS in the Cloud, optimization methods are presented as a good alternative to assist in the service selection process.
Due to the wide applicability of optimization methods in problems of this nature, Alabool et al. [65] carried out a systematic review of studies that propose the use of Multicriteria Decision-Making (MCDM), in order to develop Cloud Service Evaluation Methods (CSEMs). The authors employed the Evaluation Theory to detect deficiencies in each proposal and created an information base so that researchers can better develop their CSEMs.
Hosseinzadeh et al. [66] presented an overview on a set of articles suggesting the use of MCDM methods to develop schemes for services selection in the Cloud. In order to evaluate the proposed method, the authors identified the optimization method, the Quality-of-Service (QoS) criteria, and the set of data and environments utilized.
In Table 2, we present a list of related studies and the optimization method used by each one. It can be noticed that, in multiobjective optimization, the most used method is the Genetic Algorithm, while in multicriteria optimization the most used method is the AHP and its variants.
In relation to the study that addresses the datacenter location in its proposals, Mitropoulou et al. [15] propose the construction of a price index based on a hedonic method of pricing. By using regression models, the authors analyzed a set of factors that affect the final price of VM instances, among them the datacenter location. The authors collected data from providers on a specific platform and analyzed in which regions they offer services, grouping them by continent. However, the models presented low significance.
Menzel and Ranjan [10] and Menzel et al. [12] presented CloudGenius, a framework based on the principles of AHP and the Genetic Algorithm to assist in the migration of Web applications. CloudGenius addresses the decision-making process based on three main goals: lower price, better latency, and better QoS. However, Simulated Annealing [56,61] Others [49,63,68,[95][96][97] [ 16-18, 30-32, 42, 54, 56, 58, 75, 76, 84, 86, 87, 89, 90, 98-102] the authors evaluate the effects of datacenter location only on network performance, regardless of costs. Souidi et al. [21] used as a hypothesis of the selection problem the option of datacenter location based on the location of the user, aiming at a better performance of the network. A similar approach is adopted by Li et al. [8], who assumed that the lower the distance between the datacenter location and the location of the user, the lower the latency of the network. Network latency in geographically distributed datacenters is also analyzed by Yao et al. [19] and Malekimajd et al. [20]. However, in none of these studies, the influence of the datacenter location on the cost of services in the Cloud is verified.
Ziafat and Babamir [22,43] used different multiobjective optimization algorithms for the selection of datacenters considering qualitative aspects such as the distance between datacenter and user, indices of reliability and availability, response time, and lower cost. However, when considering a large number of conflicting objectives, the selection of a datacenter that satisfies all the objectives, despite being considered optimal by the optimization algorithm, tends not to satisfy user needs effectively, since it is not possible to prioritize an objective in relation to another.
In relation to the datacenter location, Marks and Lozano [103] highlight some important aspects to be considered, namely cost, since the same instance which is hosted in datacenters from different locations may present differences in its final price; data transmission, once possible delays may occur (due to the distance between the datacenter and the user), therefore compromising a quality service; and confidentiality of information, as some countries have specific laws regarding data hosting policy within its boundaries.
In Table 3, we summarize a comparison between our proposed approach and other state-of-the-art approaches that analyze datacenter location.
Given this scenario, our proposal differs from those presented because, in addition to analyzing all the resources that compose the instances, simultaneously, it analyzes the datacenter location in relation to the cost aspect. In a complementary way, when comparing our proposal with those that analyze the influence of the datacenter location on the cost, the difference is that it allows the user to prioritize one objective over the others, making it adaptable for the user to define their preferences based on the requirements of their applications. In this way, we are not aware of any similar study in terms of adopted methodology, decision-making criteria, datacenter location analysis in the price of VMs, and combination of the optimization techniques used related to the selection of IaaS in the Cloud.

Problem modeling
When decision-makers (DMs) planning, for strategic reasons, the migration of applications from local infrastructures to an IaaS in the Cloud (for example, due to a cost reduction, market trends or a large volume of data), they should initially know whether it is possible or not. In addition, they need to decide whether to total or partial migration, which implies re-writing all the application or parts of it.

3
Selecting services in the cloud: a decision support methodology… Regardless of the type of migration, DMs need to select an IaaS provider from a set of providers P = {p 1 , p 2 , … , p } capable of offering a better QoS at a low cost. These providers will present an extensive set of instances I = {i 1 , i 2 , … , i n } which are composed of various computational resources R n , such that: where Q CPU , Q MEM , and Q STO are the CPU, memory, and storage amounts of the nth instance, respectively.
When simulating the hiring of a VM on a provider's website, the options to be selected are the amount of computational resources, the operating system, and the datacenter location. In face of that, it is understood the total cost C T of an instance i n ∈ I is the result of a combination of the cost of computational resources C R , the cost of the datacenter location C L , and the cost of the OS C OS , such that: The same instance hosted in different datacenter locations can have significantly different prices. For example, data obtained in July 2018 showed that Amazon's instance r3.2xlarge (8 vCPU, 61 GB of memory, and 160 GB of SSD storage), hosted in Brazil, had been priced at US$ 1.3990/hour. The same instance, hosted in the USA, had been priced at US$ 0.6650/hour. In comparison, the instance r3.4xlarge (16 vCPU, 122 GB of memory, and 320 GB of SSD storage) had been priced at US$ 1.3300/hour, i.e., double the resources for half the price [104].
This price difference cannot be justified just by changing the datacenter location. Additional factors contribute to such difference, some of them related to the VM configurations, as vCPU cores, I/O rates, and older hardware. However, this information is not clear enough and many times difficult to be accessed by users with low technical knowledge. Thus, it is common that the first thing the user seeks is the number of computational resources and the price of the VM -both easy and quick information to access.
Therefore, services are generally selected which offer greater amounts of computational resources at the lowest price, according to the requirements of the applications in question, and different priorities may be assigned to the decision criteria. Thus, in this case, we assume that the best option is to choose the provider p ∈ P and the instance i n ∈ I from which one can obtain the best cost-benefit relation, characterized by maximizing instance resources R n and minimizing the final cost C T , such that: Selecting services in the cloud: a decision support methodology…

D-AHP methodology
In this section, we present the D-AHP, a decision-making methodology to support processes for migrating applications to computational infrastructures in Cloud environments.
We propose use of the D-AHP methodology to solve the following problem: Select a VM instance i n ∈ I associated with an IaaS provider p ∈ P , in such a way as to obtain the maximum amount of computational resources R n at the lowest price C T , based on CPU, memory, and storage amounts, and the prices of the instances in different datacenter locations L.
To do this, the D-AHP analyzes a set of variables arranged in two distinct sets: the set of numerical variables V = {v 1 , v 2 , … , v x } and the set of non-numeric vari- Tables 4 and 5.
The D-AHP basically consists of four main steps included in three major phases, as shown in Fig. 1.
The first step is to pre-select trusted providers that have a good range of services, comply with Service Level Agreement (SLA) terms and are able to quickly adapt to the characteristic dynamics of the Cloud environment.

Fig. 1 Major phases of D-AHP
The second step is to apply Pareto Dominance [105] to a set of computational resources and its price in order to reduce the number of instances. As a result, only the non-dominated instances are selected for the next step.
In the third step, the non-dominated instances are analyzed in relation to their availability regarding datacenter locations and the computational requirements of the application to be migrated. The instances resulting from this new filtering give rise to the set of selectable instances, arranged at the last level of the hierarchical structure present in the final step of the D-AHP.
The fourth and final step of the D-AHP is to use the AHP method [106] to obtain a final classification of the selectable instances. In this process, they are compared to one another from the perspective of a set of decision criteria and sub-criteria which, as in the second step, are related to the amount of computational resources and price.
The D-AHP is adaptable to any OS or storage system. In addition, it is possible to use it by considering a set of free-choice datacenter locations, or even only among availability zones within the same region.
Moreover, the D-AHP considers that the application workload to be migrated from local infrastructures is constant. If we consider a dynamic application workload, the QoS values can be significantly changed, and problems of over-provisioning or under-provisioning of computational resources can be detected [107].
In the following subsections, we detail the steps in the D-AHP and specify the processes performed internally in each step. Also, we represent these steps and processes by a flowchart, as shown in Fig. 2.

Step 1: selection of IaaS providers
In this first step, we seek to select a subset of providers P + ∈ P = {p 1 , p 2 , … , p } so that the next steps in the D-AHP can be constructed.
In the D-AHP proposal, users define the set of IaaS providers using inclusion or exclusion criteria, according to their preferences [108]. In this study, providers are selected using two approaches: the first is by definition of a set of services offered to the DMs when migrating their applications to the Cloud; the second is reference to Gartner's Magic Quadrant for updated IaaS providers [109]. The services adopted as criteria for choosing the set of selected providers P + to be analyzed (15 providers) were as follows: Any Location (by continent), Hourly Pay-As-You-Go, Auto-Scaling, No Limit Transfer, Support 24x7, Load Balancing, Firewall, Operating System, GPU Instances, and SSD Storage [110].
The annual Magic Quadrant developed by Gartner uses aspects such as Ability to Execute and Completeness of Vision to classify IaaS providers for a given period of market observation. Among the ranking groups, the Leader providers are technologically more advanced. They are points of reference within the industry and they dictate the rules within the segment by having a better view of the market and the ability to carry forward the results of their research.
Thus, we aim to verify, from the providers that are members of the Leaders' quadrant, which ones offer the complete set of defined services. Joint analysis of both factors will identify providers belonging to the set of selected providers P + and the set of unselected providers P − .

Step 2: applying pareto dominance
According to the Pareto Dominance, if X * is the set of feasible solutions to a minimization problem and if x, Thus, for a solution to be non-dominated, there must be no other solution within the search space better than it, whenever all objectives are simultaneously considered.
When applying Pareto Dominance in a set, we look for the best solutions belonging to it, with the best performance in relation to multiple objectives, which can be maximization or minimization.
In the D-AHP proposal, Pareto Dominance is applied in order to reduce the number of instances provided by the providers of set P + . As a direct consequence, the number of pairwise comparisons is significantly reduced, which is the basis of the AHP method (applied in the last step of the D-AHP).
In the D-AHP, the dominance relationship is applied initially intra-provider, that is, in instances of the same provider. In our analysis, we only used On-demand instances in the Public Cloud model. From Eq. 1, it is known that R n =(Q MEM ,Q CPU , Q STO ). Therefore, the maximization of R n is conditioned to the maximization of its three components. Thus, the dominance relationship is applied considering four objectives and no prioritization among them, according to Eqs. 5-8.
The dominance relationship is applied over the set of instances as follows: for example, according to the data obtained in July 2018, the provider Azure was offering instance H8 at a price of US$ 0.796/hour; the same provider was offering instance L8 at a price of US$ 0.624/hour [111]. According to Table 6, pairwise comparison of H8 and L8 showed that the amount of computational resources of L8 were bigger than (Memory and Storage) or equal to (vCPU) those of H8. Besides this, L8 had been priced lower than H8. Therefore, L8 dominates H8; i.e., L8 is non-dominated, and H8 is dominated.
After this process is performed for all providers of set P + selected in Step 1, the non-dominated and dominated instances of each are stored in the non-dominated and dominated instances sets, denoted by I + and I − , respectively. Cases where there are indifferent instances, both are also added to set I + . An instance, when included in the set I − , is automatically eliminated from the next steps of the process, as it has already been dominated by some other instance in relation to all objectives.
A new dominance relationship is applied in an inter-provider way over the instances of set I + , which is composed of non-dominated instances from all providers of set P + . Dominated instances are stored in set I − , together with the dominated ones resulting from the first dominance relationship, while the non-dominated ones are stored in the final set of non-dominated instances I * , which must then pass through a new filtering process in the third step of the D-AHP. Until this step, the computational demand of the application has not been analyzed.

Step 3: instance filtering
In this step, two constraints are established, applicable to the set of non-dominated instances I * .
The first constraint refers to the availability locations of the instances, which is necessary because of the multi-provider approach of the D-AHP. We assume that instances can only have their prices compared if they are hosted in datacenters whose locations L are common among the providers of set P + . Thus, the selected and unselected locations are stored in sets L + and L − , respectively.

3
Selecting services in the cloud: a decision support methodology… When this constraint is applied, instances that are not available in all common datacenter locations among providers of set P + are omitted from selection for the next step; when this constraint is not applied, DMs can be prevented from expanding their searches for more attractive prices in other regions, invalidating one of the objectives of the D-AHP, which is the search for the lowest price.
However, the adaptability of the D-AHP in relation to the number of datacenter locations should be noted. The D-AHP allows DMs to increase or decrease the number of datacenter locations when they want to analyze the prices of instances. In cases where the DM already has a defined provider, it is possible to apply the D-AHP only to this provider or even just to availability zones within the same region.
The second constraint refers to the demand for computational resources consumed by the workload of the application to be migrated. Instances whose computational resources are lower than the demand of the application are not selected because it is assumed that there will not be enough resources to execute the application if one of these instances is selected during the process.
Application of both constraints creates set I ‡ , which is composed of the selectable instances to be arranged at the last level of the hierarchical structure of the final step of the D-AHP, whose number of instances tends to be smaller than the number of instances of set I * . This provides a smaller number of pairwise comparisons, and, as a result, there is a lower possibility of inconsistencies occurring during the DMs' judgment.

Step 4: elements of the AHP method
The AHP method allows to structure a problem in the form of a hierarchy of criteria, which has at least three levels: at the top, the main objective O of the problem; in the middle, the set of decision criteria C = {C j |j = 1, 2, … , m} that define the alternatives; and at the bottom, the set of competing alternatives A = {A i |i = 1, 2, … , n}.
The basis of the AHP consists of a pairwise comparison between the elements of each hierarchical level. For such, the Saaty scale is used, whose values vary from 1 to 9 to represent the importance level between two criteria, in which 1 means the equal importance level; 3, 5, 7, and 9 mean the moderate, strong, very strong, and extreme importance level of one criterion over another, respectively; and 1/3, 1/5, 1/7, and 1/9 represent reciprocal importance levels [112].
The elements resulting from the pairwise comparison are arranged in a comparison matrix M, whose elements represent the dominance level between two criteria. Based on the elements of M, the weight vector of the criteria j can be obtained through the geometric mean method [106].
To obtain the final classification of the set of alternatives, an aggregation process is carried out, that is, the global valuation of A i in relation to the main objective following the weighted sum model.
Let V(A i ) be the global value of A i in relation to O, j the preference level (weight) of the jth criterion in relation to O, and D the decision matrix whose elements x ij represent the preference level of A i in relation to the criterion C j . Therefore, V(A i ) is calculated by Eq. 9.
where T j is the transpose of j . The alternatives are classified according to their respective global values. The best alternative A best is the one with the highest global value, according to Eq. 10.
The verification of possible inconsistencies of judgments from contradictory comparisons can be calculated by the Consistency Index (CI) and the Consistency Ratio (CR), according to Eqs. 11 and 12. If CI, CR < 0.1 , then there is consistency in the judgments; if not consistent, judgments must be redone.
where λ max is largest eigenvalue of the matrix M, m is the number of criteria and RI is the Random Consistency Index, whose values are shown in Table 7.
The hierarchy proposed in the D-AHP methodology is represented in Fig. 3. It is composed of a main objective, two criteria, sub-criteria, and a set of n instances as decision alternatives whose number may vary depending on the result of the filtering in Steps 2 and 3.
Below, we describe each of the elements of the hierarchy of the D-AHP shown in Fig. 3. These include the following: -Objective: Select VM Instance -The aim is to select an instance with enough computational resources for execution of the application to be migrated, at the lowest price, optimizing the cost-benefit ratio; -Criterion 1: Computational Resources -This refers to the amounts of computational resources of VM instances, specifically CPU, memory, and storage; -Criterion 2: Price -This refers to the price of instances in common countries where providers have datacenters; -Sub-criterion 1: CPU -This compares instances in relation to the amount of CPU; -Sub-criterion 2: Memory -This compares instances in relation to the amount of memory;  At the end of this step, the selectable instances are classified, from best to worst, according to their respective performance in relation to each criterion and sub-criterion of the hierarchy, considering the weight defined by the DM for each of them in the decision-making process.
In relation to the sub-criteria of the Price criterion, in countries where a given provider has more than one available datacenter, we chose the region within the same country with the minimum value of C T , which is not necessarily the same for all instances.

Application of the methodology
For application of the D-AHP, we defined new application profiles based on actual Cloud usage data contained in the dataset called Google Cluster Trace [113]. This contains, among other information, data relating to computational resources consumed by applications distributed in the form of jobs and tasks. To match the resource metrics of instances, we used GB as a measure of memory and storage resources in the dataset, and CPU resources were measured by cores.
The information in the dataset refers to actual data on the consumption of computational resources by applications running in Google datacenters in the Cloud. Nonetheless, the data are considered sensitive and for this reason they are obfuscated through a rescaling value before becoming public; the reasons range from economic aspects to data security [114].
According to Reiss et al. [115], not knowing such factor, by which data were rescaled, makes researchers propose different ways of treating data. In face of this uncertainty, and since a standard is adopted, data can be manipulated in different ways depending on the purpose of use.

Workload characterization
Zhang et al. [116], in seeking workload models to accurately reproduce the performance characteristics of real workloads, found that simply capturing the average usage of each task would be sufficient to generate synthetic workload with high accuracy, when it comes to the resource usage and task waiting time. Thus, the authors assume that it is possible to realistically estimate the total waiting time and the use of resources for real or imaginary workloads. They came to these conclusions for two reasons: the low variability in the use of resources in the workload by tasks, and the characteristics from evaluation metrics (the use of resources, for instance) under different workload conditions.
In order to generate a realistic and compatible workload with the amounts of computational resources offered by the instances, we propose a rescaling value to be applied to the values related to the total of resources consumed by applications in the dataset, with the aim of turning such values comparable to the resources offered by VM instances. For doing so, we used the second instance with more computational resources among the instances of set I * as the maximum value for the rescaling so that there are at least two in each decision-making process. In this way, M CPU , M MEM , and M STO refer to the values set for rescaling the CPU, memory, and storage amounts, respectively.
This then gives us a set of jobs T = {T 1 , T 2 , … , T } , composed of a set of individual tasks TK ={tk 1 ,tk 2 , … … ,tk }. R T =(DC T , DM T , DS T ) and R tk =(DC tk , DM tk , DS tk ) are ordered triples whose components are the CPU, memory, and storage resources consumed by a job and by a task, respectively.
The values of the components of R tk can be obtained from the dataset. To obtain the overall values of the components of R T , we assume that it is calculated by the sum of the values of the resources consumed by the tasks that make up the th job.
According to Reiss et al. [113], applications that need to perform different types of tasks with different resource requirements usually run as multiple jobs. Therefore, let R A = (DC A , DM A , DS A ) denote the amount of computational resources consumed by the workload of the application Ā . The components of R A can be obtained from the dataset through the sum of resource consumption of the jobs, which, in turn, is obtained through the sum of the consumption of the tasks that compose them. Thus, an estimate of computational resources R Therefore, considering the task-job-application relation, we estimate that the computational resource consumption of an application workload from the dataset data is calculated by Eqs. 13-15, as follows:

Defining application profiles
Based on dataset values, we propose a set of usage levels based on the interval before rescaling (between 0 and 1). For this, we assume that the usage level of each computational resource (CPU, memory, and storage) by an application Ā can be Low Usage, Medium Usage or High Usage, as shown in Table 8.
Using the values of R A , we can classify usage levels for the workload of Ā . By combining different usage levels for each of the three computational resources, we can generate a workload profile for Ā , which can be categorized in different ways, as shown in Table 9.

Case studies
Next, we present three case studies to better understand the proposal to estimate the computational resources consumed by applications from the dataset. From these input data, we apply the D-AHP methodology as a way of proving its effectiveness in solving problems that involve selection of VM instances for the migration of applications to the Cloud. In Case Study 1, we verified the effectiveness of the D-AHP over a reduced set of selectable instances, resulting from an application profile equal to or greater than that classified as Medium. In Case Study 2, we applied the D-AHP to a larger set of selectable instances, obtained through an application profile lower than that classified as Medium. In Case Study 3, we verified the influence of the datacenter location from the results obtained in Case Studies 1 and 2.

Case Study 1
In the first case study, we intended to migrate an application Ā comprising five jobs T = {T 1 , T 2 , T 3 , T 4 , T 5 } , each composed of a different number of tasks.
In Table 10, jobs are identified by their JobID. Each of them is composed of a certain number of tasks, according to the dataset. The consumption of resources for each job was obtained through the sum of the consumption of their respective tasks. Table 10 shows that the application has resource usage levels classified as Medium Usage for the CPU, High Usage for Memory, and Low Usage for Storage (see Table 8), according to the values of DC A , DM A , and DS A , respectively. In this way, the workload profile of Ā is classified as Medium (see Table 9).
Using these input data, one can then apply the D-AHP. Application of Step 1: Step 1 consists of selecting a subset of IaaS providers P + belonging to set P by combining the service set offer and Gartner's Magic Quadrant, as described in Subsection 4.1.  [111], and Google (G) [117]. In a complementary way, by analyzing the set of established services, it is evident that such providers are the only ones that offer the complete set of services. Thus, the set of selected providers P + is composed of three providers, such that P + = {A, Z, G}.
Thus, we can define the set of VM instances of the elements of P + , such that I A = {i a,1 , i a,2 , … , i a,q } , I Z = {i z,1 , i z,2 , … , i z,r } , and I G = {i g,1 , i g,2 , … , i g,s } are the sets of all VM instances offered by providers A, Z, and G, respectively.
Among the datacenter location options for providers of set P + , provider A has datacenters spread across the set of locations L A = {l a,1 , l a,2 , … , l a, } , while providers Z and G have set of locations L Z = {l z,1 , l z,2 , … , l z, } and L G = {l g,1 , l g,2 , … , l g, } , respectively.
Application of Step 2: In Step 2, Pareto Dominance is applied over the set of instances I A , I Z , and I G for the selected providers. Thus, the intra-provider analysis facilitates comparison of the instances i a,q ∈ I A , i z,r ∈ I Z , and i g,s ∈ I G in a pairwise way, within their respective sets, in relation to the objectives defined in Eqs. 5-8.
The set of non-dominated instances I + resulting from the first phase of dominance application is composed of instances i + a,q , i + z,r , and i + g,s , which refer to providers A, Z, and G, respectively. The number of elements of set I + can be changed according to the number of providers selected in Step 1.
After executing the first dominance relationship for each provider's instances, a second dominance relationship is applied to the instances of set I + . As result, the non-dominated instances selected for the next step of the D-AHP are obtained, which make up the final set of non-dominated instances I * .
Application of Step 3: In Step 3, the set of datacenter locations is conditioned to the providers selected in Step 1, in order that only instances hosted in all common locations between them are selected.
The elements of sets L A , L Z , and L G are not common to all providers of set P + ; i.e., one provider may have a datacenter in a location where the others do not have a datacenter. However, when we conducted a country-by-country approach, common countries were identified through the intersection of sets L A , L Z , and L G . Thus, when only considering the elements resulting from this operation, the set of To estimate the workload of Ā , we multiply the sum of the resource consumption of the jobs obtained in Table 10 Table 11 were selected, along with their corresponding resources.

Application of Step 4: In
Step 4 of the D-AHP, we try to identify the instance that has the best performance in relation to a set of criteria and sub-criteria defined in the hierarchy shown in Fig. 3. For this, weights must be assigned to decision criteria and sub-criteria.
In order to define the weights of the criteria, group decision-making was used. Multiple DMs can contribute a variety of experience, knowledge, and perspectives, and a group can deal with the complexity of the problem better than a single DM. A questionnaire was presented to a group of professionals ( ) in the areas of Computing and Software Engineering, whose judgments were grouped using Aggregating Individual Priorities (AIP). In total, 11 DMs obtained consistency in their judgments based on the Consistency Index (CI) and Consistency Ratio (CR) values and, because of this, had their preferences considered. Therefore, ={DM 1 , DM 2 , … ,DM 11 }.
Tables 19 and 20 in Appendix A present the judgments of the 11 DMs in relation to the criteria and sub-criteria of the Computational Resources criterion. These judgments were made according to the Saaty scale.
Note that there was no unanimity in the judgments of all DMs, which may lead, at the end of the process, to selection of different instances. For datacenter locations, equal weights were defined without the influence of DMs to avoid prioritizing instances with a great deal of discrepancy between locations, or those priced more

3
Selecting services in the cloud: a decision support methodology… highly than others. Thus, each of the eight locations has a priority equal to 0.125 (or 12.5%).
In relation to decision alternatives, the selectable instances of set I ‡ are compared to one another according to the actual values of the resources that they have in the form of direct attributes for the sub-criteria linked to the Computational Resources criterion (or benefit criterion); that is, the bigger the better. For sub-criteria linked to the Price criterion, instances are compared with one another again according to the actual prices applicable in each of the countries represented by the sub-criteria, although in the form of indirect attributes (or cost criterion); that is, the smaller the better.
From the data shown in Table 11, the selectable instances are evaluated considering the DMs' preferences in relation to the decision criteria and sub-criteria, as shown in Tables 19 and 20 in Appendix A, in addition to the datacenter location weights (without DMs' preferences).
In the phase prior to applying the weights of the criteria on the selectable instances, the valuation of each one of them is directly related to the amounts of each resource they have. For example, the instance x1.32xlarge is the one that has the most CPU, memory, and storage, and is therefore ranked the best. Instance i3.16xlarge, on the other hand, has little storage, which makes its value very low when compared to the others. In relation to the price, because it is an indirect criterion, the instances with the highest price have a lower valuation.
In Table 21 in Appendix A, instance values are shown in relation to the decision criteria, which already account for the values obtained in relation to their respective sub-criteria. In this step, distinct values are noted for different DMs, as per their individual judgments for Computational Resources criterion. With the Price criterion, due to definition of equal weights for all locations, the values obtained were the same for all DMs.
The final classification of the instances, considering the individual judgments of the DMs and after aggregation of their judgments by the AIP, is presented in Table 12. In this case, it is evident that differences in prioritization of criteria and sub-criteria by different DMs result in different classifications of the instances.
For DMs who prioritized the Price criterion over Computational Resources (as in the case of DM 3 ), the instance with the highest price was the one ranked last, with a value well below the others, considering the maximum importance level (9 on the Saaty scale) attributed by this DM. As a consequence, the lowest-priced instance had the highest score. For the other DMs who did not prioritize any of the criteria, the score for the instances remained close. Such differences may be justified by the different weights assigned to the sub-criteria of the Computational Resources criterion. Thus, the Google's instance n1-highmem-96 obtained the highest global value among the set of selectable instances I ‡ . When analyzing the data in Table 12, it was noted that this instance was only classified as the best by one DM ( DM 9 ), and it was given the second highest classification by all the others. Furthermore, it can be seen that DM 3 judgments significantly influenced the decision of the group because of the larger differences between the evaluations of the instances for this DM in particular.
Although some DMs have equal judgments at all levels of the D-AHP hierarchy (in this case, DM 1 , DM 2 , DM 6 , DM 7 , DM 10 , and DM 11 ) , the aggregation process using the geometric mean method (recommended by Saaty [106]) considers each DM as a member of the group, totaling 11 individual judgments.

Case study 2
In this second case study, we intend to migrate a new application Ā composed of three jobs T = {T 1 , T 2 , T 3 } , whose resource consumption values are shown in Table 13.
According to Table 13, the application has a resource usage level rated Low Usage for CPU, memory, and storage (see Table 8). In this way, the workload profile of Ā is classified as Very Low (see Table 9).
By applying the rescaling values on DC A , DM A , and DS A , we obtain DC Considering that the applications of Steps 1 and 2 are analogous to Case Study 1 (described in Subsection 5.3.1), that is, the providers {A, Z, G} ∈ P + and set of non-dominated instances I * are the same, we can directly define the set of selectable instances I ‡ on the basis of the R ′ A values, as stated in Table 14. As in Case Study 1, by using the data shown in Table 14, the instances are evaluated considering DMs' preferences (see Tables 19 and 20 in Appendix A). Again, equal weights were defined for all countries for all DMs.
In Table 22 in Appendix A, instances are valued in relation to the decision criteria while already considering the values obtained in relation to their respective subcriteria. Regarding the sub-criteria of the Computational Resources criterion, once

3
Selecting services in the cloud: a decision support methodology… again, the instance x1.32xlarge is the one classified as the best since the amount of resources that it possesses is far superior to the majority of other instances. However, due to its higher price, this instance has the worst ranking in relation to the Price criterion. As can be seen from the data in Table 15, most instances have similar classifications due to similar judgments, except for DM 3 , whose judgments prioritize the lowest price.
For DMs who define equal weights of importance for all criteria and sub-criteria ( DM 1 , DM 2 , DM 6 , DM 7 , DM 10 , and DM 11 ), the final classification is characterized by the ratio of the values. This can be verified using the results obtained for the instances x1.16xlarge and x1.32xlarge. Because x1.32xlarge has exactly twice as many computational resources as x1.16xlarge, in order for x1.16xlarge to be a better option, its price must be less than half the price of x1.32xlarge, which was not the case in two of the eight countries analyzed. In addition, just as x1.32xlarge has practically double the valuation of x1.16xlarge in relation to the Computational Resources criterion, it scores half for the Price criterion.
Regarding the final classification after AIP, Amazon's instance x1.32xlarge obtained the highest global value among the set of selectable instances I ‡ . When analyzing the values in Table 15 for each DM, we noted that it was ranked the best by 10 of a total of 11 DMs (i.e., not by DM 3 , who ascribed a higher level of importance to the Price criterion in his judgment. Because of this, instance D16 v3 was ranked second best.

Case study 3
In this case study, we intend to analyze the influence of the datacenter location in the selection of VM instances. For this, the same information contained in the Tables used in Case Studies 1 and 2 are used, except for the values referring to the weights of the datacenter locations, which will be modified in order to identify possible classification changes for the instances when defining countries in which providers have datacenters.
To do this, of the eight countries analyzed, different importance levels were defined for three of them: the USA, Brazil, and Japan. The option for the USA is due to the fact that this country has the lowest prices among all the other countries analyzed; Brazil was admitted because it is the same country as the DM group and, consequently, had better latency [8,21]; Japan was also chosen because of the time zone in relation to Brazil, and applications can always be performed outside peak hours, which usually occur during the day.
The importance levels of these three countries in relation to the rest and to one another, following the Saaty scale, are represented in Table 16. From the data shown in Table 23 in Appendix A, in a comparison with Case Study 1 (see Table 12), we verified that the valuation of instances n1-highmem-96 and i3.16xlarge were the ones that underwent the most changes, while the others had less notable changes. However, no instance classifications changed for any of the DMs. Table 17 shows the final classification of the instances after aggregation of the DMs' judgments. For comparison purposes, two new columns were added to identify the changes in values and, consequently, in the ranking of the instances, when comparing such results with those obtained in Table 12.
By analyzing Table 17, we concluded that the instance n1-highmem-96 is still classified as the best, although with a greater difference of values over the others. However, it was noted that prioritization of some datacenter locations in relation to others resulted in some changes in the final classification of instances, such as the inversion of classification between instances i3.16xlarge and x1.32xlarge.
From the data shown in Table 24 in Appendix A, differences were also detected in relation to the results obtained in Table 15, mainly for instances with have an intermediate amount of computational resources, over which the DMs' judgments that did not define equal weights for all elements of the hierarchy in all steps of the pairwise comparisons (i.e., DM 3 , DM 4 , DM 5 , DM 8 , and DM 9 ) had a greater influence. Table 18 presents the final classification of the instances after aggregation of the DMs' judgments, together with additional columns, to compare the results with those obtained in Table 15 (see Case Study 2). Based on the values of both tables, we can conclude that, for most instances, despite the final valuation not being the same in both simulations, the classification is maintained. However, we noticed that, once again, prioritization of some datacenter locations in relation to others caused changes in the final classification of the instances, which can be characterized by a simple inversion of the classification between two instances (as for D15 v2 and D32 v3) or by a more pronounced change (such as E32 v3), which was classified three positions below its classification in Case Study 2.

Conclusions
The dynamic pace with which Cloud Computing has been evolving in recent years, providing reliable, affordable, and low-cost computational resources, is driving adoption of the IaaS model. However, there are still many uncertainties surrounding this new paradigm of distributed computing, making a migration process a very complex task. In a market characterized by the presence of multiple providers and the variety of VM instances that each one offers, decisions about the best provider/instance set make decision-making difficult. In order to help with this problem, we are proposing the D-AHP, a methodology for selecting VM instances in

3
Selecting services in the cloud: a decision support methodology… the Cloud, based on Pareto Dominance and the AHP multicriteria optimization method. For this, the D-AHP uses the amount of computational resources and the price of instances in different datacenter locations as decision criteria and subcriteria. A set of new application workload profiles based on the Google Cluster Trace dataset were defined for the case studies presented here, to validate the D-AHP, and these were admitted as migrating to the Cloud. By using the D-AHP, we observed that execution of the Pareto Dominance between instances and filter steps significantly reduces the dimensionality of the problem, as they eliminate instances with less computational resources and have a higher cost if hosted in datacenters from different geographic locations, making the number of pairwise comparisons reduce considerably. The D-AHP method has proved to be efficient because it significantly reduced the number of alternatives to be compared in its last phase, considering that the AHP method is not as efficient when many alternatives are available in the hierarchy, and because with this type of problem it is essential to have the possibility of prioritizing one objective in relation to another in order to meet user needs better. This fact can be verified through different classifications within a set of selectable instances, which are the result of the individual preferences of a set of DMs responsible for the decision process.
In future research, we will endeavor to solve the problem of manual collection of instance details by integrating the D-AHP with existing databases, from which it is possible to obtain information about the prices and computational resource amounts of VM instances belonging to a wide range of providers and, as a result, make the D-AHP an automated tool. In addition, when applying the D-AHP in this study, it was found that, for applications with lower computational demands, the number of selectable instances increases, which can make it difficult to apply the last step of the D-AHP. In order to deal with such situations, we intend, in future research, to seek new alternatives to reduce the dimensionality of the problem, e.g., by adding new criteria to the hierarchy of the D-AHP so that they can be considered in the migration processes of applications with specific demands, such as web applications and integration solutions.

Appendix A: additional tables referring to the case studies
See Tables 19, 20

3
Selecting services in the cloud: a decision support methodology…