Predictive models for managing financial incentives oriented to companies: Application to Portugal 2020

Since Portugal joined the European Union (EU) that it has been receiving incentives/funds to reduce disparities with other EU countries. Despite this goal, disparities between European regions still exist and the impact of such funds is questionable. What if it is possible to predict the success of such incentives when the funds are awarded to the beneficiaries? Using data from the database of The Agency for Competitiveness and Innovation (IAPMEI), for the programs National Strategic Reference Framework (QREN), and Portugal 2020, two predictive models are developed to estimate the number of applications to be received and the schedule of expected payments to beneficiaries for a four-month period. The results allow for a better prediction, on one hand, of the resources to be allocated to the evaluation process of the applications, and on the other hand, of the financial execution plan for the upcoming period, in order to prepare the financial execution.


INTRODUCTION
The need for financing has always been a reality. To facilitate this demand, financial incentive tenders were created for companies, to grant financing without having to spend too much time and resources in a vast search for investment possibilities. From the investor's perspective, the creation of a bidding process for financial incentives ensures a better choice in the company where the investment will be made. Thus, the investor has an easier time evaluating each of the options, since these, through a tender, are calculated in comparison with each other, thus reducing their investment risk.
One of the investment programs created for this purpose was the "Portugal 2020", which resulted from a partnership agreement between Portugal and the European Commission, covering the five European Structural and Investment Funds: European Regional Development Fund (ERDF), European Social Fund (ESF), Cohesion Fund (CF), European Agricultural Fund for Rural Development (EAFRD) and European Maritime Affairs and Fisheries Fund (EMFF). The program establishes economic, social, environmental, and territorial development policies, to promote growth and employment in Portugal, between 2014 and 2020.
The need to manage the portfolio of projects received and to forecast their success, supports the development of predictive models to ensure that these needs are efficiently fulfilled. If Portugal does not execute the full amount allocated to it, it will have to return the unused funds to the EU. This is an issue that also needs to be considered.
The existence of forecasts allows to better program and plan the evaluation process of applications and allows to better estimate the financial execution and the expected load of the projects in the portfolio payment of applications, making the whole process more efficient using fewer resources and performed in less time.
The model for predicting demand in contests of Portugal 2020 will estimate the number of applications that will enter a contest for the chosen period, up to a maximum of one year, based on the historical patterns candidates evidenced in previous contests. Additionally, the model for estimating the amounts of payment requests (execution) will be based on the status of each project in execution, in view of the project portfolio held by the tender management company, and an execution plan to be verified in different time periods of no more than one year. Therefore, it will be possible to predict and prepare the expectations for the realization of the contest and the support to the companies in question.
The development of these models contributes to the literature on management of financial incentives and will serve as a groundwork to the development of similar models applicable to other calls and using different databases.
The remainder of the paper is organized as follows. The second section reviews the theoretical framework, where the topics considered relevant to the object of this study are identified and the main concepts addressed. In the third section the fundamentals for the execution of the work are discussed, which include data collection and treatment processes as well as the methods used. The fourth section presents the analyses and reports the results obtained and the difficulties experienced in the specification of the forecasting models. Finally, in the last section, the conclusions of this study and its limitations are presented, and possible steps for future research are suggested.

A. Evolution of the portuguese economy
The recession experienced in recent years in Portugal does not represent the future of economic activity in Portugal [1], as shown in Fig. 1 However, the new crisis that all of us are living due to COVID-19 is having an extremely negative impact on the Grow Domestic Product (GDP) growth, namely in 2020 and probably in 2021, despite favorable perspectives as the vaccination process covers the worldwide population by the end of 2021.

Exports
Investment Public consumption Private consumption GNP (%) Figure 1 -Net contributions to real GDP in PortugalSource: Banco de Portugal [1].
As shown in Fig. 2, the recession felt in 2014 had a negative impact on the increase in the number of insolvencies, despite the creation of new companies, which maintained a positive growth [2].

Constitutions
Dissolutions Insolvencies The new reality is extremely negative, since many companies are closed and face bankruptcy, especially those related with tourism. However, the Portuguese Government is supporting the companies to minimize the insolvencies (e.g. layoff program and contribution to rentals) while the EU approved a huge financial incentive to be applied not only in helping companies, but also in public investments.

B. Financial Needs of Companies in Portugal
If financing was essential for companies to be able to expand any part of their activity, nowadays, because of COVID-19, it is even more essential. However, organizations do not have the capacity, usually, to do this only with the internal resources they generate. Thus, external financing sources are needed. Also, the financing needs vary during the various growth phases of each company.
Despite the benefits for financing solutions such as selffinancing, with a reduction in the dependence on external capital, companies end up choosing, mainly, bank credit as a source of financing [3]. However, there are risks associated with this type of financing since some companies are not able to provide the required guarantees by the banks, ending up, not advancing with their investment projects [11] [4].

C. EU support programs in Portugal
One of the programs created by the Portuguese public entity for the development and support to micro, small and medium-sized companies (SMEs), was the Portugal 2020 program, which results from a partnership agreement between Portugal and the European Commission. This program brings together the activities of the five aforementioned European Structural and Investment Funds: i) CF -focused in reducing economic and social disparities and promoting sustainable development [5]; ii) ERDF -aimed at strengthening economic and social cohesion in the EU by bridging the imbalances between regions [6]; iii) EAFRD -whose goals are to foster the competitiveness of the agricultural sector, ensure the sustainable management of natural resources and climate action and achieve balanced territorial development of rural economies and communities, including the creation and maintenance of employment [7]; iv) ESF -whose objectives are to promote employment and support labor mobility, promote social inclusion and combat poverty, invest in education, skills and lifelong learning and improve institutional capacity and efficiency of public administration [8]; and v) EMFF whose objectives are to help fishermen in the transition to sustainable fishing, help coastal communities to diversify their economies, finance projects to create jobs and improve the quality of life European coasts and facilitate access to finance [9].
The Agency for Competitiveness and Innovation (IAPMEI) is responsible for supporting SMEs in the commercial, industrial, services and construction sectors, promoting competitiveness, business growth and supporting the internationalization of companies in these sectors in Portugal.
IAPMEI is present in a regional network of business support centers nationwide, in 12 cities, where, together with its website, it identifies products and services related to its responsibilities, such as financial incentive systems. These incentive and financing systems describe incentives and other financing solutions with direct or indirect intervention by IAPMEI, being available to companies for the development of their strategies.
There are currently three main incentive systems, corresponding to three domains of business development (Business Innovation and Entrepreneurship, Qualification and Internationalization of SMEs and Research and Technological Development), which encompass several subsystems, aimed at enhancing the development of national companies during the various phases of their life cycle and in their areas of competitiveness considered fundamental to operate in global markets [10].
The financial incentives currently available can take two forms: Non-Refundable Incentive (non-repayable financial support, subject to the objectives defined in the contract) and Refundable Incentive (interest-free loan, under repayment conditions defined in the contract).

D. Application and selection process
The Portugal 2020 Partnership Agreement is applied through IAPMEI as Operational Programs in addition to the European Territorial Cooperation Programs, in which Portugal will participate alongside other member states. These programs are managed by the various delegations of Managing Authorities for the operational programs (Norte, Centro, Lisbon, Alentejo, Algarve and Competitiveness and Internationalization Operational Program (POCI/Compete)) [11].
Applications to the programs are made through calls for tenders by opening notices which define the requirements and opening and closing calls for the financial support measures. For the application and selection process of the various programs, the registered candidates are defined through the competitions presented, and information about the programs that are running and the results of the programs already running can be found on the individual websites of each competition or on the website managed by IAPMEI [12]. The application is made at the 2020 counter until the deadline for the competition. After the end of the tender period, the evaluation of the project is carried out based on the terms of the notice of each tender. Then, the opinions are sent by IAPMEI to the managing authorities of the operational programs that then rank and select the candidates based on the score obtained during the evaluation process and until the budget limit determined by the competition is reached.

III. METHODOLOGY
The methodology adopted is the CRoss Industry Standard Process for Data Mining (CRISP-DM), that implies a strict connection between the business and the data analyst, with the project being successful, when the model proves to be useful for the business [13].
Knowing the problem and the goals of the managing entity, namely the forecast of applications and the estimation of the financial execution, that is, the payments to be made to projects already approved and the expected load of payment requests for projects in the portfolio, the data understanding and data preparation are the next two CRISP-DM phases.
According with the business manager, it is necessary to consider the information available of the previous and current national programs, respectively, QREN (National Strategic Reference Framework) and Portugal 2020. Each program has its own system (Incentive System) for the Financial Incentive Program, and each system includes one or more measures (Incentive System Measures). Finally, each measure is presented in competition through a notice, in which, in some cases, it still has several phases.

A. Data
The data were provided by the managing entity of the financial program of Portugal 2020, IAPMEI, and reflect the raw information of the projects executed or in execution present or not in the portfolio that were approved in the contests associated with the Portugal 2020 programs and the QREN.
The data provided were already divided into two datasets. The set of data referring to the applications (dataset A) made available consists of 37461 records with 87 variables, which needed to be processed to fit the objective of explaining the forecast of the number of applications per measure. The dataset of payments (dataset B) request is comprised of 52916 payment records and 30 variables, which needed to be processed to fit the purpose of explaining the amount.
The chosen variables to be analyzed, with the monitoring and approval of the business specialist, are presented in Table  I. After the quality analysis (e.g., errors, missing-values and outliers), at the suggestion of the business specialist, records that presented an application status in its life cycle were filtered as "Request for Assistance", thus resulting in a final total of 37188 records.
The variables analyzed and included in the dataset B, with the monitoring, supervision, and approval of the business are presented in Table II. For these variables, their metadata was then organized and analyzed again in terms of quality. During this evaluation, at the suggestion of the business specialist, payments whose payment decision direction was not "Favorable" were filtered from payment requests, resulting in a total of 45514 payment records with a "Favorable" decision. Some of these payments still did not have a payment date and were excluded, resulting in a final total of 33266 records of payments made. In the data preparation phase and to meet the mentioned objective, the variable "Four-month period" was created based on the variable "Payment date". It was then decided to proceed with the evaluation of the number of times that each call had opened to check if there would be feasibility for forecasting the number of applications per call.

B. Modelling
After the descriptive statistics, the beginning of the analysis proceeds with the selection of data modeling techniques, Pearson coefficients, the multiple regression models, and autoregressive models with the goal of forecasting future data, applying the "Bottom-Up" methodology to prepare the dataset. After choosing a model that passes the validity and quality tests applied, this must be reviewed by the business specialist interested in the model to ensure that the model meets the expectations of the business for which it is developed.

IV. RESULTS
The forecast of the total number of applications for each measure of each program, taking into account the confidence margin, with a 95% accuracy rate, where the forecast will be made for the next several competitions, divided individually, according to what was requested by the business representative.
The forecast of the expected execution value for the project portfolio for a period of less than one year whose objective is a 95% hit rate, taking into account the confidence margin, where the forecast will be made for several subsequent time periods divided in four-month terms, as requested by the business representative.

A. Forecast of the total number of applications
In order to predict the number of applications per fourmonth period, the records obtained from the applications were grouped, by analyzing the values of the variables and the proximity to the final variable, following the principle of Occam's razor [14]. It started with the simplest model and then other independent variables were added to the model to increase its explanation power while trying to keep it as simple as possible.
A first model was created (Model 1). In order to understand the relationship between the number of applications and the number of candidates, it was included the periods year, month, and four-month period and it was estimated the correlation coefficient between the sum of notices and the number of candidates for each of temporal period.
It was estimated a weak/moderate linear relationship between the variables in the sample , but, in the population, the variables are also correlated . In Fig. 3 it is possible to visualize the existence of two outliers (cases 17 and 14 in the dataset) that were eliminated from the estimations. A second model (Model 2) was estimated. The variable time was added to the specification in the previous model following the forecasts described in [15]. It is believed that there is a causal relationship with previous values, in the sense that a candidate does not apply again with the same project in the next opening period, knowing that it has already been approved. As such, a new autoregressive independent variable [AR (1)] with the immediately previous period was added to the first model. The validation of the assumptions of the model are presented in Annex.

Eq. (2)
The variable time is not significant in explaining variations in the number of applications (per four-month period).
Finally, a third model (Model 3) was created. It is also possible that there is a seasonality relationship with the immediately preceding period (the first four-month period of the year and the first four-month period of the following year). To test this possibility, another independent variable was added, the [SAR (1)], which led to the following model: The assumption of the presence of seasonality between the dependent variable and SAR (1) was not confirmed. The best model is still the first model in Eq.1.

B. Forecast the four-month execution of the projects
To specify a forecast model for the four-month execution of the projects in the portfolio, the records obtained from the applications were grouped by analyzing the values of the variables and the proximity to the final variable, following the same principle and methodology for the previous model.
The "Sum of approved incentives" variable was chosen as the independent variable since its linear correlation with the "Sum of incentives paid" was found to be the most strongly correlated . Thus, this specification of the model was carried out, which included 34 observations.
Then, a first model (Model 4) was estimated. The validation of the assumptions of the simple regression model are presented in the Annex. The estimated model is: Eq. (4) where the Sum of incentives paid is and the Sum of approved incentives is The independent variable is significant in explaining the variations of the dependent variable. When the approved incentive increases one million Euro, the incentive paid also increases, on average, about 0.56 million euros.
Then, a second linear regression model (Model 5) was estimated. If time is added in the specification of the second model, at least one of the assumptions of the model is not validated, meaning that it cannot be estimated. Therefore, Model 4 is the chosen one.
V. CONCLUSION The objectives of this paper were to develop two models that predicted (1) the number of applications per contest and (2) the need for financial execution for a subsequent period not exceeding one year.
To specify the forecast models, the data received in each dataset were prepared, transformed, filtered, and adjusted with the inclusion of variables that would help to explain the model better. After these analyzes and data preparation, it was verified that one of the objectives would not be possible to be implemented with the available data due to lack of data volume, and for this reason the study was adjusted. The framework remained the same, while the objective was adjusted for the forecast of applications to a four-month period.
In the end, two models were obtained, Model 2 for the first goal and Model 4 for the second goal, which can greatly facilitate the forecasts to be made in a real context. The "Bottom-up strategy" and de CRISP-DM methodology used for the development of the models proved to be adequate and the use of the regression methods were chosen because these are methods that present robustness in their execution, adapting to the type of business, and facilitating their interpretation.
This study adds value to the entities that manage the incentives, since based on the present analyzes, they can be reviewed and applied to each individual project context, helping to explain some factors about the applied measures or to be applied based on the forecasts generated.
For future work, it is proposed the following: i) The elaboration of a model for forecasting which project has a high probability of being approved; ii) The development of a model that, based on the requirements presented as necessary for the competition and considering the remaining candidates, can substantially support the decision to approve projects for financial support, presenting the reasons why the candidate should be admitted; iii) The development of a model that estimates the number of projects approved through applications; and iv) The development of a model that predicts, in time, the payment decision when receiving the payment request. . Therefore, there is not an autocorrelation problem of the random errors.

Second model (Model 2) assumptions:
(1) Independence between random errors for different periods (2) Independence between independent variables The Variance Inflation Factor (VIF) greater than 5, indicates the presence of the problema of multicollinearty [16]. Thus, this assumption is validated.
(3) There is not a problem with the autocorrelation of the random errors since the Durbin-Watson test is close to 2. (1) Linearity of the relationship: it was already validated with the Pearson correlation.

B. Forecast the four-month execution of the projects
(2) Independence between random errors for different periods This assumption is not validate since the two independent variables are correlated ( ). Therefore, the estimations of the coefficients in Model 4 for the second goal are good enough.