A Probabilistic Linear Solver Based on a Multilevel Monte Carlo Method

We describe a new Monte Carlo method based on a multilevel method for computing the action of the resolvent matrix over a vector. The method is based on the numerical evaluation of the Laplace transform of the matrix exponential, which is computed efficiently using a multilevel Monte Carlo method. Essentially, it requires generating suitable random paths which evolve through the indices of the matrix according to the probability law of a continuous-time Markov chain governed by the associated Laplacian matrix. The convergence of the proposed multilevel method has been discussed, and several numerical examples were run to test the performance of the algorithm. These examples concern the computation of some metrics of interest in the analysis of complex networks, and the numerical solution of a boundary-value problem for an elliptic partial differential equation. In addition, the algorithm was conveniently parallelized, and the scalability analyzed and compared with the results of other existing Monte Carlo method for solving linear algebra systems.


Introduction
Among the many numerical methods proposed in the literature for solving linear algebra problems, the probabilistic methods were often perceived within the linear algebra community more as a curiosity than a serious alternative to the state-of-the-art deterministic methods.Even though it is broadly accepted that they offer interesting features from a computational point of view, such as being easily parallelizable, fault-tolerant, and more suited to heterogeneous architectures (features of paramount importance in view of the current high performance computers).However, the truth is that they also exhibit some significant weakness, such as a very slow convergence to the solution.This made the underlying algorithms highly demanding computationally, especially when dealing with high accuracy solutions.In addition, they require a continuous close monitoring by the user in order to control the numerical errors, being therefore often necessary to repeat several times the simulations before accepting the solution within the accuracy prescribed by the user.Rather the deterministic methods, such as the iterative methods, are basically governed by an automatic procedure, being often only necessary to impose initially a certain tolerance that should be reached by the algorithm before stopping safely the execution.This tolerance should be enough (with the exception of some pathological problems) to guarantee the convergence of the solution to the prescribed accuracy.Some of the aforementioned issues affecting the probabilistic methods have been progressively improved over the years, although it may seem that the advance is often relatively modest, specially when compared with the advance experimented by the counterpart deterministic methods.
Historically, the idea of using probabilistic methods based on Monte Carlo simulations for solving linear algebra problems goes back to the pioneering work of von Neumann and Ulam during the 1940's [1].Although initially the method was proposed merely for computing the inverse of a matrix, it was later generalized for solving linear algebra problems in a series of seminal papers, see [2,3] e.g., and [4,5] for further references.Briefly the underlying idea consists in generating a discrete Markov chain which evolves by random paths through the different indices of the matrix.Mathematically, the method can be seen in a way as a Monte Carlo sampling of the Neumann series of the matrix.The convergence of the method was rigorously established in [6], and improved further more recently (see for instance [7], and [8] just to cite a few references).More specifically, in [8,9] an important step forward has been done in the applicability of the probabilistic methods for solving more realistic problems.The method called Monte Carlo synthetic acceleration method is in fact a kind of hybrid scheme which combines the Richardson iterative method along with a Monte Carlo method.Essentially the role played by the Monte Carlo method consists in accelerating the convergence of the underlying iterative method, and has been shown to be competitive enough for a class a problems, comparing even with perhaps one of the most widely used iterative method such as the GMRES method.
Another related area of application of the probabilistic methods where some significant progress has recently made is in the field of matrix functions [10,11].In fact, in the specific case of the action of a matrix exponential over a vector, it was proposed in [12,13] a probabilistic method based on a multilevel Monte Carlo method [14], which as an important feature improves notably the typical slow convergence rate of any Monte Carlo method.The multilevel method has become in fact a widely used method for accelerating stochastic simulations in general (see the excellent review in [15] e.g., and [16] for a specific application to Markov Chains), and in particular for the matrix exponential has been shown in [13] that can be even competitive against the classical deterministic methods based on Krylov subspace methods.Specifically, for large scale problems and extremely large number of processors, the multilevel method clearly outperforms the deterministic method for solving problems consisting in large matrices, not only in terms of computational time, but also in terms of memory requirements.Another remarkable feature of the multilevel method is precisely to be a method that gives rise to automatic algorithms in the aforementioned sense.In fact, typically the only interaction of the user with the algorithm consists merely to set up initially the prescribed accuracy of the solution, and subsequently the algorithm is capable of reaching autonomously the desired goal.
The aim of this paper is precisely to apply such a multilevel method for the problem of computing the action of a resolvent matrix over a vector.This is done in practice exploiting the well-known connection existing between the resolvent matrix with the matrix exponential through the Laplace transform.The multilevel method derived in [13] for the case of the matrix exponential is conveniently adapted for this specific problem, as well as the convergence of the resulting method and computational cost of the underlying algorithm conveniently analyzed.Moreover, to test the performance of this new algorithm several numerical examples were run.Those concern the computation of the so-called Katz centrality, which describes some important features in complex networks, and the numerical solution of a linear system coming from the discretization of a boundary-value problems for an elliptic partial differential equation.Finally, another noteworthy contribution of the paper was to parallelize the resulting algorithm, and compare the scalability of the algorithm, when running both algorithms in a multicore architecture, with other available Monte Carlo methods.A key result from this comparison is the autonomous operation of our approach, that, unlike other available Monte Carlo methods [8], the proposed method does not require parameter tuning for optimal scaling.
The paper is organized as follows.The probabilistic representation of the vector solution is presented in Sect. 2 along with the description of the multilevel method for the resolvent of a matrix.Section 3 describes the implementation of the multilevel algorithm.The analysis of the numerical errors and convergence of the method is presented in Sect.4, while Sect. 5 is devoted to the analysis of the computational cost of the algorithm.In addition, in Sect.6 the algorithm is tested running a few numerical examples, and the parallel performance of the method is compared with the performance of a classical Monte Carlo method.Finally, this work is closed summarizing the high points of the paper and discussing potential directions for future research.

Mathematical Description of the Probabilistic Method
In order to apply a multilevel method to any Monte Carlo method, it is first required to have a probabilistic representation of the solution.To this purpose, next we describe the probabilistic method adopted for representing the action of the resolvent of a matrix over a vector.Let A = {a i j } n i, j=1 a given sparse n-by-n matrix, and v and x an n-dimensional vectors.Using the Laplace transform, it holds that provided A 2 < s.The integral above can be discretized using a suitable Newton-Cotes numerical quadrature, and after approximating the improper integral by a finite one we obtain where t i = (i − 1) t, i = 1, . . ., N , t = T /N the corresponding discretization step, and ω i the suitable weights corresponding to the chosen quadrature rules.Both, the integral discretization as well as the truncation of the improper integral by replacing the infinite limit by a finite one, T , introduces two source of errors which they will be analyzed in Sect. 4.
Concerning the action of the exponential matrix e t A over the vector v, this can be computed probabilistically resorting to the representation introduced in [12], and it was conveniently generalized here to deal with more general classes of matrices.Essentially the main idea consists in decomposing the matrix A as D − U .D is a diagonal matrix with entries d i j = 0 ∀i = j, d ii = d i = a ii + L ii , i = 1, . . ., n, and the matrix U with entries u i j is given by where σ = {σ i j } is a binary matrix with entries taking the value 1 when a i j < 0, and 0 otherwise.Here L = (L i j ) denotes a generalized Laplacian matrix, defined in the broad sense as a matrix with nonpositive off-diagonal entries L i j = −|a i j |, and zero row sums, that is L ii = − j =i L i j .Note that this does not constitute any restriction in the class of matrices amenable to be represented probabilistically.Quite the contrary one can see that any arbitrary matrix can be straightforward decomposed in such a way.Such decomposition allows in practice to approximate the action of the matrix exponential over the vector using a suitable splitting method.In [12] it was used the Strang method in view of being of sufficiently high order, and even more important for not introducing any additional computational cost when compared with the lower order Lie splitting method.
As it happens for the probabilistic representation described in [17], for this new representation it can be used as well for both, computing a single entry of the vector solution, or the full vector solution.In the following for simplicity we described merely the probabilistic representation for computing a single entry of the vector solution, being the derivation of the other straightforward (see [13],e.g.).Therefore, the probabilistic representation for computing a single entry of the action of the resolvent matrix over the vector reads as follows where For each starting point i, the variables i k , k = 1, . . ., N , correspond to a sequence of N discrete random variables with outcomes on S, and probabilities p i k−1 i k (t), k = 2, . . ., N , and p i i 1 (t) for k = 1, defined by P( t) for each k.Note that x corresponds in fact to an approximation of the true solution x, being the true solution recovered in the limit N → ∞ and T → ∞.This representation can be interpreted intuitively as follows: Lets generate a random path which starts at the chosen entry i of the vector x.This evolves according to a continuous-time Markov chain governed by the generator Q, moving therefore randomly from i to any possible state on S. We evaluate the N functions η k using the random values obtained along the process, and accumulated the number obtained after multiplying all previous functions η k weighted by the corresponding quadrature weight ωk .Finally the solution is obtained through a suitable expected value.

A Multilevel Method for Computing the Resolvent Matrix
As it was described in the literature (see the excellent survey in [15], and references therein), essentially the idea behind the so-called geometric multilevel Monte Carlo method (MLMC for short) consists in approximating the finest solution P L obtained to the level of discretization L using a sequence of coarser approximations obtained at previous levels l, from l 0 to L −1.This in practice entails accelerating the Monte Carlo simulations, since for a fixed given accuracy, the method generates more samples for the coarsest level with low computational cost, and less samples for the higher levels with higher computational cost.This method can be conveniently adapted to this specific problem, assuming in particular that the different levels of discretization correspond to the value of t l , being now t l = T /N l , with N l = 2 l .Moreover, concerning the minimum level of discretization l 0 , as it was already pointed out in [13], it should be chosen specifically different from zero.This is because the computational cost turns out to be almost independent of the level when simulating the coarsest level.The multilevel method can be expressed in mathematical form through the following telescoping series, where φ N e t l di N /2 v i N Since this series should be truncated for computational purpose, this entails a truncation error which is proportional to E[P L − P L−1 ].For a numerical purpose, when a finite independent sample of sizes M l , l = l 0 , . . ., L is used, Eq. ( 6) can be approximated by the following estimator, which is the empirical mean, Since an empirical mean is used, one has to generate M l independent samples {i (m) N } of the set of variables {i 1 , i 2 , . . ., i N }, and to evaluate the previously defined quantity P l in Eq. ( 6), denoted hereafter as P (m) l . The specific values taken by the set of random variables {i N } are determined by the transition probabilities P of the continuous-time Markov chain in Eq. ( 5).The detailed procedure followed so far to generate the random variables are described in Sect.3.
It becomes crucial here to remark that the samples used for computing the approximation at level l should be reused for computing the level l − 1.This is because the underlying correlation appearing between the two consecutive levels belonging to the same sample, works by reducing the overall variance.In fact, as it was pointed out in [14,18] the multilevel Monte Carlo method can be seen as a recursive application of the control variate technique [19], frequently used in applied statistics for variance reduction.For the specific problem of Monte Carlo path simulation in Finance, the correlation appears when the samples used correspond to same discretized Brownian path generated using different time stepsizes, and for more general applications and examples, see [15] e.g..In our particular problem, the procedure followed to reuse the paths are described in Sect.3. As a consequence, the multilevel method is capable of reducing the computational cost by choosing conveniently an optimal sample size M l , and this is done by keeping fixed the overall variance within a prescribed accuracy ε 2 .The detailed procedure, consisting mostly in a minimization process, to find the optimal sample size is explained in detail in [15].Here the main results are summarized only for the sake of completeness.The optimal sample size M l for an accuracy ε turns out to be where C l , m l and V l are the computational cost, mean and the variance for each level l, respectively.The overall computational cost and variance is calculated as follows

The Multilevel Algorithm
Before explaining the detailed algorithm followed to implement in practice the multilevel method described above, lets describe first the algorithm used to generate the random paths, and second the procedure followed to reuse the paths generated for a higher level of discretization l to a lower one l − 1. Essentially the algorithm to generate the paths was introduced first in [13] for the specific problem of computing the action of a matrix exponential over a vector, and here it is summarized for the purpose of illustration.The random paths are generated in practice as it is done for a typical continuous-time Markov chain.More specifically, if p i j (t) represents the transition probability matrix governing the transition between the state i and j, we can use the Kolmogorov's backward equation in Eq. ( 5) in integral form, where k i j = |L i j |/L ii , to simulate a random path.In practice the random paths are given by the transitions among the different states on S = {1, 2, • • • , n}, being those states the values taken by the set of random variables {i 1 , i 2 , . . ., i N }.Therefore, an algorithm to generate a path may work as follows: Generate a first random time S 0 obeying the exponential density function p(S 0 ) = L ii e −L ii S 0 ; Then, depending on whether S 0 < t or not, two different alternatives are taken; If S 0 > t, the algorithm is stopped, and no jump from the state i to a different state is taken; If, on the contrary, S 0 < t, then the state i jumps to a different state j, which is chosen randomly according to the probability function k i j , and a new second random number exponentially distributed S 1 governed now by the density function p(S 1 ) = L j j e −L j j S 1 is generated; If S 1 < (t − S 0 ) the algorithm proceeds repeating the same elementary rules, otherwise eventually it is stopped.
To illustrate graphically how the paths generated from a higher level are reused for the lower one, in Fig. 1 a sketch diagram for the case of l = 2 is shown.Here they are plotted the four different scenarios that may occur when generating random paths starting at the same point i.Then, from Eq. ( 4), the possible outcomes of the two random variables may Note that the last scenario contributes with zero to the term E[P 2 − P 1 ] in (6).
The pseudocode implementing the described multilevel method is shown in Algorithm 1, which consists in practice in the general setting for any implementation of the method particularized for this specific problem choosing a procedure to compute the mean m l and the second moment m 2l for any of the levels l.Note that both quantities are needed to compute the variance V l , which is given by l .This procedure is described in Algorithm 2. Algorithm 2. Procedure to compute a single entry i of the vector solution xi .

Numerical Errors
In this Section the different source of errors of the Monte Carlo method for computing Eq. ( 4) are discussed.Note that the error ε made in computing the vector solution at a single point i can be decomposed as follows where η (m) k corresponds to the mth sample of the random variable η k defined in (4), and Here the matrix D = D−s 1. Lets analyze then these four different errors separately.The first error ε 1 is due to the truncation of the improper integral in Eq. ( 1) by a finite one, where the unbounded domain of integration has been replaced by a finite one, replacing conveniently the infinite limit by a finite one, T .Such an error can be further evaluated and is given by Note that the integral can be computed analytically, and yields where B = s 1 − A. Moreover, it turns out that and since we assume A 2 < s, it holds that Here λ min (B) denotes the smallest eigenvalue of B, which corresponds in practice to s − λ max (A), being, λ max (A) the largest eigenvalue of A. Hence The formula above can be used to find the minimum value, T c , such as the error is less than the prescribed accuracy ε, and is given by To use in practice such formula the largest eigenvalue of the matrix A should be known.
In general, especially for large matrices, this could be a formidable task in itself, being required in general to use suitable available approximations.For the specific case of complex networks, there are indeed some useful approximations (see [20] e.g), and they have used in Fig. 17 to verify this estimation.In fact, in Fig. 17 it is plotted the result corresponding to the absolute error made when truncating the improper integral as function of the finite limit T .In this example the solution corresponds to the Katz centrality of a given node for two different complex networks of the same size.The theoretical solution, which is needed to compute the error, was obtained using Matlab, being Eq. ( 17) computed using a high accuracy numerical method.For this specific example we have used two different approximations for the largest eigenvalue of the adjacency matrix A. One assumes the largest eigenvalue equals to the maximum degree of the network, d max (Approx1), while the other one (much more accurate), assumes the value to be max(d avg , √ d max ) (Approx2), where d avg denotes the average degree of the network, d avg = 1 n n i=1 d i .Note how the latter approximation fits in fact much better with the theoretical error curve plotted in Fig. 2.
Concerning the second error ε 2 , this appears when the definite integral is approximated numerically using a suitable quadrature.Hence, the order of the error is merely the order of the error of the chosen quadrature rules, and therefore to minimize this error it would be advisable to use higher order methods However, as it will be shown later, it turns out that the order of the third error ε 3 is proved to be of order O( t 2 ).Then, it becomes useless at this point to implement any higher order method for this approximation, since the order of the error ε 3 becomes dominant.Consequently, for the algorithm proposed it was used a simple trapezoidal quadrature, which is well known to be of order O( t 2 ).Therefore, the quadrature weights used are given by ω i = t, i = 2, . . ., N − 1, and ω 1 = ω N = t/2.  2 Absolute numerical error in log-scale made when truncating the improper integral in Eq. (1) using a finite limit, T .The solution corresponds to the Katz centrality of two complex networks: a a small-world network, and b a scale-free network, both of size of 1000 nodes.The dashed (Approx 1) and dot-dashed (Approx 2) lines correspond to the solution obtained using the approximation of the largest eigenvalue as d max , and max(d avg , √ d max ), respectively.The value of the variable s was chosen to be d max /0.85 The analysis of the remaining errors, ε 3 and ε 4 , coincides exactly with the analysis done in [12], but for the sake of completeness it is summarized here.The third error ε 3 is due merely to the splitting procedure, as a result of decomposing the matrix A as D − U , and the error turns out to be of order O( t) or O( t 2 ) [21], depending on whether the Lie or the Strang splitting is used.As mentioned already in Sect.2, since the matrix D is diagonal matrix, it can be computed almost without any computational cost, being therefore much more convenient to adopt the Strang splitting to this purpose.The fourth error, ε 4 , is the pure Monte Carlo statistical error, and known to be of order O(M −1/2 ).In fact, it is well known that the arithmetic mean appearing in (11) provides the best unbiased estimator for the expected value in (4).In practice, one should simulate on the computer the random variables, based on generating random numbers.By doing so, the error made in replacing the expected value with the mean over a finite size sample is statistical in nature.More precisely, ε 4 turns out to be, for a large M value, approximately a random Gaussian variable with standard deviation proportional to M −1/2 , i.e., where σ denotes the square root of the variance, and ν is a standard normal (i.e., N (0, 1)) random variable.All this clearly shows that the proposed Monte Carlo method could in principle have a poor numerical performance, and also that the error is merely statistical, so it can only be bounded by some quantity with a certain degree of confidence.However, there already exists many available statistical techniques, such as variance reduction, and quasi-random numbers [22], that can be used, in practice, to improve greatly the order of the global error, and consequently the overall performance of the algorithm.

Computational Complexity of the Multilevel Algorithm
To properly estimate the computational complexity of the multilevel algorithm, it is required to establish first the convergence rate of both, the mean m l = |E[P l − P]| and variance , as a function of the corresponding level l.In [13] this was done for the particular problem of computing the action of a matrix exponential over a vector.Note that the method proposed here required to compute the action of a matrix exponential over a vector, and therefore the results found there can be applied here straightforwardly.More specifically in [13] it was proved that the mean |E[P l − P]| turns out to be of order O( t 2 l ).This is because the splitting method used for the computing the matrix exponential was the Strang splitting.Besides computing the matrix exponential, recall that it is required as well to approximate numerically the definite integral in Eq. ( 1).As was mentioned above this is done through a suitable trapezoidal numerical quadrature.However, this procedure does not modify the order of convergence, and this can be seen readily through the following Lemma.

Lemma 1 Let consider I = N
i=1 ω i f i the discretization of a given definite integral obtained by means of a trapezoidal numerical quadrature, where ω i are the corresponding quadrature weights, and f i the values of a given function evaluated at equally spaced points within the integration domain.Assume that the values f i , i = 1, . . .N are known with an error of order O( t 3 ).Then it holds that the error of I is of order O( t 2 ).
Proof If every term of the sum is of order O( t 3 ), the sum of all of them turns out to be of order of O(N 2 t 3 ), and since N = T / t the order is reduced in a factor of two.However, after multiplying by the trapezoidal weights, which are proportional to t, the order of the error becomes finally O( t 2 ).
Concerning the convergence rate of the variance V [P l − P l−1 ], the same result already proved for the mean holds.Since the convergence rate of the variance for computing the action of the matrix exponential over a vector was proved to be O( t 2 l ) in [13], the convergence rate for this specific problem turns out to be similarly of order O( t 2 l ).In Fig. 3, the mean E[P l − P l−1 ]| (a) and V [P l − P l−1 ] (b) are shown as a function of the level l.The matrices correspond to the adjacency matrices of a small-world network of two different size.Note that the obtained numerical convergence rate fully agrees with the theoretical estimation.
To determine theoretically the computational complexity of the multilevel algorithm, it is needed also an estimation of the computational time of the Monte Carlo method.We have seen that the numerical method requires to compute in practice the action of the matrix exponential over a vector, and note that this should be done a few times, namely as many as the number of discretized points of the numerical quadrature.Therefore, one might wrongly conclude that the overall computational time depends on such a number.However, it turns out that the matrix exponential required for any time step is computed using the matrix exponential obtained for all the previous time steps, as it can been seen from Eq. ( 2).In practice this means that is enough to compute once the matrix exponential (and only for the finite limit T c ), provided that all the values of the matrix exponential obtained for intermediate times are conveniently saved.Therefore, the computational time of the algorithm is merely due to the computational time required to compute the action of a matrix exponential over a vector evaluated exclusively at T c .This time was already estimated in [12].In Fig. 4 the results corresponding to the CPU time spent by the Monte Carlo algorithm when computing the Katz centrality of two different networks characterized by different values of d avg are plotted.Note that for t l sufficiently large (or equivalently l sufficiently small) the computational time tends to a constant value, while for smaller values the computational time scales as 1/ t l .Using the theoretical estimation in [12], it can be estimated that the minimum value of the level l 0 to obtain a better performance of the MLMC algorithm is given by Known the convergence rates of the mean, variance, and computational time, the theorem in [15] can be applied, and as a result it can be established that the computational complexity of the proposed MLMC algorithm for computing the resolvent of a matrix should be of order O(ε −2 ).This contrasts favorably with the complexity of the Monte Carlo method, which can be concluded to be of order O(ε −5/2 ).In fact, the action of the matrix exponential over a vector using the Monte Carlo method was estimated in [12] as being of order O(ε −5/2 ).Recall that the algorithm for computing the resolvent matrix requires computing such a matrix exponential, and to avoid calculating such matrix exponential several times for each time steps, we can use the same procedure explained already for the MLMC method.Consequently we can conclude than the computational time of the Monte Carlo method for this problem remains of order O(ε −5/2 ).In Fig. 5 the results corresponding to the computational time spent to compute the Katz centrality of a single node of a small-world network of size n = 10 6 is plotted as a function of the chosen prescribed accuracy ε for both, the MLMC method and the Monte Carlo method.Note the perfect agreement with the theoretical estimates, and the performance notably superior to the Monte Carlo method for lower accuracy values.
6 Numerical Examples and Computational-Related Issues

Numerical Examples
In this section the results corresponding to several numerical examples are shown.The chosen examples consist in both, the numerical computation of the aforementioned metric Katz centrality in some synthetic complex networks and the numerical solution of some elliptic boundary-value problems with Dirichlet boundary conditions.More specifically, these are the Poisson equation discretized by means of the finite difference method, and a reactiondiffusion equation in an arbitrary domain discretized by the finite element method.In all of them the solution is computed exclusively at a single node.Concerning the synthetic networks, these consist of small-world and scale-free networks that can be generated easily for an arbitrary size using the functions smallw, and pref, respectively, freely available through the toolbox CONTEST [23] for Matlab.
To compare the performance of the algorithm with others available in the literature, we implemented a different available Monte Carlo method.This method was first proposed in [17], and in the following to distinguish from our Monte Carlo method it will be termed the classical Monte Carlo method.Basically this method depends on two free parameters to be fixed by the user according to the accuracy desired for the solution.These are first, the so-called history length, which basically is related with the number of random jumps allows to occur within the matrix before stopping the algorithm, and second, the number of random walks picked up from a finite sample.Both discretized parameters are involved in different source of errors.Essentially they are the equivalent of the bias error, and statistical error appearing as source of errors for the MLMC method, but they are treated by the algorithm in a totally different way.In fact, while the classical Monte Carlo algorithm considers both separately, the MLMC algorithm works with both parameters in a unified way within the goal to reach the prescribed accuracy for the solution.The algorithm proposed in [17] is indeed adaptative, being capable of finding automatically the optimal solution within a prescribed accuracy.However, note that this is done considering only the statistical error, but seems to fail to find the optimal history length satisfying such an accuracy.Concerning the statistical error this is done basically by increasing progressively the number of random walks simulated until a certain condition (related basically with the variance) is satisfied.On the other hand, the history length, which controls the bias error, this should be fixed independently according to another stopping criterion of the algorithm.This truncation criterion to stop running the random walks was proposed in [9], and consists in imposing a certain relative weight, which acts basically as a cutoff threshold stopping automatically the random walk whenever the underlying random variable overcomes such a threshold.However, it becomes not trivial to relate the value of this threshold with the statistical error, and with the prescribed accuracy of the solution desired by the user.In practice what happens is that the user may have to run several times the algorithm, changing accordingly such a threshold, until the desired solution within the prescribed accuracy is finally reached.Rather the multilevel algorithm by construction is a truly automatic algorithm, meaning that the only intervention of the user consists in setting initially the accuracy desired for the solution, leaving the algorithm to find alone both the optimal sample size and history length.Moreover, this offers a further computational advantage for the parallelization point of view, as it will be discussed below.
All Monte Carlo codes were implemented in Fortran 90 and the simulations run in a multicore architecture consisting on a computer equipped with an AMD Ryzen 1800X Octa-core at 3.6 GHz with 32 GB of RAM.
Due to the random nature of any of the Monte Carlo algorithms, it is worth observing that the measured computational time can vary from simulation to simulation.To mitigate such a variability of the computational times, the CPU times shown in the tables below correspond to an average CPU time obtained repeating the simulations with 10 different initial random seeds of the pseudorandom generator.

Example A: Complex Networks
As it was mentioned above, in the case of complex networks the evaluation of the resolvent matrix over a vector is related with the so-called Katz centrality of the network.In fact, the Katz centrality of a node i of a network is defined mathematically [24,25] as where A is the adjacency matrix of the network, 1 a vector of ones, an α, with 0 < α < 1/λ max (A), is a attenuation factor suitable chosen.Intuitively, the Katz centrality is a metric of the network that measures the relative degree of influence of a node within the network, weighting conveniently the importance of the connection between the node i and distant neighbors by an attenuation factor α.
In Tables 1 and 2 the CPU time spent to compute the Katz centrality of a small-world network and scale-free network using the Monte Carlo method, the MLMC method, and the classical Monte Carlo method in [17] is shown.This has been done for different network sizes, being the accuracy kept fixed to ε = 3 × 10 −5 for all simulations.Note that the CPU time is almost independent of the network size for any of the Monte Carlo methods.This is due to the fact that the Monte Carlo method essentially is based on local computations, rather than the classical deterministic methods which typically require as part of the algorithm multiplying a matrix by vector or even worse a matrix by a matrix.This gives rise in practice to an implicit dependence on the size of the problem as it can be seen in [12].
Concerning the results, it is worth observing the notably performance of both, the MLMC method and the classical Monte Carlo, compared with the Monte Carlo method, and a similar performance between the MLMC method and the classical method.

Example B: Partial Differential Equations
Example 1 This example concerns the numerical solution of a Dirichlet boundary value problem consisting in the 2D Poisson equation and solved on a square domain with zero Dirichlet boundary condition, u(x, y)| ∂ = 0.The domain is conveniently discretized using a computational grid with discretization parameters x = y = z = 2/n x , and the operator by a finite difference scheme using the standard 5−point stencil [26].Therefore, the discretized problem to be solved is A u = f , The accuracy ε of the solution was kept fixed to 3 × 10 −5 The accuracy ε of the solution was kept fixed to 3 × 10 −5 being A the well-known block tridiagonal matrix corresponding to the discrete Laplace operator with zero Dirichlet boundary conditions, and f the source term evaluated at the grid nodes.
The Monte Carlo method can be applied readily to solve such a linear problem, but first to guarantee the convergence of the method, it is necessary to use a suitable preconditioner as it was pointed out in [17], and second to rewrite the solution of the problem as follows Here P denotes the preconditioner, and H = 1 − P −1 A. Specifically for this problem has been used a left Jacobi preconditioner.Comparing Eq. ( 1) with Eq. ( 26), note that Eq. ( 26) corresponds indeed to the action of the resolvent of the matrix H (setting s = 1) over the vector P −1 f , and therefore both, the Monte Carlo method and the MLMC method can be applied indeed for solving this problem.Table 3 shows the computational time spent by the three Monte Carlo algorithms for different grid sizes.As in the previous example, the same conclusions can be drawn.In fact, both, the MLMC method and the classical Monte Carlo method, exhibit a similar performance, being notably superior than the performance of the Monte Carlo method, and furthermore this holds for any size of the problem.
Finally, it is worth observing in Table 3 that the computational time spent by all three methods scales almost linearly with the matrix size.In fact, when the grid size is scaled by a factor of 4, or equivalently the matrix size by a factor of 16, the computational time increases approximately by a factor equal to 16, since n = n 2 x .This can be explained as follows: The largest eigenvalue of the discrete 2D Laplace operator with zero Dirichlet boundary condition is known to be The accuracy ε of the solution was kept fixed to 10 −3 , and the chosen point i = n/2 therefore λ max (H ) = 1−2 sin 2 ( π 2n x +2 ).From the definition of T c in Eq. ( 21), asymptotically for large n x , it holds that T c ∼ n 2 x ln (n x ), and therefore from Eq. ( 23) it follows that l c ∼ log 2 (n 2 x ln n x ).Note that the computational time of the MLMC method depends on the value of the minimum level l c , being T M L MC > T C PU (l c ).Using the theoretical estimation in [12] asymptotically for sufficiently small t l , it is known that T C PU (l) ∼ 2 l , then it holds that

Example 2
This second example consists in the numerical solution of the underlying linear algebra problem corresponding to the discretization of a 2D reaction-diffusion equation given by where ⊂ R 2 , ∇ = ( ∂ ∂ x , ∂ ∂ y ), and β(x, y) corresponds to the diffusion coefficient characterized by a 2 × 2 positive-definite matrix β = {β i j } 2 i, j=1 .In particular, for this example this matrix has been chosen to be diagonal with entries β 11 = 1 + y 2 /α 2 (α 2 − x 2 /α 2 ), and β 22 = 1 + x 4 /α 4 (α 2 − y 2 /α 2 ).The domain consists in an arbitrary geometry, which is plotted in Fig. 6, and the Dirichlet boundary data was chosen to be 0 at both, the inner and outer circle with radius 0.25α and α, respectively.The size of the domain can be conveniently increased by simply rescaling both circles using a single scale parameter α.To generate the computational mesh, and obtaining the corresponding FEM stiffness matrix and right-hand side vector, the scientific software COMSOL [27] was used, setting specifically linear elements as the discretization basis.Different values of the corresponding maximum element size h max used when meshing the geometry was chosen to test the algorithms for different matrix sizes.Similarly to the previous example, to ensure the convergence of the Monte Carlo method it is required to use suitable preconditioners.Also in this example it has been used a left Jacobi preconditioner as it was described in Eq. ( 26).However, note that for this more involved problem, the maximum eigenvalue of the matrix is not theoretically known, being therefore necessary to resort to suitable approximations in order to apply Eq. ( 21) and thus obtaining the minimum value, T c .For this purpose it has been used the Gershgorin circle theorem to find a reasonable bound for the maximum eigenvalue of the matrix.
The computational time spent for computing the solution at a single point inside the domain is shown in Table 4, and the same conclusions hold as in the previous example.

Computational-Related Issues
Parallel Performance To analyze the scalability of the Monte Carlo methods, the algorithms were conveniently parallelized using OpenMP.This allows to exploit fully the multi-core  The accuracy ε of the solution was kept fixed to 10 −3 , the scale parameter α to 10, and the chosen point x = (0, 0) architecture of the available server.In Table 5 the computational time required to compute the Katz centrality of a small-world network of size n = 10 7 is shown as function of the number of cores.Note that both the Monte Carlo method, and MLMC method exhibit a remarkable scalability being almost close to the ideal one.This is not surprising, since as it was theoretically anticipated before, any Monte Carlo method requires to compute independent simulations which can be trivially split in so many independent tasks as the available cores of the server, being therefore the intercommunication overhead of the parallelized algorithm almost negligible.
For the classical Monte Carlo method, as was explained above it is based on a adaptive algorithm, which requires to fix a given parameter before running in parallel .It consists The size of the matrix is n = 10 7 , and the accuracy ε was kept fixed to 3 × 10 −5 .The simulations were run on a octa-core server namely on the number of random paths N 0 chosen to be increased for each iteration of the algorithm, and it turns out that this parameter is fundamental to improve the scalability of the algorithm.In fact, this parameter affects indeed the total iterations of the algorithm, and moreover has a strong impact in the intercommunication overhead of the algorithm, and consequently in the scalability and parallel performance.In fact, as it can be seen in Table 5, for a fixed number of cores increasing N 0 has always a positive impact on the scalability improving accordingly the speed-up of the algorithm.This is because in practice it reduces the total iterations needed to achieve convergence.However note that this may increase unnecessarily the overall CPU time, since it could happen that the total number of random paths simulated exceeds unnecessarily the random paths needed to attain the prescribed accuracy.This would never happen with the MLMC method, because the algorithm exploits the mathematical relation in Eq. (8).Therefore the MLMC algorithm is capable of computing precisely the number of random paths needed to attain the prescribed accuracy, provided the variance and computational cost for each level is known.Since in practice these values have to be computed numerically as well, further iterations may be needed to improve the accuracy of such values.However, it is important to remark that typically only a few number of iterations are required after all.This indeed explains the remarkable scalability of the algorithm observed in Table 5. Simultaneous computing of both, Katz centrality and total subgraph communicability, of a complex network Another important feature of the MLMC method compared with the classical Monte Carlo is the fact that without any additional computational cost it allows to compute other useful metrics, such as the total subgraph communicability of a node [24], and this could be done simultaneously.This metric is defined as and measures the importance of the node i, weighting now the walks of length k by a penalty factor of value β k /k!.Concerning β, it is typically interpreted as an effective "temperature" of the network (see [28], e.g.).Note from Eq. ( 2) that both, the Monte Carlo method and the MLMC method, compute automatically ancillary quantities such as as part of the algorithm.These quantities are trivially related with the total subgraph communicability of a given node for a discrete set of effective temperatures as follows The network is a small-world network of size n = 10 7 .The solid line corresponds to the solution obtained using a Krylov-based method, while the dotted-line to the solution obtained simultaneously when computing the Katz centrality using a modified MLMC algorithm capable of saving the ancillary quantities.In the inset it is shown the relative error of the computed solution where β j = j t.Although in practice they are not saved individually, because they are not indeed needed to compute the Katz centrality, being rather only effectively computed the sum of all of them, however a simple modification in the algorithm would allow in practice saving such quantities.Furthermore this might be done without almost any additional computational cost of the algorithm.In case of the MLMC method, a crucial point here is to ensure that the numerical error made when computing these ancillary quantities does not exceed the prescribed accuracy.In fact, this cannot be guaranteed by the method, because the proposed MLMC method was developed to compute exclusively the resolvent matrix within a prescribed accuracy, and not any of their ancillary quantities.Nevertheless, in all the examples analyzed so far the error made never exceeded the prescribed accuracy of the algorithm as it is shown in Fig. 7 for the particular case of a small-world network.The numerical error was computed here using the solution obtained using a Krylov-based method [29] of higher accuracy as it was the theoretical solution.

Conclusion
The goal of this paper was to propose a new probabilistic method based on the multilevel Monte Carlo method for computing the action of the resolvent matrix over a vector.More specifically, the idea was to compute such an action resorting to the numerical evaluation of the Laplace transform of the matrix exponential, because it is known that the action of a matrix exponential over a vector can be computed efficiently using a multilevel Monte Carlo method.In fact a method was introduced recently in the literature for such a purpose, and essentially it requires generating suitable random paths which evolve through the indices of the matrix according to the probability law of a continuous-time Markov chain governed by the associated generalized Laplacian matrix.The convergence of the proposed multilevel method has been conveniently analyzed in this paper, and several numerical examples were run to test the performance of the algorithm.Since there is in the literature other well-established Monte Carlo method for solving linear algebra systems, we compared in this paper the results obtained using both methods.It is needless to say that being both methods based on the Monte Carlo method they share similar advantages from a computational point of view, such as the comparative ease of implementation in parallel, fault-tolerant, and in general they are well suited for heterogeneous architectures.However, we show here that the multilevel stands out especially in a feature that is apparently lacking in the classical method, which is the autonomous operation of the multilevel algorithm, in contrast with the classical Monte Carlo algorithm.In fact, for the multilevel method is enough, in general, to prescribe the desired accuracy of the solution, and the algorithm proceeds automatically in order to meet the requirements established for the solution.Rather the classical method requires typically a continued surveillance of the underlying errors by the user, being often necessary to repeat simulations in order to satisfy the requirements demanded for the solution in terms of accuracy.
This feature of the multilevel method is of paramount importance, and it was namely one of the main goals of this paper.In fact, it was not intended to show here that the multilevel method is more efficient than the classical Monte Carlo method, which is clearly not the case in view of the results obtained, but rather to draw the attention to this inherent feature of the multilevel method.Other than facilitating the interaction of the user with the algorithm, it has a positive impact in the scalability of the algorithm when parallelized conveniently, as is also shown in this paper.To this purpose both methods have been parallelized and some examples run in a multicore architecture.The results show in general a clearly better scalability of the multilevel method compared with the classical Monte Carlo.
Finally, a further advantage of the proposed method is discussed, and lies in the potential capability of the method to compute simultaneously two different metrics of a complex networks for about the same computational cost.These are the Katz centrality, and the total subgraph communicability of a network.This is discussed to close the paper, being left as a possible future work the analysis of the associated errors.
/2 ω k , and di = d i − s.Concerning φ k , this corresponds to a two-point random variable taking values −1 and 1 with a probability related to the matrix σ .Q = −L is the infinitesimal generator of a continuous-time Markov chain on the set S = {1, 2, • • • , n}, being the matrix transition probability P = ( p i j ) the solution of the Kolmogorov's backward equations,

Fig. 1
Fig. 1 Sketch diagram showing the four possible sampled paths obtained for level l = 2, and for a matrix of size N = 10.The solid line corresponds to a random path obtained for a level number l, and the dotted line with l − 1 Fig. 2Absolute numerical error in log-scale made when truncating the improper integral in Eq. (1) using a finite limit, T .The solution corresponds to the Katz centrality of two complex networks: a a small-world network, and b a scale-free network, both of size of 1000 nodes.The dashed (Approx 1) and dot-dashed (Approx 2) lines correspond to the solution obtained using the approximation of the largest eigenvalue as d max , and max(d avg , √ d max ), respectively.The value of the variable s was chosen to be d max /0.85

Fig. 3 a
Fig. 3 a Mean and b variance of P l − P l−1 in log 2 scale versus the level number l obtained numerically.The adjacency matrix corresponds to a small-world network of different size n.The brown line denotes to an ancillary function of slope −2 (Color figure online)

Fig. 4
Fig. 4 Computational time in log 2 scale versus the level l for adjacency matrices corresponding to two different complex networks of size n = 10 6 .The brown line corresponds to an ancillary function of slope 1 (Color figure online)

Fig. 5
Fig. 5 Computational time as function of the prescribed accuracy ε, both in log 10 scale.The brown solid line corresponds to an ancillary function of slope −5/2, while the dotted line to a function of a slope −2.The results correspond to the Katz centrality for a single node of a small-world network of size n = 10 6 (Color figure online)

Fig. 6
Fig. 6 Computational mesh describing the domain used for solving the 2D reaction-diffusion equation

Fig. 7
Fig.7 Total subgraph communicability of a single node (n/2) for different values of the effective temperature t.The network is a small-world network of size n = 10 7 .The solid line corresponds to the solution obtained using a Krylov-based method, while the dotted-line to the solution obtained simultaneously when computing the Katz centrality using a modified MLMC algorithm capable of saving the ancillary quantities.In the inset it is shown the relative error of the computed solution

Table 1
CPU time spent for computing the Katz centrality of a small-world network using the Monte Carlo method (MC), the multilevel Monte Carlo method (M L MC), and the classical Monte Carlo method (MC c )

Table 2
CPU time spent for computing the Katz centrality of a scale-free network using the Monte Carlo method (MC), the multilevel Monte Carlo method (M L MC), and the classical Monte Carlo method (MC c )

Table 3
CPU time spent for computing the solution of the 2D Poisson equation at a single point using the Monte Carlo method (MC), the multilevel Monte Carlo method (M L MC), and the classical Monte Carlo method (MC c )

Table 4
CPU time spent for computing the solution of a 2D reaction-diffusion equation at a single point using the Monte Carlo method (MC), the multilevel Monte Carlo method (M L MC), and the classical Monte Carlo method (MC c )

Table 5
Elapsed time spent for computing the Katz centrality of a small-world network as a function of the number of cores