A new perspective on the quaternionic numerical range of normal matrices

A new geometric proof of a known result characterizing the quaternionic numerical range of normal matrices is proposed. Our proof can be interpreted in probabilistic terms.


Introduction and preliminaries
Let F ∈ {H, C}, where H and C stand, respectively, for the quaternionic and the complex fields. Let S F n = {x ∈ F n : x = 1}. For a square matrix A of size n > 1 over F, A ∈ M n (F), the set W F (A) = {x * Ax : x ∈ S F n } is called the numerical range of A in F. Recall that every quaternionic normal matrix is unitarily equivalent to a complex diagonal matrix A = diag(d k ) n k=1 , where d k = h k + s k i (s k ≥ 0) are the eigenvalues of A (see [1, p.178]). Hence, we can write A = H + Si, where H and S are diagonal real matrices. Since the numerical range is invariant under unitary equivalence (see [2,Theorem 3.5.4]), we will consider normal matrices in diagonal form. We will also assume, without loss of generality, that h 1 = min{h k : k = 1, . . . , n} and, until Theorem 2.4, that s k > 0. So et al. ([1,Main Theorem,p.192]) proved that the upper bild is the convex hull of the eigenvalues and certain real numbers, constructed from pairs of non-real eigenvalues, named cone vertices. The main idea of their proof was to define an optimization problem with side conditions and use Lagrange multipliers. In 1995, two independent proofs were presented by Au-Yeung [3] and Zhang [4]. In this article, we propose a new geometric proof of the same result, bearing inspiration from probability theory.
We start by characterizing the elements in the quaternionic numerical range. It is an easy exercise to show that any element in the quaternionic numerical range can be written as using the decomposition z = x + yj ∈ H n , with x, y ∈ C n . Taking into account that A can be written as A = H + Si, with H = A+A * 2 and S = A−A * 2i , it follows that B(A) = W H (A) ∩ C, the bild of A, is given by (see [5]), For future reference denote the above conditions by: It is useful to work instead with the upper bild B + (A) = B(A) ∩ C + since this allows us to use convexity (see [6]). From Equation (1), each element ω ∈ B + (A) is a convex combination of elements ω x ∈ W C (A) ⊆ C + (because s k > 0) and ω y ∈ W C (A * ) ⊆ C − , that is, ω = αω x + (1 − α)ω y , with α ∈ (0, 1]. Since there is a real ω r = βω x + (1 − β)ω y , 0 < β < α, that lies in the same segment [ω x , ω y ], we conclude that ω is a convex combination of ω x and ω r . Therefore, In other words, B + (A) = conv{d 1 , . . . , d n , v, v} and, in order to determine the shape of B + (A) we just need to obtain the minimum and maximum reals (we will focus on v) in the numerical range. For this matter, it is important to characterize the real elements in the bild. It follows from (1) and the decomposition A = H + Si, that a further condition on (x, y) must be satisfied: Together with (I) and (II), the three conditions are necessary and sufficient for an element of the form (1) to belong to the real part of the numerical range. Some of these real elements are given by where α i,j ∈ (0, 1), satisfies We now define the relevant values of c i,j for future use, If we take x = √ α i,j e i and y = 1 − α i,j e j (e k denotes the kth canonical unit vector) which satisfy (I) -(III), when i = j, we see that , for some k = 1. Above, we proved that c 1,j ∈ W(A), for any j = 1. Since any other ω must be a convex combination of elements of D and D * , the only possible real value smaller than all c 1,j would be c 1,1 . However, as we will prove, c 1,1 does not belong to the numerical range, as it is obtained from a pair (x, y) which does not satisfy conditions (I)-(III).
In order to motivate our approach we now introduce some concepts in a heuristic way. A convex combination can be seen as the expected value of a probability distribution. For instance, The argument of our proof is partially supported on this observation. In particular, we will interpret an element from the where (x, y) ∈ S C 2n , with x || = (|x 1 | 2 , . . . , |x n | 2 ) and y || = (|y 1 | 2 , . . . , |y n | 2 ). However, we will look at this probability distribution differently. Namely, we will use the probability distribution that arises from the process of first choosing randomly a pair (d i , d * j ), using a probability θ ∈ (D × D * ), and then choosing randomly one element from that pair, d i with probability α i,j and d * j with probability 1 − α i,j . Here, (S) denotes the set of probability distributions over S. This process creates a new probability distribution α(θ) over D ∪ D * , i.e. α(θ) ∈ (D ∪ D * ). The choice of θ should be coherent with the initial probability γ , in the sense that α(θ) = γ .
Using the law of total probability we have that the probability α(θ ) for the element d i is Analogously, for the element d * j , we have In this way, we define the function α : , for all d ∈ D ∪ D * . Accordingly, we define D (γ ) to be the set of probability distributions θ ∈ (D × D * ) coherent with γ ∈ (D ∪ D * ), that is, We will show that the set of coherent probability distributions is non-empty (see Lemma 2.2). Moreover, we will find out that, if (x, y) satisfies (I) -(III), there is a coherent θ such that θ(d 1 , d * 1 ) = θ 1,1 = 0, meaning that this distribution gives probability zero to the pair (d 1 , d * 1 ), see Lemma 2.3. Using (I) -(III) on (1), a real element in the bild can be written as h = x * Hx + y * Hy. Rewriting h as a convex combination of c i,j 's, h = i,j θ i,j c i,j , we find that h ≥ c, since θ 1,1 = 0. This is the content of the main result of the paper, see Theorem 2.4. We finalize by observing that the ideal case where v = h 1 is not attainable and the minimum is, morally speaking, the second best case.

Numerical range of normal matrices
We begin with a characterization of the upper bild in terms of the complex numerical range and two real values. On the other hand, there is β ∈ (0, α) such that [7,Corollary 3.3]), by convexity and compactness of the upperbild, the set B(A) ∩ R is a closed interval, and so ω ∈ conv{W C (A), v, v}. We then have that B + (A) ⊆ conv{W C (A), v, v}. The converse inclusion follows trivially from the convexity of the upperbild.

satisfies (I) and (II). It is easy to see that
To characterize the reals v and v, we need some technical lemmas. We start by noticing that the set D (γ ) of probability distributions over D × D * coherent with γ = (x || , y || ) (see (5)) is, under mild conditions, non-empty. This result is a special case of a more general lemma that we prove in the appendix. For any (x, y) satisfying (III), D (x || , y || ) = ∅.

Proof:
We start by noting that s i |y i | 2 and s 1 |y implies x * Sy = 0.
In fact, if (6) holds, by Holder's and triangle's inequalities we have Thus, s 1 x * 1 y 1 = − n i=2 s i y i x * i , and x * Sy = 0. Therefore, if (II) holds, then From (III), which can be written as n i=1 s i (|x i | 2 − |y i | 2 ) = 0, the two conditions in (7) are equivalent.
We will now consider the case where S ≥ 0. Let A = A 1 ⊕ A 2 , where A j = H j + S j i (j = 1, 2) with S 1 = 0 and S 2 > 0. Without loss of generality, we may assume Therefore v = min{v 1 , v 2 }. We know that v 1 is the smallest entry on the diagonal of A 1 . From the previous case, since S 2 > 0, we have v 2 = min{c i,j : i = j, i, j ∈ {k + 1, . . . , n}}. Taking into account (2) and (3), c i,j = h i ≥ v 1 when 1 ≤ i ≤ k and k + 1 ≤ j ≤ n, and the conclusion that v = min{c i,j : i = j} = c follows. The proof for the maximum goes along the same lines.

Disclosure statement
No potential conflict of interest was reported by the author(s).