Repositório ISCTE-IUL

—Non-orthogonal multiple access (NOMA) concatenated with multiple-input multiple-output (MIMO) or with massive MIMO, has been under scrutiny for both broadband and machine-type communications (MTC), even though it has not been adopted in the latest 5G standard (3GPP Release 16), being left for beyond 5G. This paper dwells on the problems causing such cautiousness, and surveys different NOMA proposals for the downlink in cell-centered systems. Because acquiring channel state information at the transmitter (CSIT) may be hard, open-loop operation is an option. However, when users clustering is possible, due to some common statistical CSI, closed-loop operation should be exploited. The paper numerically compares these two operating modes. The users are clustered in beams and then successive interference cancellation (SIC) separates the power-domain NOMA (PD-NOMA) signals at the terminals. In the precoded closed-loop system, the Karhunen-Lo`eve channel decomposition is used assuming that users within a cluster share the same slowly changing spatial correlation matrix. For a comparable number of antennas the two options perform similarly, however, while in the open-loop downlink the number of antennas at the BS is limited in practice, this restriction is waived in the precoded systems, with massive MIMO allowing for a larger number of clusters.


I. INTRODUCTION
Non-orthogonal multiple access (NOMA) has been much studied for next generation wireless systems, including 5G, when dense networks are envisaged due to its ability to further enhance the overall spectral efficiency [1,2,3].In contrast to orthogonal multiple access (OMA), in NOMA all users are superimposed in the time, frequency, or the code domains and then separated by means of successive interference cancellation (SIC) or parallel interference cancellation (PIC), and achieve all points in the capacity region of the multiple access channel (MAC) region [4].In power-domain NOMA (PD-NOMA) the signals are separated at the receivers taking in consideration their different power levels, although other types of NOMA exist; a range of other categories of NOMA are described in [5]: scrambling-based NOMA, spreading-based NOMA, codingbased NOMA, and interleaving-based NOMA.In addition to those, NOMA based on the partitioning of lattices [6] have also proved well even when the channel gains of the users in the same cluster are similar [7,8,9,10], which is a requirement for a good performance in PD-NOMA.Both multiple-input multiple-output (MIMO) schemes compared in this paper use PD-NOMA, overwhelmingly the most studied type of NOMA.Clustered MIMO-NOMA has also been studied for millimeter waves ranging from 28 GHz up to 73 GHz [11], exhibiting a superior sum-rate than OMA, as initially suggested by [3].In [12] multiple analog beams are formed to further create more NOMA groups and therefore increase performance for any angular distribution of the users' positions.
NOMA was included in long-term evolution (LTE) Release 14, under the name multi-user superposition transmission (MUST) [13], multiplexing two users, and despite the information-theoretic foundations of NOMA [14], its practical application has been postponed by 3GPP for the 5G standard, after having been analyzed during the 3GPP technical tasks.The authors in [15] show some reasons underpinning that decision.They show how NOMA only outperforms multi-user MIMO (MU-MIMO) when the system loading (defined in respect to the length of the quasi-orthogonal sequences used in a spreadingbased or coding-based NOMA system) gets larger.However, at lower overloading factors, the use of several slots (or resources in general) by a spreading-based or coding-based NOMA makes it less spectrally efficient than MU-MIMO.NOMA only outperforms MU-MIMO at high signal-to-noise ratio (SNR), due to the inherent interference-limitation in MU-MIMO, but even so with almost negligible gain (around 1 dB).
In the case of PD-NOMA, the number of users supported in the power-domain is always very low, typically two or three users, due to SIC error propagation.The issue of fair power allocation is prone to a discussion over the definition of fairness in the context of NOMA (see [16] and references therein).In [17] it was proposed a power allocation policy based on stochastic geometry to take into account the distribution of the users' location.While most papers analyze NOMA in a singlecell scenario, the problem in real multi-cell scenarios raised the problem of how to associate users to a cell while maximizing the system's sum-rate.This problem is dealt with in [18], applying matching-theoretic algorithms.The multi-cell scenario can be enhanced by considering different types of service requirements in different cell and a power allocation algorithm that takes in consideration different types of data traffic is proposed in [19].The benefits of PD-NOMA over OMA highly depend on the differences between the channel gains to each user and a well-designed system should have the option of switching to a OMA at times.The authors in [20] recently bridged this gap by proposing a utility cost that takes in consideration the costs and gains associated to each MAC mode such that the system can opt between them.When device-to-device (D2D) communication exists in a cellular communication environment, NOMA can be used to multiplex the communication from one transmitting terminal to two receiving devices [21].While the number of users possible to multiplex in PD-NOMA is extremely limited (2 or 3 users only), the concatenation of PD-NOMA with orthogonal frequency division multiplexing (OFDM) largely increases the number of multiplexed users in the overall system.The problem then becomes the one of user grouping and power allocation, for which simple greedy algorithms can perform quite decently [22].
When MIMO-NOMA is considered, the natural approach is to spatially cluster users which, from a MU-MIMO precoding point of view, behave as one virtual-user [23,24].Subsequently, the messages to each user in a cluster are separated by SIC.This requires channel state information at the transmitter (CSIT), which can be challenging to obtain, and [25] shows how to refine the quality of the CSIT.In the scenario of machine-type communications (MTC) [26], where terminals are particularly simple and energy-constrained, CSIT is even harder to attain.Moreover, user grouping is also hard given the combinatorial nature of the problem.An optimal solution to the user-selection problem to form NOMA groups (which are then differentiated in some orthogonal domain) is given in [27], but only for singleantenna BS and single-antenna users, and only for groups with two users.Obtaining CSIT with a massive MIMO base station (BS) is even more challenging, due to the sheer number of channels; a technique to mitigate intra-cluster pilot contamination has been proposed in [28].
This paper looks at the two most important MIMO-NOMA downlink setups using clusters of users: the first system operating in open-loop (analyzed in Section II) and the second in closed-loop using precoding at the BS (analyzed in Section III).Both schemes were respectively proposed by Ding et al. in [29] and [30].The former system has also been analyzed in [31] in terms of its information-theoretic achievable rates.The uncoded system requires the number of antennas at the terminals to be equal or greater than the number of antennas at the BS in order to take advantage of the null space that the extra dimensions permit, which constitutes a strong limitation on the number of clusters.Both schemes are assessed and compared in this paper not from an information-theoretic point of view, as typical in the NOMA literature [31,32], but rather via numerical simulation.

A. System Model
Consider a downlink multi-user open-loop MIMO transmission with M antennas at the BS and N ≥ M antennas at each user, similar to the one in [29,31], where users are grouped in M clusters of K users, multiplexed with PD-NOMA (see Fig. 1).The BS transmits x = Ps, where P is the M × M precoding matrix, which in the open-loop system corresponds to an identity matrix, given that there is precoding at the BS, and therefore no CSIT is needed at the BS.The transmitted vector s ∈ C M ×1 is constructed as: where s m,k ∈ C is the BPSK or QAM symbol to be transmitted to the k-th user in the m-th cluster and the coefficient α 2 m,k ∈ [0, 1] defines the power allocation for the k-th user in the m-th cluster.This system can be seen as a multi-user MIMO (MU-MIMO) (also known as the broadcast channel [6]), where each cluster plays the role of an aggregated virtual-user, and later the information to each user within each cluster is distilled from the NOMA symbol detected by the cluster.The set of power coefficients is selected such that K k=1 α 2 m,k = 1 [29].In the worst case, a user within a cluster will have to decode K − 1 signals from other users with higher power allocation coefficients than its own.The signal received at the k-th user in the first cluster is: where H 1,k ∈ C N ×M is the Rayleigh flat-fading matrix from the BS to the k-th user in the first cluster and n 1,k is the unit power additive white Gaussian noise vector for k-th user in the first cluster.The noise is taken from an independent circularly symmetric complex Gaussian distribution.i.e., n 1,k ∼ CN (0, σ 2 n ) ∈ C 1×K .The channel matrix for the first user in the first cluster, is denoted as H 1,1 ∈ C N ×M .Linear detection at each terminal is made by multiplying the incoming signal (2) by the detection vector, leading to: where v 1,k H denotes the Hermitian transpose of v 1,k , and w l is an indicator vector.This relation can be expanded, knowing that at the first cluster one is interested only in the sum α where sm ∈ C is the contribution of cluster m to the s vector.The aim is to eliminate the inter-cluster interference for any i = m.The matrix Hi,k ∈ C N ×M −1 is constructed by removing the m-th column of the matrix H m,k .The problem can now be rewritten as: where must belong to a space that is orthogonal to Hi,k .Let us expand the matrix Hm,k into its SVD decomposition for the case M = N : where U i,k is a unitary matrix: and λ has the diagonal form: Note that (9) has a zero row at the bottom (even if M = N ) because after removing a column from H m,k to create Hi,k , the matrix becomes tall and thus rank-deficient.In general, there will be (M − N ) + 1 rows of zeros in the matrix of singular values.One can see that the column highlighted in (8) (which is a matrix in the general case), Ũi,k ∈ C N ×(N −M +1) , does not contribute to Hi,k since it is multiplied by the row of zeros (or a zero fat matrix in general), thus spanning a space orthogonal to Hi,k .Next, one projects the h m,ik column onto the orthogonal space using the projection matrix which is equally applied by all the users in the m-th cluster, eliminating the inter-cluster interference because (10) fulfils the requirement established in (5).Consequently, N ≥ M antennas are needed at each user, otherwise the Hi,k matrix becomes fat rather than tall and there is no orthogonal space spanned by the columns of U i,k in (8).Without loss of generality, focusing on the first cluster, the channel gains of the different users in the first cluster should be ordered such that 1,k .Note that this ordering happens within each cluster, and all clusters are statistically identical.Zero-forcing (ZF) detection is then applied at each terminal in the cluster: leading to a sum of the intended NOMA signal for that cluster perturbed by a noise term.
For SIC detection to be possible with BPSK, the following constraint is imposed: for users 1 ≤ k ≤ K in the m-th cluster, even though it disregards fairness.One follows the rule α 2 m,k−1 = 0.5 × α 2 m,k .The rule follows the geometric progression of ratio 1/2 deprived from its first term with k = 0, the value of N k=1 (1/2) k tends to 1 as N tends to infinity, and the restriction ( 12) is naturally fulfilled.A similar strategy was proposed in the context of visible light communications (VLC) using decaying factors 0.3 and 0.4 instead of 0.5 [33].

B. Performance
A two-user case PD-NOMA with null-space based MIMO is assessed with BPSK and different QAM modulation schemes in Figures 2, 3 and 4, with M = 2, N = 3, and K = 2 in all cases.Subsequently, a five-users case with BPSK is also assessed.
The well-known two regimes of PD-NOMA emerge, depending on the SNR.Consider user 1 the one with the lowest power    allocation (i.e., the one with larger channel gain).At low SNR, user 1 can incorrectly detect the signal with the larger power coefficient and propagate the error, incorrectly decoding its own signal.At high SNR, user 2 still has to cope with the extra degradation imposed by the interference from the signal to user 1, yielding a poorer performance.In Figures 2 and 4 user 2 clearly outperforms user 1 in the low SNR regime.As expected, when using higher modulation schemes at user 2, that user's performance is degraded.Interestingly, when users 1 and 2 respectively apply BPSK and 16-QAM, the two regimes do not appear in Fig. 3 because at low SNR the errors arising at initial detection stage in user 1 are not significant to corrupt the BPSK detection of user 1.
The robustness of the system is chiefly defined by the relations between the power coefficients.In Figures 2 and 3, α 1 = 1 /4 and α 2 = 3 /4 were used to compare with the results in Fig. 1 in [29].In Fig. 4, one has α 1 = 1 /17 and α 2 = 16 /17.Comparing Fig. 2 with Fig. 1 in [29], one observes that the SER is bounded by the outage probability.
For the five user case, with M = 2 and N = 2, the users are ordered such that user 1 has the best channel and user 5 the worst.In Fig. 5, one can find that users with higher power allocation coefficients have a better (lower) SER at low SNR and then worse performance at high SNR, exhibiting the same dual-regime.The α m,k were defined by the set {1, 2, 4, 8, 16}, normalized by With six users and the same power allocation rule α m,1 becomes too small, and user 1 gets a SER > 0.5 for SNR = 10 dB, showcasing the limitations of PD-NOMA.

A. System Model
Consider a scenario similar to the previous one, but now with a massive-MIMO BS with M antennas transmitting to users equipped with N antennas.The users are also grouped into L clusters, each of which with K users, all with different channel matrices, however, they all share the same spatial correlation matrix R l .In such cases one can apply the Karhunen-Loève channel decomposition [34,35], according to which the k-th user in the l-th cluster can have its channel matrix written as: where G l,k ∈ C N ×N denotes a fast fading complex Gaussian matrix, Λ l ∈ C M ×M is a diagonal matrix that contains the eigenvalues of R k and U l ∈ C M ×M is a matrix that contains the eigenvectors of R l , meaning that since that a correlation matrix is always symmetric.However, R l only has r l non-zero eigenvalues, with r l being the rank of R l .Therefore, Λ l is of the form: and thus can be reduced to a r l × r l matrix, turning G l,k a N × r l matrix and U l a r l × M matrix.Obtaining CSIT for the fast fading matrix G l,k may often be difficult.Because R l is a slowly-changing channel correlation matrix, its estimation at the BS is easier to obtain.The BS sends a precoded M × 1 NOMA vector with superimposed symbols where s l,k is the symbol for the k-th user in the l-th cluster, α l,k is the power coefficient for the k-th user in the l-th cluster.
The number of effective BS antennas for each cluster is Ml = (M − r l (L − 1)) and, P l is the M × Ml precoding matrix of the l-th cluster.
T is the Ml × 1 precoding vector that has a 1 in the l-th position.The k-th user in the l-th cluster therefore receives where n l,k is the noise at the k-th user in the l-th cluster.
Looking at (17), P l needs to satisfy the following constraint to eliminate inter-cluster interference: Since H is always a fat matrix (and thus it always has some non-zero nullspace), then Using a P l given by ( 19), the inter-cluster interference is removed and (17) becomes Without loss of generality, looking at user k = 1 in the first cluster (l = 1) of a system with K = 2 users, (20) leads to: The information to all users is carried by the Ml × 1 vector: which imposes a limit of Ml to the number of clusters.This vector is then multiplied by the matrix G 1,1 Λ 1 2 1 U 1 P 1 whose dimensions are N × M , and whose elements will be denoted as c n, m.Disregarding noise, this can be written as: Lowest alpha, best channel Highest alpha, worst channel which again, as in the previous closed-loop model, is the intended NOMA mixture for the cluster, added to a noise term.

B. Performance
A system with a massive array with M = 50 antennas at the BS and terminals with N = 3, which is the same number of antennas considered at the terminals in the openloop setup.Comparing Figures 6 and 7 with Figures 2 and  3, also with two PD-NOMA users and the same modulations, one can see that the performances very similar.This is because the channel models of ( 11) and ( 16) are in fact equivalent in terms of end-to-end SNR per user.To understand why this happens one needs to revisit equations (11) and (24) and note that G 1,1 Λ 1 2 1 U 1 = H 1,1 (Karhunen-Loève decomposition) and that v m,k = P 1 = 1.Hence, both equations are in fact equivalent in terms of the ratio between the signal power and the noise power in each of these ZF schemes, when averaging over several channel realizations.

IV. COMPARISON AND CONCLUSIONS
While both systems are equivalent in terms of performance, the open-loop cannot uphold a massive array at the BS because it is limited by the number of receive antennas that the terminals can fit, while in the closed-loop model an increasing number of M antennas at the BS can lead to an arbitrarily large number of clusters.However, one should notice the trade-off that higherrank correlation matrices impose, forcing to lower the number of clusters or the number of effective transmit antennas per cluster.It is worth mentioning that a system's designer should not only optimize the power coefficients but also consider different modulations for the users.The correlation matrix can have a rank as large as r l = N , so considering for example N = 8 antennas at the receivers and M = 128 at the BS, it is possible to support L = 15 clusters, with Ml = 128 − 8 × (15 − 1) = 16 effective transmit antennas per cluster.In this example, the closed-loop system can almost duplicate the number of NOMA clusters possible in open-loop, which would be L = N = 8.Notably, with single antenna terminals (N = 1), keeping the M = 128 and the same Ml = 16 antennas per cluster, one could support L = 113 clusters.
Lowest alpha, best channel Highest alpha, worst channel

Figure 2 :
Figure 2: Open-loop with two users, both using BPSK.