ISCTE-IUL

This paper proposes an efficient light field image coding (LFC) solution based on High Efficiency Video Coding (HEVC) and a novel Bi-prediction Self-Similarity (Bi-SS) estimation and compensation approach to efficiently explore the inherent non-local spatial correlation of this type of content, where two predictor blocks are jointly estimated from the same search window by using a locally optimal rate constrained algorithm. Moreover, a theoretical analysis of the proposed Bi-SS prediction is also presented, which shows that other non-local spatial prediction schemes proposed in literature are suboptimal in terms of Rate-Distortion (RD) performance and, for this reason, can be considered as restricted cases of the jointly estimated Bi-SS solution proposed here. These theoretical insights are shown to be consistent with the presented experimental results, and demonstrate that the proposed LFC scheme is able to outperform the benchmark solutions with significant gains with respect to HEVC (with up to 61.1 % of bit savings) and other state-of-the-art LFC solutions in the literature (with up 16.9 % of bit savings).


Introduction
Light Field (LF) imaging based on a single-tier camera equipped with a Microlens Array (MLA)also known as holoscopic, plenoptic, and integral imagingderives from the fundamentals of light field/radiance sampling [1], where not only the spatial information about the Three Dimensional (3D) scene is represented, but also the angular viewing direction, i.e., the "whole observable" scene.
Recently, LF imaging has become a prospective imaging approach for providing richer content capture, visualization, and manipulation, being applicable in many different areas of research, e.g., 3D television [2,3], biometric recognition [4], and medical imaging [5].Among the advantages of employing an LF imaging system is the enabling of new degrees of freedom in terms of content production and manipulation, thus supporting functionalities not straightforwardly available in conventional imaging systems, namely, post-production refocusing, changing depth-of-field, and changing viewing perspective.However, deploying LF image and video applications with its appealing functionalities will require the use of efficient coding schemes to deal with the large amount of data involved in such types of systems.In this context, novel initiatives on LF image and video coding standardization are also emerging.Notably, the Joint Photographic Experts Group (JPEG) committee has recently started the JPEG Pleno standardization initiative [6] that addresses representation and coding of emerging new imaging modalities.In addition, the Moving Picture Experts Group (MPEG) group has recently started a new work item on coded representations for immersive media (MPEG-I) [7].

Related Work
Previous Light Field Coding (LFC) schemes available in the literature can be categorized in three main approaches: i) based on transform coding [8,9], ii) based on view extraction [10][11][12][13][14][15][16][17], and iii) based on non-local spatial prediction [18][19][20][21][22]. Generally, all coding schemes try to take advantage of the particular planar intensity distribution of the LF image.Notably, as a result of the used optical system, the raw LF image corresponds to a 2D array of Micro-Images (MIs), where both light intensity and direction information are recorded.

LFC Based on Transform Coding
Most of the early proposed LFC schemes adopted the transform-based approach by using a Discrete Cosine Transform (DCT) or a Discrete Wavelet Transform (DWT).In [8], a 3D DCT was applied to a stack of MIs to exploit the existing correlation between adjacent MIs, as well as the redundancy within each MI.In [9], the LF content was separated into various viewpoint images by extracting one pixel with the same position from each MI and a 3D DWT was then applied to a stack of them.Afterwards, the lower frequency coefficients were transformed using a Two Dimensional (2D) DWT followed by arithmetic encoding, while the remaining high frequency coefficients were simply quantized and arithmetic encoded.Recently, it has been concluded in the literature that HEVC Main Still Image Profile [23] presents significant compression performance improvements in comparison to previous transform-based still image coding technologies [24,25] such as JPEG (DCT-based) and JPEG 2000 (DWT-based) standards.Moreover, similar conclusions have been also reached for LF image coding, in [26,27], where HEVC presented significantly better performance than JPEG and JPEG 2000.

LFC Based on View Extraction
Alternatively, other schemes proposed to extract a set of views from the LF data for coding.In [10][11][12][13][14][15], MIs or Viewpoint Images (VIs) were extracted from the LF content in order to represent the LF data as a set of views and to use inter-view prediction for achieving compression.In [10,11], these views were then encoded as multiview content using Multiview Video Coding (MVC) [28].Differently, in [12][13][14][15], the views were encoded as a Pseudo Video Sequence (PVS) using a 2D video coding standard, such as H.264/AVC [28], in [12], or High Efficiency Video Coding (HEVC) [23] in [13][14][15].Although conceptually different (in terms of coding architecture), both multiview-and PVS-based coding approaches have the same basic purpose of proposing an efficient prediction configuration for better exploiting the correlations between the views.For this, different scanning patterns for ordering the views, as well as different prediction structures have been proposed.In [29], it was shown that the PVS-based coding solution outperformed a transform-based solution (similar to the LF coding solution proposed in [9]) with significant gains, notably, at lower bit rates.In addition, an alternative to the multiview representation based on these low resolution MIs/VIs was proposed in [16,17] using super-resolved rendered views.In this case, the scalable coding architecture proposed in [16] was used, which supported backward compatibility to legacy 2D and 3D multiview displays in the lower layer while the highest layer supports the entire LF content.In [17], the associated disparity information was also encoded and transmitted in the lower layers along with the set of views.

LFC Based on Non-Local Spatial Prediction
Schemes based on the non-local spatial predictive approach rely on a non-local prediction techniques that exploit the existing redundancy between MIs in a (spatial) neighborhood to encode the entire raw LF image, being usually integrated (but not necessarily so) on a standard 2D image codec.The idea of exploiting non-local spatial redundancy has been firstly proposed for 2D image and video compression in order to further enhance the performance of H.264/AVC intra prediction [30].Notably, the intra macroblock compensation technique was proposed in [30] to extend the usage of motion compensated prediction for intra-coded frames.
In the context of LF content coding, previous work of the authors [18,19] showed that further improvements are still possible for LF images with respect to the state-of-the-art for 2D image coding using the HEVC Main Still Picture profile [24,25,31] by using the concept of Self-Similarity (SS) compensated prediction.Similar to the intra macroblock compensation [30], the SS estimation process uses a block-based matching over the previously coded and reconstructed area of the current picture (referred to as SS reference [18]), to find the 'best' predictor for the current block.As a result, the chosen block becomes the candidate predictor and the relative position between the two blocks is signaled by an SS vector.In [19], a novel vector prediction scheme was also proposed to take advantage of the particular characteristics of the SS prediction data and thus increase coding efficiency.Subsequently, in [20], a scheme to extend the SS prediction concept by using HEVC inter B frame bi-prediction was proposed for LF image coding.However, in this case, to guarantee that the two prediction signals came from two different MIs, the search area was proposed to be separated into two non-overlapping parts [20] to perform the prediction estimation as in conventional HEVC bi-prediction.Although not targeting LF image coding, another prediction scheme similar to the SS compensated prediction, known as Intra Block Copy (IntraBC) [32], has been recently proposed in the literature in the context of Screen Content Coding (SCC) [32].In this case [32], the prediction estimation is performed considering only integer pixel accuracy and the search window is expanded to the entire CB row or column, or to the entire previously coded area of the picture by using a hash-based search [32].Furthermore, instead of using a block-based matching approach, an alternative prediction scheme based on locally linear embedding was proposed in [21], where a set of nearest neighbor patches were estimated from the same search area and linearly combined to predict the current block.More recently, in [22], a prediction scheme based on Gaussian Process Regression (GPR) was also proposed for LF image coding.In this case, two separate search areas are adopted for finding a set of nearest neighbor patches and the prediction is modeled as a non-linear (Gaussian) process for estimating the predictor block.

Motivations and Contributions
Motivated by the authors' results in [18,19,21], this paper proposes an improved LF image coding solution based on HEVC and a novel Bi-predicted Self-Similarity (Bi-SS) estimation approach using the generic concept of superimposed prediction [33], which allows bi-prediction using samples from the same search area.Therefore, instead of dividing the search are into two non-overlapping parts to derive each predictor block from different MIs (as in [20]), these predictor blocks can be located in the same MI and in overlapped pixel positions.Moreover, instead of simply combining the two (independent) best uni-predicted candidate predictor blocks for bi-prediction (as in [20]), the locally optimal rate-constrained algorithm [34] is used for jointly estimating these two predictor blocks.
In addition to this, a theoretical analysis of the proposed Bi-SS prediction is also presented, which shows that other non-local spatial prediction schemessuch as the IntraBC [32], the preceding uni-predicted SS solution in [19], and the bi-prediction proposed in [20] are suboptimal in terms of Rate-Distortion (RD) performance and, for this reason, can be considered as restricted cases of the jointly estimated Bi-SS solution proposed here.Furthermore, studies about the influence of the MI crosscorrelation and the weighting factors used for bi-prediction on the RD efficiency of the Bi-SS prediction are also presented to experimentally validate the theoretical assumptions used for LF image coding.
Experimental results show that the proposed LFC solution using the jointly estimated Bi-SS predictionfrom now on referred to as LFC Bi-SS solutionis able to outperform with significant coding gains various state-of-the-art LFC solutions based on different non-local special predictions.

Paper Outline
The remainder of the paper is organized as follows: Section 2 describes the proposed LFC Bi-SS solution architecture; Section 3 proposes the jointly estimated Bi-SS prediction and presents the theoretical and experimental analyses of its prediction efficiency improvement for LF image coding; Section 4 presents the test conditions and experimental results; and, finally, Section 5 concludes the paper.

LFC Bi-SS Solution Architecture
The proposed LFC Bi-SS solution is not tuned for any particular optical acquisition setup since it does not require any explicit knowledge about it (e.g., microlens size, focal length, and distance of the microlenses to the image sensor).Notice that, although these parameters may be provided by camera makers, many of them are highly dependent on the manufacturing process, being different even from camera to camera of the same model (e.g., each microlens may vary slightly in shape, size, and relative layout position).This means that the LF content from each specific LF camera is also different, and compression tools that use this kind of information need to be robust to these variations.For this reason, using compression tools that are less dependent on a very precise calibration pre-process may be advantageous for supporting LF visualization without increasing the processing complexity.
In this sense, this LFC Bi-SS solution may be mainly advantageous for applications in which the LF content is consumed by the end user in a format similar to the captured format, as for example, in the case in which the captured LF content is visualized in an LF display that also makes use of an MLA in its optical system, or is consumed by using a proprietary LF rendering algorithm that makes used of the same (raw) 2D format.In this scenario, the captured LF image can be encoded with the proposed LFC Bi-SS solution and may be then transmitted to a receiver over heterogeneous networks.Alternatively, this LFC solution may be also advantageous for improving the storage compression efficiency.The proposed solution is also advantageous in terms of the encoder/decoder computational complexity and necessary memory, which are no larger than that of HEVC inter B frame coding.Detailed information about HEVC computational complexity can be found in [19].
Fig. 1 presents the architecture of the proposed LFC Bi-SS solution, which is based on HEVC and comprises both additional and modified modules to efficiently handle LF content.Basically, the proposed codec introduces an additional type of predictionthe Bi-SSand the encoder will choose the best, among Bi-SS and HEVC intra prediction, based on a conventional Rate-Distortion Optimization (RDO) process.
More specifically, enhancing the HEVC coding architecture with Bi-SS compensated prediction requires adaptations at the following stages of the coding process (as explained in the following): i) SS estimation; ii) prediction modes and block partitioning; iii) Bi-SS compensation; iv) Bi-SS vector prediction; and v) reference picture management.

Bi-SS Estimation
The Bi-SS estimation (depicted in Fig. 1) is used to exploit the cross-correlation existing in an MI neighborhood (see Fig. 2a) by estimating the prediction block with the highest similarity (according to appropriate criteria) to the current block in the previously coded and reconstructed area of the current picture itself (the SS reference, as seen in Fig. 2b).Hence, the relative position between the current and the 'best' candidate block is signaled by an SS vector,  0 , (see Fig. 2b).Similarly to the conventional HEVC inter P frame prediction, the best SS vector,  0  , for the SS prediction can be found by minimizing the Lagrangian cost function in (1) [35], where () is a matrix variable representing the current block at position  = (, ) in the LF image; Ĩ( −  0 ) represents a candidate block in the SS reference, Ĩ, with  −  0 ∈  (see Fig. 2b); ( 0 ) corresponds to an estimated number of bits for encoding the SS vector  0 (i.e., the estimated number of bits necessary to encode the motion vector difference between  0 and its predictor, selected as shown in Section 2.4); and  is the Lagrangian multiplier.In addition, to keep the complexity low, the l 1 -norm (or Sum of Absolute Differences (SAD)), ‖ ‖ 1 , is used and a limited causal search window W is adopted.However, it is worth noting that the search area shall be larger than the MI size to be able to exploit the inherent MI cross-correlations.Finally, the SS predictor block,  ̂(), is derived as Ĩ( −  0  ).As done in HEVC reference software version 14.0 [36], when SAD is used as the distortion measure,  is given by √  , where   is the Lagrangian multiplier computed for prediction mode selection in intra-coded frames.
Notice that the SS estimation process in (1) only considers a single compensated signal for prediction of the current block, as previously proposed by the authors in [19], and for this reason will be hereinafter referred to as Uni-predicted Self-Similarity (Uni-SS) estimation.Moreover, further improvements to this estimation process will be proposed in Section 3 for allowing jointly estimated Bi-SS prediction.

Prediction Modes and Block Partitioning
The Bi-SS prediction is evaluated for all Coding Block (CB) sizes (i.e., from 64×64 down to 8×8) in the conventional RDO process to choose the best prediction mode.For this, the proposed Bi-SS method defines the following two additional prediction modes: • Bi-SS mode -In this case, Bi-SS estimation (see Fig. 1) is used to find a prediction for encoding the current block.As in HEVC inter coding, the eight partition patterns, i.e., M×M, M×(M/2), (M/2)×M, (M/2)×(M/2), M×(M/4), M×(3M/4), (M/4)×M, and (3M/4)×M [37], are allowed to define a flexible way to partition the CB for the Bi-SS estimation process in Fig. 2b.

General Coder Control
Output Bitstream

Deblocking & SAO Filters
Coefficient Data

LF Coder
• SS-skip mode -The SS-skip is employed only for the M×M partition pattern, and, in this case, the SS vector is directly derived from the Bi-SS vectors prediction presented Section 2.4, and no further information is transmitted.

Bi-SS Compensation
In this process (shown in Fig. 1), the inverse quantized and inverse transformed causal residual block is added to the prediction block to obtain the reconstructed block.The SS reference is then updated (for each CB) by including this reconstructed block as it will be available at the decoder side.

Bi-SS Vectors Prediction
In conventional video coding solutions, since neighboring motion vectors are likely to be correlated, they are usually predictively coded based on motion vectors of neighboring blocks.Regarding HEVC [23], this predictive coding of motion vectors was improved, relatively to previous video coding standards, by introducing a tool called Advanced Motion Vector Prediction (AMVP).Furthermore, an HEVC technique called merge mode is used to derive all motion data of a block (i.e., motion vectors and indices of the used reference pictures) from the neighboring blocks, replacing the direct and skip modes of the previous H.264/AVC standard.In these methods, a vector candidate (or merge candidate) list is built by selecting vectors from CBs in the spatial and temporal (co-located) neighborhood [37].From these spatio-temporal candidates, the encoder selects the best predictor vector in an RDO sense, and transmits only the index of the chosen candidate in this list.
In addition to this, a set of new SS candidate vectors proposed in [19], referred to as MI-based vector prediction (MIVP) candidate vectors, is also included into AMVP and merge candidate lists to further improve the RD performance.As explained in [19], up to three MI-based candidate vectors (i.e., left, above, and above-left) are computed to force the candidate vectors to be distributed according to the structure of MIs.A detailed description of the MIVP candidate vectors selection is given in [19].
Regarding the proposed Bi-SS mode, if bi-prediction is used, two predictor vectors are derived from the AMVP method (one for each estimated SS vector) and the difference between the two SS estimated vectors and the corresponding predictor vectors are transmitted along with the indices of the chosen candidates in the list.In this case, the AMVP candidate list is constructed with the following candidates (for intra-coded frames): • Spatial AMVP vector candidates -Up to two spatial vector candidates are derived from a set of five spatial neighboring CBs that were previously coded with Bi-SS mode.The position of these neighboring CBs are defined in HEVC standard [23].
• MI-based vector candidate -When less than two spatial vector candidates are available, one MI-based vector candidate is derived from the set of left, above, and above-left defined in MIVP.
• Zero vector candidates -Afterwards, zero vectors are added to fill, when necessary, the AMVP candidate list with up to two final candidates (as in HEVC [23]).
In the case of the SS-skip mode, bi-predicted merge candidates may be derived from the following merge candidate list: • Spatial merge vector candidates -Up to four spatial candidates from the set of five [23] neighboring CBs that were coded with Bi-SS mode are included.
• MI-based vector candidates -The maximum size of the merge candidate list is signaled in the slice header syntax (being equal to five as defined by default in HEVC standard [23]).After selecting the spatial candidates, up to three MIVP merge candidates are included into the merge candidate list until the maximum number of candidates is reached.• Additional merge candidates -Furthermore, if the merge candidate list is still not fully populated, bi-predicted candidates can also be derived by combining two existing candidates from different reference picture lists.When the list is still not full, zero motion candidates are included to complete the list.
An analysis of the influence of this MIVP scheme in the RD performance achieved by the Bi-SS prediction is presented in Section 3.5.

Reference Picture Management
To allow the Bi-SS estimation and Bi-SS compensation in intra-coded frames of HEVC, the reference lists construction and signaling need to be altered so as to include the SS reference.This process is similar to the temporal lists construction on HEVC inter-coded frames, and is managed by the general coder control block in Fig. 1 by using the concept of Reference Picture Set (RPS), which is signaled for each slice [37].For this, the SS reference is made available at the decoded picture buffer (see Fig. 1) and marked to be used as a reference.

Proposed Jointly Estimated Bi-SS Compensated Prediction
To further improve the performance of the proposed Uni-SS LFC solution [19], a novel jointly estimated Bi-SS estimation and compensation scheme, which is based on the generic concept of superimposed prediction [33], is here proposed to replace the aforementioned Uni-SS estimation and compensation processes.
To motivate the adoption of this Bi-SS prediction in the LFC solution presented in Section 2, a theoretical analysis is firstly presented, which shows that other non-local spatial prediction schemessuch as the IntraBC [32], the preceding Uni-SS prediction presented in Section 2.1, and the solution for bi-prediction proposed in [20] can be considered as restricted cases of the Bi-SS solution proposed here.Then, the Bi-SS candidate predictor estimation is proposed in Section 3.2 and the theoretical assumptions for its improved RD efficiency are experimentally analyzed in Sections 3.3 and 3.4, respectively, in terms the MI cross-correlation in LF images and in terms of the weighting factor used in the bi-prediction.Finally, the influence of the MIVP in the RD performance achieved for Bi-SS prediction is also analyzed in Section 3.5.

Theoretical Bi-SS Performance Analysis
The RD performance improvement due to the adoption of the jointly estimated Bi-SS prediction (presented in more detail in the following section) is based on three main hypotheses, which will be analyzed in this section: 1) With a large enough search window, , (see Fig. 2b), it is possible to find two predictor blocks that properly represent the current block, (), i.e., with low residual signal.
2) By combining two good predictor blocks, it is possible to further minimize the residual signal of the SS compensated prediction, compared to only using the uni-predicted SS candidate (as in the Uni-SS prediction [19] and in the IntraBC scheme [32]).
3) Jointly estimating the predictor blocks leads to better RD performance than deriving them independently (as in reference software for HEVC inter B frame coding [36] and in the bi-prediction solution proposed in [20]).
For this analysis, the performance of the proposed Bi-SS prediction is here modeled by the uncertainty [35] (or inaccuracy [33,38]) in the SS compensated prediction signal.
Regarding the first abovementioned hypothesis, it is valid due to the following facts: • Given the small baseline between adjacent microlenses in the acquisition process, a significant cross-correlation exists between neighboring MIs, as shown by the autocorrelation function in Fig. 2a.It can be seen (Fig. 2a) that the autocorrelation function presents a regular structure of spikes and the constant distance between these regular spikes corresponds to the MI spacing in the array [19].Since these highly-correlated samples are distributed along the MIs, it is likely that similarly good predictor blocks will be also distributed accordingly.
• It was shown in [19] that, when using the SS compensated prediction scheme for exploiting the inherent MI crosscorrelation, the distribution of the chosen SS vectors is also related to the size and arrangement of the MIs in the MLA.This can be illustrated by the heat map in Fig. 2c, where brighter areas correspond to more frequent SS vector amplitudes.Hence, since these most frequent best uni-predicted SS vectors are distributed in all directions according to the MI arrangement, it is possible to consider that the second-best SS vector, which can also represent the current block properly, is likely to be found according to this distribution in a different direction.
Regarding the second and third hypotheses, the residual signal for the Bi-SS compensated prediction is given by (2), where  −   ∈ , and ℎ  is the weight for each of these predictor blocks.For instance, for the bi-predicted SS candidate predictor proposed in Section 3.2, ℎ  = 1 2 ⁄ , ∀ ∈ {0,1}.However, as discussed in [35], the residual signal given by ( 2) can be actually generalized as in (3) for  predictor blocks.
The general case represented by (3) can also incorporate other types of candidate predictors reflecting the very flexible set of inter coding tools of HEVC [23].In this case,  = (ℎ 0 , ℎ 1 , … , ℎ −1 ) corresponds to a weight vector that is able to, for example, incorporate [35]: i) the filtering used to generate the quarter-pixel interpolated signal in the SS estimation; and ii) the deblocking filter that can be applied in the SS reference.Each  ̃() =  ̃( −   ) term can be interpreted as each of the multiple compensated signals available for prediction of the current block.Hence, the uncertainty in a given Bi-SS compensated prediction can be modeled, as in [35], by an a posteriori probability density function, ℎ  (), conditioned on the encoded data.Therefore, since the expected value (the second term on the right-hand side of ( 3)) is the estimator that minimizes the mean-square error in the prediction of a random variable [35], it is possible to say that the residual signal in (3) can be minimized and, consequently, the accuracy of the prediction can be improved by using a larger set of multiple compensated signals and an optimized weight vector [35].
In addition, another possibility is to analyze the performance of the Bi-SS compensated prediction by modeling the inaccuracy of each used displacement vector   , as in [33,38].For this, Fig. 2b shows that although the pixel correlation in the raw LF image is not as smooth as in conventional 2D images, each MI itself has some degree of inter-pixel redundancy as in common 2D images (see Fig. 2b).Thus, it is possible to consider that samples inside each MI follow the same correlation model as samples in a 2D image (i.e., an isotropic exponentially decaying autocorrelation function).This assumption is reasonable at least for blocks () smaller than the MI resolution, and has been also adopted in [20,39] for LF images.With this assumption, the accuracy of the SS compensation can be measured by the displacement error variance [38], and the same signal model used in [33,38] can also be considered for the SS compensated prediction signal.In this case [33,38],  denotes a row vector of impulse responses (ℎ 0 , ℎ 1 , … , ℎ −1 ) of a 2D prediction filter [38], and the residual signal is given by (4), where the second term on the right-hand side of the equation denotes a 2D convolution of the prediction filter  with a column vector of  multiple compensated signals  ̃= ( ̃0,  ̃1, … ,  ̃−1 )  .In this model, both  and each component of  ̃ are assumed to be wide sense stationary random processes with an additive Gaussian noise signal.Hence, these noisy signals may comprise all signal components of the SS compensated prediction that cannot be described by the translational displacement model [38].
Based on the abovementioned approximations, the conclusions from [33,38] also hold for validating the second and third hypotheses.Notably, with high rate assumptions: • Concerning the second hypothesis, the optimal filter  (i.e., that minimizes the mean square error) can be interpreted as a low pass filter that removes high frequency components from  ̃ that are too noisy or that change too rapidly [38].From the theoretical analysis in [38], it was concluded that increasing the number of equally good predictor blocks always led to bitrate savings compared to a more limited set of predictor blocks, even if the simple average filter is used, instead of considering an optimal filter  in (4).Therefore, this suggests that increasing from one predictor block (in the previously proposed Uni-SS prediction [19]) to two predictor blocks (in the Bi-SS prediction proposed here) minimizes the residual signal in the SS compensated prediction.
• Concerning the third hypothesis, an extended analysis was performed in [33] for the case where the multiple compensated samples of  ̃ are jointly estimated.In this case, the displacement error of all components of  ̃ are assumed to be correlated, instead of being independent as assumed in [38].Moreover, this analysis considered the simple average filtering case as for the proposed Bi-SS.Therefore, it was shown that a combination of two jointly estimated predictor blocks is more efficient than two independent predictor blocks [33].Furthermore, it was concluded that, for jointly estimated predictors, the major portion of the gain is already achievable by only two predictor blocks [33].This suggests that further RD gains can be achieved by jointly estimating the two predictor blocks for bi-prediction compared to simply combining two best uni-predicted candidates (as in the HEVC reference software inter B frame coding [23] and in bi-predicted solution proposed in [20]).
Therefore, without loss of generality, the IntraBC scheme in [32], and the previously proposed LFC solutions in [19,20] can be seen as restricted cases of the Bi-SS solution being proposed here.In these cases [19,20,32], restrictions are imposed in the number of predictor blocks, the number of allowed partition patterns and sizes, and in the bi-prediction estimation process for each predictor block that is independently employed in different areas of the SS reference.
As it will be seen in Section 4, the theoretical insights from this section for the proposed Bi-SS solution are supported by the experimental results for LF images.Moreover, to extend the conclusions from this analysis, Section 3.4 also analyzes the RD performance for LF image coding when using different sets of weighting coefficients for Bi-SS prediction.

Bi-SS Candidate Predictors Estimation
Motivated by the theoretical analysis presented above, the proposed Bi-SS prediction is here presented, which is based on the generic concept of superimposed prediction [33].
More specifically, there is only a single reference picture available in the Bi-SS compensated prediction, i.e., the SS reference [18], and only two possible candidate predictors (instead of the three candidates of HEVC [23] that are used in [20]) are derived to predict the current block, namely: i) the Uni-SS candidate, and ii) the Bi-SS candidate.
The Uni-SS candidate predictor corresponds to the previously proposed SS prediction [19] (see Section 2.1), in which the predictor block is found by minimizing the Lagrangian cost function in (1).
The proposed Bi-SS candidate predictor differs from the HEVC reference software inter B frame bi-prediction, as well as from the bi-predicted solution in [20], for two main reasons: 1) The two predictor blocks in the Bi-SS solution are derived from the same reference picture (the SS reference) and are estimated in the same search window,  (see Fig. 2b).Consequently, they can be located in the same MI (different to [20]) and in overlapped pixel positions (as illustrated by the dashed blue lines in Fig. 2b).
2) To further improve the prediction efficiency, these two predictor blocks are jointly estimated in the complete search window (as explained below), instead of combining two best uni-predicted candidates (as in [20]).
For jointly estimating the two predictor blocks, the locally optimal rate-constrained algorithm proposed in [34] (see Fig. 3) is used.This algorithm avoids searching through all possible combinations of two candidate predictor blocks  ̃( −  0 ) and  ̃( −  1 ) inside .For this, in each algorithm iteration, , an optimal SS candidate vector   (+1) (with index  ∈ {0,1}) is found by minimizing the Lagrangian cost function conditioned to the optimal SS candidate vector found in the previous iteration   () (with  ∈ {0,1}).Therefore, the algorithm is focused on finding an optimized vector  1 conditioned to a known vector  0 in even iterations, and vice versa in odd iterations.For instance, in the first iteration,  = 0,  = 0,  = 1, and the optimal  1 (1) is found by fixing  0 (0) =  0  (the best uni-predicted SS candidate vector).Similarly, in the second iteration,  = 1,  = 1,  = 0, and the optimal  0 (2) is found by fixing  1 (1) (which was found in the previous iteration).Similarly to (1), the Lagrangian cost function show in Fig. 3 is used to find the optimal SS vector in each iteration, where  is computed as √  [36] and (  (+1) ) + (  () ) corresponds to the estimated number of bits for encoding the SS vectors  0 and  1 given in each for the proposed Bi-SS candidate predictor.The index  defines which of the two vectors ( 0 or  1 ) will be optimized in a particular iteration , while the index  defines the vector that will be kept fixed.
iteration, i.e., the estimated number of bits necessary to encode the motion vector difference between the SS vectors and their predictor vectors and for signaling the vector predictor using AMVP.
The maximum number of iterations, , defines a tradeoff between complexity and RD performance and can be adjusted according to the system constraints.In this work,  = 2 and, consequently, the corresponding complexity is similar to that of HEVC inter B-frame with one active reference in each reference picture list.
Finally, the best prediction between Uni-SS and Bi-SS candidates is also chosen in terms of conventional RDO [35] by comparing the associated Lagrangian costs  −  and  −  , respectively found in (1) and Fig. 3.

Bi-SS Prediction Analysis for Different MI Cross-Correlation
This section aims at analyzing how close the theoretical conclusions drawn in [33,38] and the approximations considered in Section 3.1 are to the experimental results when using the proposed Bi-SS prediction for LF image coding.More specifically, this analysis focuses on the influence of the MI cross-correlation, inherently present in LF images, in the performance of the Bi-SS prediction proposed in Section 3.2.For this, Tables 1 and 2 summarize some statistics of relevant results when encoding different LF test images (see Section 4), such as: percentages of prediction mode usage, SS bi-prediction usage, and coding block size (CBS) usage.These statistics are shown for higher bitrates in Table 1 (corresponding to quantization parameter value 22) and for lower bitrates in Table 2 (corresponding to quantization parameter value 42).In addition, some RD results comparing Uni-SS and Bi-SS prediction are presented using the Bjøntegaard delta (BD) [42] metrics, i.e., in terms of the luma PSNR of the LF image and the corresponding bitrate (BR) in terms of bits per pixel (bpp) for four different Quantization Parameter (QP) values.In this case, the sets of QP values {22, 27, 32, 37} and {27, 32, 37, 42} were considered for analyzing the performance, respectively, for high and low bitrates.
For better analyzing these results, two different situations where the MI cross-correlation is differently distributed in a neighborhood are considered (corresponding to the highlighted values in Tables 1 and 2).For the first situation, a second frame of Plane and Toy sequence (frame 23) is used to exemplify the case where the MI cross-correlation varies for the same camera parameters due to the different distance of the main object relatively to the camera [40].In this case, Plane and Toy (frame 23) presents a more rapid decrease in the MI cross-correlation in a neighborhood when compared to Plane and Toy (frame 123).
For the second situation, two raw LF images, Jeff and ISO_Chart_12, with considerably larger and smaller MI resolutions, respectively, are used to illustrate the case in which the MI cross-correlation varies due to a change in the camera parameters.In this case, the aperture of the microlens (which usually corresponds to the size of the MI) limits its field of view [41], and consequently, the smaller the MI is, the less correlated it will be with MIs in its neighborhood (i.e., less overlapped areas of the scene will be captured by neighbor MIs).For completeness, Tables 1 and 2 also show statistics for all LF test images.
The results in Table 1 illustrate the influence of the MI cross-correlation in the usage and performance of the proposed Bi-SS approach for higher bitrates.Comparing the results for Plane and Toy (frame 123) and Plane and Toy (frame 23), it can be seen that the percentage of SS bi-prediction is larger in the case where the MI cross-correlation decreases rapidly (frame 23).Moreover, the bi-prediction is also able to achieve larger bit savings in this case (frame 23) when compared to the LFC Uni-SS solution, where only uni-prediction is used.Somehow similar to the abovementioned conclusion, the usage percentage of biprediction tends to be also considerably larger for the case of ISO_Chart_12 where the MI cross-correlation is smaller due to changes in the camera parameters (compared to Jeff).
In addition, Fig. 4 illustrates the Bi-SS vectors distribution for the LF images Plane and Toy (frame 23) and ISO_Chart_12.It can be seen that the Bi-SS vectors are also distributed according to MI structure of each tested LF image (as was also concluded for the uni-predicted SS vectors [19]).This is evident for ISO_Chart_12 (in Fig. 4b) where the MIs in the raw LF image are distributed according to a hexagonal grid, and, consequently, the SS vectors distribution (Fig. 4b) follow the same structure.This fact supports the assumptions made in Section 3.1 that more than one good predictor is likely to be found in all directions, distributed according to the MI size and arrangement in the array.Moreover, analyzing the distribution of these SS vectors along the raw LF image (rightmost images in Fig. 4), it can be seen that, although the amplitude of the SS vectors respects the MI structure, the direction tends to be dictated by the object boundaries and textures.
It is also worthwhile to notice (Table 1) that, for all tested LF images considered in Section 4, most of the coding blocks tend to be partitioned down to 8×8 blocks, being smaller than the MI resolution.These results support the assumptions used in the theoretical analysis from Section 3.1, which considered that samples inside each MI follow the same correlation model as samples in a 2D image.
To complete this analysis, Table 2 illustrates the statistical results also for lower bitrates, so as to analyze the performance of the Bi-SS prediction when the SS reference degrades.Comparing the results in Tables 1 and 2 for all test images, it can be observed that using higher QP values results in increasing percentages of usage of the SS uni-prediction, as well as SS-skip modes.This is due to the fact that the Lagrangian multiplier, in (1) and Fig. 3, increases with increasing QP values [35] and, for this reason, the possible quality improvements of using the bi-prediction do not justify the higher number of bits needed for transmitting the SS vectors when minimizing the Lagrangian cost.

Bi-SS Prediction Analysis for Different Weighting Coefficients
This section aims at analyzing the RD performance of the proposed LFC Bi-SS solution when different sets of weighting coefficients, ℎ 0 and ℎ 1 (see Fig. 3), are used for Bi-SS prediction.
For this, the HEVC weighted prediction signaling [37] is used.Basically, the usage of explicit weighted prediction in HEVC is activated by a flag in the Picture Parameter Set (PPS), and different integer weighting factors,   , and offset values,   , can be assigned for prediction in each slice [37].The resulting predictor block  ̂() for weighted bi-prediction can be then derived by [37]: where  is a log weight denominator rounding factor [37] used to normalize the integer weighting factors and the subsample interpolation filtering process [43].Notice that determining suitable weighting factors and offsets is out of the scope of HEVC standard and they are directly derived from the bitstream at the decoder side.
Due to space limits, Fig. 5 illustrates the results for only two LF images (i.e., Seagull and Vespa), but the conclusions are consistent for all LF test images considered in Section 4. From these results, it can be seen that the LFC Bi-SS solution using the average weighting coefficients always presents the best RD performance.Moreover, the less asymmetric the weighting coefficients are, the better the RD efficiency of the LFC Bi-SS solution is shown to be.In addition, in all cases, the weighted Bi-SS always outperform the LFC Uni-SS solution.For this reason, the average weighting coefficients are here considered to the proposed LFC Bi-SS solution.
Notice that it was not under the scope of this work to develop an optimized set of weighting coefficients for each CB (instead of fixing it for each slice).In fact, the theoretical analysis in [38] suggested that it is still possible to achieve further RD performance improvements by adaptively estimating these weighting coefficients; this will be considered in future work, as well as the experimental validation, for LF image coding, of the theoretical assumption made in [33] that suggests that the major portion of the gain is already achievable by only two predictor blocks.

MIVP Efficiency Analysis for Bi-SS Prediction
To separately analyze the RD efficiency of using the MIVP vector prediction for Bi-SS prediction, Table 3 shows some preliminary results for comparing the performance of Bi-SS prediction with MIVP (referred to as Bi-SS w/ MIVP) and without MIVP (referred to as Bi-SS w/o MIVP).These results are then compared to the achieved RD improvements presented for the previously proposed LFC Uni-SS solution in [19] (i.e., Uni-SS solution with and without MIVP).
In these tests, the same test conditions adopted in [19] are here used.Notably, HEVC reference software version 14.0 [36] is used as the benchmark, as well as the base software for implementing the proposed codec.RD performance is evaluated here for six different LF images through the BD [42] metrics, i.e., in terms of the luma PSNR of the LF image and the corresponding bitrate (BR) in terms of bits per pixel (bpp) for four QP values (27, 32, 37, and 42).These results of the MIVP influence on RD performance for Uni-SS prediction can also be found in [19].
From these results, it is possible to see that the gains of including the MIVP candidate vectors for Bi-SS prediction are slightly lower than for Uni-SS prediction.However, the MIVP is still relevant for the LFC Bi-SS solution, leading to further bit savings of up to 5.5% (for LF test image Seagull).

Performance Evaluation
This section assesses the performance of the complete LFC Bi-SS codec.For this, the test conditions are firstly introduced and, then, the obtained results are presented and discussed.

Test Conditions
The test conditions to evaluate the performance of the proposed LFC Bi-SS solution can be summarized as follows.

Test Images
Twelve LF test images with different camera setups and scene characteristics are used (see Fig. 6 and Table 4) so as to achieve representative RD results.These are: Plane and Toy (frame number 123 of the sequence with same name) and Demichelis Spark [44] (first frame of a video sequence with same name); Laura, Fredo, Seagull, and Zhengyun [45]; and Flowers, Vespa, Ankylosaurus_&_Diplodocus_1, Fountain_&_Vincent_2, Color_Chart_1, and ISO_Chart_12 [46].The (raw) LF test images were converted to the Y'CbCr 4:2:0 color format before being encoded.

Coding Conditions
The experimental results that are presented in this section considered the following test conditions: 1) Codec Software Implementation -The reference software of HEVC version 14.0 [36] was used as the benchmark, as well as the base software for implementing the proposed codec.2) Search Range -A search range value of 128 was adopted for all tested LF images (i.e., w=128 in Fig. 2b).
3) Search Strategy -The full search algorithm with the HEVC quarter-pixel accuracy was used since fast search algorithms proposed for 2D video coding and for SCC have shown to present significant drop in RD performance for LF image coding [20].
4) Coding Configuration -The results are presented using the Main Still Picture profile [23] and five QP values are considered, i.e.: 22, 27, 32, 37, and 42. 5) RD Evaluation -The RD performance was evaluated in terms of three different objective quality metrics: i) Overall PSNR; ii) Rendering-dependent PSNR (PSNR5×5Views); and iii) Rendering-dependent SSIM (SSIM5×5Views).The overall PSNR is calculated by taking the luma PSNR of the raw LF image.Differently, the rendering-dependent metrics (PSNR5×5Views and SSIM5×5Views) are measured in terms of the average luma PSNR and SSIM calculated for a set of views rendered from the reconstructed LF content, similarly to the metrics proposed in [6].To have a representative number of rendered views, a set of 5×5 views was rendered from equally distributed directional positions.For rendering the views from LF images captured using a focused LF camera setup (Table 4), the algorithm proposed in [47] and referred to as Basic Rendering algorithm was used.In this case, the plane of focus was chosen to represent the case where the main object of the scene is in focus.For LF images captured using the traditional LF camera setup (Table 4), 5×5 VIs were extracted.The rate is given in terms of bpp value, which is calculated with the number of bits needed for encoding the LF image divided by its corresponding number of pixels given in Table 4 (bpp).

Benchmark Solutions
In order to assess the RD performance for different local and non-local spatial prediction schemes, five HEVC-based coding solutions are compared against the proposed LFC Bi-SS.To guarantee a fair comparison between all of them, the same test conditions presented in Section 4.1.2are also adopted.These five benchmark solutions are:   [36], using the Main Still Picture profile [23].
2) HEVC SCC -In this case, the original LF image is encoded using the HEVC SCC reference software version 1.0 [48], where IntraBC prediction [32] is used.As previously discussed in this paper, this solution is a restricted case of the proposed LFC Bi-SS since a reduced set of coding options is used, such as: i) only uni-prediction estimation is allowed considering only integer-pixel precision; ii) partition patterns are limited (i.e., M×M, M×(M/2), (M/2)×M, and (M/2)×(M/2) ); iii) CB sizes are also limited (CBs larger than 16×16 are skipped based on a threshold on the RD cost); and iv) only one dimensional (1D) vectors for 16×16 CBs are allowed.However, it is worth mentioning that some improvements for the IntraBC prediction have been continuously included.For instance, in the HEVC SCC reference software 1.0, the search window was expanded over the entire CB row or column (for 16×16 CBs), and over some positions in the entire picture by using a hash-based search (for 8×8 CBs).
3) LFC Uni-SS -In this case, the original LF image is encoded with the authors' previous solution proposed in [19], where only the uni-predicted candidate is available for the SS estimation and compensation.This solution also uses the MIVP candidate vectors for improving the coding performance (as in the proposed LFC Bi-SS).A search area with w=128 (in Fig. 2b) is also adopted in this case.
4) LFC Restricted-SS -In this case, the original LF image is encoded with the author's implementation of the solution proposed in [20], where bi-prediction is also allowed by simply using the HEVC inter B-frame prediction.For this, and as explained in [20], the SS reference search area is separated into two different parts, which are assumed to be two different reference pictures [20].Therefore, as in HEVC inter B-frame prediction, three candidate predictors can be derived: the two best (uni-predicted) candidates from each of the two reference pictures, and a linear combination of them for bi-prediction.As discussed in Section 3, this solution can be seen as a restricted case of the Bi-SS prediction proposed here.It is also worthwhile to notice that, the solution presented in [20] (as well as the author's implementation of this solution) does not include the MIVP candidate vectors that are used in both LFC Uni-SS and LFC Bi-SS solutions.A search area with w=128 (in Fig. 2b) is also adopted in this case.
5) LFC GPR -In this case, an HEVC-based LFC solution using the implicit GPR-based prediction method proposed in [22] is considered.Different from the SS prediction, the predictor block is here given as a linear combination of six nearest neighbor patches, which are implicitly found in two different search windows: i) horizontal search window (defined as the causal area in the same CB row): ii) the vertical search window (defined as the causal area in the same CB column).Afterwards, the set of weighting coefficients for combining these six patches is implicitly determined by solving a GPR and the same process to find the nearest neighbor patches and weighting coefficients is replicated at the decoder side to derive the prediction.For this comparison, the results presented in [22] for coding using this LFC GPR solution are compared to the results for the proposed LFC Bi-SS solution considering the same test conditions (described in [22]).Notably, the results are shown using the BD metrics in terms of the overall PSNR with respect to HEVC reference software [36] with "Intra, main" configuration and considering four different QP values: 22, 27, 32, and 37 (as in [22]).It is important to notice that, other benchmark solutions have already been compared against the authors' previous solution LFC Uni-SS in [19], (e.g., a multiview-based method, similar to the proposed in [12,13]).For a more complete performance comparison, please also refer to [19].

Overall RD Performance
Figs. 7 and 8 illustrate the RD performance of the proposed LFC Bi-SS using three different objective quality metrics, i.e., the overall PSNR, and the rendering-dependent metrics PSNR5×5Views and SSIM5×5Views.Additionally, Tables 5 and 6 present the LFC Bi-SS RD performance against four benchmark solutions (i.e., HEVC, HEVC SCC, LFC Uni-SS, and LFC Restricted-SS) in terms of BD metrics, respectively, for higher and lower bpp values.It is worth noting that the comparison between the proposed LFC Bi-SS and the LFC GPR solutions will be separately performed in Section 4.4 since, in this case, the results presented in [22] for the LFC GPR solution is directly compared to the LFC Bi-SS solution RD results when using the same set of LF images as well as the same test coding conditions adopted in [22].As can be observed in Figs.7 and 8, independently of the adopted objective quality metrics, the results follow the same trend and the proposed LFC Bi-SS solution outperforms the other benchmark solutions in all cases.Moreover, as shown in Tables 5  and 6, significant gains can be achieved by the proposed LFC Bi-SS solution mainly for lower bpp values (Table 6), achieving in average 2.7 dB (or 51.5 % of bit savings) against HEVC, 1.3 dB (or 32.0 % of bit savings) against HEVC SCC, 0.5 dB (or 14.4 % of bit savings) against LFC Uni-SS, and 0.3 dB (or 9.4 % of bit savings) against LFC Restricted-SS.Additionally, as hypothesized in the theoretical analysis of Section 3.1, increasing the number of predictor blocks from the LFC Uni-SS solution to the LFC Bi-SS solution led to bit savings of up to 22.7 % (Table 5).Furthermore, comparing to the LFC Restricted-SS solution, it can be seen that by jointly estimating both candidate predictors, instead of devising them separately from different areas, further bit savings of up to 16.9 % (Table 6) can be achieved.
Moreover, comparing the achieved performance gains due to MIVP (Table 3) with the results against the LFC Uni-SS in Table 5, it can be seen that most of the LFC Bi-SS performance gains come from the usage of an improved bi-prediction scheme.However, it should be noticed that the MIVP is mainly advantageous for improving the performance against the LFC Restricted-SS (see Table 5), even when only Uni-SS prediction is allowed.For instance, for the LF image Plane and Toy, the LFC Uni-SS solution (using the MIVP) presents a slightly better performance than the LFC Restricted-SS solution (where the MIVP is not used).
Regarding the performance for different LF image characteristics, it was observed that the coding efficiency of the SS compensated prediction is not as good for close-up images.This can be seen by analyzing the results for LF image Flowers (close-up image) against Fountain_&_Vincent_2 (see Fig. 6g and j, respectively), but it was also observed for other LF images of the dataset in [46].In these close-up images, most of the coding blocks (about 75 % of them) are coded using HEVC intra prediction.

Computational Complexity
The significantly better performance of the LFC Bi-SS solution comes with the price of additional computational load compared to both LFC Uni-SS and LFC Restricted-SS.
To illustrate this fact, Tables 7 and 8 present, respectively, the encoding and decoding time of the LFC Bi-SS solution, and also the time ratio in terms of  −  ℎ.⁄ , with respect to HEVC, HEVC SCC, LFC Uni-SS, and LFC Restricted-SS benchmark solutions.For this, encoding/decoding times were obtained using a machine with an Intel Xeon E5-2620 v2 processor clocked at 2.10 GHz and using gcc 4.8.3 compiler.
Regarding the encoding complexity (Table 7), it can be seen that coding solutions that make use of a block-based matching algorithm (i.e., HEVC SCC, LFC Uni-SS, and LFC Restricted-SS and LFC Bi-SS) are generally much slower than the conventional HEVC intra coding (which is used as the HEVC benchmark solution with Still Picture Profile in this paper).For instance, LFC Bi-SS is 84.1 times slower than HEVC intra coding in average.The brute force RDO considering a larger set of available prediction modes is also a contributing factor for this increased encoding complexity in the LFC Bi-SS solution.Compared to HEVC SCC, the proposed LFC Bi-SS encoding is 15.5 slower (in average) mainly due to the brute force search that is performed to a larger set of CB partition patterns and sizes and in quarter-pixel positions, instead of the integer-pixel search that is performed in HEVC SCC using the faster hash-based search algorithm [32].It is also important to notice that the encoding time using HEVC SCC may vary considerably for different LF images, mainly due to the different runtime complexity that can be achieved by the used hash-based search algorithm.This fact can be observed in Table 7 by comparing the significantly different encoding time ratios between Laura and Flowers.Additionally, it can be seen that the proposed LFC Bi-SS solution is 2.0 times slower than LFC Uni-SS and 1.7 times slower than LFC Restricted-SS (in average).In this case, this increase in encoding complexity is observed mainly due to the locally optimal rate constrained algorithm that is used for finding the two possible predictors Uni-SS and Bi-SS candidates.For instance, a search window with  (quarter) pixel positions will require ( + 1)× SAD computations in the LFC Bi-SS solution compared to the  SAD computations needed for the LFC Uni-SS solution, where  is the maximum number of iterations allowed in the locally optimal rate constrained algorithm (in this work  = 2).
Nevertheless, note that a fast search approach can still be adopted for the proposed LFC Bi-SS solution, for instance, by taking advantage of the regular SS vector distribution depicted in Fig. 4. In this case, instead of performing the full search algorithm, the number of positions that are visited for the block-based matching algorithm can be significantly reduced by considering only the most probable positions shown in Fig. 4.
Regarding the decoding complexity (Table 8), it can be observed that the proposed LFC Bi-SS decoding complexity does not vary significantly with respect to the LFC Uni-SS and LFC Restricted-SS solutions.However, the proposed LFC Bi-SS is 12.9 and 15.3 times slower than HEVC and HEVC SCC, respectively, mainly due to the quarter-pixel interpolation filter that is used for the SS compensation and due to the larger set of coding possibilities (e.g., prediction modes, CB partition patterns) which require more time for parsing at decoder.

Comparison between Bi-SS and GPR-based LF image coding
To complete the LFC Bi-SS RD performance analysis, Table 9 compares the results presented in [22] for the LFC GPR benchmark solution with the LFC Bi-SS solution RD results when using the same set of LF images as well as the same test coding conditions adopted in [22].Namely, the results are shown using the BD metrics in terms of the overall PSNR with respect to HEVC reference software [36] with "Intra, main" configuration and considering four different QP values: 22, 27, 32, and 37 (as in [22]).
From these results, it can be seen that the proposed LFC Bi-SS solution is able to significantly improve the overall RD performance compared to the LFC GPR, leading to further bit savings of 43.9% with respect to HEVC (in average) compared to the 36.1 % of bit savings achieved by the LFC GPR solution (also with respect to HEVC).
Additionally, Table 9 also compares the encoding and decoding time ratios (with respect to HEVC) observed for both LFC Bi-SS and LFC GPR solutions.It can be seen that the LFC GPR solution is able to significantly reduce the encoding complexity by reducing the training data as proposed in [22].However, the implicit prediction used in the LFC GPR solution is still a bottleneck in the decoding time since the same process for finding the nearest neighbor patches and determining the weighting coefficients used in the encoder needs to be repeated at decoder side.For this reason, the proposed LFC Bi-SS is shown to be significantly faster than the LFC GPR at the decoder side.

Visual Quality Inspection
In addition to the presented BD results, a portion of a central view from each of three different LF test images was used for a visual inspection in Fig. 9.For rendering the views, the algorithm proposed in [47] and referred to as Basic Rendering algorithm was used.For the image in Fig. 9c, the Light Field Toolbox version 0.4 [49] was firstly used to transform from the hexagonal to a square grid of MIs.For all compared solutions, the quantization parameter (QP) [23] of the encoder was Table 9. RD performance and complexity comparison between LFC Bi-SS and GPR-based solution proposed in [22] considering QP values 22, 27, 32, and 37 (according to [22]) LF Image LFC Bi-SS vs. HEVC LFC GPR vs. HEVC (from [22])  [22] Table 8.LFC Bi-SS decoding complexity regarding the benchmark solutions with QP value set to 32 (for each image in Fig. 6) adjusted to lead to the same bpp value for all images in5 Fig. 9. Notice that, since there is still no consensus on the scientific community regarding subjective evaluation methodologies for LF content, these results are shown here as an illustrative qualitative analysis of the proposed coding solution.
From a visual inspection of the views rendered from the coded raw LF images in Fig. 9, it is possible to conclude that the proposed LFC Bi-SS presents considerably better visual quality than HEVC and improvements are also noticeable when compared to the LFC Restricted-SS solution (e.g., for Demichelis Spark and Zhengyun1 LF images).Furthermore, it can be seen that the coding artifacts are less evident but still noticeable in the case of Fig. 9c.

Conclusions
This paper proposed an LF image coding solution based on self-similarity compensated bi-prediction (Bi-SS) where two predictor blocks can be jointly estimated from the same search window.As discussed in this paper, the proposed HEVC-based LFC coding architecture was shown to be advantageous in terms of the simplicity of the coding format, which is less dependent on a very precise LF camera calibration process, while keeping the encoder/decoder complexity and memory load comparable to HEVC inter coding.In addition, the proposed LFC Bi-SS led to significantly superior performance when compared to HEVC Main Still Picture Profile, presenting gains of up to 4.3 dB (or 61.1 % of bit savings).Furthermore, jointly estimating the two candidate blocks for Bi-SS prediction led to further RD improvements when compared to the case in which only one candidate block is estimated (with up to 44.1 % of bit savings with respect to the LFC Uni-SS solution), as well as compared to the case in which to the two candidate blocks are independently estimated (with up to 16.9 % of bit savings with respect to the LFC

Fig. 1
Fig.1Coding architecture of the proposed LFC Bi-SS solution based on HEVC (the novel and modified blocks are highlighted in blue).The HEVC blocks for motion estimation and compensation are here omitted since temporal prediction is not used in the proposed light field image coding solution.
prediction: (a) inherent MI cross-correlation in a light field image neighborhood; (b) Bi-SS estimation process (example of a second candidate block and SS vector for bi-prediction is shown in dashed blue line); and (c) Heat map showing the SS vectors distribution when coding an LF image

( 1 whileFig. 3
Fig.3Algorithm for jointly estimating the two predictor blocks  ̃( −  0 ) and  ̃( −  1 ) for the proposed Bi-SS candidate predictor.The index  defines which of the two vectors ( 0 or  1 ) will be optimized in a particular iteration , while the index  defines the vector that will be kept fixed.

Fig. 4
Fig. 4 Bi-predicted SS vector distribution (from left to right): heat map of SS vector distribution for first candidate predictor, heat map of SS vector distribution for second candidate predictor, distribution of first (in blue) and second (in red) SS vectors along the encoded raw image.Results are illustrated for: (a) Plane and Toy (frame 23); and (b) ISO_Chart_12.

Fig. 5 RD
Fig. 5 RD performance of the proposed LFC Bi-SS solution (compared to the LFC Uni-SS solution) with different sets of weighting coefficients (QP values 22, 27, 32, 37, and 42).The results are show for two LF test images: (a) Seagull, and (b) Vespa

Fig. 9
Fig. 9 Comparison of a portion from the central view rendered from (from left to right): original image; compressed image using HEVC; compressed image using LFC Restricted-SS; and compressed image using the proposed LFC Bi-SS solution.The results as shown for the bpp values: (a) 0.05 bpp for Demichelis Spark; (b) 0.06 bpp for Zhengyun1; and (c) 0.09 bpp for Fountain_&_Vincent_2

Table 2
Influence of MI cross-correlation in mode selection statistics and RD performance for low bitrate

Table 1
Influence of MI cross-correlation in mode selection statistics and RD performance for high bitrate

Table 4 .
Description of LF test images in Fig.6HEVC -In this case, the original LF image is encoded with HEVC

Table 6 .
LFC Bi-SS overall RD performance with respect to the benchmark solutions for each image in Fig.6(QP values27, 32, 37, and 42)

Table 5 .
LFC Bi-SS overall RD performance with respect to the benchmark solutions for each image in Fig.6(QP values22, 27, 32, and 37)

Table 7 .
LFC Bi-SS encoding complexity regarding the benchmark solutions with QP value set to 32 (for each image in Fig.6)