An FFT-based acquisition scheme for DS-CDM a systems

This paper introduces an efficient acquisition/correlation technique for DS-CDMA systems using a frequency-domain approach employing TCH-based training blocks (Tomlison, Cercas and Hughes). The classical time-domain active acquisition technique is compared with the proposed passive matched-filter type frequency domain technique. Moreover using the fact that an N-point discrete Fourier transform (DFT) can be partitioned into M smaller DFTs, we present a procedure for simultaneous decoding/despreading and synchronization that switch between 16 bit-length and 256 bit-length cyclic codes thus providing code rate variability.


I. INTRODUCTION
In today's spread spectrum (SS) communication systems synchronization remains a key implementation factor.Synchronization is mainly a problem of maximizing correlations.The correlation search space can be addressed in a parallel, serial or hybrid manner.The parallel based approach is fast but costly and complex to implement.The serial method is slow but simple to implement and therefore it is the most used technique.The hybrid method [1] exploits the trade-off between speed and complexity.
In this paper we propose an FFT-based method to acquire the synchronism in a DS-CDMA system.As an alternative we can use MC-CDMA where the spreading is performed in the frequency domain and the time synchronization requirements are much lower [10].However, in MC-CDMA, the transmitted signal have strong envelope fluctuations leading to power amplification difficulties [11,12].Our method uses cyclic spreading codes selected from a new family of linear block cyclic codes, named TCH after Tomlison, Cercas and Hughes, that were presented in [4] together with a receiver structure that exploits the relation between correlation in the time domain with multiplication in the frequency domain.The proposed scheme permits simultaneous decoding/de-spreading while acquiring synchronism.Several code rates can be used using the same receiver structure.The possibility of changing the code rate or data rate is important because mobile communication channels have impulse responses that are time variant i.e. channels with fading.This paper is organized as follows: In section II we present the system characterization.In section III we present the TCH synchronization procedure.In section IV we discuss simulation results and in section V we draw some conclusions.

A. Typical operations
In digital radio communications, spread spectrum (SS) techniques, in particular code division multiple access (CDMA) is becoming increasingly important.Each user is identified by a unique spreading sequence that modulates the transmitted/received data.In such a communication system the transmitter takes an input sequence that is spread by a pseudonoise (PN) sequence and the resulting signal is subsequently modulated onto a sinusoidal carrier.The receiver performs the corresponding operations that typically involve downconversion, chip matched filtering with the conjugate complex of the transmit filter and sampling at the chip intervals.Subsequently the signal is de-spread.This is accomplished by correlating the received samples with a synchronized replica of the PN code.
In practice, a DS-CDMA based system receiver must replicate the pseudo-noise (PN) sequence transmitted and shift the phase of the sequence replica until it correlates with received SS signal.When the phase of the receiver sequence replica matches the phase of the incoming PN sequence, there is maximum correlation.When there is a code phase offset between the two signals, there is a low level correlation.It is important to understand that a receiver must also detect the incoming carrier signal by replicating the carrier frequency plus Doppler.Thus it can be stated that DS-CDMA signal acquisition and tracking process is a two dimensional (PN sequence phase and carrier frequency) signal replication process.
The time/frequency uncertainty region composed by unitary search cells (Fig. 1) is defined by system and receiver characteristics.To represent the time-frequency uncertainty range a two-dimensional state matrix is used.The matrix represents the quantization of the uncertainty range, in the PN code-phase axis and in the PN frequency-offset axis (due to oscillator drifts and Doppler effects).The region to search is given by m code phase hypotheses and n carrier frequency offset hypotheses.Therefore there are m by n cells to be tested by the acquisition part of the receiver.

B. System characterization
We consider a general direct sequence spread spectrum (DS-SS) modulation system described as follows: The data waveform is given by d(t)=d n , nT s ≤ t < (n+1)T s where {d n } is the binary data symbol with values in {-1,1} and n an integer.The spreading sequence is c(t)=c k , kT c ≤ t < (k+1)T c where {c k } is the code chip with values in {-1,1}and k an integer.The spreading factor L is given by L=T s /T c with T s the symbol time and T c the chip time interval.The chip shaping is p(t) and the spread signal (for a single user) is Root raised cosine filters split between the transmitting and receiving sections are used for p(t).The multi-user case is implemented with code-division multiple access (CDMA) where each user is assigned a different spreading code.In this work we use TCH codes.

C. TCH codes
Non-linear TCH (Tomlinson, Cercas, Hughes) cyclic codes of length L=2 m , m being a positive integer, are codes that can be defined by one or more generator polynomials.Each generator polynomial represents an L bit cyclic codeword, all its L-1 circular shifted codewords, and can also represent all the L respective negated codewords.Thus, when using p generator polynomials, the number of information bits, K TCH , of a TCH code defined in this manner is given by: Due to the cyclic nature of these codes and their length being a power of two, a maximum-likelihood decoder can be efficiently implemented using the Fast Fourier Transform (FFT), as presented in figure 2. The decoder only needs to perform an FFT of the received word, multiply it by the FFT of each generator polynomial (which can be previously calculated and stored in the spectra lookup table) and apply the inverse FFT (IFFT) to each result.To make the decoder more efficient, the generator polynomials can be grouped into pairs forming complex sequences (one of the polynomials is the real part and the other is the imaginary) and the FFT transform of these sequences stored in the spectra lookup table.This reduces to half the number of stored transformed sequences, the number of complex multiplications required and also the number of IFFT's.So it is only necessary to perform one FFT of order L, p/2 complex vector multiplications and p/2 IFFT's of order n to obtain the correlation of the received word with all of the 2 m+log2(p)+1 possible codewords.The results of these correlations are fed to the peak and sign detector that finds the polynomial, the number of shifts and the sign corresponding to the highest correlation value.The result is then used by the last decoder block that outputs the corresponding most probable transmitted data word.

D. Synchronization evaluation
Let τ be the time delay in the received sequence and the locally generated reference sequence.If T c is the chip duration and L the sequence length, then τ ∈[0, LT c ] and τˆ is its estimate.Synchronism is acquired if the following condition is satisfied: Usually δ=1/2, i.e. the acceptable timing error is half a chip.Though τ is a continuous variable, practical implementation requires the discretization of estimate values range.In this paper the evaluation of the receiver architecture is based on the first attempt acquisition success probability, P D (or alternatively on the acquisition error probability, 1-P D ) in a finite time instead of the traditional mean acquisition time.P D is the probability of estimating correctly the code phase of the incoming sequence the first time it is tested.

III. TCH SYNCHRONIZATION PROCEDURE
In direct sequence code-division multiple access (DS/CDMA) wireless mobile systems, the first step of processing at the receiver in the mobile station is to synchronize the locally generated pseudo-noise (PN) scrambling sequence to the tracking range of the received waveform.This process is generally known as initial code acquisition, or scrambling code acquisition.The conventional serial search scheme is normally simple in hardware, but the acquisition time is very long for long-duration PN sequences because its mean acquisition time is directly proportional to the period of the PN sequence employed, since several correlations must be performed [3] (one for each code phase tested).An alternative is to trade the time needed for performing correlations in the time domain with the slighter more complex algorithm (but taking a lesser processing time) of doing the same operation in the frequency domain.

A. Classical vs proposed synchronization
There are several well documented acquisition techniques in literature [3], [8].The classical sliding window correlation method (fig.3) is based on the comparison between the received PN sequence and a locally generated replica.If the correlation level doesn't exceed a certain threshold, the receiver will increment the offset between the two sequences by half a chip and repeats the correlation procedure.When the given threshold is exceeded the signal acquisition is completed.
Complexity reduction accomplished by using this method is set back by a considerable increase of the acquisition period.It can be easily be seen that in a worse case scenario, the receiver will need to perform the correlation procedure 2L -1 times until it reaches signal acquisition (L being the sequence period).Ideally, the minimum acquisition time of any sequence is proportional to the sequence period.One possible implementation strategy is to use 2L parallel correlation circuits (considering half chip offsets).Although with optimal performance, it is not feasible or practical when dealing with long period sequences (high p values).
Recent technological breakthrough has made possible the development of digital signal processing (DSP) with high performance and low costs.The use of DSP technology offers the possibility of producing optimal acquisition circuit as an alternative to classical techniques.The frequency domain technique is described in [4], [6], [7] where the equivalence between time domain techniques and the frequency domain technique is exploited.For two time domain sequences r and s the correlation is given by r s R m S m i l i i p ( ) ( ) ( ) where R and S are the discrete Fourier transforms of r and s given respectively by ( 5) and ( 6 The number of operations needed to calculate a single correlation point using the classical technique (with an offset of half a chip) is 2p algebraic multiplications and 2p-1 algebraic additions.Knowing that the mean offset value between the incoming signal and the local replica is p/2, it can be concluded that the average equivalent number of basic operations regarding the classical method is given by: Based on the evidence that a complex multiplication equals four algebraic multiplications and two algebraic additions and that the additions can be ignored in terms of calculus requirements, it can be defined a calculus reduction ratio (CRR) as the quotient of the basic operations between the two considered techniques: This gain factor is 4 for 16 chip sequences and over 20 for 256 chip sequences.The CRR should not be interpreted nor be a reference to infer on the relation between acquisition time which depends on the electronic implementation of both techniques.The undeniable advantage of the FFT based technique rest in the possibility of acquire a certain received SS signal with only a single sequence period sample.This equals the improbable best case scenario with time domain techniques.The expected (mean) acquisition time value for time domain techniques equals p sequence periods (for half chip offset resolution).

A. DFT Partition
In the frequency based decoder we use an N-point DFT for each N-chip length code.It turns out (See appendix) that an Npoint DFT can be partitioned into M smaller L-point DFT's where N=L•M.Consider the case where in the process of calculating a 256-point DFT we also get 16 DFT's of 16 points.This is an elegant result where the same algorithm is used to process a 256-chip code or 16 times 16-chip codes.For synchronization purposes we use either a 256-chip code or a 16-chip code and for data communication we can also switch between codes of length 256 and 16.In fact, we could also use codes of 32, 64 and 128 chips but those have inferior correlation properties.In order to study the applicability of this partition we developed a set of Matlab functions and simulated the synchronization operation.

B. Communication and Synchronization Frame Format
Let's assume that a receiver is turned on.Although there is a frame format predefined for the data communication the receiver doesn't know where one frame starts or ends.Therefore, one possible solution is to force a transmitter to periodically broadcast a certain fixed code that the receiver searches and try to lock on.If this code is identified, the rest of the data can be understood correctly.In Fig. 5 we assume a radio frame of T F seconds with 16 slots in each frame.For a SS-CDMA system T F is usually around 10 ms.We also assume that each slot is used to transmit 1024 chips.The first slot in each frame is used to transmit a synchronizing code.This is different from the usual approach of repeatedly transmitting a synchronizing code with a period of one slot.
In the context of synchronization we can anticipate several possible time instants for the receiver initial search: a) the receiver starts to load a sequence in the slot reserved for synchronization.At least one full sync code will be loaded and processed, b) the receiver starts to load at the end of the sync slot such that it will not catch one full sync code but only a portion of it, c) the receiver starts to load a sequence in the middle of the frame and therefore must wait until the next sync slot is transmitted To investigate these possibilities an input sequence was chosen with the format depicted in Fig. 6.

C. Synchronization Matlab Script
The decoding/despreading operation which is also used for synchronization acquisition was simulated in Matlab [5].A typical running script is presented below in algorithmic form: Generate TX sequence with sync slot Reshape sequence (Interleaver) as discussed in Fig. A1 For each noise_step For iter=1 to max_iterations Add Gaussian Noise to reshaped sequence For each code_delay_step Estimate delay with FFT processing and compute probabilities End End End We start by defining an input sequence m which is then cut (stripping bits) at the head or tail to simulate the three possible initial time instants of Fig. 6. (See appendix and Fig. 7).Afterwards, sequence m is reordered due to the DFT partition.This is illustrated in Fig. 5 for a smaller length sequence.Then the three step procedure for de-spreading is applied and then we analyze the correlation peaks to check for sync codes existence.Fig. 7. Example of a 512 bit data sequence including the sync codes.This example corresponds to case (a) of Fig. 6.By changing the offset other time instants can be simulated.
To start the synchronization procedure the receiver loads a sequence of 512 bits into a FIFO memory (First-in first-out).This sequence is partitioned into 2 sub-sequences of 256 bits each.Each sub-sequence will have to be correlated with the known code words used for synchronization purposes.If a match is found the position of the correlation peak can report an offset as to where the code starts.If a match is not found a new group of 512 bits is loaded and the correlation process starts over.

D. Performance results
For all simulations we used a Monte Carlo analysis with 1000 iterations.For each point we estimate the chip phase delay for misalignments between the received and local sequences of -8, 0 and 8 chips and average the probability of incorrect estimation over all iterations.The channel noise was specified by values of signal to noise ratio, SNR/chip and then converted to the metric Eb/No ranging from -12 dB to -12 dB as appropriate.Two cases were considered.The first case considers one single user.The probability (considering only the first attempt) of estimating correctly the code phase delay is illustrated in figure 8.Both ideal and non-ideal sampling were simulated.The ideal instant corresponds to sampling the chip waveform when it reaches the peak.The non-ideal sampling corresponds to sampling the chip waveform one sample after the ideal instant case (four samples per chip were considered).The second case considers 2 and 4 simultaneous (asynchronous) users and results are presented in figure 9.
For both cases we used Root Raised Cosine filters, half in the transmit section, half in the receiver section with a roll-off factor of 0.22.The TCH code used for synchronization has a period length of 256 chips.In figure 10 a comparison between sync codes of 16 and 256 chips is depicted.

V. CONCLUSIONS
Digital communications as present in a wireless communication link are impaired by several factors namely the different fading rate of mobile radio channels.In order to minimize the bit error rate the receiver should adapt to these changes.In this paper we presented a method where data can be decoded and synchronism acquired using a basic frequency domain processing for different code rates.The scheme is appropriate for a DSP implementation in the context of software radio architectures.[ ] .From this result we can see that a N-point DFT can be computed by taking M smaller L-point DFT's and then combining these into a larger DFT using L smaller Mpoint DFT's.From the computation point of view this result can be implemented in a three-step procedure: For each of the columns m=0,...,M-1, compute an L-point DFT and store the result in array F(p,m).Note that this corresponds to the inner sum of (A3.c).Moreover, note that the DFT are computed for sequences stored in the columns of x[l,m] but due to partition of (A1.a) these sequences were stored in a special order, i.e., not on a column basis but on a row first basis (see Fig. A1).Note that this sequence re-ordering is in fact a type of interleaver.

Fig. 2 -
Fig. 2 -Basic Maximum Likelihood Decoder for a nonlinear cyclic code with codeword length 2 m

Fig. 4 .
Fig. 4. Frequency domain correlation technique is equivalent to say that the circular convolution of Nlength sequences r[n] and s[n] can be computed through the multiplication of their corresponding DFT's, R[k] and S[k].The cross-correlation between y[n] and x[n] can be calculated by the convolution of y[n] with x[-n].The correlation coefficients can be obtained by applying inverse discrete Fourier transform: also the advantage of being able to test simultaneously two independent local sequences ( c j and c j+1in figure4).The calculus of a FFT or IFFT with p samples requires Considering that the resulting FFTs of the local replica signals can be held in memory blocks, the number of basic operations is:

Fig. 5 .
Fig. 5. Example of a 32 bit sequence partitioned into two 16 bit subsequences

Fig. 6 .
Fig. 6.Initial time instant possibilities for synchronization search

Fig. 8 .
Fig.8.Probability of making an incorrect estimation of the code delay for a misalignment between the local and received code sequence for a single user.In this case each point represents the average for delays between -8 and 8 chips.Degradation from non-ideal sampling is illustrated.

Fig. 9 .
Fig. 9. Probability of making an incorrect estimation of the code delay for a misalignment between the local and received code sequence considering 2 and 4 simultaneous (asynchronous) users.
the periodicity property of the twiddle factor