

# A Low-Power 64-point FFT/IFFT Architecture for Wireless Broadband Communication

K. Maharatna, E. Grass and U. Jagdhold

Department of Systems Design  
IHP-GMBH

Technology Park 25, D-15236, Frankfurt (Oder), Germany

email: [maharatna@ihp-ffo.de](mailto:maharatna@ihp-ffo.de)  
FAX +49-335-5625 671

## Abstract

A low power 64-point FFT/IFFT architecture is developed for the application in OFDM based wireless broadband communication system. The proposed architecture satisfies the specifications of IEEE 802.11a and ETSI Bran. The architecture requires 25% multiplication and 86% addition/subtraction operation compared to the conventional Cooley-Tukey approach. This leads to power and area saving as well. The architecture is capable to perform FFT and IFFT without changing the internal coefficients which makes it highly suitable for practical applications.

Key words: FFT/IFFT, OFDM, Low power.

## 1. Introduction

FFT/IFFT is an integral component of the Physical layer (PHY) of Orthogonal Frequency Division Multiplexing (OFDM) based wireless broadband communication system. The specifications of IEEE 802.11a [1] and ETSI Bran [2], which in essence forms the basis of such a communication system, shows that FFT/IFFT is one of the most computation intensive component of the PHY layer. According to these specifications, the transceiver of the OFDM based wireless communication system has to perform 64-point IFFT (in the transmit direction) or FFT (in the receive direction) in 3.2  $\mu$ sec. This implies that one has to use highly specialized architecture to satisfy this tight timing constraint. It is obviously possible to use the conventional Cooley-Tukey algorithm [3] for this purpose but to meet the specification one has to employ a highly parallel

structure or use a very high frequency of operation that leads to high area and power consumption. Thus, it is necessary to develop simple but efficient design methodology to keep the area and power consumption as low as possible and at the same time satisfying the timing constraint.

In this paper, we propose a power efficient 64-point FFT/IFFT architecture that satisfies the specification of IEEE 802.11a and ETSI Bran. The architecture internally uses only 8-point FFT for computation of the 64-point FFT/IFFT. The performance analysis of the architecture exhibits its superiority compared to the conventional butterfly approach. The rest of the paper is structured as follows: in Section 2, the mathematical formulation of it is discussed and the architecture is described in Section 3. Section 4 is devoted for the performance evaluation of the proposed structure and conclusions are drawn in Section 5.

## 2. The mathematical formulation

The FFT  $A(r)$  of a complex data sequence  $B(k)$  of length  $N$  where  $r, k \in \{0, 1, \dots, N-1\}$  can be described as,

$$A(r) = \sum_{k=0}^{N-1} B(k) W_N^{rk} \quad (1)$$

where  $W_N = e^{-2\pi j/N}$ . One may formulate the radix-8 representation of the FFT in the following manner:

Let,  $N = 8T$ ,  $r = s + Tt$  and  $k = l + 8m$ , where,  $s, l \in \{0, 1, \dots, 7\}$  and  $m, t \in \{0, 1, \dots, T-1\}$ . Applying these values in equation (1) and simplifying one gets,

$$A(s + Tt) = \sum_{l=0}^7 W_8^{lt} [W_{8T}^{sl} \sum_{m=0}^{T-1} B(l + 8m) W_T^{sm}] \quad (2)$$

For  $N = 64$ ,  $T = 8$ . Thus, the 64-point FFT can be expressed as

$$A(s + 8t) = \sum_{l=0}^7 [W_{64}^{sl} \sum_{m=0}^7 B(l + 8m) W_8^{sm}] W_8^{lt} \quad (3)$$

Equation (3) shows that the 64-point FFT can be computed by first taking 8-point FFT of the appropriate data slot (described in equation (3)) then multiplying them with 8 interdimensional constants and then once again taking 8-point FFT of the resultant data.

The IFFT can be performed by first swapping the real and imaginary parts of the incoming data and then performing the forward FFT on them and once again swapping the real and imaginary parts of the data at the output. This methods allows one to perform the IFFT without changing any internal coefficients and thus, resulting in more efficient hardware implementation.

### 3. Architectural description

The basic architecture of the proposed 64-point FFT/IFFT module is shown in Figure 1. It utilizes one input buffer, one 8-point FFT module, an internal buffer and four real multipliers. According to the specification of IEEE 802.11a and ETSI Bran, the FFT block receives the data every 4  $\mu$ sec for duration of 3.2  $\mu$ sec. in serial manner. The input data slots are stored in this buffer every 4  $\mu$ sec and the 8-point FFT module fetches the data from the buffer as soon as the computation of 64-point FFT for a particular data slot is completed.

After computation of first 8-point FFT on the initial input data sequence, the resultant data undergoes the interdimensional constant multiplication operation. We observed that in the present approach for computing 64-point FFT one requires only 49 complex multiplication operations that can be performed by using only 9 unique interdimensional constants. The multiplied data are stored in an internal register 'cb' (shown in Figure 1) from where they are rerouted to the 8-point FFT module in appropriate order as described in equation (3) to generate the final result. The final results are stored in the buffer cb once again from where the output is generated in serial manner.

The input mechanism, the internal computation process and the data output mechanism are carried out in pipelined fashion. The parallelism and pipelining introduced in this architecture is favorable from the power consumption point of view.

To perform the FFT and IFFT using the same architecture, we introduce a signal 'mode'. The logic LOW state of the 'mode' signal set the processor to operate in forward FFT mode while its logic HIGH state enables the processor to perform IFFT operation.

Two additional signals 'data\_valid' and 'data\_next' are kept in the design that indicate input valid data and output valid data respectively. These signals are important from the viewpoint of the integration of the complete wireless broadband communication systems. They indicate valid data operation condition to the previous and the next module of the system. Thus, this FFT/IFFT processor can be utilized as the stand-alone processor or it can be integrated with other required components to form a complete system.

### 4. Performance of the architecture

From the algorithmic point of view, the proposed architecture requires less number of arithmetic computations compared to that of the conventional Cooley-Tukey algorithm. This is shown in Table 1.

| Algorithm        | Complex Multiplication | Addition / subtraction |
|------------------|------------------------|------------------------|
| Cooley-Tukey [3] | 192                    | 1152                   |
| Proposed         | 49                     | 994                    |

Table 1. Comparison of the number of arithmetic operations with the Cooley-Tukey algorithm

The above comparison shows that the proposed architecture requires 25% real multiplication compared to that of the conventional approach. In terms of the number of addition, the proposed architecture requires 86% of those required in the conventional approach. This results into significant reduction of power dissipation and high speed operation.

The architecture is first coded in VHDL and then simulated using Mentor graphics' Modelsim simulator. For convenience, the simulation result of FFT for pure cosine function input is shown in Figure 2(a). The result of IFFT

on the resulting data is shown in Figure 2(b) that shows the functional correctness of the architecture. The architecture is synthesized for  $0.25\mu\text{m}$  CMOS technology at 20 MHz clock frequency using Synopsis Design Analyzer tool. The synthesized circuit is simulated using the same simulator that once again exhibits the correctness of the structure. The synthesis result shows that the area consumption of the complete FFT core is  $2.4117 \text{ mm}^2$  that is equivalent to 81.666K inverter count at that technology. At the operating frequency of 20 MHz the power consumption of the whole structure is 66.625 mW at the peak. Figure 3 shows the power consumption of the processor at different stages of operation.

At 20 MHz clock frequency the core architecture is capable to compute 64-point FFT/IFFT in 0.9  $\mu\text{sec}$ . However, with the serial input and serial output circuitry, the throughput of the architecture (here throughput is considered as the full serial output of 64 transformed data) is 3.15  $\mu\text{sec}$ . These figures indicate that the proposed architecture is highly suitable for application in OFDM based wireless broadband communication systems.

## 5. Conclusion

A FFT/IFFT processor that satisfies the specification of IEEE 802.11a and ETSI Bran is described in this article. The architecture gives advantage in terms of area, timing and power dissipation. The proposed one can be used as a stand-alone processor or can be integrated with other components to construct a complete wireless broadband communication system.

## References

- [1] IEEE P802.11a/D7.0, "Draft supplement to standard [for] information technology-telecommunications and information exchange between systems - local and metropolitan area networks-specific requirements – Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High speed physical layer in the 5 GHz band".
- [2] DTS/BRAN-0023003 V0.k (1999 – 10), "Broadband Radio Access Networks (BRAN); HIPERLAN type 2 technical specification; Physical (PHY) layer".

[3] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series", *Math. Computation*, vol. 19, pp. 297 – 301, 1965.



Figure 1. The basic architecture of the FFT/IFFT processor.



Figure 2 (a).1. Input for the FFT



Figure 2 (a).2. Output of the FFT



Figure 2 (b).1. Input for the IFFT



Figure 2 (a).2. Output of the IFFT



Figure 3. Power consumption of the processor over different clock cycles