# Low Voltage/Low Power CORDIC Based DHT Chip Implemented Using Transmission Gate Logic on Sea of Gates K. Maharatna, A.S. Dhar and S. Banerjee Department of E & ECE, Indian Institute of Technology, Kharagpur — 721 302 ### Abstract In this paper a low voltage/low power design of an one dimensional Discrete Hartley transform (DHT) chip using transmission gate logic is described in sea of gates semicustom environment. The use of transmission gate logic is shown to be advantageous for extraction of optimum low power/low voltage performance in semicustom design environment. # 1. Introduction With the rapid advancement of VLSI technology, systems with high integration density and high clock frequency are emerging. As the microminiaturization of the system continues, the difficulty in providing adequate cooling might either add a significant cost to the systems or limit the functionality that could be provided. As a result, the concept of low voltage/low power system design comes as the headline topic. In the field of signal processing, digital technology finds immense application. It is almost obvious that the future Digital signal processing (DSP) systems should be of low power type to improve the system performance. The discrete Hartley transform (DHT) which involves only real arithmetic, is already popular in the field of DSP as an alternative to the discrete Fourier transform (DHT) [1]. DHT is very much important in the field of biomedical signal processing, particularly in medical imaging. Two dimensional fast Hartley transform is employed for processing ultrasound images, in reconstruction of image for computerized tomography [2] to mention within the numerous set of medical applications. So to find a low power/low voltage design technique for implementation of DHT is always preferable. In this paper a low power/low voltage design of a parallel processing scheme [3] for fast computation of 1 dimensional DHT having prime transform length using a set of circular CORDIC (CoOrdinate Rotation Digital Computer) processor as basic processing element is described. The paper is organized as follows, in Section 2 the modified CORDIC algorithm is discussed, Section 3 gives a brief introduction to DHT algorithm using CORDIC processor, in Section 4 the background of low power design methodology and design implementation are discussed, Section 5 depicts the power performance of the design and in Section 6 conclusions are drawn. # 2. Brief introduction to modified CORDIC algorithm The CORDIC algorithm is an iterative process for computing generalized vector rotations. The detail discussion of conventional CORDIC algorithm is given in [4, 5]. In the case of computation of transforms the angles to be computed are predefined and at the same time are normally small as the transform length is long. In this type of case the application of conventional CORDIC algorithm requires a large number of iterations and thus the computation speed suffers. Moreover the conventional CORDIC requires the scaling operation and thus demands more hardware. To avoid these problems the modified CORDIC method [3] is chosen here. In the modified CORDIC method instead of rotating the vector to and fro, the desired angle is achieved by rotating the vector in the same direction through small angles everytime. The incremental angle $\theta_{\bf k}$ is chosen such that it may be expressed as $2^{\bf k}$ and satisfy the condition $\sin\theta_{\bf k}=\theta_{\bf k}$ Thus for computation of some angle $\xi$ the rotational matrix can be decomposed into > $\cos \alpha_n = \sin \alpha_n$ $\sin \alpha_n = \cos \alpha_n$ where $\xi = \alpha_1 + \alpha_2 + \dots + \alpha_n$ In this case the third term in the sine series is neglected. This modification is valid when $k = \{4, 5, ...b \ 1\}$ where b is the bit width of the machine. Fig. 1 shows the architectural organization of a modified CORDIC block alongwith the transfer matrix and symbol [3]. The rotation operation is performed by simple shift and addition method. This CORDIC processor is used as basic processing element to realise 1D DHT which has been described in the following section. Fig. 1 Architecture of one dimensional modified CORDIC block # 3. Introduction to 1-D DHT The DHT involves only real arithmetic and for the forward and inverse DHT the transform kernel is same. This reduces hardware considerably than the DHT architecture. Different algorithm has been proposed for DHT to realize the transform in the form of array architecture. Here we have considered the permutation cycle based approach for DHT architecture [3]. Let N be a given set of data samples depicted by $f(\tau)$ , where $\tau \in \{0,1,2,....,N-1\}$ . Then the DHT of the sample set can be defined as $$H(\gamma) = \sum f(\tau) \cos(2\pi \gamma \tau/N)$$ (2) where $\gamma$ ==0, 1, 2, ...., N=1, $\cos{(\zeta)}$ == $\cos{(\zeta)}$ + $\sin{(\zeta)}$ and N is the transform length. The function 'cas' is periodic and thus $\gamma \tau$ is essentially computed modulo N. If the transform length N is prime, $\gamma \tau$ modulo N becomes a permutation of the sequence 1, 2, ...., N=1 for $\gamma, \tau \in \{1, 2, ...., N-1\}$ . Thus in the equation (2), the term under summation contains 'cas' values of all the N multiples of 0 [viz. 0, 0, 20, ..., (N=1)0] where $\theta = 2\pi/N$ . Thus the unique permutation sequence can be utilized to define the interconnection between the processors. Considering $r = \gamma \tau$ modulo N, the equation (2) can be written as $$II(\gamma) = f(0) + H_1(\gamma)$$ (3) where $$H_1(\gamma) = \sum f(\tau) \cos \tau \theta$$ , $\theta = 2\pi/N$ (4) From equation (3), one can write $$\Pi'(\gamma) = \sum f(\tau) \cos r\theta, \ \gamma, \tau \in \{1, 2, ..., N \mid 1\}$$ where $\Pi'(\gamma) = \Pi_1 \cdot (\gamma) = f(\theta)$ (5) A linear array of CORDIC rotational units can be utilized to compute $U_{N-1} \geq f(k)$ cas (N-k)0 [3], where $U_{N-1}$ is the output of the array and f(k) is the k th input signal. Using the principle described above, the architecture for 5 point DHT using CORDIC rotational unit is shown in Fig. (2) [3]. This architecture may be extended for computation of long data sequence which is essentially prime. However, in that case the interconnection is to be defined properly. Fig. (2) Architecture of 5 point DHT chip # 4. Design implementation Extensive research in past few years proposes different techniques for low power design; such as power reduction through circuit/logic design, proper algorithm selection, system integration, proper technology selection [6] etc. It has been found that maximum amount of power savings can be achieved at the algorithmic level and at architectural level. From the technology selection point of view, full custom design is best suited for low power circuit design but the cost and time effectiveness cannot be achieved. The cost as well as time effectiveness can be achieved by semicustom design approach. But the problem associated with this approach is that the area optimization cannot be achieved and as a result the area capacitance may become high which degrade the circuit performance. So it is required to adopt a methodology to extract a optimum low power performance while designing in semicustom environment. For the low power circuit application, pass transistor logic is a suitable candidate [6] due to their inherently low capacitances. It has been shown that for the sea of gates semicustom design approach, the application of transmission gate logic (IGL) comes out as a potential candidate [7, 8] when low power circuit design is concerned. The sea of gates technology is a special gate array type of semicustom design technology without the presence of predefined routing channels. The symmetrically placed NMOS and PMOS transistors of fixed dimensions about the power rails enable a clean and compact implementation of the circuits [9]. The DHT architecture design is done using transmission gate logic on sea of gates technology. The software used for the design of the circuit is the 'OCEAN', developed in Delft University of Technology (Netherlands) following C3DM (Philips) 1.6μm double layer CMOS technology. The minimum size transistor dimensions are 1.6 μm x 23.2 μm (NMOS), 1.6μm x 29.6μm (PMOS), transistor pitch 8μm, metal layer width 2.4μm, poly gate resistance 700Ω and 950Ω 1.6μm x 29.6μm (PMOS), transistor pitch 8μm, metal layer width 2.4μm, poly gate resistance 700Ω and 950Ω for NMOS and PMOS respectively, threshold voltage 0.7V and 1.1V for NMOS and PMOS respectively. The basic circuits are designed hierarchically using the minimum size transistor dimensions. In this approach a basic cell of transmission gate and an inverter circuit is constructed. With the translation of these two basic cells the CORDIC rotational unit is constructed in hierarchical manner. The CORDIC unit is then utilized as a basic macro to construct the DHT chip. Placement and routing of the basic modules are done both automatically as well as with manual intervention whenever necessary. The individual cell isolation is done by connecting the PMOS and NMOS polygates to the power rails. For some critical routing portions the polysilicon gates of the unused transistors are used. Buffers are placed at the appropriate places to compensate the loading effects. For construction of the arithmetic units; conditional sum addition technique is used [10]. The circuit diagram and layout of one 8 bit shifter module (which is a basic module of the shifter block) [7] are shown in Fig. 3 as examples. #### 5. Performance analysis The performance of the designed chip is analysed with the help of Switch Level Timing simulator (SLS) provided with the OCEAN package. This simulator has the ability to model the pass transistor logic and is capable to include the routing and parasitic capacitances while extracting netlist from the layout. The design performance is analyzed with regards to the parameters of delay, dynamic power dissipation, power-delay product (PDP) and energy-delay product (EDP). Fig. 3(a) Circuit diagram of 8 bit shifter module Fig. 3(b) Layout of one 8 bit shifter module Good performance in the low power regime can be obtained by operating the circuit at reduced supply voltages [6]. But the problem related with the reduction of supply voltage is that for the single channel transistor logic as the supply voltage approaches the threshold voltage of the transistor the delay increases. It has been shown in [7] that using TGI, this problem may be solved where delay remains constant until the supply voltage becomes approximately equal to the threshold voltage of one of the transistors. The dynamic dissipation of the extracted circuit from the layout has been carried out be using the conventional formula [6] $P = \sum p_t C_t V_{DD}^{-2} f$ where P is the dynamic dissipation term, p<sub>i</sub> is the activity constant, $C_L$ is the load capacitance, $V_{DD}$ is the supply voltage and f is the operation frequency. The activity factor may be calculated by considering initially uncorrelated inputs and then evaluating the switching probability of each nodal capacitances throughout the circuit [6]. But since the simulator used here does not have the provision to include the glitching effect (which has a considerable impact on power estimation of the circuit) the activity factor is taken as 1. The calculation using this value yields somewhat larger amount of power dissipation than the reality, but we may consider this as a good approximation to compensate the glitching phenomena and at the same time estimating the worst case limit of power dissipation. As the transistors of high threshold voltages are used here the subthreshold power dissipation term is negligible. Since the 'IGI. switches do not have direct access to the power rails the static dissipation terms are also excluded from the power estimation equation. The extracted circuit from the layout of 5 point DHT (shown in Fig. 2) exhibits a simulated delay of 158 nsec at 5V supply. In order to achieve quadratic advantage in power dissipation the supply voltage is lowered. It has been found that the supply voltage lowering does not affect the delay down to 1.5V. This fact is attributed to the phenomena that the delay of the transmission gate cell is approximately independent of the input level [6]. The simulated power dissipation of the design at 5V and 1.5V are 10 mW and 0.9612 mW respectively. The PDP at 5V supply is 1.58 nJ and the corresponding index at 1.5V is 0.151 nJ. The EDP at 5V and 1.5 V are $2.4964 \times 10^{46} \, \mathrm{Jsec.}$ and $2.3858 \times 10^{47} \mathrm{Jsec.}$ respectively. However, as the supply voltage is lowered, the peak driving current and the noise margin are both affected. So proper precautions should be taken while the circuit is intended to operate at reduced supply level. From architectural view point the circuit has the advantage of parallelism which is advantageous for low power/low voltage application [6] of the circuit. #### 6. Conclusions The design presented in this paper is based on TGL, implemented on sea of gates semicustom environment. The dynamic power dissipation, PDP and EDP show that the circuit can be utilized for low power/low voltage DSP applications. Use of transmission gate logic enables the circuit to operate at reduced supply level of 1.5V and thus quadratic power advantage can be achieved as is evident from the power estimation formula. The TGL implementation turns out to be a suitable solution for extracting optimal low power performance from the sea of gates type semicustom design approach. The use of high resolution technology may improve the circuit performance considerably. #### References [1] R. N. Bracewell, The Fourier transform and its applications. New York; McGraw-Hill, (1965). [2] C. H.Paik, G.L. Cote, J. S. Daponte and M. D. Fox, Fast Hartley transforms for spectral analysis of ultrasound Doppler signals. *IEEE Trans. Biomed. Engng* 35(10), 885–888 (1988). [3] A. S. Dhar and S. Banerjee, An array architecture for fast computation of discrete Hartley transform. *IEEE Trans. Circuits Syst.* **38**(9) 1095-1098 (1991). [4] J. E. Volder, The CORDIC trigonometric computing technique. *IRE Transactions on Electron. Comput.* EC-8(3), 330-334 (1954). [5] J. S. Walther, A unified algorithm for elementary functions. *In Proc. AFIPS Conf.* 379-385 (1971). [6] A. Bellaouar and M. I. Elmasry, Low-Power Digital VLS1 Design, Circuits and Systems. Kluwer Academic Publishers (1995). [7] K. Maharatna, A. S. Dhar and Swapna Banerjee, A 52. MHz 1.5V 16 bit high performance shifter circuit. *Proc.* 5th International Conf. on VLSI and CAD. 478-480 (1907) [8] K. Maharatna, A. S. Dhar and S. Banerjee, Low power/low voltage performance study of transmission gate logic circuits implemented on sea of gates. (Communicated). [9] P. Groeneveld and P. Stravers, OCEAN: The sea of gates design system user's manual. [10] I. S. Abu Khater, R. H. Yan, A. Bellaouar and M. I. Elmasry, A 1 V low power high performance 32 bit Conditional sum adder. *IEEE Symposium on low power electronics* (1994).