India International Centre, New Delhi 15th May 2016, www.conferenceworld.in

(ICSTM-16)

ISBN: 978-81-932074-8-2

# AREA – DELAY EFFICIENT FIR FILTER ARCHITECTURE FOR FIXED AND RECCONFGURABLE APPLICATIONS

O. Ponthilakji<sup>1</sup>, Mr. Arumugam<sup>2</sup>

<sup>1</sup>PG scholar, <sup>2</sup>Assistant Professor

Department of ECE, Sasurie Academy of Engineering, Coimbatore(India)

#### **ABSTRACT**

Transpose form finite-impulse response (FIR) filters are inherently pipelined and support multiple constant multiplications (MCM) technique that results in significant saving of computation. However, transpose form configuration does not directly support the block processing unlike direct form configuration. In this paper, we explore the possibility of realization of block FIR filter in transpose form configuration for area-delay efficient realization of large order FIR filters for both fixed and reconfigurable applications. Based on a detailed computational analysis of transpose form configuration of FIR filter, we have derived a flow graph for transpose form block FIR filter with optimized register complexity. A generalized block formulation is presented for transpose form FIR filter. We have derived a general multiplier-based architecture for the proposed transpose form block filter for reconfigurable applications. A low-complexity design using the MCM scheme is also presented for the block implementation of fixed FIR filters. The proposed structure involves significantly less areadelay product (ADP) and less energy per sample (EPS) than the existing block implementation of directform structure for medium or large filter lengths, while for the short-length filters, the block implementation of direct-form FIR structure has less ADP and less EPS than the proposed structure. Applicationspecific integrated circuit synthesis result shows that the proposed structure for block size 4 and filter length 64 involves 42% less ADP and 40% less EPS than the best available FIR filter structure proposed for reconfigurable applications. For the same filter length and the same block size, the proposed structure involves 13% less ADP and 12.8% less EPS than that of the existing direct-form block FIR structure.

Index Terms—Block processing, finite-impulse response (FIR) filter, reconfigurable architecture, VLSI.

#### I. INTRODUCTION

Finite Impulse Response (FIR) digital filter is widely used in several digital signal processing applications, such as speech processing, loud speaker equalization, echo cancellation, adaptive noise cancellation, and various communication applications, including software-defined radio (SDR) and so on. Many of these applications require FIR filters of large order to meet the stringent frequency specifications. Very often these filters need to support high sampling rate for high-speed digital communication. The number of multiplications and additions required for each filter output, however, increases linearly with the filter order. Since there is no redundant

India International Centre, New Delhi

(ICSTM-16)

15th May 2016, www.conferenceworld.in

ISBN: 978-81-932074-8-2 computation available in the FIR filter algorithm, real-time implementation of a large order FIR filter in a resource constrained environment is a challenging task. Filter coefficients very often remain constant and known a priori in signal processing applications. This feature has been utilized to reduce the complexity of realization of multiplications. Several designs have been suggested by various researchers for efficient realization of FIR filters (having fixed coefficients) using distributed arithmetic (DA)and multiple constant multiplication (MCM) methods. DA-based designs use lookup tables (LUTs) to store pre computed results to reduce the computational complexity. The MCM method reduces the number of additions required for the realization of multiplications by common sub expression sharing, when a given input is multiplied with a set of constants. The MCM scheme is more effective, when a common operand is multiplied with more number of constants. Therefore, the MCM scheme is suitable for the implementation of large order FIR filters with fixed coefficients. But, MCM blocks can be formed only in the transpose form configuration of FIR filters. Block-processing method is popularly used to derive high-throughput hardware structures. It not only provides throughput-scalable design but also improves the area-delay efficiency. The derivation of block-based FIR structure is straightforward when directform configuration is used, whereas the transpose form configuration does not directly support block processing. But, to take the computational advantage of the MCM, FIR filter is required to be realized by transpose form configuration. There are some applications, such as SDR channelizer, where FIR filters need to be implemented in a reconfigurable hardware to support multi standard wireless communication. Several designs have been suggested for efficient realization of reconfigurable FIR (RFIR) using general multipliers and constant multiplication schemes. Chen and Chiueh have proposed a canonic sign digit (CSD)-based RFIR filter, where the nonzero CSD values are modified to reduce the precision of filter coefficients without significant impact on filter behaviour. But, the reconfiguration overhead is significantly large and does not provide area-delay efficient structure. The architectures are more appropriate for lower order filters and not suitable for channel

#### II. EXISTING WORK

The existing structure for block FIR filter is [based on the recurrence relation of shown in Fig. 6 for the block size L = 4. It consists of one coefficient selection unit (CSU), one register unit (RU), M number of inner product units (IPUs), and one pipeline adder unit (PAU). The CSU stores coefficients of all the filters to be used for the reconfigurable application. It is implemented using N ROM LUTs, such that filter coefficients of any particular channel filter are obtained in one clock cycle, where N is the filter length. The RU receives xk during the kth cycle and produces L rows of S0k in parallel. L rows of S0kare transmitted to M IPUs of the proposed structure. The M IPUs also receive M short-weight vectors from the CSU such that during the kth cycle, the (m + 1)th IPU receives the weight vector cM-m-1 from the CSU and L rows of S0k form the RU. Each IPU performs matrixvector product of S0k with the short-weight vector cm, and computes a block of L partial filter outputs (rmk). Therefore, each IPU performs L inner-product computations of L rows of S0k with a common weight vector cm. The structure of the (m+1)th IPU is shown in Fig. 7(b). It consists of L number of L-point inner-product cells (IPCs). The (l+1)th IPC receives the (l+1)th row of S0k and the coefficient vector cm, and computes a partial result of inner product r (kL -1), for  $0 \le 1 \le L - 1$ . Internal structure of (1 + 1)th IPC for L = 4 is shown in Fig. 8(a). All the M IPUs work in parallel and produce M blocks of result (rm k). These partial inner products are

filters due to their large area complexity. Constant shift method (CSM) and programmable shift method.

India International Centre, New Delhi

(ICSTM-16)

15th May 2016, www.conferenceworld.in

ISBN: 978-81-932074-8-2

added in the PAU to obtain a block of L filter outputs. In each cycle, the proposed structure receives a block of L inputs and produces a block of L filter outputs, where the duration of each cycle is  $T = TM + TA + TFA \log 2$  L, TM is one multiplier delay, TA is one adder delay, and TFA is one full-adder delay.

#### 2.2 Drawbacks

- It provides only block performances
- High delay
- · Occupies high area

#### III. PROPOSED METHOD

There are several applications where the coefficients of FIR filters remain fixed, while in some other applications, like SDR channelizer that requires separate FIR filters of different specifications to extract one of the desired narrowband channels from the wideband RF front end. These FIR filters need to be implemented in a RFIR structure to support multi standard wireless communication. In this section, we present a structure of block FIR filter for such reconfigurable applications. In this section, we discuss the implementation of block FIR filter for fixed filters as well using MCM scheme.

#### 3.1 Proposed Structure

The proposed structure for block FIR filter is shown in Figure for the block size L=4. It consists of one coefficient selection unit (CSU), one register unit (RU), M number of inner product units (IPUs), and one pipeline adder unit (PAU). The CSU stores coefficients of all the filters to be used for the reconfigurable application. It is implemented using N ROM LUTs, such that filter coefficients of any particular channel filter are obtained in one clock cycle, where N is the filter length. The RU receives xk during the kth cycle and produces L rows of S0 k in parallel. L rows of S0 k are transmitted to M IPUs of the proposed structure. The M IPUs also receive M short-weight vectors from the CSU such that during the kth cycle, the (m+1)th IPU receives the weight vector cM-m-1 from the CSU and L rows of S0 k form the RU. Each IPU performs matrix-vector product of S0 k with the short-weight vector cm, and computes a block of L partial filter outputs (rm k). Therefore, each IPU performs L inner-product computations of L rows of S0 k with a common weight vector cm. In each cycle, the proposed structure receives a block of L inputs and produces a block of L filter outputs, where the duration of each cycle is  $T=TM+TA+TFA\log 2L$ , TM is one multiplier delay, TA is one adder delay, and TFA is one full-adder delay.

India International Centre, New Delhi

(ICSTM-16)

ISBN: 978-81-932074-8-2

15th May 2016, www.conferenceworld.in







3.1 proposed structures (a) Structure of RU (b) Structure of (m+1) IPU

#### 3.2 MCM-Based Implementation of Fixed-Coefficient FIR Filter

We discuss the derivation of MCM units for transpose form block FIR filter, and the design of proposed structure for fixed filters. For fixed-coefficient implementation, the CSU is no longer required, since the structure is to be tailored for only one given filter. Similarly, IPUs are not required. The multiplications are required to be mapped to the MCM units for a low-complexity realization. In the following, we show that the proposed formulation for MCM-based implementation of block FIR filter makes use of the symmetry in input matrix S0k to perform horizontal and vertical common sub expression elimination [17] and to minimize the number of shift-add operations in the MCM blocks.

India International Centre, New Delhi

(ICSTM-16)

15th May 2016, www.conferenceworld.in

ISBN: 978-81-932074-8-2



is no longer required, since the structure is to be tailored for only one given filter. The recurrence relation can be expressed as

$$Y(z)=z-1\cdot z-1(z-1rM-1+rM-2+rM-3)\cdot +r1+r0.$$

Where  $R = S0 k \cdot C$ 

Similarly, IPUs are not required. The multiplications are required to be mapped to the MCM units for a low-complexity realization. In the following, we show that the proposed formulation for MCM based implementation of block FIR filter makes use of the symmetry in input matrix  $\mathbf{S}0k$  to perform horizontal and vertical common sub expression elimination and to minimize the number of shift-add operations in the MCM blocks.

#### 3.3 Hardware and Time Complexities

The proposed structure for reconfigurable application consists of one CSU, one RU, M IPUs, and one PAU. The CSU consists of N ROM units of P words each, where P is the number of FIR filters to be implemented by the proposed reconfigurable structure. We have excluded complexity of CSU in the performance comparison, since it is common in all the RFIR structures. Each IPU is comprised of L IP cells, where each IP cell involves L multipliers and (L-1) adders. The RU involves (L-1) registers of B-bit width. The PAU involves (M-1) adders and the same number of registers, where each register has a width of  $(B+B_-)$ , B, and  $B_-$  respectively, being the bit width of input sample and filter coefficients. Therefore, the proposed structure involves LN multipliers, L(N-1) adders, and  $[B(N-1)+B_-(N-L)]$  (flip flops) FFs; and processes L samples in every cycle where the Duration of cycle period  $T = [TM + TA + TFA(\log 2L)]$ .

India International Centre, New Delhi

(ICSTM-16)

15th May 2016, www.conferenceworld.in

ISBN: 978-81-932074-8-2



3.2 Proposed MCM Structure

#### 3.4 Advantages

- Block and higher order N processing.
- Less area requirement
- Low delay

#### IV. SIMULATION RESULTS



India International Centre, New Delhi

(ICSTM-16) ISBN: 978-81-932074-8-2

15th May 2016, www.conferenceworld.in



#### 4.1 Area Comparison Of Existing Rca



#### V. CONCLUSION AND FUTURE WORK

In this paper, we have explored the possibility of realization of block FIR filters in transpose form configuration for area delay efficient realization of both fixed and reconfigurable applications. A generalized block formulation is presented for transpose form block FIR filter, and based on that we have derived transpose form block filter for reconfigurable applications. We have presented a scheme to identify the MCM blocks for

India International Centre, New Delhi

(ICSTM-16)

15th May 2016, www.conferenceworld.in

ISBN: 978-81-932074-8-2

horizontal and vertical sub expression elimination in the proposed block FIR filter for fixed coefficients to reduce the computational complexity. Performance comparison shows that the proposed structure involves significantly less ADP and less EPS than the existing block direct-form structure for medium or large filter lengths while for the short-length filters, the existing block direct-form structure has less ADP and less EPS than the proposed structure. Application-specific integrated circuit synthesis result shows that the proposed structure for block size 4 and filter length 64 involve 42% less ADP and 40% less EPS than the best available FIR filter structure of [10] for reconfigurable applications. For the same filter length and the same block size, the proposed structure involves 13% less ADP and 12.8% less EPS than that of the existing direct-from block FIR structure of [15].

#### VI. ACKNOWLEDGMENT

The authors would like to acknowledge the anonymous reviewers for their detailed comments and suggestions which helped to improve the quality of the paper.

#### **REFERENCES**

- [1] J. G. Proakis and D. G. Manolakis, *Digital Signal Processing: Principles, Algorithms and Applications*. Upper Saddle River, NJ, USA:Prentice-Hall, 1996.
- [2] T. Hentschel and G. Fettweis, "Software radio receivers," in *CDMA Techniques for Third Generation Mobile Systems*. Dordrecht, The Netherlands: Kluwer, 1999, pp. 257–283.
- [3] E. Mirchandani, R. L. Zinser, Jr., and J. B. Evans, "A new adaptive noise cancellation scheme in the presence of crosstalk [speech signals]," *IEEETrans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 39, no. 10, pp. 681–694, Oct. 1995.
- [4] D. Xu and J. Chiu, "Design of a high-order FIR digital filtering and variable gain ranging seismic data acquisition system," in *Proc. IEEESoutheastcon*, Apr. 1993, p. 1–6.
- [5] J. Mitola, Software Radio Architecture: Object-Oriented Approaches to Wireless Systems Engineering. New York, NY, USA: Wiley, 2000.
- [6] A. P. Vinod and E. M. Lai, "Low power and high-speed implementation of FIR filters for software defined radio receivers," *IEEE Trans. WirelessCommun.*, vol. 7, no. 5, pp. 1669–1675, Jul. 2006.
- [7] J. Park, W. Jeong, H. Mahmoodi-Meimand, Y. Wang, H. Choo, and K. Roy, "Computation sharing programmable FIR filter for low-power and high-performance applications," *IEEE J. Solid State Circuits*, vol. 39, no. 2, pp. 348–357, Feb. 2004.
- [8] K.-H. Chen and T.-D. Chiueh, "A low-power digit-based reconfigurable FIR filter," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 8, pp. 617–621, Aug. 2006.
- [9] R. Mahesh and A. P. Vinod, "New reconfigurable architectures for implementing FIR filters with low complexity," *IEEE Trans. Comput.-AidedDesign Integr. Circuits Syst.*, vol. 29, no. 2, pp. 275–288, Feb. 2010.
- [10] S. Y. Park and P. K. Meher, "Efficient FPGA and ASIC realizations of a DA-based reconfigurable FIR digital filter," *IEEE Trans. CircuitsSyst. II, Exp. Briefs*, vol. 61, no. 7, pp. 511–515, Jul. 2014.

India International Centre, New Delhi

(ICSTM-16)

15th May 2016, www.conferenceworld.in

ISBN: 978-81-932074-8-2

- [11] P. K. Meher, "Hardware-efficient systolization of DA-based calculation of finite digital convolution," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 8, pp. 707–711, Aug. 2006.
- [12] P. K. Meher, S. Chandrasekaran, and A. Amira, "FPGA realization FIR filters by efficient and flexible systolization using distributed arithmetic," *IEEE Trans. Signal Process.*, vol. 56, no. 7, pp. 3009–3017, Jul. 2008.
- [13] P. K. Meher, "New approach to look-up-table design and memorybasedrealization of FIR digital filter," *IEEE Trans. CircuitsSyst. I, Reg. Papers*, vol. 57, no. 3, pp. 592–603, Mar. 2010.
- [14] K. K. Parhi, *VLSI Digital Signal Processing Systems: Design and Implementation*. New York, NY, USA: Wiley, 1999.