# Comparative Analysis of Power Yield Improvement under Process Variation of Sub-Threshold Flip-Flops

Hassan Mostafa, Mohab Anis, and Mohamed Elmasry

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada N2L3G1 {hmostafa@uwaterloo.ca, manis@vlsi.uwaterloo.ca, and elmasry@uwaterloo.ca }

Abstract—In low power synchronous systems, sub-threshold flip-flops are used to reduce the total power dissipation. Moreover, process variations create a large variability in the flip-flop power in scaled technologies impacting the power yield, especially, for sub-threshold operation. This paper presents an analysis of power yield improvement of four commonly used flip-flops under process variations. These flip-flops are designed using STMicroelectronics 65-nm CMOS technology. The analyzed flipflops are compared for delay, energy, and energy-delay product (EDP) overheads to achieve this power yield improvement. The analysis shows that the sense amplifier based flip flop (SA-FF) has the lowest overheads while the modified clocked CMOS master slave flip-flop (M- $C^2MOS-MSFF$ ) exhibits the largest overheads, and correspondingly, it is not recommended for sub-threshold operation.

### I. INTRODUCTION

In modern digital synchronous systems, the total power dissipation is dominated by the flip-flops power dissipation. Voltage supply scaling is one of the most promising power reduction techniques for flip-flops circuits [1,2]. When the supply voltage,  $V_{DD}$ , is decreased below the transistor threshold voltage,  $V_t$ , the transistor is operating in the sub-threshold region [1]. Sub-threshold flip-flops are considered the most energy efficient solution for low power applications in which, performance is of secondary importance [2,3].

As CMOS technologies continue to scale towards the nanometer regime, the device parameters, such as threshold voltage, channel length, and mobility, will have large statistical process variations [4-7]. In addition, these process variations are increasing dramatically for sub-threshold circuits. Consequently, these process variations lead to large power consumption variability. Therefore, the deterministic design methodology should be replaced by the statistical design methodology to deal with this power variability [8].

Deterministic gate sizing tools size the sub-threshold circuits to optimize the energy-delay-product (EDP). However, due to random process variations, a large number of circuits might not meet the allowed power budget. Consider as an intuitive example, a flip-flop that is designed for optimum EDP, which exhibits a specific target power dissipation. Due to random process variations, the power dissipation, which is dominated by the sub-threshold leakage power that has an exponential relationship with  $V_t$ , is modeled by a log-normal distribution with the probability density function (pdf) shown in Figure 1. Here, 42% of the total number of flip-flops will not meet the desired target power constraint. Therefore, the flip-flops must be designed by using statistical sizing tools to improve the power yield. Moreover, the utilization of statistical sizing tools for power yield improvement is more appropriate for an efficient and fair comparison of sub-threshold flipflops, since the power yield is the main concern in low power applications. This paper provides a comparative analysis of power yield improvement under process variations of four different sub-threshold flip-flops circuits, especially for the power variability and the required timing and energy overheads for power yield improvement. These flip-flops represent different trade-off choices between performance and power dissipation.

The paper is organized as follows: Section II introduces the selected flip-flops. Sections III and IV describe the simulation setup and the simulation results, respectively. Finally, some conclusions are drawn in Section V.



Fig. 1. The power pdf due to process variations under deterministic gate sizing algorithms.

#### **II. FLIP-FLOPS SELECTION**

Four different flip-flops are selected to represent the various trade-off choices between performance and power dissipation. Figure 2 and 3 depict the transmission-gate master-slave flipflop (TG-MSFF) and the modified clocked CMOS masterslave flip-flop (M-C<sup>2</sup>MOS-MSFF), respectively. Both of them are implemented by cascading two complementary latches. This master-slave implementation results in robust flip-flop with a good hold time behavior. Moreover, they are used in standard libraries [9,10] which makes it so important to include them in this comparison. Figure 4 shows one of the fastest flipflops, a semi-dynamic flip-flop (SD-FF) [11]. This flip-flop can be considered as a pulsed latch, since it samples the input data to the flip-flop output during a very short transparency period around the clock sampling edge. Accordingly, the input data can arrive after the clock edge which results in a negative setup time. This flip-flop circuit is modified from that in [11]

978-1-4244-5309-2/10/\$26.00 ©2010 IEEE

by adding two additional inverters into the delayed clock signal as shown in the dotted rectangle in Figure 4. This modification is to allow enough transparency period length for the subthreshold flip-flop sampling. Figure 5 denotes a sense-amplifier based flip-flop (SA-FF) with a NAND SR-latch [1]. This flipflop can be viewed as a compromise between the master-slave robustness and pulsed latches high performance.



Fig. 2. The Transmission Gate based Master-Slave Flip-Flop (TG-MSFF)



Fig. 3. The Modified Clocked CMOS Master-Slave Flip-Flop (M-C<sup>2</sup>MOS-MSFF)



Fig. 4. The Pulsed Semi-Dynamic Flip-Flop (SD-FF) after the addition of two inverters into the delayed clock signal (dotted rectangle). This modification is to allow enough transparency period length for the sub-threshold SD-FF flip-flop sampling period.

## **III. SIMULATION PROCEDURE AND SETUP**

## A. Optimum EDP Design

All flip-flops are optimized for minimum EDP by using a STMicroelectronics 65-nm CMOS technology transistor model, a typical process corner, a clock frequency of 1 MHz



Fig. 5. The Sense-Amplifier based Flip-Flop (SA-FF)

and pseudorandom input data with a 50% data activity [9]. The measured EDP is obtained by multiplying the square of the data-to-output delay and the total power consumption which includes both the internal power dissipation and the local clock/data power dissipation [9]. The optimum setup time for each flip-flop is determined to achieve minimum EDP. The optimization process is conducted by using the CFSQP (C Version Feasible Sequential Quadratic Programming) optimization technique, implemented in Spectre-RF.

#### B. Impact of Process Variations on Flip-Flop Power

Monte Carlo analysis, including the mismatch between transistors is performed on the flip-flops at the optimal EDP point as a starting point. An industrial hardware-calibrated statistical STMicroelectronics 65-nm CMOS transistor model is used in this Monte Carlo analysis. In this model, the transistor parameters such as the threshold voltage and the channel length are modeled by a normal distribution within the  $\pm 3\sigma$  design space. The number of the Monte Carlo analysis points used is 4000 points. The delay, power, energy, and EDP variability are then obtained.

#### C. Functional Yield Improvement using Setup Time Margin

The optimum setup time determined in Subsection III.A, is obtained by using a certain process corner to minimize the EDP. This results in a poor functional yield, since the setup time constraint of some of the flip-flop simulated Monte Carlo points is violated and correspondingly, these flip-flops samples are not functioning. A setup time margin is added to achieve a functional yield greater than 99.9% [4,10]. This setup time margin is determined by sweeping the setup time and calculating the functional yield and the EDP. The setup time that achieves a functional yield greater than 99.9% and minimum EDP is selected.

#### D. Power Yield Improvement using Gate Sizing

The power variability is obtained from the Monte Carlo simulations by adopting the modified setup time. A simplified gate sizing algorithm is employed. It is similar to that in [12] but the Lagrangian Relaxation (LR) optimization technique is replaced by the CFSQP optimization technique, implemented in Spectre RF. The algorithm starts with a given power constraint (P<sub>o</sub>) and power yield constraint (Y<sub>o</sub>), where (P<sub>o</sub>) is the optimal power obtained at minimum EDP. Then, the gate sizing values obtained for the minimum EDP are used as an initial gate sizing values. Monte Carlo statistical analysis is then applied to obtain the power variability. The standard deviation ( $\sigma$ ) of the obtained power distribution is calculated. Using the power log-normal distribution mean, P<sub>o</sub>, and standard deviation,  $\sigma$ , the equivalent power's natural logarithm (lnP<sub>o</sub>) normal distribution mean and variance,  $\mu_{ln}$ and  $\sigma_{ln}$ , respectively, are given by [13]:

$$\mu_{ln} = \ln(\frac{P_o}{\sqrt{1 + \frac{\sigma^2}{P_o^2}}}) \quad \text{and} \quad \sigma_{ln} = \sqrt{\ln(1 + \frac{\sigma^2}{P_o^2})} \qquad (1)$$

Following that, the geometric mean and standard deviation of the log-normal distribution,  $\mu_g$  and  $\sigma_g$  are calculated as follows [13]:

$$\mu_q = \exp(\mu_{ln})$$
 and  $\sigma_q = \exp(\sigma_{ln})$  (2)

In order to ensure that the power dissipation log-normal distribution integral from 0 to the desired power constraint  $P_o$  equals the desired power yield  $Y_o$ , the power distribution pdf has to be shifted from  $P_o$  to  $P_o'$  by using statistical gate sizing where  $P_o'$  is given by [13]:

$$P_o^{'} = \frac{\mu_g}{(\sigma_g)^n} \tag{3}$$

where n is dependent on the target power yield value  $(Y_{\alpha})$ and can be obtained from the normal distribution tables. For example, in this paper, a power yield of 99.87% ( $Y_o = 99.87\%$ ) is required which means that "n" must equal 3.0 from the normal distribution tables. Following the calculation of  $(\mathbf{P}_{o})$ , an optimization problem is solved by employing CFSQP to determine the new gate sizing that matches the power  $(\mathbf{P}_{o})$ and minimizes the delay and energy overheads. These steps are repeated, until the power yield constraint is achieved. It should be emphasized that the power pdf changes after each iteration because the variations in the threshold voltage are a strong function of the transistor width [12]. Figure 6 illustrates how this gate sizing algorithm improves the power yield by shifting the power pdf to a shorter mean power. Finally, the associated delay, energy, and EDP overheads with the power yield improvement sizing scenario are calculated.

## IV. SIMULATION RESULTS AND DISCUSSION

Figure 7 portrays the optimal values for the power, delay, energy, and EDP for each flip-flop at the minimum EDP point obtained in Section III.A for different values of the supply voltage,  $V_{DD}$ . It is evident from Figure 7 that the TG-MSFF exhibits the lowest power, delay, energy, and EDP among all other flip-flops. The SD-FF has the largest power, energy, and EDP. The M-C<sup>2</sup>MOS-MSFF introduces the largest delay.

Figure 8 shows the simulation results after applying the power yield improvement technique for different values of the supply voltage,  $V_{DD}$ . Figure 8.a shows the new power



Fig. 6. The power yield improvement under process variations employing gate sizing. The dotted pdf represents the power pdf of the power yield improved sub-threshold flip-flops (power yield = 99.9%) while the solid pdf represents the power pdf of the minimum EDP sub-threshold flip-flops (power yield = 58%)



Fig. 7. Minimum EDP simulation results for the four selected flip-flops (a) Optimal power  $(P_o)$ , (b) Optimal delay, (c) Optimal energy, and (d) Optimal EDP.

dissipation mean,  $P_o'$  and Figures 8.b, 8.c, 8.d show the delay, energy, and EDP values calculated after adopting the power yield improvement. It is clear from Figure 8 that the TG-MSFF is still showing the lowest values of power, delay, energy, and EDP even after adopting the power yield improvement technique. The M-C<sup>2</sup>MOS-MSFF exhibits the largest delay, energy, and EDP which means that this flip-flop requires large overheads to achieve the target power yield improvement. Therefore, the M-C<sup>2</sup>MOS-MSFF is not recommended for subthreshold operation as it requires large overheads to achieve the target power yield.

Figure 9 shows the delay, energy, and EDP overheads for all flip-flops when  $V_{DD} = 0.15V$ . According to this figure, the power yield improved SA-FF exhibits the lowest delay, energy, and EDP overheads among all other flip-flops and following it is the TG-MSFF. However, the absolute value of these overheads are lower in the TG-MSFF as shown in Figure 8. For example, the delay overhead of the SA-FF (when  $V_{DD}$ = 0.15) is 1.8X while that of the TG-MSFF is 2.9X, however, the absolute delay of the SA-FF is 438 nsec while that of the TG-MSFF is 184 nsec. The M-C<sup>2</sup>MOS-MSFF exhibits the largest overheads in all parameters. The delay, energy, and EDP overheads of the M-C<sup>2</sup>MOS-MSFF flip-flops are higher than that of the SA-FF by factors of 8X, 270X, and 47X, respectively, when  $V_{DD} = 0.15V$ .

Figure 10 shows the delay versus the power space for the improved power yield flip-flops when  $V_{DD} = 0.2V$ . It is evident that all flip-flops samples achieve a power yield larger than 99.9%.



Fig. 8. Power yield improved simulation results for the four selected flipflops (a) Power mean  $(P_o^{\prime})$ , (b) Delay mean, (c) Energy mean, (d) and EDP mean.



Fig. 9. The power yield improvement associated normalized overheads for  $V_{DD}$  = 0.15V. These overheads are normalized to their nominal values.

## V. CONCLUSION

A comparative analysis of improved power yield four commonly used flip-flop topologies is introduced. The SA-FF exhibits the lowest overheads in delay, energy, and EDP, however, the M-C<sup>2</sup>MOS-MSFF flip-flop has the largest overheads. The M-C<sup>2</sup>MOS-MSFF delay, energy, and EDP overheads are higher than that of the SA-FF by factors of 8X, 270X, and 47X, respectively, when  $V_{DD} = 0.15V$ . These results recommend the utilization of the SA-FF for sub-threshold operation. In addition, the results show that the M-C<sup>2</sup>MOS-MSFF flip-flop is not recommended to be used in the subthreshold region.



Fig. 10. The delay-power scattered plot for (a) TG-MSFF, (b)  $M-C^2MOS-MSFF$ , (c) SD-FF, and (d) SA-FF

#### REFERENCES

- J. M. Rabaey, A. Chandrakasan, and B. Nikolic, "Digital Integrated Circuits A design Prespective," second edition, Prentice Hall, 2002.
- [2] H. P. Alstad, S. Aunet, "Three Subthreshold Flip-Flop Cells Characterized in 90 nm and 65 nm CMOS Technology," *Proceedings of the IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems* (DDECS'08), pp. 1-4, 2008.
- [3] A. Wang, and A. Chandrakasan, "A 180 mV Sub-threshold FFT Processor Circuits,"*IEEE Journal for Solid State Circuits (JSSC)*, vol. 40, pp. 310-319, 2005.
- [4] H. Mostafa, M. Anis, and M. Elmasry, "Comparative analysis of Timing Yield Improvement under Process Variations of Flip-Flops Circuits", *Proceedings of the IEEE International Symposium on Very Large Scale Integration (ISVLSI'09)*, PP. 133-138, 2009.
- [5] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, "Parameter Variations and Impact on Circuits and Microarchitecture," *Proceedings of the 40<sup>th</sup> Conference on Design Automation (DAC '03)*, pp. 338-342, 2003.
- [6] S. Borkar, T. Karnik, and V. De, "Design and Reliability Challenges in Nanometer Technologies," *Proceedings of the 41<sup>st</sup> Conference on Design Automation (DAC '04)*, pp. 75-75, 2004.
- [7] K. Bowman, S. Duvall, and J. Meindl, "Impact of Die-to-die and Withindie Parameter Fluctuations on the Maximum Clock Frequency Distribution for Gigascale Integration," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 2, pp. 183-190, 2002.
- [8] F. N. Najm, "On the Need for Statistical Timing Analysis," Proceedings of the 42nd Conference on Design Automation (DAC '05), pp. 764-765, 2005.
- [9] V. Stojanovic, and V. G. Oklobdzija, "Comparative Analysis of Master-Slave Latches and Flip-Flops for High Performance and Low Power Systems," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 4, pp. 536-548, 1999.
- [10] M. Hansson, and A. Alvandpour, "Comparative Analysis of Process Variation Impact on Flip-Flop Power-Performance," *Proceedings of the* 2007 IEEE Symposiums on Circuits and Systems (ISCAS 2007), pp. 3744-3747, 2007.
- [11] F. Klass, "Semi-Dynamic and Dynamic Flip-Flops with Embedded Logic for High Performance Processors," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 5, pp. 712-716, 1999.
- [12] S. H. Choi, B. C. Paul, and K. Roy, "Novel Sizing Algorithm for Yield Improvement Under Process Variation in Nanometer Technology, "Proceedings of the 41<sup>st</sup> Conference on Design Automation (DAC '04), pp. 454-459, 2004.
- [13] E. L. Crow, and K. Shimizu, "Lognormal Distribution: Theory and Applications," *Statistics, Textbooks, and Monographs*, vol. 88, 1988.