# Statistical Design Framework of Submicron Flip-Flop Circuits Considering Process Variations

Sayed Alireza Sadrossadat, Hassan Mostafa, Student Member, IEEE, and Mohab Anis, Senior Member, IEEE

Abstract—In this paper, a framework for the statistical design of the flip-flops circuits is proposed to achieve a high yield, while meeting the performance, leakage power, switching power, and layout area design specifications. The proposed design solution provides the nominal design parameters, i.e., the widths and lengths of the flip-flop transistors, which provide maximum immunity to the process variations in the transistor dimensions and threshold voltage. The proposed framework shows that for a given flip-flop design specifications, a certain yield can be achieved. To further increase this yield, the proposed framework shows which design specifications should be relaxed. The transmission gate-based master-slave flip-flop is selected as a design case study in this paper, however, the proposed framework is applicable to any other flip-flop circuit in the nanometer regime.

*Index Terms*—Design framework, flip-flops, nanometer regime, process variations, yield maximization.

#### I. INTRODUCTION

**I** N MODERN digital synchronous systems, the demand for higher performance has moved the clock frequencies up to multi-GHz in microprocessors and other advanced very largescale integrated applications. These increased clock frequencies lead to very deep pipelining which means that hundreds of thousands of flip-flops are required to control the data flow under strict timing constraints. A violation of the timing constraints at a flip-flop may result in latching incorrect data causing the overall system to malfunction [1], [2].

Moreover, the continued complementary metal-oxidesemiconductor (CMOS) technology scaling toward the nanometer regime causes the transistor parameters, such as threshold voltage, channel length, mobility, and oxide thickness, to have large statistical process variations [3]–[11]. Consequently, these process variations result in delay uncertainty. Thus, the deterministic design methodologies should be replaced by the statistical design methodologies [1], [8]– [13]. Process variations can be classified as die-to-die (D2D) variations or within-die (WID) variations. In D2D variations, all devices on the same die are assumed to have the same

Manuscript received January 27, 2010; revised July 30, 2010; accepted September 1, 2010. Date of publication September 27, 2010; date of current version February 4, 2011.

S. A. Sadrossadat is with the Department of Electronics, Carleton University, Ottawa, ON K1S 5B6, Canada (e-mail: sasadros@uwaterloo.ca).

H. Mostafa is with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail: hmostafa@uwaterloo.ca).

M. Anis is with the Department of Electronics Engineering, American University in Cairo, New Cairo 11835, Egypt (e-mail: manis@vlsi.uwaterloo.ca). Color versions of one or more of the figures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSM.2010.2080693

parameters. However, devices on the same die are assumed to behave differently for WID variations [4]. Although D2D variations were originally considered as the major source of process variations, WID variations have now become the main design challenge as technology scales [1], [6]–[11].

Deterministic gate-sizing tools size the circuits to optimize the power-delay product. However, due to process variations, a large number of circuits might not meet the target delay. Therefore, the flip-flops have to be designed using statistical sizing tools to improve the timing yield [1], [3], [14]–[17].

Recently, researchers have attempted to adopt the statistical design methodologies for timing yield improvement [1], [14], [15], [18], [19]. However, some of this research have the following drawbacks.

- The utilization of Monte-Carlo analysis tools which are time consuming and not scalable in terms of technology scaling (i.e., Monte-Carlo analysis must be conducted again for each new CMOS technology generation).
- 2) Only one design constraint is considered like the circuit delay constraint. However, there are many other design constraints, such as switching power, leakage power, and layout area, which should be included. It should be more practical to optimize the overall parameters yield than optimizing only one design constraint yield.
- 3) Most of this research provide only the minimum required power overhead to achieve this timing yield improvement ignoring whether this required power overhead is within the allowed power budget constraint or not.

In the Monte-Carlo method, the whole design is simulated thousands of times, which is very time consuming. However, in this methodology, only one sequential quadratic programming (SQP) optimizer run is used. For example, 5000 Monte-Carlo runs of the flip-flop case study take 2.5 h while the SQP optimizer run takes 3 min. Accordingly, the proposed SQP methodology shows better simulation time than that of the Monte-Carlo method by a factor of 50X.

Although this paper considers both D2D and WID variations, the focus is mainly on WID variations. This is because from a circuit perspective, WID variations are much more complex and difficult to be modeled than D2D variations. The D2D variations can be easily modeled by using cornerbased models. However, WID variations require accounting for each device parameter separately using statistical design methodologies. In fact, the variability in the flip-flop design metrics, such as the setup time, hold time, delay, layout area, and power is caused by the WID variations which, in turn, are related to the flip-flop sizing. Therefore, to meet the specifications for all the flip-flop design metrics constraints, the widths and lengths of the flip-flop transistors must be chosen optimally [20], [21]. The impact of variability on the design metrics should be considered up front during the design phase, to maximize the overall parametric yield considering all the design metrics constraints [21].

According to [21], the current industrial practice is to first develop a database, by simulations, which characterizes the design metrics for various transistor sizes. This is used to carefully select the sizes of the flip-flop transistors. Monte-Carlo simulations are performed for the selected design to verify if the variations in the design metrics such as the setup time are meeting the design constraints. The selected design is updated and Monte-Carlo simulations are performed iteratively, till all the design metrics meet the design constraints.

In this paper, a statistical design framework to determine the optimal size of the flip-flop transistors is proposed. The proposed framework is systematic, time efficient when compared to the time-consuming Monte-Carlo analysis tools, scalable with CMOS technology and can be used for future technology prediction and optimization, and maximizes the overall parametric yield considering all the flip-flop design metrics constraints. In addition, the proposed methodology formulation has the flexibility to tune the design as per the design specifications, as demonstrated in Section IV.

The rest of this paper is organized as follows. In Section II, some background on the timing characteristics of the flip-flop circuits as well as the process variations main sources is introduced. Section III formulates the statistical design problem and describes the yield optimization methodology. Simulation results and discussions are given in Section IV. Finally, some conclusions are drawn in Section V.

## II. BACKGROUND

### A. Timing Characteristics of the Flip-Flops Circuits

A clock signal is used in clocked registers to control the timing of the data latching process. These clocked registers can be classified into latches and flip-flops. Latches are described as level-sensitive registers, because the input data is latched when the clock signal maintains a specific voltage level. Flipflops are called edge-triggered registers, since the input data is latched by a transition edge in the clock signal waveform. The flip-flop can sample the input data correctly if the following constraints are satisfied.

- 1) Setup time  $(T_{setup})$  is defined as the minimum time that the input data should be available before the clock sampling edge arrival.
- 2) Hold time  $(T_{hold})$  is defined as the minimum time that the input data should be available after the clock sampling edge.

The timing relationships among the input data, clock signal, and output data of a flip-flop can be obtained by the following timing characteristics [1], [2].

- 1) Clock-to-output delay  $(T_{Clk-Q})$  represents the delay from the sampling clock edge (Clk) to the time the latched data is valid at the output (Q).
- 2) Data-to-output delay  $(T_{D-Q})$  represents the delay from a transition of the input data (D) to the time the latched data is valid at the output (Q). This delay is determined as the sum of the setup time  $(T_{setup})$  and the clock-tooutput delay  $(T_{Clk-Q})$ .

# B. Process Variations

Process variations affect device parameters, resulting in fluctuations in the flip-flop design metrics. The primary sources of process variations that affect the device parameters are as follows.

1) *Random dopant fluctuations (RDF):* It has been demonstrated that the threshold voltage variation, due to RDF, is normally distributed, and its standard deviation is modeled [23], [24] as follows:

$$\sigma_{V_{t}, RDF} = \frac{A_o}{\sqrt{WL}} \tag{1}$$

where  $A_o$  is a technology-dependent parameter, and W and L are the channel width and length of the transistor, respectively. It is clear from (1) that  $\sigma_{V_t, RDF}$  is inversely proportional to the square root of the transistor active area. Therefore, these variations can be mitigated by sizing the transistors up at the expense of more power consumption and layout area overhead.

2) Channel length variations: For sub-90 nm nodes, optical lithography utilizes light sources with wavelengths much larger than the minimum feature sizes for the technology [23]. Therefore, controlling the critical dimension (CD) at these technology nodes is so difficult. The variation in CD (i.e., the channel length of the transistor) impacts, directly, the transistor threshold voltage,  $V_t$ . In short channel devices,  $V_t$  has an exponential dependence on the channel length *L* due to charge sharing and drain-induced barrier lowering (DIBL) effects [23], [24] expressed as follows:

$$V_t \approx V_{to} - (\zeta + \eta V_{DS}) \exp(-L/L_{to})$$
(2)

where  $V_{to}$  is the long channel threshold voltage,  $\zeta$  is the charge sharing coefficient,  $L_{to}$  is the characteristic length, and  $\eta$  is the DIBL coefficient. As a result, a slight variation in L introduces a large variation in  $V_t$ due to the exponential dependence described in (2).

Although the RDF and channel length variations are considered the dominant sources of device variations [23], there are many other sources such as line edge roughness (LER) and oxide charge variations [24]. In the following analysis, we consider only the RDF and channel length variations since these two sources have the dominant contribution to WID variations [23].

## **III. PROBLEM FORMULATION**

In this section, the proposed statistical design framework has been proved on the transmission gate-based master-slave flipflop (TG-MSFF) shown in Fig. 1 as a case study circuit. This



Fig. 1. TG-MSFF.

TG-MSFF is a combination of two level-sensitive latches. The advantages of this flip-flop are simplicity and good hold time behavior [25]. Therefore, this flip-flop is used in the PowerPC 603 low-power processor and in the digital libraries, which makes it very suitable to be used as a case study for our proposed framework.

In this problem, the TG-MSFF design metrics, such as layout area (Area), setup time ( $T_{setup}$ ), delay ( $T_{D-Q}$ ), dynamic (switching) power ( $P_{dynamic}$ ), and leakage (standby) power  $(P_{leakage})$ , are considered the main design constraints for the proposed methodology. In the following section, the analytical formulas used to calculate these five design constraints in the proposed overall parametric yield optimization methodology are provided. It is worth noting that the TG-MSFF exhibits a good hold time behavior, and, therefore, its hold time constraint is always satisfied. Thus, the hold time constraint is not included in this case study, TG-MSFF flip-flop. Typically, there are two main categories of flip-flops. The first category is the master-slave flip-flops, which has a good hold time behavior (i.e., most of this category flip-flops allow zero hold time to be used), and, therefore, the hold time constraint is always satisfied. On the contrary, the second category, which is the pulsed flip-flops, has good setup time behavior (i.e., most of this category flip-flops allow negative setup time to be used) and, accordingly, the setup time constraint is always satisfied in this second category. However, this pulsed flipflops category has a strict hold time constraints. Thus, for the master-slave flip-flops category, only the setup time constraints are considered and only the hold time constraint is taken into account for the pulsed flip-flops category.

#### A. Design Metrics Constraints Analytical Formulas

In this paper, the simplified delay models, used in the proposed methodology, are compared with the simulation program with integrated circuit emphasis (SPICE) transient simulations referring to the full device models for different supply voltage, transistors sizing, and load capacitances. This comparison shows a good agreement between the simplified delay models results and the SPICE transient simulation results with a maximum error of 18.7% and an average error of 11.4%. This error can be further reduced by utilizing more accurate delay models. In addition, 50%–50% propagation delay and velocity saturation region are assumed for 45 nm technology [26], [27]. Moreover, a double-bounded probability density function (DB-PDF) is assumed for each of the design variables and all the design variables are assumed to be independent and symmetrical.

 Setup time (T<sub>setup</sub>): The setup time delay is given by [22] as follows:

$$T_{setup} = T_{I1} + T_{I3} + T_{T1} + \text{maximum}(T_{I2}, T_{I4})$$
(3)

where  $T_{11}$ ,  $T_{12}$ ,  $T_{13}$ , and  $T_{14}$  represent the propagation delays of inverters I1, I2, I3, and I4, respectively.  $T_{T1}$  is the transmission gate TX1 propagation delay. The maximum operator is used since due to process variations, inverters I2 and I4 sizing may not be identical.

2) *Total delay*  $(T_{D-Q})$ : The total delay is given by [22] as follows:

$$T_{D-Q} = T_{setup} + T_{I6} + T_{T3} \tag{4}$$

where  $T_{16}$  and  $T_{T3}$  represent the propagation delays of inverter I6 and transmission gate TX3, respectively. Thus, the delays of the inverters and the transmission gates must be calculated. In this analysis, 45 nm CMOS technology transistor models are adopted. Accordingly, the operating region of the transistors is assumed to be velocity saturation region.

According to [22], the propagation delays of the inverters and the transmission gates are given by the following:

$$T_{pass(j)} = 0.69 \times (R_{eq-pass(j)}) \times (C_{o-pass(j)})$$
(5)

and

$$T_{inv(i)} = 0.69 \times R_{eq-inv(i)} \times C_{o-inv(i)}$$
(6)  
for  $(1 \le i \le 6)$  and  $(1 \le j \le 4)$ 

where  $R_{eq-pass(j)}$  and  $R_{eq-inv(i)}$  are the equivalent resistances of the transmission gate Tj  $(1 \le j \le 4)$  and the inverter Ii  $(1 \le i \le 6)$ , respectively, and  $C_{o-pass(j)}$  and  $C_{o-inv(i)}$  are the output capacitances of the transmission gate Tj  $(1 \le j \le 4)$  and the inverter Ii  $(1 \le i \le 6)$ , respectively [22].

The output capacitance of any transmission gate or any inverter is calculated by summing up all the capacitances connected to its output node. These capacitances are calculated by adopting the equations in [22] as follows:

$$C_{o-inv(i)} = C_{int} + C_{ox} \left( L_{ni} W_{ni} + L_{pi} W_{pi} \right)$$
(7)  

$$C_{o-pass(j)} = \left( C_{jd} L_s \left( W_{nj} + W_{pj} \right) \right)$$
  

$$+ C_{jsw} \left[ \left( 2L_s + W_{nj} \right) + \left( 2L_s + W_{pj} \right) \right]$$
  

$$+ \frac{1}{2} C_{ox} \left( L_{nj} W_{nj} + L_{pj} W_{pj} \right)$$
(8)  
for  $(1 \le i \le 6)$  and  $(1 \le j \le 4)$ 

where  $C_{int}$  is the inverter intrinsic diffusion capacitance,  $C_{ox}$  is the gate oxide capacitance per unit area,  $C_{jd}$ is the junction capacitance per unit area,  $C_{jsw}$  is the sidewall capacitance per unit length,  $L_s$  is the junction length,  $L_{ni}$ ,  $L_{pi}$ ,  $W_{ni}$ , and  $W_{pi}$  are the gate length of the n-channel metal-oxide-semiconductor (NMOS), the gate length of the p-channel metal-oxide-semiconductor (PMOS), the gate width of the NMOS, and the gate width of the PMOS of inverter Ii ( $1 \le i \le 6$ ), respectively. Also,  $L_{nj}$ ,  $L_{pj}$ ,  $W_{nj}$ , and  $W_{pj}$  are the gate length of the NMOS, the gate length of the PMOS, the

$$R_{eq-inv(i)} = \frac{1}{2} [R_{eqn(i)} + R_{eqp(i)}]$$
where
$$R_{eq(n/p)(i)} = \frac{0.75}{I_{ds(n/p)(i)}} V_{DD} (1 - \frac{7}{9} \lambda_{(n/p)} V_{DD})$$
and
(9)
$$I_{ds(n/p)(i)} = K_{n/p} \frac{W_{ni/pi}}{L_{ni/pi}} V_{DSAT(n/p)}$$

$$\times [V_{DD} - V_{to(n/p)} - \frac{V_{DSAT(n/p)}}{2}]$$

where  $I_{ds(n/p)(i)}$  is the transistor drain-to-source current,  $R_{eq(n/p)(i)}$  is the transistor equivalent resistance,  $V_{DD}$  is the power supply,  $\lambda_{(n/p)}$  is the transistor channel length modulation,  $V_{DSAT(n/p)}$  is the transistor velocity saturation voltage,  $V_{to(n/p)}$  is the transistor threshold voltage, and  $K_{n/p}$  is a technological parameter of the transistor. The subscript *n* and *p* refer to the inverter NMOS and PMOS transistors, respectively. The subscript *i* refers to inverter I*i* ( $1 \le i \le 6$ ).  $R_{eq-inv(i)}$  is the total equivalent resistance of inverter I*i*.

Similarly, the total resistance of the transmission gate is given by [22] as follows:

$$R_{(n/p)(j)} = \frac{1}{K_{n/p} \frac{W_{nj/pj}}{L_{nj/pj}} [V_{DD} - V_{to(n/p)}]}$$
  
and  
$$R_{eq-pass(j)} = \frac{R_{n(j)} \times R_{p(j)}}{[R_{n(j)} + R_{p(j)}]}$$
(10)

where  $R_{(n/p)(j)}$  is the resistance of the PMOS or NMOS transistors of transmission gate TX *j*, and  $R_{eq-pass(j)}$  is the total resistance of transmission gate TX *j*.

By using (3)–(10), the inverter delay,  $T_{li}$ , and the transmission gate delay,  $T_{Tj}$ , are obtained. Correspondingly, the setup time,  $T_{setup}$ , and the total delay of the flip-flop,  $T_{D-Q}$ , are calculated.

3) Dynamic (switching) power ( $P_{dynamic}$ ): The dynamic power of the flip-flop circuit is given as follows:

$$P_{dynamic} = V_{DD}^2 \times f \times \sum_{n} (\alpha_i \ C_i)$$
(11)

where  $V_{DD}$  is the supply voltage, f is the operating frequency, n is the number of nodes in the flip-flop circuit (n equals 11 in the flip-flop case study),  $\alpha_i$  and  $C_i$  are the activity factor and parasitic capacitance at node i, respectively.

 Leakage power (P<sub>leakage</sub>): The leakage power of the flipflop circuit is given as follows:

$$P_{leakage} = V_{DD} \times \sum I_{leakage} \tag{12}$$

where  $I_{leakage}$  is the leakage current of the transistors when operating in the subthreshold region. The leakage

current of a transistor in the OFF state is given by [26] and [28] as follows:

$$I_{leakage} = \left[\mu_o \ C_{ox} \ \frac{W_i}{L_i} \ V_T^2 \ e^{1.8}\right] \times e^{\frac{(V_{GS} - V_{io})}{s \times V_T}}$$
(13)

where  $V_{GS}$  is the transistor gate-to-source voltage, *s* is the subtreshold coefficient,  $V_T$  is the thermal voltage which is approximately 26 mV at room temperature, and  $\mu_o$  is the transistor effective mobility.

5) *Layout area* (*Area*): The flip-flop has different layout implementations [29], [30]. The layout implementation from [29] is used in this paper. The *x* and *y* dimensions of the flip-flop layout are calculated as a function of the layout rules as follows:

$$Area = x_{dim} \times y_{dim}$$

$$x_{dim} = 2 \times \text{maximum}(PC, NC) + 18 \times GC$$

$$+9 \times CC + \text{maximum}(\Sigma L_p, \Sigma L_n)$$

$$y_{dim} = 2 \times CW + MM + PN$$

$$+\text{maximum}(W_n) + \text{maximum}(W_p) \quad (14)$$

where *PC* is the p-diffusion to contact spacing, *NC* is the n-diffusion to contact spacing, *GC* is the gate to contact spacing, *CC* is the contact to contact spacing,  $\Sigma L_p$  is the sum of all the PMOS transistors lengths,  $\Sigma L_n$  is the sum of all the NMOS transistors lengths, *CW* is the contact width, *MM* is the metal to metal spacing, and *PN* is the p-diffusion to n-diffusion spacing.

## B. Classification of the Proposed Methodology Parameters

The flip-flop design is constrained by the specifications for its design metrics such as  $T_{setup}$ ,  $T_{D-Q}$ ,  $P_{dynamic}$ ,  $P_{leakage}$ , and *Area*. Each of these metrics constraints should be met for a range of environmental parameters (supply voltage, temperature), design parameters (transistor width and length), and statistical parameters (process variations parameters such as  $V_t$ ).

- Environmental parameters: These are often more critical and can be accounted for by evaluating the design metrics at their respective worst-case operating conditions. For example, the leakage is the worst at high temperature (subthreshold leakage being the main leakage current component) and high supply voltage. The number of performance corners and the voltage and temperature for each performance corner are determined by the intended set of applications. For example, in portable applications, the operating temperature is lower than that for high-performance applications. In this paper, the performance corner is low supply voltage and high temperature while the low-leakage power corner is high supply voltage and high temperature.
- 2) Design parameters: These are the widths and lengths of the flip-flop transistors. In this paper, the D2D variations in the widths and lengths of transistors are considered. Since the gate length impacts  $V_t$  significantly (2), D2D threshold voltage variations are also accounted for, implicitly [31].

3) Statistical parameters:  $V_t$  is the most significant statistical parameter. Because of the small layout area of the flip-flops circuits and the close proximity of the transistors, the effect of the WID variations in the channel length and width is negligible [20]. Therefore, in this paper, the random WID  $V_t$  variations due to RDF is considered as the main source of WID variations. The  $V_t$ variations of all the flip-flop transistors are considered to be independent and un-correlated Gaussian random variables [20]. According to (1), the random WID  $V_t$ variations, due to RDF, are inversely proportional to the square root of the transistor area. Therefore, the transistor sizing has a significant impact on these variations.

#### C. Statistical Yield Maximization

1) Problem Characterization: In the proposed case study, the TG-MSFF, there are five design metrics. The upper bound of these design metrics are denoted by Area – Max,  $\mu_{T_s}$ ,  $\mu_{T_d}$ ,  $\mu_{P_{dyn}}$ , and  $\mu_{P_{leak}}$  for the Area,  $T_{setup}$ ,  $T_{D-Q}$ ,  $P_{dynamic}$ , and  $P_{leakage}$ , respectively. The mean values of these design metrics excluding Area are calculated by using SPICE transient simulations and the tolerance percent for them is supposed to be  $\pm 3\sigma$  [21]. As a result, the constraints are attained as follows:

- 1.  $Area \leq Area Max$
- 2.  $T_{Setup} + 3 \times \sigma_{T_{setup}} \leq \mu_{T_s}$
- 3.  $T_{D-Q} + 3 \times \sigma_{T_{D-Q}} \leq \mu_{T_d}$
- 4.  $P_{dynamic} + 3 \times \sigma_{P_{dynamic}} \le \mu_{P_{dyn}}$
- 5.  $P_{leakage} + 3 \times \sigma_{P_{leakage}} \leq \mu_{P_{leak}}$ .

These constraints are shown in Fig. 2. Different values of  $\mu_{T_s}$ ,  $\mu_{T_d}$ ,  $\mu_{P_{dyn}}$ , and  $\mu_{P_{leak}}$  are used to design the TG-MSFF on the general-purpose high performance corner and on the general-purpose low leakage power corner. The value of *Area* – *Max* is set to 1.46 $\mu$ m<sup>2</sup> by following the flip-flops scaling trends in [32].

The delay and dynamic power distributions are modeled to be Gaussian distributions similar to [20] and [21]. The leakage power exhibits a log-normal distribution (not a Gaussian distribution) with  $V_t$  variations. However, the usage of the central limit theorem [33] helps to model the sum of the leakage of a sufficiently large number of flip-flops cells as a normal distribution [20]. In [34], 16 flip-flops cells are a sufficient number to validate these results.

The variations of the threshold voltage, and the design constraints are a function of the transistor sizes, according to (1). The threshold voltage and transistor lengths and widths (design variables),  $\{W(l) \text{ and } L(l) \ (\forall 1 \leq l \leq 20)\}$ , define a 40-D design space. Within this design space, the design constraints define a feasible region, which is a region of widths and lengths and threshold voltages and satisfy all the design constraints. The nominal design should be selected somewhere in this feasible region to satisfy the design constraints. Also, the variations in the transistor dimensions and threshold voltages should be taken into account. This can be done as follows.

If the spread (tolerance percent or  $3\sigma$  value of normally distributed widths and lengths of the transistors) of the design

Fig. 2. Normalized simplified yield maximization method.

variables is known, the nominal design can be specified within a certain imaginary box, called the tolerance box. This is shown in Fig. 2 for a two design variable problem and a simplified feasible region defined by simplified constraints in two dimensions. The tolerance box dimensions are specified by the tolerance percent of the design variables. The center of the tolerance box is the nominal design (the design variables are supposed to have a symmetrical distribution). The smaller dots within the tolerance box represent all the design variable values that are satisfying the constraints. The overlapping area of the tolerance box with the feasible region determines the yield. The tolerance box should be moved (the nominal design moves with it) to ensure the maximum overlapping of the tolerance box and the feasible region resulting in maximum yield. Calculating the overlapping area is a very hard problem; so, as an estimation, the yield box is defined, which is the inner box in Fig. 2, and captures the maximum rectangular overlap that can be gained between the feasible region and the tolerance box and is used for yield calculation directly [35]. In this case study (TG-MSFF flip-flop sizing with 40 design variables), the 40-D volume of the inner box, called the yield box, defines the yield.

2) Polyhedral Approximation of the Constraint Region: Performance constraints define a feasible region as follows:

$$F = \{x \in \mathbb{R}^n | h_i(x) \ge 0 \qquad i = 1, 2, ..., m\}$$
(15)

Here, x represents a sample of the random variable X with arbitrary joint PDF. The real-valued functions  $h_i(x) : \mathbb{R}^n \to \mathbb{R}$ are measures of the system performance. Analytic form of h(x) is not usually known and only numerical evaluations of the function and its derivatives can be obtained. Most existing techniques assume convexity of the feasible region F. It is a limitation, but the proposed methodology is also applicable to non-convex problems, if we accept the need to repeat calculations with several different starting values. Because the form of h(x) may be unavailable, the method finds a polyhedral approximation of the feasible region by taking first-order approximation of each  $h_i(x)$  [36], [37]. Accordingly,





Fig. 3. Polyhedral approximation of the original constraint.

the partial derivatives are calculated as follows:

$$\begin{aligned} 1. \quad & \frac{\partial (Area)}{\partial (W(l))}, \ \frac{\partial (Area)}{\partial (L(l))} \quad \forall (1 \le l \le 20) \\ 2. \quad & \frac{\partial (T_{setup})}{\partial (W(l))}, \ \frac{\partial (T_{setup})}{\partial (L(l))} \quad \forall (1 \le l \le 20) \\ 3. \quad & \frac{\partial (T_{D-Q})}{\partial (W(l))}, \ \frac{\partial (T_{D-Q})}{\partial (L(l))} \quad \forall (1 \le l \le 20) \\ 4. \quad & \frac{\partial (P_{dynamic})}{\partial (W(l))}, \ \frac{\partial (P_{dynamic})}{\partial (L(l))} \quad \forall (1 \le l \le 20) \\ 5. \quad & \frac{\partial (P_{leakage})}{\partial (W(l))}, \ \frac{\partial (P_{leakage})}{\partial (L(l))} \quad \forall (1 \le l \le 20) \end{aligned}$$

and  $h_i(x)$  is approximated [36], [37] as follows:

$$h_{li}(x) \approx h_i(x^*) + g_i(x^*)^T (x - x^*)$$
 (16)

where  $g_i(x^*)$  is the gradient vector of  $h_i$ , and  $h_i$  is the  $i_{th}$  original constraint.  $h_{li}$ s are the approximated constraints which are called polytopes in Fig. 3, and finally create a polyhedral shape that is the new feasible region. At first glance, point  $x^*$  is on the surface  $h_i(x) = 0$  and has the minimal distance from  $\mu_0$ , which is the center of the initial tolerance box and is shown in Fig. 2. For finding the best match for the approximated linear constraints, an optimization problem should be solved [36], [37] as follows:

$$\min \beta = [(x - \mu_0)^T (x - \mu_0)]^{\frac{1}{2}},$$
(17)
subject to
 $h_i(x) = 0.$ 

Solving this optimization problem assures the minimal errors in approximation of the constraint region. So, the linearized constraints (polytopes in Fig. 3) replaced the nonlinear original constraints in the yield maximization problem.

3) *Modeling Arbitrary Distributions:* For traditional designs, it is assumed that symmetrical distributions simplify the solution process. In this case, the maximum volume box in the feasible region corresponds to the maximum yield that can

be attained which is shown in Fig. 2. But, if the PDF is nonsymmetrical, the maximum volume box does not correspond to the maximum yield. Thus, the calculation of the yield involves the evaluation of a multidimensional probability integral by quadrature or Monte-Carlo-based methods, which is computationally expensive [38]. Here, Kumaraswamy's distribution [39] is used for approximating a DB-PDF for physically bounded variables as follows:

$$f(z) = abz^{a-1}(1 - z^{a})^{b-1}$$
and
$$z = \frac{x - x^{l}}{x^{u} - x^{l}} \quad , \qquad x^{l} \le x \le x^{u}$$
(18)

where  $x^{l}$  and  $x^{u}$  are the lower and upper bound, respectively, for the probabilistic design variable x. DB-PDF can take several shapes by using different values for a and b. The integral of the DB-PDF, i.e., cumulative density function, can be calculated according to [39] as follows:

$$F(z) = 1 - (1 - z^{a})^{b}.$$
(19)

4) Yield Maximization: Given a convex and bounded polytope P, the yield maximization can be performed in the component space which means there is no need for extra evaluations of functions once P has been constructed. Uniform distributions lead to worst-case design and such case can be handled by searching for the maximum volume rectangular n-dimensional cube inside the feasible region. According to (18) and (19), DB-PDF can take several shapes using different values for a and b and it can be used to approximate uniform, triangular, tail, or almost any single-modal distribution. So, the problem is to the search for the maximum yield rectangular n-dimensional cube [36], [37] as follows:

$$R(x^l, x^u) = x \in R^n | x^l \le x \le x^u.$$

$$(20)$$

Within the polytope P, the requirement,  $R \subseteq P$ , is equivalent to

$$A^+ x^u - A^- x^l \ge C. \tag{21}$$

 $A_i$  is the transpose of the gradient vector  $g_i$ , obtained by linearization of the performance constraint  $h_i$  at a given x.  $A^+$  and  $A^-$  indicate the upper and lower bounds of the same linearized constraint  $(h_i)$  at a given vector x, and C refers to the constant terms in the linearization [right-hand side (RHS) values].

For a given nominal design  $(\mu_0)$  and tolerance percent (t) for each of the design variables, tolerance box that is a multidimensional polyhedral can be found. Each of the 40 dimensions of the tolerance box can be calculated as follows:

$$\begin{bmatrix} \mu_{0,L(l)} - \frac{t_{L(l)}}{2}, \, \mu_{0,L(l)} + \frac{t_{L(l)}}{2} \end{bmatrix} = \begin{bmatrix} L^{lb}(l), \, L^{ub}(l) \end{bmatrix}$$
$$\begin{bmatrix} \mu_{0,W(l)} - \frac{t_{W(l)}}{2}, \, \mu_{0,W(l)} + \frac{t_{W(l)}}{2} \end{bmatrix} = \begin{bmatrix} W^{lb}(l), \, W^{ub}(l) \end{bmatrix}$$
$$\forall (1 < l < 20)$$

in which

$$L^{ub}(l) - L^{lb}(l) = t_{L(l)} = 6 \times \sigma_{L(l)}$$
$$W^{ub}(l) - W^{lb}(l) = t_{W(l)} = 6 \times \sigma_{W(l)}$$
$$\forall (1 \le l \le 20).$$

If  $x_i^l$  and  $x_i^u$  define the bottom-left and top-right corner of the yield box that is the smaller box in Fig. 2, the yield is calculated by [36] and [37] as follows:

$$Yield(x^{r}, x^{l}, x^{u}) = \prod_{j=1}^{n} \Pr\{x_{j}^{l} \le x_{j} \le x_{j}^{u}\}$$
(22)  
and  
$$= \prod_{j=1}^{n} \left[F\left(\frac{x_{j}^{u} - x_{j}^{r}}{t_{j}}\right) - F\left(\frac{x_{j}^{l} - x_{j}^{r}}{t_{j}}\right)\right]$$

where x refers to W(l) and L(l) ( $\forall 1 \le l \le 20$ ) and  $x^r$  refers to the bottom-left corner of the tolerance box. Also F(x) can be found by using the integral of Kumaraswamy's distribution. Now, using a given tolerance box, the objective is to move this box such that the yield is maximized. Finally, the optimization problem [36], [37] is as follows:

$$\max \quad Yield(x^{r}, x^{l}, x^{u}), \qquad (23)$$

$$subject \quad to$$

$$A^{+}x^{u} - A^{-}x^{l} \leq C,$$

$$x^{r} \geq x^{\min},$$

$$x^{l} \geq x^{r},$$

$$x^{u} - x^{l} \geq t,$$

$$and$$

$$x^{r} + t \leq x^{\max}$$

where  $x_{min}$  and  $x_{max}$  refer to minimum and maximum possible values for W(l) and L(l) ( $\forall 1 \leq l \leq 20$ ), respectively, and  $x^r$  refers to the bottom-left corner of the tolerance box.  $A^+$  and  $A^-$  are the upper and lower bounds of the linearized constraint  $(h_i)$  at a given vector x, and C refers to the constant terms in the linearization (RHS values).

Fig. 4 portrays the design procedure in details, with all the inputs, the objective function (yield), the design constraints, design variables, and the steps to compute the design constraints.

By using a SQP-based optimizer [40] to solve the constrained optimization problem in 40-dimensions, the maximum yield and the corresponding optimum nominal design values (transistors sizes) are calculated. The predictive 45 nm technology models [41], [42] has been used in this paper but this method can easily be applied for any future technology and for any circuit realization.

Convergence and Complexity of the Method: The 5) method consists of polyhedral approximation and yield maximization subproblems. Convergence of the algorithm for solving  $h_i(x) = 0$  depends on the continuity and convexity of the functions. In engineering applications, the behaviors of these functions are usually in the vicinity of the nominal point, but it could be different elsewhere. In such case,  $h_i(x) = 0$  can be solvable, but the solution may not be unique. The problem size increases linearly with the number of functions and variables. This should be compared with a quadratic increase in [43]. The proposed formulation can be used for any general PDF, as long as yield integration is not very expensive. For independent random variables, the evaluation of the yield will be reduced to multiplication of n 1-D integrals. This is inexpensive because of the closed form of DB-PDF. Calculation of the 1-D integrals



Fig. 4. Design procedure.

needs much less computations than Monte-Carlo simulation used for yield estimation.

### **IV. EXPERIMENTAL RESULTS**

In the following experimental results, a predictive 45 nm CMOS technology model is adopted [41], [42]. The layout area upper bound, *Area* – *Max* is set to  $1.46\mu m^2$  [29]. The value of  $\sigma_{V_i}$  is calculated using (1) for the predictive 45 nm technology model to be 56 mV for a minimum size transistor. The value of the power supply used,  $V_{DD}$ , is 1.0-V. According to international technology roadmap for semiconductors (ITRS) [32], the gate dimension variations are assumed to have a  $3\sigma$  value of  $\pm 12\%$  of the physical length. The TG-MSFF is designed into two different corners. The first corner is the general-purpose high performance corner in which the upper bound of the delay,  $T_{D-Q}$ , takes on different values and the other design constraints upper bounds are fixed to their



Fig. 5. Varying  $T_{D-Q}$  constraint. (a) Nominal  $T_{D-Q}$  (ps). (b) Nominal  $P_{dynamic}$  ( $\mu$ W). (c) Nominal  $P_{leakage}$  ( $\mu$ W). (d) Nominal area ( $\mu$ m<sup>2</sup>).

conventional values excluding the setup time in this case because the setup time is a component of the total delay. The second corner is the general-purpose low leakage power corner.

Also, if "Y%" is the yield achieved for some special RHS values of each constraints, by increasing one of the RHS values (i.e.,  $\mu_{T_d}$  in the third constraint), higher yield is gained. By other words, relaxing some of the constraints results in yield improvement.

## A. General-Purpose High Performance Corner Design

The TG-MSFF case study is designed using different performance  $(T_{D-Q})$  targets for a maximum dynamic power of 50  $\mu$ W and a maximum leakage power of 5  $\mu$ W (the conventional leakage and switching power values mentioned in [44]–[46]). The TG-MSFF delay,  $T_{D-Q}$ , constraint is reduced from 100 ps to 50 ps. Table I shows the simulation results and the associated trends are analyzed in this section to give more insights on how the proposed yield optimization framework works. The values of the nominal (optimal) transistors widths and lengths obtained from the optimization problem solution are 40 values for each case. These values are not shown in Table I due to space limitations. However, the corresponding design metrics values are tabulated in this table.

From Table I, the following interesting observations are extracted.

1) Fig. 5(a)–(d) shows the flip-flop delay  $(T_{D-Q})$ , dynamic power  $(P_{dynamic})$ , leakage power  $(P_{leakage})$ , and layout area (*Area*) at the obtained nominal designs, respectively. According to Fig. 5(b) and (c), the performance-

power tradeoff is evident. In other words, any gain in performance is accompanied by an increase in the power.

- 2) The initial target of  $T_{D-Q} = 100 \text{ ps}$  is relaxed and is achieved by a small layout area  $(1.31 \,\mu\text{m}^2)$ . This target allows for low dynamic power and low leakage power values of 33.4  $\mu$ W and 3.6  $\mu$ W, respectively. Therefore, the dynamic power and the leakage power constraints of  $50 \,\mu$ W and  $5 \,\mu$ W are not violated.
- 3) Fig. 5(a) shows that as the delay constraint is reduced, the gap between the delay constraint and the nominal delay values is reduced. For example, when  $T_{D-Q}$  constraint is 100 ps, the nominal design  $T_{D-Q}$  is 71.4 ps with a gap of 28.6 ps between them. However, this gap becomes 0.3 ps when  $T_{D-Q}$  constraint is 50 ps.
- 4) The nominal design layout area is increasing as the  $T_{D-Q}$  constraint is reduced. This increase seems to be small (i.e., the layout area is increased by 9.2% as the  $T_{D-Q}$  constraint is reduced from 100 ps to 50 ps.
- 5) Finally, it is evident that as the  $T_{D-Q}$  delay constraint becomes more strict, the yield is reduced.

## B. General-Purpose Low Leakage Power Corner Design

A general-purpose low leakage power TG-MSFF is also designed and the results are summarized in Table II. In this design strategy, the achieved yield is 98.2%. Relaxing the leakage power constraint to  $1.35 \,\mu$ W results in a 100% yield. Therefore, the proposed statistical design framework provides the flip-flops designer with the different design choices to achieve the target yield.

Fig. 6 shows the variation of the Monte-Carlo yield in the vicinity of the obtained low leakage power design solution.

| TABLE I                                                                 |           |
|-------------------------------------------------------------------------|-----------|
| GENERAL-PURPOSE HIGH PERFORMANCE CORNER DESIGN OPTIMIZATION FOR VARYING | $T_{D-Q}$ |

| $T_{D-Q}$ constraint (ps) | Nominal $T_{D-Q}$ (ps) | Nominal $P_{dynamic}$ ( $\mu$ W) | Nominal $P_{leakage}$ ( $\mu$ W) | Nominal Area $(\mu m^2)$ | Overall Yield (%) |
|---------------------------|------------------------|----------------------------------|----------------------------------|--------------------------|-------------------|
| 100                       | 71.4                   | 33.4                             | 3.6                              | 1.31                     | 100               |
| 90                        | 68.7                   | 37.3                             | 3.7                              | 1.34                     | 100               |
| 80                        | 60.2                   | 41.4                             | 4                                | 1.37                     | 100               |
| 70                        | 57.4                   | 42.8                             | 4.3                              | 1.38                     | 100               |
| 60                        | 53.2                   | 45.7                             | 4.5                              | 1.41                     | 100               |
| 50                        | 49.7                   | 47.1                             | 4.8                              | 1.43                     | 99.3              |

TABLE II GENERAL-PURPOSE LOW LEAKAGE POWER CORNER DESIGN OPTIMIZATION



Fig. 6. Yield obtained by Monte-Carlo simulations for the low leakage power TG-MSFF design. (a)  $W_{n_{I3}}$  is varied from 90 nm to 102 nm (around the optimal value of 96 nm). (b)  $W_{n_{T3}}$  is varied from 201 nm to 213 nm (around the optimal value of 207 nm). The Monte-Carlo yield is calculated when only W, L, and  $V_t$  variations are included and when all process variations are taken into account.

In each figure of Fig. 6, one design parameter is varied, whereas all the other design parameters are kept constant. For example, the NMOS transistor width of the inverter I3,  $W_{n_{13}}$ , is varied from 90 nm to 102 nm (the proposed design solution optimal value is 96 nm) and the NMOS transistor width of the transmission gate TX3,  $W_{n_{T3}}$ , is varied from 201 nm to 213 nm (the proposed design solution optimal value is 207 nm). The Monte-Carlo yield is calculated for the case when only W, L, and  $V_t$  variations are considered and when all process variations are taken into account. It is evident that the Monte-Carlo yield, when only W, L, and  $V_t$  variations are considered, degrades as the design point is moved away from the obtained optimum. The same analysis is performed for all the TG-MSFF transistors widths and lengths (40 parameters). However, it is not shown here due to space limitations.

When all process variations are considered, the Monte-Carlo yield exhibits a degradation as the design point is moved away from the obtained optimum, similar to the Monte-Carlo yield when only W, L, and  $V_t$  variations are taken into account, except for the design parameters  $W_{n/2}$ ,  $W_{n/3}$ , and

 $W_{n_{72}}$ . For example, the value of  $W_{n_{i3}}$  at which the maximum yield occurred deviates from 96 nm to 98 nm, when all the process variations are included. This deviation is only in three parameters (out of 40 parameters) and also its value is about 2%. Therefore, the proposed framework is still valid when all the process variations are taken into account.

#### V. CONCLUSION

In this paper, a statistical design framework to design the submicrometer flip-flops circuits was proposed. The framework accounts for the process variations in the transistor dimensions and the threshold voltage fluctuations due to RDF. Moreover, the widths and lengths of the transistors are chosen to satisfy the design constraints of the flip-flop delay, setup time, switching power, leakage power, and layout area. The proposed framework is flexible, time efficient, scalable in terms of technology scaling, involves a small infrastructure in terms of mathematical computations, and uses readily available models and tools in the industry. The TG-MSFF is selected as a case study and designed in the high performance moderate power corner and the low leakage power moderate performance corner. Finally, the proposed framework can be extended to include the impact of other process variations sources such as oxide variations and LER.

#### REFERENCES

- H. Mostafa, M. Anis, and M. Elmasry, "Comparative analysis of timing yield improvement under process variations of flip-flops circuits," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, May 2009, pp. 133–138.
- [2] S. Zanella, A. Nardi, A. Neviani, M. Quarantelli, S. Saxena, and C. Guardiani, "Analysis of the impact of process variations on clock skew," *IEEE Trans. Semiconduct. Manuf.*, vol. 13, no. 4, pp. 401–407, Nov. 2000.
- [3] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, "Parameter variations and impact on circuits and microarchitecture," in *Proc. 40th DAC*, 2003, pp. 338–342.
- [4] S. Borkar, T. Karnik, and V. De, "Design and reliability challenges in nanometer technologies," in *Proc. 41st DAC*, 2004, p. 75.
- [5] K. Bowman, S. Duvall, and J. Meindl, "Impact of die-to-die and withindie parameter fluctuations on the maximum clock frequency distribution for gigascale integration," *IEEE J. Solid-State Circuits*, vol. 37, no. 2, pp. 183–190, Feb. 2002.
- [6] H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, "Challenge: Variability characterization and modeling for 65-nm to 90-nm processes," in *Proc. IEEE CICC*, Sep. 2005, pp. 593–599.
- [7] A. Keshavarzi, G. Schrom, S. Tang, S. Ma, K. Bowman, S. Tyagi, K. Zhang, T. Linton, N. Hakim, S. Duvall, J. Brews, and V. De, "Measurements and modeling of intrinsic fluctuations in MOSFET threshold voltage," in *Proc. ISLPED*, 2005, pp. 26–29.
- [8] F. N. Najm, "On the need for statistical timing analysis," in Proc. 42nd DAC, 2005, pp. 764–765.
- [9] C. Cho, D. D. Kim, J. Kim, J. Plouchart, D. Lim, S. Cho, and R. Trzcinski, "Decomposition and analysis of process variability using constrained principal component analysis," *IEEE Trans. Semiconduct. Manuf.*, vol. 21, no. 1, pp. 55–62, Feb. 2008.
- [10] D. S. Boning, K. Balakrishnan, H. Cai, N. Drego, A. Farahanchi, K. M. Gettings, D. Lim, A. Somani, H. Taylor, D. Truque, and X. Xie, "Variation," *IEEE Trans. Semiconduct. Manuf.*, vol. 21, no. 1, pp. 63–71, Feb. 2008.
- [11] V. Wang, K. Agarwal, S. R. Nassif, K. J. Nowka, and D. Markovic, "A simplified design model for random process variability," *IEEE Trans. Semiconduct. Manuf.*, vol. 22, no. 1, pp. 12–21, Feb. 2009.
- [12] T. Pfingsten, D. J. L. Herrmann, and C. E. Rasmussen, "Model-based design analysis and yield optimization," *IEEE Trans. Semiconduct. Manuf.*, vol. 19, no. 4, pp. 475–486, Nov. 2006.
- [13] S. Sinha, Q. Su, L. Wen, F. Lee, C. Chiang, Y. Cheng, J. Lin, and Y. Harn, "A new flexible algorithm for random yield improvement," *IEEE Trans. Semiconduct. Manuf.*, vol. 21, no. 1, pp. 14–21, Feb. 2008.
- [14] S. H. Choi, B. C. Paul, and K. Roy, "Novel sizing algorithm for yield improvement under process variation in nanometer technology," in *Proc. 41st DAC*, 2004, pp. 454–459.
- [15] A. Agarwal, K. Chopra, and D. Blaauw, "Statistical timing based optimization using gate sizing," in *Proc. Conf. DATE*, 2005, pp. 400– 405.
- [16] M. Hansson and A. Alvandpour, "Comparative analysis of process variation impact on flip-flop power-performance," in *Proc. ISCAS*, 2007, pp. 3744–3747.
- [17] T. S. Barnett, J. P. Bickford, and A. J. Weger, "Product yield prediction system and critical area database," *IEEE Trans. Semiconduct. Manuf.*, vol. 21, no. 3, pp. 337–341, Aug. 2008.
- [18] A. Ripp, M. Bühler, J. Koehl, J. Bickford, J. Hibbeler, U. Schlichtmann, R. Sommer, and M. Pronath, "DATE 2006 special session: DFM/DFY design for manufacturability and yield: Influence of process variations in digital, analog and mixed-signal circuit design," in *Proc. DATE Conf.*, 2006, pp. 387–392.
- [19] T. McConaghy and P. Drennan. (2008). Variation-aware custom IC design: Improving PVT and Monte Carlo analysis for design performance and parametric yield. *Solido Design Automation Application Note* [Online]. Available: http://www.solidodesign.com/files
- [20] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, "Modeling of failure probability and statistical design of SRAM array for yield enhancement in nanoscaled CMOS," *IEEE Trans. Comput.-Aided Des.*, vol. 24, no. 12, pp. 1859–1879, Dec. 2005.

- [21] V. Gupta and M. Anis, "Variability-aware design of static random access memory bit-cell," Masters dissertation, Dept. Electric. Comput. Eng., Univ. Waterloo, Waterloo, ON, Canada, 2008.
- [22] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits: A Design Perspective*, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2002.
- [23] M. H. Abu-Rahma and M. Anis, "A statistical design-oriented delay variation model accounting for within-die variations," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 27, no. 11, pp. 1983– 1995, Nov. 2008.
- [24] Y. Taur and T. H. Ning, Fundamentals of Modern VLSI Devices. New York: Cambridge Univ. Press, 1998.
- [25] V. Stojanovic and V. G. Oklobdzija, "Comparative analysis of masterslave latches and flip-flops for high performance and low power systems," *IEEE J. Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, Apr. 1999.
- [26] W. Liu, MOSFET Models for SPICE Simulation Including BSIM3v3 and BSIM4. New York: Wiley, 2001.
- [27] H. Mostafa, M. Anis, and M. Elmasry, "A design-oriented soft error rate variation model accounting for both die-to-die and within-die variations in submicrometer CMOS SRAM cells," *IEEE Trans. Circuits Syst. I*, vol. 57, no. 6, pp. 1298–1311, Jun. 2010.
- [28] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits," *Proc. IEEE*, vol. 91, no. 2, pp. 305–327, Feb. 2003.
- [29] A. Venkatraman, R. Garg, and S. P. Khatri, "A robust, fast pulsed flipflop design," in *Proc. GLSVLSI*, 2008, pp. 119–122.
- [30] N. A. Doshi, S. B. Dhobale, and S. R. Kakade, "LFSR counter implementation in CMOS VLSI," in *Proc. World Academy Sci., Eng. Technol.*, 2008, pp. 169–173.
- [31] A. Srivastava and D. Sylvester, "Statistical optimization of leakage power considering process variations using dual-Vth and sizing," in *Proc. DAC*, 2004, pp. 773–778.
- [32] International Technology Roadmap for Semiconductors. (2009). [Online]. Available: http://public.itrs.net
- [33] A. Papoulis, Probability, Random Variables and Stochastic Process, 3rd ed. New York: McGraw-Hill, 1991.
- [34] V. Gupta and M. Anis, "Statistical design of the 6T SRAM bit cell," *IEEE Trans. Circuits Syst. I*, vol. 57, no. 1, pp. 93–104, Jan. 2010.
- [35] J. Jaffari and M. Anis, "Variability aware device optimization under ion and leakage current constraints," in *Proc. IEEE Int. Symp. Low Power Electron. Des.*, Oct. 2006, pp. 119–122.
- [36] A. Seifi, K. Ponnambalam, and J. Vlach, "Maximization of manufacturing yield of systems with arbitrary distributions of component values," in *Proc. Annu. Oper. Res.*, 2000, pp. 373–383.
- [37] K. Ponnambalam, A. Seifi, and J. Vlach, "Probabilistic design of systems with general distributions of parameters," *Int. J. Circuit Theory Applicat.*, vol. 29, no. 6, pp. 527–536, 2001.
- [38] S. Director and P. Feldmann, "Optimization of parametric yield: A tutorial," in *Proc. IEEE CICC*, vol. 3, no. 1. May 1992, pp. 1–8.
- [39] P. Kumaraswamy, "A generalized probability density function for double-bounded random processes," J. Hydrol., vol. 46, nos. 1–2, pp. 79–88, 1980.
- [40] T. F. Coleman and Y. Zhang, Optimization Toolbox for Use with MATLAB. Natick, MA: Math Works, 2005.
- [41] Berkeley Predictive Technology Model. (2008). Predictive Technology Models for 45 nm [Online]. Available: http://ptm.asu.edu
- [42] B. Voss and M. Glesner, "A low power sinusoidal clock," in *Proc. IEEE ISCAS*, vol. 4. May 2001, pp. 108–111.
- [43] J. Wojciechowski, J. Vlach, and L. Opalski, "Design for nonsymmetrical statistical distributions," *IEEE Trans. Circuits Syst. I*, vol. 44, no. 1, pp. 29–37, Jan. 1997.
- [44] M. Olivieri, G. Scotti, and A. Trifiletti, "A novel yield technique for digital CMOS circuits design by means of process parameters run-time estimation and body bias active control," *IEEE Trans. Very Large-Scale Integr.*, vol. 13, no. 5, pp. 630–638, May 2005.
- [45] J. Tschanz, S. Narendra, R. Nair, and V. De, "Effectiveness of adaptive supply voltage and body bias for reducing impact of parameter variations in low power and high performance microprocessors," in *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, 2002, pp. 310–311.
- [46] K. Yelamarthi and C. H. Chen, "Process variation-aware timing optimization for dynamic and mixed-static-dynamic CMOS logic," *IEEE Trans. Semiconduct. Manuf.*, vol. 22, no. 1, pp. 31–39, Feb. 2009.



Sayed Alireza Sadrossadat received the B.S. degree in computer engineering from the College of Engineering, University of Tehran, Tehran, Iran, in 2007, and the Masters degree in electrical and computer engineering from the University of Waterloo, Waterloo, ON, Canada, in 2010. Currently, he is pursuing the Ph.D. degree from the Department of Electronics, Carleton University, Ottawa, ON.

His current research interests include probabilistic design, yield maximization, computer arithmetic, and neural network technology for circuits and

systems design.



Hassan Mostafa (S'01) received the B.S. and M.S. (with honors) degrees in electronics from Cairo University, Cairo, Egypt, in 2001 and 2005, respectively. Currently, he is pursuing the Ph.D. degree in electronics from the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada.

He was involved in a project with IMEC, Leuven, Belgium, in 2000. This project included modeling and fabricating the ion-sensitive field-effect transistor. He has authored and co-authored over 20

papers in international journals and conferences. His current research interests include analog circuit design, mixed analog circuit design, low-power circuit, variation-tolerant design, soft error-tolerant design, time-based analog-to-digital converter, and statistical design methodologies.



**Mohab Anis** (S'98–M'03–SM'09) received the B.S. (with honors) degree in electronics and communication engineering from Cairo University, Cairo, Egypt, in 1997, and the M.A.Sc. and Ph.D. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, in 1999 and 2003, respectively.

He is currently an Associate Professor and the Co-Director of the VLSI Research Group, Department of Electrical and Computer Engineering, University of Waterloo. He is the Co-Founder of the Spry

Design Automation, Waterloo. He has authored and co-authored over 90 papers in international journals and conferences, and is the author of two books: *Multi-Threshold CMOS Digital Circuits-Managing Leakage Power* (Norwell, MA: Kluwer, 2003) and *Low-Power Design of Nanometer FPGAs: Architecture and EDA* (San Mateo, CA: Morgan Kaufmann, 2009). His current research interests include integrated circuit design and design automation for very large-scale integrated systems in the deep submicrometer regime.

Dr. Anis is an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II. He is an Associate Editor of the *Journal of Circuits, Systems* and Computers, the ASP Journal of Low Power Electronics, and VLSI Design. He is a member of the program committees for several IEEE conferences. He received the Douglas R. Colton Medal for Research Excellence in recognition of excellence in research leading to new understanding and novel developments in microsystems in Canada in 2004, and the International Low-Power Design Contest Award in 2002.