# A Novel Low Area Overhead Direct Adaptive Body Bias (D-ABB) Circuit for Die-to-Die and Within-Die Variations Compensation Hassan Mostafa, Student Member, IEEE, Mohab Anis, Senior Member, IEEE, and Mohamed Elmasry, Fellow, IEEE Abstract-A direct adaptive body bias (D-ABB) circuit is proposed in this paper. The D-ABB is used to compensate for die-to-die (D2D) and within-die (WID) parameter variations, and accordingly, improves the circuit yield regarding the speed, the dynamic power, and the leakage power. The D-ABB circuit consists of threshold voltage estimation circuits and direct control of the body bias performed by on-chip direct controller circuits. Circuit level simulation results of a circuit block case study, extracted from a real microprocessor critical path, referring to an industrial hardware-calibrated 65-nm CMOS technology transistor model, are presented. These results show that the proposed D-ABB reduces the standard deviations of the frequency, the dynamic power, and the leakage power by factors of $5.5 \times$ , $6.4 \times$ , and $4.5 \times$ , respectively, when both D2D and WID variations are considered. In addition, in the presented case study, initial total yields of 16.8% and 13% are improved to 100% and 91.4%, respectively. The proposed D-ABB circuit exhibits lower area overhead compared to the other ABB circuits reported in the literature. Index Terms—Adaptive body bias (ABB), die-to-die (D2D), microprocessors, process variations compensation, within-die (WID), yield improvement. # I. INTRODUCTION ITH continual CMOS technology scaling, power density has become a significant concern in microprocessor design due to the increasing chip density and clock frequencies [1], [2]. Power constraints of a microprocessor, which is dictated by the overall system thermal design, impact the system cost and the maximum operating frequency, especially, in low power mobile processors. Thus, the goal of a microprocessor designer is not only to achieve the maximum operating frequency, but also to satisfy the power constraints. From a circuit perspective, the main power dissipation components are dynamic power and leakage power [3]. Dynamic power is due to the CMOS transistors switching activity, and the leakage power is due to the parasitic currents of the CMOS transistors when operating in the standby mode. According to [2], the leakage power becomes comparable to the dynamic power in current CMOS technologies, and is expected to surpass the dynamic power in future CMOS technologies [4]. Manuscript received February 01, 2010; revised May 02, 2010; accepted July 04, 2010. Date of publication August 23, 2010; date of current version August 10, 2011. H. Mostafa and M. Elmasry are with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L3G1, Canada (e-mail: hmostafa@uwaterloo.ca; elmasry@uwaterloo.ca). M. Anis is with the Department of Electronics Engineering, American University in Cairo, Cairo 11511, Egypt (e-mail: manis@vlsi.uwaterloo.ca). Digital Object Identifier 10.1109/TVLSI.2010.2060503 Moreover, as CMOS technologies continue to scale towards the nanometer regime, the device parameters, such as threshold voltage, channel length, oxide thickness, and mobility, exhibit large statistical process variations [5]–[7]. These process variations are expected to worsen in future technologies, due to difficulties with printing nanometer scale geometries in standard lithography. Therefore, these variations are considered the primary design challenge as CMOS technology scales [2], [5], [6]. Process variations are classified as die-to-die (D2D) variations and within-die (WID) variations. In D2D variations, all the devices on the same die are assumed to have the same parameter values. However, the devices on the same die are assumed to behave differently, in WID variations [5]. Although D2D variations are originally considered the main source of process variations, WID variations have become the major design challenge as technology scales [6]. In addition, process variations result in a spread of the microprocessor operating frequencies and the associated leakage power. Therefore, some of the fabricated microprocessor chips are discarded because they are either too slow (the frequency constraint is not met) or highly leaky (the leakage power constraint is not met). There is a tradeoff between the microprocessor speed and its leakage power consumption, which means that slow circuits are less leaky, and highly leaky circuits are fast. Accordingly, process variations result in a parametric yield loss and an increased overall system cost. Adaptive body bias (ABB) allows the tuning of the transistor threshold voltage $V_t$ by controlling the transistor body-to-source voltage $V_{BS}$ . A forward body bias (FBB) (i.e., $V_{BS} > 0$ ) reduces $V_t$ , increasing the device speed at the expense of increased leakage power. Alternatively, a reverse body bias (RBB) (i.e., $V_{BS} < 0$ ) increases $V_t$ , reducing the leakage power but slowing the device. Therefore, the impact of process variations is mitigated by speeding up slow and less leaky devices or slowing down devices that are fast and highly leaky [8], [9]. Practically, the implementation of the ABB is desirable to bias each device in a design independently, to mitigate D2D and WID variations. However, supplying so many separate voltages inside a die results in a large area overhead. On the other hand, using the same body bias for all devices on the same die limits their capability to compensate for WID variations. Thus, the granularity level of the ABB scheme is a tradeoff between the target yield and the associated area overhead. Recently, researchers have attempted to use ABB to maximize the system clock frequency or minimize the leakage power. In [10], RBB is used to reduce the leakage power during different operating conditions. FBB is used in [11] for a 1 GHz communication router in 150-nm CMOS technology to maximize the clock frequency. The objective of the research work in [12]–[14] is to design a body bias generator circuit to compensate for D2D variations. Several optimization algorithms are presented in [15]–[17] aiming at finding the optimal body bias voltages to minimize the leakage power. In [8], the optimal granularity level of the ABB scheme is discussed mathematically to achieve near-optimal performance and power characteristics. In [1], ABB is used to compensate for D2D variations by maximizing the die frequency subject to a power constraint. Finally, ABB is used in [3] by estimating the process parameters and using a digital controller to control the body bias. In all the aforementioned research, the ABB circuit area overhead limits its capability to mitigate the WID variations by using the ABB circuit for each circuit block. For example, most of the previously published ABB circuits have a large area overhead because they consist of an analog-to-digital converter (ADC) and/or a digital-to-analog converter (DAC) in conjunction with a digital controller to achieve the required body bias control (i.e., these ABB circuits convert the estimated threshold voltages to digital by using the ADC; then, the digital controller finds the optimal body bias voltages which are converted back to analog by using the DAC). Therefore, the ABB scheme area overhead should be reduced to allow WID variations compensation, which is performed by the direct implementation of the body bias generation circuits in the proposed D-ABB circuit (i.e., no ADC or DAC circuits are required). In this paper, a novel direct ABB (D-ABB) circuit is proposed. It is based on $V_t$ estimation circuits and direct adaptive control of the body bias, achieved by an on-chip direct controller circuit. This direct controller circuit generates the appropriate body bias voltage based on the $V_t$ fluctuations by directly implementing the relationship between $V_t$ and $V_{BS}$ . For example, if the value of $V_t$ increases, due to process variations, the voltage $V_{BS}$ should be increased by adopting FBB to reduce $V_t$ , and therefore, compensate for the process variation impact. Similarly, if the value of $V_t$ decreases, due to process variations, the voltage $V_{BS}$ should be decreased by adopting RBB to increase $V_t$ , and therefore, compensate for the process variation impact. Therefore, the goal of the proposed D-ABB is to reduce the process variations impact by considering D2D and WID variations. This, in turn, improves the parametric yield for the clock frequency, dynamic power, and leakage power. This goal is achieved by using a direct controller circuit which exhibits low area overhead compared to other ABB circuits [1], [3]. It should be mentioned that the goal of the proposed D-ABB is different from that in [1], [3], and [18]. For example, the goal of the work in [1] and [18] is to minimize the leakage power for a given target frequency at high temperature which is the worst case condition for both leakage power and frequency. Also, in [3], the goal is to maximize the circuit parametric yield for a given set of constraints for the dynamic power, and the leakage power. This is achieved by allowing more RBB and relaxing the frequency constraint. The rest of this paper is organized as follows. In Section II, the proposed D-ABB circuit is analyzed. Simulation results are given in Section III. In Section IV, the proposed D-ABB is compared with the previous ABB circuits. Finally, some conclusions are drawn in Section V. #### II. PROPOSED D-ABB CIRCUIT In the proposed D-ABB circuit, the effect of process variations on $V_t$ is compensated by estimating the actual values of $V_t$ , which are impacted by process variations, by using estimation circuits placed close to the critical path. Then, the direct controller generates the appropriate body bias voltage, $V_{BS}$ , to mitigate the process variations impact. The direct controller is a direct implementation of the relationship between $V_t$ and $V_{BS}$ . In [8] and [19] the relationship between $V_t$ and $V_{BS}$ for an nMOS transistor is given by $$V_t = V_{to} + \Delta V_t|_{BB}$$ $$\Delta V_t|_{BB} = \gamma (\sqrt{2\phi_F - V_{BS}} - \sqrt{2\phi_F})$$ (1) where $V_{to}$ is the nMOS transistor threshold voltage at zero body bias (i.e., when $V_{BS}=0$ ), $\Delta V_t|_{BB}$ is the body bias effect on $V_t$ , $\gamma$ is the body effect coefficient, and $\phi_F$ is the Fermi potential with respect to the mid-gap in the substrate [19]. If $V_{to}$ is increased due to process variations by $\Delta V_t|_{PV}$ . Therefore, the body bias voltage $V_{BS}$ compensates for this process variations by producing a threshold voltage change $\Delta V_t|_{BB}$ that cancels out the process variations change $\Delta V_t|_{PV}$ (i.e., $\Delta V_t|_{BB}=-\Delta V_t|_{PV}$ ). The value of $V_{BS}$ that compensates for the process variations change is given by $$V_{BS} = \frac{2\sqrt{2\phi_F}}{\gamma} \times \Delta V_t|_{PV} - \frac{1}{\gamma^2} (\Delta V_t|_{PV})^2$$ (2) where $\Delta V_t|_{PV}$ is the difference between the estimated threshold voltage $V_{te}$ , which is impacted by the process variations, and the nominal threshold voltage $V_{to}$ . Similarly, for pMOS transistors, the same relationship in (2) is used by replacing $V_{BS}$ by $V_{SB}$ . Typically, the sources of the nMOS transistors are connected to the ground (zero voltage), and the sources of the pMOS transistors are connected to the supply voltage $V_{DD}$ . Therefore, the body bias voltages of the nMOS transistors $V_{Bn}$ and the pMOS transistors $V_{Bp}$ , which result in process variations compensation, are given by $$V_{Bn} = \frac{2\sqrt{2\phi_{F_n}}}{\gamma_n} [V_{\text{tne}} - V_{\text{tno}}] - \frac{1}{\gamma_n^2} [V_{\text{tne}} - V_{\text{tno}}]^2$$ (3) $$V_{Bp} = V_{DD} - \frac{2\sqrt{2\phi_{F_p}}}{\gamma_p} [|V_{\text{tpe}}| - |V_{\text{tpo}}|] + \frac{1}{\gamma_p^2} [|V_{\text{tpe}}| - |V_{\text{tpo}}|]^2.$$ (4) The proposed D-ABB circuit is depicted in Fig. 1(a) and (b) for the bias voltages, $V_{Bn}$ and $V_{Bp}$ , respectively. A set of sensing circuits estimates the actual values of the threshold voltages, which are impacted by the process variations. The sensing circuit for the nMOS transistor, shown in Fig. 1(a), outputs an estimate for the nMOS threshold voltage, denoted by $V_{\rm tne}$ . In the mean time, the sensing circuit for the pMOS transistor, shown in Fig. 1(b), outputs an estimate for the pMOS threshold voltage, denoted by $V_{\rm REF} - |V_{\rm tpe}|$ , where $V_{\rm REF}$ is a dc reference voltage. The estimated variables (i.e., $V_{\rm tne}$ and $V_{\rm REF} - |V_{\rm tpe}|$ ) are applied to a set of amplifiers and squaring circuits to produce the Fig. 1. D-ABB circuit for (a) nMOS transistors body bias control $V_{Bn}$ and (b) pMOS transistors body bias control $V_{Bp}$ . required bias voltages, which are capable of reducing the impact of the process variations. In Fig. 1(a), the voltage source $V_{\rm tno}$ is a dc bias voltage representing the nMOS transistor nominal threshold voltage value at zero body bias. The dc supply voltages of the amplifiers are set to $V_{B+}$ and $V_{B-}$ to limit the body bias voltage $V_{Bn}$ . The selection of $V_{B+}$ and $V_{B-}$ values depends on several physical limits, which are explained in Section III-B. According to Fig. 1(a) and recalling (3), the gains $K_{1n}$ , $K_{2n}$ , and $K_{3n}$ are given by $$K_{1n} \times K_{3n} = \frac{2\sqrt{2\phi_{F_n}}}{\gamma_n}, \qquad K_{2n} \times K_{3n} = \frac{1}{\gamma_n^2}.$$ (5) Accordingly, the amplifiers gains $K_{1n}$ and $K_{3n}$ and the squaring circuit gain $K_{2n}$ are arbitrarily selected according to (5). Similarly, The voltage $(V_{\rm REF} - |V_{\rm tpo}|)$ , shown in Fig. 1(b), is a dc bias voltage representing the difference between the reference voltage $V_{\rm REF}$ and the pMOS transistor nominal threshold voltage value at zero body bias. The dc supply voltages of the amplifiers are set to $(V_{DD} + V_{B+})$ and $(V_{DD} + V_{B-})$ to limit the body bias voltage $V_{Bp}$ and to implement (4). According to Fig. 1(b) and recalling (4), the gains $K_{1p}$ , $K_{2p}$ , and $K_{3p}$ are given by $$K_{1p} \times K_{3p} = \frac{2\sqrt{2\phi_{F_p}}}{\gamma_p}, \qquad K_{2p} \times K_{3p} = -\frac{1}{\gamma_p^2}.$$ (6) The implementation of the sensing circuits, the amplifiers, and the squaring circuits are given in the following discussions. ## A. Sensing Circuits Sensing circuits are used to estimate the actual values of the threshold voltages of the nMOS and pMOS transistors, which are impacted by process variations. Figs. 2 and 3 illustrate the sensing circuit implementations for the nMOS and pMOS transistors, respectively [3]. 1) nMOS Threshold Voltage Sensing Circuit: In the nMOS threshold voltage sensing circuit, displayed in Fig. 2, the pMOS transistor is sized with minimum area and acts as a current Fig. 2. nMOS transistor $V_{tn}$ sensing circuit [3]. source. The nMOS transistor is a diode connected transistor, and $V_{\rm REF}$ is a reference voltage. By using the $\alpha$ -power law model, introduced in [20], and equating the dc currents of the nMOS and pMOS transistors, the output voltage of this circuit $V_{\rm outn}$ , is expressed as $$V_{\text{outn}} = V_{tn} + r_n \times [V_{\text{REF}} - |V_{tp}|], \qquad r_n = \left(\frac{k_{p'} \frac{W}{L}|_p}{k_{n'} \frac{W}{L}|_n}\right)^{1/\alpha}$$ (7) where $V_{tn}$ and $|V_{tp}|$ are the threshold voltages, $k_{n'}$ and $k_{p'}$ are the technological parameters, and $(W/L)|_n$ and $(W/L)|_p$ are the sizes of the nMOS and the pMOS transistors, respectively. By sizing this circuit such that $(W/L)|_n \gg (W/L)|_p$ , (7) is rewritten as $$V_{\text{outn}} \approx V_{tn}$$ . (8) Therefore, the output voltage of the nMOS threshold voltage sensing circuit, shown in Fig. 2, represents the actual nMOS transistor threshold voltage, which is impacted by process variations, and denoted by $V_{\rm tne}$ . 2) pMOS Threshold Voltage Sensing Circuit: Alternatively, the pMOS threshold voltage sensing circuit is depicted in Fig. 3. The nMOS transistor is sized with minimum area and acts as a current source, whereas the pMOS transistor is a diode connected transistor. Similarly, by sizing this circuit such that Fig. 3. pMOS transistor $|V_{tp}|$ sensing circuit [3]. Fig. 4. Output of the nMOS threshold voltage sensing circuit shown in Fig. 2. Fig. 5. Output of the pMOS threshold voltage sensing circuit shown in Fig. 3. $(W/L)|_p\gg (W/L)|_n,$ the output voltage of this circuit $V_{\rm outp}$ is given by $$V_{\text{outp}} \approx V_{\text{RFF}} - |V_{tn}|.$$ (9) This output voltage is denoted by $V_{\rm REF} - |V_{\rm tpe}|$ and represents the actual pMOS transistor threshold voltage, which is impacted by process variations. Fig. 4 portrays $V_{\rm tne}$ , the output voltage of the nMOS sensing circuit, versus $V_{\rm tno}$ and Fig. 5 displays the output voltage of the pMOS sensing circuit ( $V_{\rm REF}-|V_{\rm tpe}|$ ) versus ( $V_{\rm REF}-|V_{\rm tpo}|$ ). These figures are obtained from SPICE simulations by sweeping the threshold voltage parameters of the industrial 65-nm CMOS technology transistor model and using $V_{\rm REF}=0.5$ V, $r_n=r_p\approx 0.075$ . Good agreements between the estimated threshold voltages values and their actual values, Fig. 6. Proposed two-stage amplifier circuit. prove that the threshold voltage sensing circuits are effective, when used in nanometer technologies. The maximum error between the estimated threshold voltage values and their corresponding actual values is 4.5%, and the average error is 2.7%. # B. Amplifier Circuit In the proposed D-ABB circuit in Fig. 1, several amplifiers with various gains and a large output voltage swing $(V_{B+} - V_{B-})$ are required. Therefore, the two-stage configuration amplifier circuit, shown in Fig. 6, is utilized. The advantage of this configuration is that it isolates the gain and the output voltage swing requirements. The first stage is configured in a differential pair topology to provide the high gain requirements. Typically, the second stage is configured as a common source stage to allow maximum output voltage swings [21]. Long channel transistor operation is assumed by making all the amplifier transistors lengths equal 130 nm, and therefore, all transistors are assumed to be in the pinch-off saturation region. Assuming that the following transistors pairs, (M1 and M2), (M3 and M4), (M6 and M7), and (M8 and M9), are matched. According to [22], the mismatch between these transistors threshold voltages is inversely proportional to the square root of the channel area (WL). Thus, by designing all the amplifier and squaring circuit transistors widths larger than 195 nm (the minimum width for STMicroelectronics 65 nm transistor is 120 nm) and lengths of 130 nm (the minimum L for STMicroelectronics 65 nm transistor is 60 nm), this mismatch effect is minimized. Correspondingly, the amplifier gain K is written as $$K = \underbrace{\frac{g_{m1}}{g_{d1} + g_{d3}}}_{} \times \underbrace{\frac{g_{m8}}{g_{d6} + g_{d8}}}_{}$$ (10) where the first term represents the differential pair gain, the second term represents the second stage gain, $g_m$ is the transistor transconductance, and $g_d$ is the transistor drain-to-source output conductance. $g_m$ and $g_d$ are designed to achieve the required gain, which is achieved by the first stage, and the output voltage swing, which is achieved by the second stage, in each amplifier. It should be noted that the amplifier shown in Fig. 6 is a non-inverting amplifier. However, this amplifier is configured as an inverting amplifier by changing the input terminals (i.e., $V_{\rm in+}$ and $V_{\rm in-}$ become the inputs to transistors M2 and M1, respectively). Fig. 7. Proposed squaring circuit which consists of the differential voltage generator and the basic squaring circuit. # C. Squaring Circuit One of the essential building blocks in the D-ABB circuit, shown in Fig. 1, is the squaring circuit. Several squaring circuits are reported in the literature [23]-[25]. Fig. 7 depicts the squaring circuit used in the D-ABB circuit. The proposed squaring circuit consists of a differential voltage generator circuit and a basic common source differential pair squaring circuit. The differential voltage generator circuit is utilized to adjust the squaring circuit output voltage dc-offset and the squaring circuit gain. Assuming long channel transistor operation, all transistors are operating in the pinch-off saturation region, and the transistors pairs, (Md1 and Md2), (Md6, Md7, Md10, and Md11), (Md3 and Md8), and (Md4 and Md12), are matched. The small signal current flowing through Md1 is $g_{m1}V_{\rm in}/2$ which is equal to the small signal current flowing through Md8 which is $g_{m6}V_{o1}/2$ due to the current mirror action between these transistors. Therefore, $V_{o1} = (g_{m1}/g_{m6})V_{in}$ . Similarly, due to the current mirror action between transistors Md4 and Md12, the voltage $V_{o2}$ is $-(g_{m1}/g_{m10})V_{in}$ . Since transistors Md6, Md7, Md10, and Md11 are matched, the two output voltages $V_{o1}$ and $V_{o2}$ are given by $$V_{o1} = -V_{o2} = \left(\frac{g_{m1}}{g_{m6}}\right) V_{\text{in}}.$$ (11) These two output voltages $V_{o1}$ and $V_{o2}$ have an equal common mode voltage $V_{\text{REF}_{SQ}}$ . When these two output voltages are applied to the basic squaring circuit, the resultant output voltage, $V_{\text{out}_{SQ}}$ , is given by [24] $$V_{\text{out}_{SQ}} = \frac{(V_{\text{REF}_{SQ}} - |V_{tp}|)^2 - (V_{B-} + |V_{tp}|)^2}{2(V_{\text{REF}_{SQ}} - V_{B-} - 2|V_{tp}|)} + \frac{\left(\frac{g_{m1}}{g_{m6}}\right)^2 \times V_{\text{in}}^2}{2(V_{\text{REF}_{SQ}} - V_{B-} - 2|V_{tp}|)}$$ (12) Fig. 8. Simulated squaring circuit output with $V_{\rm in}$ is varied from -0.15 to 0.15 V and the gain is 10.0. where the transistors pairs, (Ms1 and Ms2) and (Ms4 and Ms5) are matched. It is evident that the squaring circuit output voltage dc-offset can be adjusted through $V_{\mathrm{REF}_{SQ}}$ , whereas the squaring circuit gain can be adjusted through the transconductance ratio $(g_{m1}/g_{m6})$ and $V_{\mathrm{REF}_{SQ}}$ . Fig. 8 displays the simulation results for the squaring circuit in Fig. 7, where $V_{\mathrm{in}}$ is varied from -0.15 to 0.15 V and the squaring circuit gain is 10.0. # D. Effect of Process and Temperature Variations on the Proposed D-ABB Circuit A 5000 point Monte Carlo analysis, including the mismatch between transistors is performed. An industrial hardware-calibrated 65-nm CMOS technology transistor statistical models is used to investigate the effect of process variations on the proposed sensing, amplifier, and squaring circuits. In [26] and [27], it has been demonstrated that the utilization of statistical transistor models is capable of accounting for both D2D and WID Fig. 9. Test circuit used in the simulation setup. variations. A very good fitting with the measured data is reported in [26] and [27], not only for the mean and standard deviation values, but also for the correlation between nMOS and pMOS transistors data. These statistical models are available in the design kits provided by STMicroelectronics. The process variations (D2D and WID variations) are included in the transistor design kit and declared by STMicroelectronics to be Silicon verified. In this design kit, several process parameters are treated as variants such as the threshold voltage, mobility, drain-to-source resistance, drain-induced-barrier-lowering (DIBL) coefficient, all junction capacitances, and doping concentration. For example, the threshold voltage $V_t$ is varied within the $\pm 3\sigma$ design space with standard deviation to mean ratio $(\sigma/\mu)_{Vt} \approx 12\%$ . Also, in this design kit, the WID variations (mismatch effect) are modeled as inversely proportional to the transistor area (WL) [22]. These statistical models are used in all the following Monte Carlo simulations. Simulation results reveal that the maximum ratio between the standard deviation of the sensing, amplifier, and squaring circuits parameters (i.e., gain, output voltage swing, and dc offset) to their mean values is less than 1.3%, 0.6%, and 0.9%, respectively. Therefore, the newly developed D-ABB circuit is insensitive to process variations. In addition, the same sensing, amplifier, and squaring circuits are found to be insensitive to the temperature variations over the $-30\,^{\circ}\text{C}$ to $120\,^{\circ}\text{C}$ range. The maximum change in the sensing, amplifier, and squaring circuits parameters, relative to their nominal values, is less than 0.3%, 0.8%, and 0.7%, respectively, over the specified temperature range. According to these simulations, the proposed D-ABB circuit is insensitive to process and temperature variations. ## III. SIMULATION RESULTS AND DISCUSSIONS #### A. Test Circuit Description The newly developed D-ABB circuit is applied to a circuit block, extracted from a real microprocessor critical path, to verify its effectiveness in process variations compensation. This circuit block consists of 15 CMOS gates including CMOS inverter gates, NAND gates, NOR gates, and Transmission gates, similar to the test circuits used in [1] and [3]. Fig. 9 portrays the test circuit, which consists of 30 critical paths, a global D-ABB circuit, and 30 local D-ABB circuits. The global D-ABB provides same bias voltages to all the die critical paths. Therefore, its effectiveness, in reducing WID variations, is limited. The distributed local D-ABB circuits supply different bias voltages to each critical path, achieving better results in reducing WID variations, at the expense of higher area overhead than that in the global D-ABB circuit. This circuit block is selected to model the effect of the proposed D-ABB on the yield improvement of a real microprocessor design [3]. The figures of merit considered in this experiment are the oscillation frequency ( $F_{\rm clk}$ ), the dynamic power ( $P_{\rm dyn}$ ) of the circuit block when configured as a ring oscillator, and the leakage power ( $P_{\rm leak}$ ) of the circuit block when operating in static conditions [3]. The circuit block and the D-ABB circuits are implemented by using an industrial hardware-calibrated 65-nm CMOS technology. This transistor model has statistical models accounting for D2D and WID variations, provided by the foundry design kit, and declared by the foundry as Silicon verified as explained in Section II-D. Table I shows the nMOS and pMOS transistor parameters, extracted from the transistor model. The supply voltage, $V_{\rm DD}$ , equals 1.0 V and circuit level simulations are conducted. The effectiveness of the proposed D-ABB circuit is proved by showing its ability on reducing the D2D and WID variations. The impact of WID variations depends on the number of critical paths per die. In [1] and [7], it is proven through statistical simulations that as the number of critical paths per die increases, the WID frequency variations cause the frequency mean and standard deviation to reduce. These results are confirmed through test chip measurements in [1] and [7]. Moreover, it is reported TABLE I 65-nm Technology Information at $T=120\,^{\circ}\mathrm{C}$ | | nMOS | pMOS | |--------------------------|-------|--------| | $V_{to}(V)$ | 0.352 | -0.204 | | $\phi_F$ (V) | 0.467 | 0.439 | | $\gamma$ (dimensionless) | 0.296 | 0.174 | in [1] and [7] that when the number of critical paths per die exceeds 14, there is no significant change in the frequency distribution. Therefore, the test circuit used in this paper has 30 critical paths per die which is sufficiently accurate for obtaining frequency distributions of real microprocessors which contain hundreds of critical paths. #### B. D-ABB Circuit Design Since the threshold voltage impacts $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ , the proposed D-ABB circuit consists of threshold voltage sensing circuits (see Figs. 2 and 3) that estimate the values of $V_{tn}$ and $|V_{tp}|$ . Then, these estimations are applied to the body bias direct controller to generate the body bias voltages $V_{Bn}$ and $V_{Bp}$ . Accordingly, RBB is applied to reduce power dissipation and frequency, while FBB is applied to increase frequency and power dissipation. The junction leakage current and the breakdown considerations determine the RBB voltage bound, while the FBB is limited by the subthreshold leakage current and the forward biasing of the drain-bulk junction. According to [28] and [29], the upper limit of the FBB voltage for latch-up free operation, in 65-nm CMOS technology with $V_{\rm DD}$ ranges from 0.9 to 1.2 V, is 0.6 V. Also, SPICE simulations are conducted by sweeping the FBB voltage for nMOS and pMOS transistors. Simulation results show that the upper limits of the FBB voltage to prevent latch-up triggering for nMOS and pMOS transistors are 0.62 and 0.59 V, respectively. Therefore, the maximum FBB voltage used in the D-ABB is set to 0.5 V to ensure latch-up free operation in case of fluctuations of the FBB voltage around 0.5 V. Accordingly, the FBB and the RBB maximum voltages (i.e., $V_{B+}$ and $V_{B-}$ ) are set to $\pm 0.5$ V [1] (i.e., the body bias voltage changes around its normal value by $\pm 0.5 \text{ V}$ ). The circuit is designed in Cadence IC environment to calculate $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ by using circuit level simulations. By using the technology information in Table I, the D-ABB circuit is designed with $V_{Bn}$ generation circuit parameters $K_{1n}$ , $K_{2n}$ , and $K_{3n}$ equal 5.7, 10.0, and 1.14, respectively, and with $V_{Bp}$ generation circuit parameters $K_{1p}$ , $K_{2p}$ , and $K_{3p}$ equal –10.8, 33.0, and –1.0, respectively. All the above parameters are for $T=120\,^{\circ}\mathrm{C}$ . It should be mentioned that the technology parameter $\phi_F$ is linearly proportional to the temperature T in $^{\circ}\mathrm{K}$ , accordingly, the D-ABB design is performed at the worst case temperature $T=120\,^{\circ}\mathrm{C}$ . The performance of the D-ABB is examined for temperature values lower than $T=120\,^{\circ}\mathrm{C}$ in Section III-E. #### C. Simulation Setup First, the global D-ABB circuit is enabled and all the local D-ABB circuits are disabled. The global D-ABB sensing circuit is placed close to any critical path (critical path number 30 is selected in this test circuit). Based on the threshold voltage variations of this critical path, the global D-ABB provides the body bias voltages to all the die critical paths. Since the body bias voltages are determined based on the threshold voltage calculations of a single critical path, this global D-ABB circuit does not reduce the WID variations effectively. Following that, the local D-ABB circuits are enabled and the global D-ABB is disabled. Each local D-ABB sensing circuit is placed close to its corresponding critical path, as shown in Fig. 9, and supplies the appropriate body bias voltages to this critical path. Therefore, the use of the local D-ABB is very efficient in accounting for WID variations as explained in Section III-D. The granularity level of the global D-ABB circuit is the whole die while the granularity level of the local D-ABB circuits is the critical path. Therefore, the granularity level of the local D-ABB is smaller at the expense of more area overhead. The Monte Carlo analysis generates 5000 different dies. In each Monte Carlo statistical run (which is corresponding to a certain die), the die frequency is calculated as the minimum frequency of the die critical paths. Since the real microprocessor die contains hundreds of critical paths, the die power (i.e., the dynamic power and the leakage power) is calculated as the average power per critical path. This is performed by summing the critical paths powers and dividing by the number of critical paths per die. # D. Global D-ABB Versus Local D-ABB 1) Global D-ABB: In this case, the global D-ABB circuit is enabled and all the local D-ABB circuits are disabled. 5000 point Monte Carlo analysis, with the same transistor statistical models explained in Section II-D, is conducted. Fig. 10 reports $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ histograms for the NBB control case [see Fig. 10(a)–(c), respectively] and for the global D-ABB control case [see Fig. 10(d)–(f), respectively], when only D2D variations are considered and WID variations are ignored. The only D2D variation case is included to compare the proposed D-ABB with the ABB circuit in [3]. Fig. 11 depicts $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ histograms for the NBB control case [see Fig. 11(a)–(c), respectively] and for the global D-ABB control case [see Fig. 11(d)–(f), respectively], when both D2D and WID variations are taken into account. The following observations are extracted for the global D-ABB control case. • The means of $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ (i.e., $\mu_{F_{\rm clk}}$ , $\mu_{P_{\rm dyn}}$ , and $\mu_{P_{\rm leak}}$ ), have a slight change between the NBB case and the global D-ABB case (i.e., the means are changed by a factor less than $1.06\times$ for all design parameters). Therefore, the global D-ABB circuit does not affect the mean of the design parameters for both cases (i.e., when only D2D variations are considered and when both D2D and WID variations are taken into account). Fig. 10. Monte Carlo Histograms of $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ , with (a)–(c) NBB control, (d)–(f) Global D-ABB control, (g)–(i) Local D-ABB control. Only D2D variations are considered at a temperature $T=120\,{}^{\circ}{\rm C}$ - The global D-ABB circuit reduces the standard deviations of $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ (i.e., $\sigma_{F_{\rm clk}}$ , $\sigma_{P_{\rm dyn}}$ , and $\sigma_{P_{\rm leak}}$ ), by factors of 5.5×, 6.4×, and 4.5×, respectively, when WID variations are ignored, and by factors of 4.0×, 3.7×, and 1.9×, respectively, when WID variations are considered. - Comparing Figs. 10 and 11, the global D-ABB circuit is better for D2D variations compensation than for WID variations compensation. This result is because only one D-ABB circuit is used for all the die critical paths. Therefore, the utilization of a local D-ABB circuit for each critical path is essential to minimize the effects of the WID variations. - 2) Local D-ABB: In this case, the global D-ABB circuit is disabled and all the local D-ABB circuits are enabled. Fig. 10 reports $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ histograms for the local D-ABB control case [see Fig. 10(g)–(i), respectively], when only D2D variations are considered and WID variations are ignored. Fig. 11 depicts $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ histograms for the local D-ABB control case [see Fig. 11(g)–(i), respectively], when both D2D and WID variations are taken into account. The following observations are extracted for the local D-ABB control case. - Similar to the global D-ABB, the local D-ABB do not affect the mean of $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ for both cases (i.e., when only D2D variations are considered and when both D2D and WID variations are taken into account). - The local D-ABB circuits achieve slightly more process variations reduction than that of the global D-ABB circuit, when WID variations are ignored. This is expected since when WID variations are ignored, the global D-ABB is sufficient and no need for the local D-ABB. - When WID variations are taken into account, the local D-ABB circuits achieve significantly more process variations reduction than that of the global D-ABB circuit. For example, F<sub>clk</sub>, P<sub>dyn</sub>, and P<sub>leak</sub> standard deviations are reduced further by applying the local D-ABB circuits by factors of 1.3×, 1.4×, and 3.2× than that when the global D-ABB circuit is utilized. # E. Temperature Variations The D-ABB design is performed at a temperature $T=120\,^{\circ}\mathrm{C}$ which is the worst case condition for both the operating frequency and the leakage power. When the operating temperature decreases, $V_t$ is increased [19], resulting in leakage Fig. 11. Monte Carlo Histograms of $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ , with (a)–(c) NBB control, (d)–(f) Global D-ABB control, (g)–(i) Local D-ABB control. Both D2D and WID variations are considered at a temperature $T=120\,^{\circ}{\rm C}$ . power reduction. The reduction in the leakage power is large because the leakage power exhibits an exponential relationship with the temperature and $V_t$ [19]. This $V_t$ increase is sensed by the D-ABB circuit and the corresponding body bias voltages are generated. The D-ABB circuit target is to compensate for the temperature variations effect by reducing $V_t$ , and therefore, FBB is adopted to the die critical paths. # IV. COMPARISON WITH PREVIOUS ABB CIRCUITS Holding a direct comparison with previous ABB circuits is not viable because of the different technology and different goal in process variations compensation. For example, the work in [1] targets reducing the relative variations of the clock frequency, $\sigma/\mu|_{F_{\rm clk}}$ , while meeting a certain leakage power constraint by using 150–nm CMOS technology. In addition, the main objective of the work in [3] is to maximize the overall yield by relaxing the clock frequency constraint and applying more RBB than FBB (i.e., using a RBB maximum voltage of 0.5 V while the FBB voltage is 0.25 V) by using 130 nm CMOS technology. Thus, the performance of the D-ABB in reducing process variations and the associated area overhead are the aspects of the following comparison with the previous ABB circuits in [1] and [3]. #### A. Process Variations Compensation 1) Comparison With the ABB in [1]: The results in [1] are obtained from the measurements of a fabricated 62 dies, each die contains 21 critical paths. Therefore, the simulation results of the D-ABB test circuit, when WID variations are considered, are compared to these measurement results in [1] because in a fabricated test chip, WID variations are not ignored. According to Section III-D, the global D-ABB circuit and the local D-ABB circuits result in a reduction of the relative standard deviation of the clock frequency $(\sigma/\mu|_{F_{\text{clk}}})$ by factors of 4.1× and 5.4×, respectively, for 65-nm CMOS technology. In [1], it is reported that the $\sigma/\mu|_{F_{\text{clk}}}$ is reduced by factors of 4.1× and 5.6×, respectively, for 150-nm CMOS technology. Thus, the D-ABB circuit exhibits approximately the same process variations reduction as the ABB circuit in [1], taking into account that the 65-nm CMOS technology, used in this paper, introduces more process variations than the 150-nm CMOS technology, adopted 2) Comparison With the ABB in [3]: In [3], circuit level simulations results are reported for 130-nm CMOS technology, when only global ABB circuit is adopted. The only D2D variations case and both D2D and WID variations case are considered | | Global D-ABB | | | | ABB introduced in [3] | | | | |----------------------------------|--------------|--------------|------------------|--------------|-----------------------|---------|------------------|---------| | | Only D2D | | Both D2D and WID | | Only D2D | | Both D2D and WID | | | | NBB | Global D-ABB | NBB | Global D-ABB | NBB | ABB [3] | NBB | ABB [3] | | F <sub>clk</sub> Yield | 51.2% | 100% | 50.2% | 94.4% | 100% | 100% | 100% | 100% | | $\mathbf{P_{dyn}}$ Yield | 65.2% | 100% | 66.4% | 99.2% | 48.4% | 100% | 47.4% | 100% | | $\mathbf{P}_{\text{leak}}$ Yield | 93.2% | 100% | 77.8% | 95.3% | 16.8% | 100% | 13.8% | 86.8% | | Total Yield | 16.8% | 100% | 13% | 91.4% | 16.8% | 100% | 13% | 86.8% | TABLE II INDIVIDUAL YIELDS FOR $F_{\rm clk}$ , $P_{\rm dyn}$ , $P_{\rm leak}$ , and the Total Yield, for the NBB and the Global D-ABB and the ABB Introduced in [3], Considering Only D2D Variations and Both D2D and WID Variations, at a Temperature $T=120\,{\rm ^{\circ}C}$ at a temperature of $T=120\,^{\circ}\text{C}$ . Thus, in the following comparison, only the global D-ABB is considered. - Only D2D variations: In [3], it is reported that adopting the introduced ABB scheme at a temperature of $T=120\,^{\circ}\mathrm{C}$ , and considering only D2D variations, results in increasing the overall yield from 16.8% to 100%. In this yield maximization, the design parameters $F_{\mathrm{clk}}$ , $P_{\mathrm{dyn}}$ , and $P_{\mathrm{leak}}$ individual yields are enhanced from 100% to 100%, 48.4% to 100%, and 16.8% to 100%, respectively. It is evident that the work in [3] relaxes the $F_{\mathrm{clk}}$ constraint. In this work, the design parameters $F_{\mathrm{clk}}$ , $P_{\mathrm{dyn}}$ , and $P_{\mathrm{leak}}$ constraints are set to 1.35 GHz, 75 $\mu$ W, and 20 $\mu$ W, respectively, to achieve an overall yield of 16.8%. Then the global D-ABB circuit is adopted and the resulting design parameters individual yields are all 100%, and accordingly, the overall yield is also 100%. These results are tabulated in Table II. - Both D2D and WID variations: In [3], it is reported that applying the introduced ABB scheme at a temperature of $T=120\,^{\circ}\mathrm{C}$ , and considering both D2D and WID variations, results in increasing the overall yield from 13% to 86.8%. In this yield maximization, the design parameters $F_{\mathrm{clk}}$ , $P_{\mathrm{dyn}}$ , and $P_{\mathrm{leak}}$ individual yields are enhanced from 100% to 100%, 47.4% to 100%, and 13.8% to 86.8%, respectively. Similar to the only D2D variations case above, the design parameters $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ constraints are set to achieve an overall yield of 13% when both D2D and WID variations are considered. These constraints values are the same values used in the only D2D variations case. Then the global D-ABB circuit is adopted and the resulting design parameters $F_{\rm clk}$ , $P_{\rm dyn}$ , and $P_{\rm leak}$ individual yields increase from 50.2% to 94.4%, 66.4% to 99%, and 77.8% to 96%, respectively. Accordingly, the overall yield increases from 13% to 91.4%. These results are tabulated in Table II. It is evident from the above comparison that the global D-ABB circuit is capable of achieving a total yield similar to that in [3], when only D2D variations are considered, and higher than that in [3], when both D2D and WID variations are taken into account. It should be noted that the 65-nm CMOS technology, used in this paper, introduces more process variations than the 130-nm CMOS technology, adopted in [3]. #### B. Associated Area Overhead The newly developed D-ABB circuit comprises of two sensing circuits, four amplifiers, two squaring circuits. The layout area of the sensing circuit is calculated to be $20 \, \mu \text{m}^2$ . The amplifier circuit area is estimated by scaling the amplifier layout area in [1] (excluding the resistors) and equals 0.012 mm<sup>2</sup>. The squaring circuit area is approximated by scaling the area of a similar squaring circuit architecture in [23] and equals 0.018 mm<sup>2</sup>. Therefore, the approximate area of the D-ABB circuit is $\approx 0.084 \, \text{mm}^2$ . In the ABB circuit in [1], a critical path mimic is used and the desired clock frequency is applied externally. The output of the critical path mimic is compared to the externally applied clock frequency by using a phase detector (PD). The output of the PD is used to enable a 5-bit digital counter whose value represents the desired body bias to apply. Finally, the 5-bit digital output from the counter is converted to an analog body bias voltage by using a DAC followed by a bias amplifier. Therefore, the ABB in [1] consists of critical path mimic, PD, two 5-bit counters, two 5-bit DAC circuits, and two bias amplifier circuits. The approximate area of the ABB in [1] is 0.213 mm<sup>2</sup>. This area is estimated by scaling the layout area given in [1] from 150 nm technology to 65-nm technology (i.e., dividing by $(150/65)^2$ ). The ABB circuit reported in [3] utilizes a set of threshold voltage sensing circuits to estimate the actual threshold voltage values. The output of these sensing circuits is converted to a digital word by using an ADC. A control unit is used to select the optimum body bias code stored in a programmable read only memory (PROM) unit, based on the ADC output word. The output of the PROM is then converted to an analog body bias voltage by using a DAC followed by a bias amplifier. The work, in [3], does not provide any information about the associated area overhead. However, implementing a 6-bit ADC in 65-nm technology, recently, occupies an area of 0.11 mm<sup>2</sup>[30]. The ABB in [3] needs two ADC circuits for the nMOS and the pMOS body bias voltages. Accordingly, the area of the ABB in [3] exceeds 0.22 mm<sup>2</sup>. The total area is roughly estimated to be 0.344 mm<sup>2</sup> excluding the PROM and the control unit area. From the above discussion, the area of the D-ABB is less than that of the ABB in [1] and in [3] by factors of $2.5\times$ and $4.1\times$ , respectively. Therefore, the D-ABB exhibits low area overhead compared to [1] and [3]. This low area overhead allows the use of the D-ABB at smaller granularity level (i.e., critical path level or cluster of gates level) with lower area overhead than that of the ABB circuits in [1] and [3]. In addition, the resolution of the DAC and/or the ADC, used in the ABB circuits in [1] and [3], limits their capability in process variations compensation. The D-ABB does not suffer from this resolution limit because no ADC or DAC is required in the D-ABB circuit. #### C. Design Considerations Practically, there are several design considerations that should be addressed, when the proposed D-ABB is to be fabricated. These design considerations are as follows. - 1) Mixed Analog-Digital Design Considerations: Separate power supply and ground planes are routed for the analog components because analog components are very sensitive to disturbances in the supply voltage. Thus, a low noise analog power supply network is a stringent requirement for the proper operation of these analog components. Noise, due to variations in the power supply and ground (i.e., the noise resulting from the digital components switching), is coupled into the analog portion of the chip and is amplified along with the desired signal. This affects the functionality of the analog components [31]. Several techniques, to help prevent the digital switching noise from affecting the analog components, are discussed in [31]. This analog and digital supplies and grounds separation is also required for the ABB circuits introduced in [1] and [3]. Therefore, the D-ABB area overhead required for this separation is the same as that in [1] and [3]. - 2) Substrate Noise Isolation: The substrate body bias terminals of the analog components are connected to the analog power supply rails whereas the substrate body bias terminals of the digital components are connected to the output of the analog components (the D-ABB outputs $V_{BN}$ and $V_{BP}$ ). In the triple well technology, provided by STMicroelectronics, the analog components are placed in separate wells than that of the digital components and substrate guard rings are used for substrate isolation. However, this isolation is not essential, in the D-ABB, because the analog components substrate is not connected to the digital components substrate (i.e., the analog components substrates are connected to the analog power supply rails whereas the digital components substrates are connected to the output of the analog components. The coupling between the analog part and the digital part occurs due to the connection of the analog outputs $V_{BN}$ and $V_{BP}$ to the digital components substrates. This coupling noise causes the analog outputs to have maximum fluctuations around its value by 6%, obtained from the simulation results. These fluctuations do not affect the circuit operation and are included in the circuit level simulations, conducted in this paper. Also, this substrate noise isolation area overhead is the same for the D-ABB and that in [1] and [3]. - 3) Area Overhead and Granularity Level Tradeoff: The global D-ABB is very efficient for the D2D variations and the systematic WID variations compensation. For random WID variations such as random dopant fluctuations (RDF) and line edge roughness (LER), there is a tradeoff between the D-ABB granularity level and the associated area overhead (i.e., the lower the granularity level is, the higher the associated area overhead). This tradeoff exists in the proposed D-ABB and the ABB circuits introduced in [1] and [3]. However, the lower area overhead of the D-ABB allows lowering the granularity level while the total area overhead is similar to that in [1] and [3]. 4) Generation of the DC Supply Voltages for the D-ABB Circuit: In the circuit level simulations conducted in this paper, the dc supply voltages, $V_{B+}$ , $V_{B-}$ , $V_{DD} + V_{B+}$ , and $V_{DD} + V_{B-}$ , are generated externally from an off-chip power supply. However, in real microprocessor design, these dc supply voltages are generated by using an on-chip dc-dc converter [32]. This dc-dc converter increases the required area overhead of the proposed D-ABB. However, the same area overhead is required for the ABB circuits in [1] and [3] because a DC-DC converter is required in these circuits as well. Generally speaking, any ABB circuit must have analog driving bias amplifiers at its output to provide the body bias voltages. These analog driving amplifiers need dc supply voltages, which are generated by using the on-chip dc-dc converter. #### V. CONCLUSION The proposed D-ABB circuit has been shown to reduce the impact of the process variations on the microprocessor frequency, dynamic power, and leakage power. The D-ABB circuit consists of threshold voltage sensing circuits and a direct controller that generates the required body bias voltages to compensate for process variations. Circuit level simulation results show that when only D2D variations are considered, the proposed global D-ABB reduces the frequency, dynamic power, and leakage power variations by factors of 5.5×, 6.4×, and 4.5×, respectively, and is capable of improving the initial total yield from 16.8% to 100%. However, when both D2D and WID variations are taken into account, the proposed global D-ABB results in frequency, dynamic power, and leakage power variations reduction by factors of $4.0\times$ , $3.7\times$ , and $1.9\times$ , respectively, and improves the initial total yield of 13% to 91.4%. When local D-ABB circuits are used, the frequency, dynamic power, and leakage power variations are reduced by factors of $5.5 \times$ , $6.4 \times$ , and $4.5 \times$ , respectively, when both D2D and WID variations are considered. These results show that the use of local D-ABB is essential in WID variations compensation. The main advantage of the proposed D-ABB is its simple circuit implementation and low area overhead compared to the previous state-of-the-art ABB techniques. Typically, the area overhead of the proposed D-ABB is less than that in [1] and [3] by factors of $2.5 \times$ and $4.1\times$ , respectively. Therefore, it can be used at a smaller granularity level. In addition, it minimizes effectively the process variations impact and improves the total yield. #### REFERENCES - [1] J. W. Tschanz, J. T. Kao, S. G. Narendra, R. Nair, D. A. Antoiadis, A. P. Chandrakasan, and V. De, "Adaptive body bias for reducing impacts od die-to-die and within-die parameter variations on microprocessor frequency and leakage," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1396–1402, Nov. 2002. - [2] ITRS, "The international technology roadmap for semiconductors," 2010. [Online]. Available: http://public.itrs.net - [3] M. Olivieri, G. Scotti, and A. Trifiletti, "A novel yield optimization technique for digital CMOS circuits design by means of process parameters run-time estimation and body bias active control," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 13, no. 5, pp. 630–638, May 2005. - [4] C. H. Kim, K. Roy, S. Hsu, R. Krishnamurthy, and S. Borkar, "A process variation compensating technique with an on-die leakage current sensor for nanometer scale dynamic circuits," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 14, no. 6, pp. 646–649, Jun. 2006. - [5] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, "Parameter variations and impact on circuits and microarchitecture," in *Proc. 40th Conf. Des. Autom. (DAC)*, 2003, pp. 338–342. - [6] H. Masuda, S. Ohkawa, A. Kurokawa, and M. Aoki, "Challenge: Variability characterization and modeling for 65-nm to 90-nm processes," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, 2005, pp. 593–599. - [7] K. Bowman, S. Duvall, and J. Meindl, "Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration," *IEEE J. Solid-State Circuits*, vol. 37, no. 2, pp. 183–190, Feb. 2002. - [8] S. H. Kulkarni, D. M. Sylvester, and D. Blaauw, "Design-time optimization of post-silicon tuned circuits using adaptive body bias," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 27, no. 3, pp. 481–494, Mar. 2008. - [9] J. Gregg and T. W. Chen, "Post silicon power/performance optimization in the presence of process variations using individual well-adaptive body biasing," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 3, pp. 366–376, Mar. 2007. - [10] A. Keshavarzi, S. Ma, S. Narendra, B. Bloechel, K. Mistry, T. Ghani, S. Borkar, and V. De, "Effectiveness of reverse body for leakage control in scaled dual Vt CMOS ICs," in *Proc. Int. Symp. Low Power Electron. Des.*, 2001, pp. 207–212. - [11] S. Narendra, M. Haycock, V. Govindarajulu, V. Erraguntla, H. Wilson, S. Vangal, A. Pangal, E. Seligman, R. Nair, A. Keshavarzi, B. Bloechel, G. Dermer, R. Mooney, N. Borkar, S. Borkar, and V. De, "1.1 V 1 GHz communications routers with on-chip body bias in 150 nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf.*, 2002, pp. 270–274. - [12] B. Choi and Y. Shin, "Lookup table-based adaptive body biasing of multiple macros," in *Proc. Int. Symp. Quality Electron. Des.*, 2007, pp. 533–538. - [13] X. He, S. Al-Kadry, and A. Abdollahi, "Adaptive leakage control on body biasing for reducing power consumption in CMOS VLSI circuits," in *Proc. Int. Symp. Quality Electron. Des.*, 2009, pp. 465–470. - [14] K. Kang, S. P. Park, K. Kim, and K. Roy, "On-chip variability sensor using phase-locked loop for detecting and correcting parametric timing failures," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 18, no. 2, pp. 270–280, Feb. 2010. - [15] T. Chen and S. Naffziger, "Comparison of adaptive body bias (ABB) and adaptive supply voltage (ASV) for improving delay and leakage under the presence of process variation," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 11, no. 10, pp. 888–899, Oct. 2003. - [16] M. Mani, A. K. Singh, and M. Orshansky, "Joint design-time and post-silicon minimization of parametric yield loss using adjustable robust optimization," in *Proc. Int. Conf. Comput.-Aided Des.*, 2006, pp. 19–26. - [17] S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, "Body bias voltage computation for process and temperature compensation," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 16, no. 3, pp. 249–262, Mar. 2008. - [18] J. Tschanz, S. Narendra, R. Nair, and V. De, "Effectiveness of adaptive supply voltage and body bias for reducing impact of parameter variations in low power and high performance microprocessors," in Symp. VLSI Circuits Dig. Techn. Papers, Jun. 2002, pp. 310–311. - [19] W. Liu, MOSFET Models for SPICE Simulation Including BSIM3v3 and BSIM4. New York: Wiley, 2001. - [20] T. Sakurai and A. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," *IEEE J. Solid-State Circuits*, vol. 25, no. 2, pp. 584–594, Apr. 1990. - [21] B. Razavi, Design of Analog CMOS Integrated Circuits. New York: McGraw-Hill, 2000. - [22] M. Pelgrom, A. Duinmaijer, and A. Welbers, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, Oct. 1989. - [23] R. Hidayat, K. Dejhan, P. Moungnoul, and Y. Miyanaga, "OTA-Based high frequency CMOS multiplier and squaring circuit," in *Proc. Int.* Symp. Intell. Signal Process. Commun. Syst., 2008, pp. 1–4. - [24] B. Boonchu and W. Surakampontorn, "A new NMOS four-quadrant analog multiplier," in *Proc. IEEE Int. Symp. Circuits Syst.*, 2005, pp. 1004–1007. - [25] H. Mostafa and A. M. Soliman, "Novel low-power accurate wideband CMOS negative-second-generation-current-conveyor realizations based on floating-current-source building blocks," in *Proc. IEEE Toronto Int. Conf.-Science Technol. for Humanity*, 2009, pp. 720–725. - [26] Q. Zhang, J. J. Liou, J. McMacken, K. Stiles, J. Thomson, and P. Layman, "An efficient and practical MOS statistical model for digital applications," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, 2000, pp. 433–436. - [27] T. S. Gotarredona and B. L. Barranco, "A new 5-parameter MOS transitors mismatch model," in *Proc. IEEE Int. Conf. Electron., Circuits, Syst. (ICECS)*, 1999, pp. 315–318. - [28] S. Lakshminarayanan, J. Joung, G. Narasimhan, R. Kapre, M. Slanina, J. Tung, M. Whately, C.-L. Hou, W.-J. Liao, S.-C. Lin, P.-G. Ma, C.-W. Fan, M.-C. Hsieh, F.-C. Liu, K.-L. Yeh, W.-C. Tseng, and S. W. Lu, "Standby power reduction and SRAM cell optimization for 65 nm technology," in *Proc. IEEE Int. Symp. Quality Electron. Des. (ISQED)*, 2009, pp. 471–475. - [29] A. Hokazono, S. Balasubramanian, K. Ishimaru, H. Ishiuchi, T.-J. K. Liu, and C. Hu, "MOSFET design for forward body biasing scheme," *IEEE Electron Device Lett.*, vol. 27, no. 5, pp. 387–389, May 2006. - [30] J. Yang, T. L. Naing, and B. Brodersen, "A 1-GS/s 6-bit 6.7-mW ADC in 65-nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, 2009, pp. 287–290. - [31] Actel, Mountain View, CA, "Designing clean analog PLL power supply in a mixed-signal environment," Actel Appl. Note AC204, 2004. - [32] J.-W. Yang, P.-T. Huang, and W. Hwang, "On-chip DC-DC converter with frequency detector for dynamic voltage scaling technology," in Proc. IEEE Asia Pacific Conf. Circuits Syst. (APCCAS), 2006, pp. 667–670. Hassan Mostafa (S'01) received the B.Sc. and M.Sc. degrees (with honors) in electronics from Cairo University, Cairo, Egypt, in 2001 and 2005, respectively. He is currently pursuing the Ph.D. degree in electrical and computer engineering from the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada. He was working on a project with Imec, Leuven, Belgium, in 2000. This project includes modeling and fabricating the ISFET transistor. He has authored/coauthored over 20 papers in international journals and conferences. His research interests include analog circuits design, mixed analog circuit design, low-power circuits, variation-tolerant design, soft error tolerant design, and statistical design methodologies. Mohab Anis (S'98–M'03) received the B.Sc. degree (with honors) in electronics and communication engineering from Cairo University, Cairo, Egypt, in 1997 and the M.A.Sc. and Ph.D. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, in 1999 and 2003, respectively. He is currently an Associate Professor and the Codirector of the VLSI Research Group, Department of Electrical and Computer Engineering, University of Waterloo. He has authored/coauthored over 90 papers in international journals and conferences and is the author of the following two books: Multi-Threshold CMOS Digital Circuits-Managing Leakage Power (Kluwer, 2003) and Low-Power Design of Nanometer FPGAs: Architecture and EDA (Morgan Kaufmann, 2009). His research interests include integrated circuit design and design automation for VLSI systems in the deep submicrometer regime. He is the Cofounder of Spry Design Automation. Prof. Anis is an Associate Editor of the *Journal of Circuits, Systems, and Computers, ASP Journal of Low Power Electronics*, and *VLSI Design.* He is an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART II: BRIEF PAPERS. He is also a member of the program committee for several IEEE conferences. He was a recipient of the 2004 Douglas R. Colton Medal for Research Excellence in recognition of excellence in research leading to new understanding and novel developments in Microsystems in Canada and the 2002 International Low-Power Design Contest. Mohamed Elmasry (S'69–M'73–SM'79–F'88) was born in Cairo, Egypt, on December 24, 1943. He received the B.Sc. degree from Cairo University, Cairo, Egypt, in 1965, and the M.A.Sc. and Ph.D. degrees from the University of Ottawa, Ottawa, ON, Canada, in 1970 and 1974, respectively, all in electrical engineering. He has worked in the area of digital integrated circuits and system design for the last 35 years. From 1965 to 1968, he was with Cairo University, and from 1972 to 1974, he was with Bell-Northern Research, Ottawa, ON, Canada. Since 1974, he has been with the Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada, where, from 1986 to 1991, he held the NSERC/BNR Research Chair in VLSI design, and where he is currently a Professor and founding Director of the VLSI Research Group. He has served as a Consultant to research laboratories in Canada, Japan, and the United States. He has authored or coauthored over 400 papers and 14 books on integrated circuit design and design automation. He is the holder of several patents. He is the founding President of Pico Electronics Inc., Waterloo, ON, Canada. Dr. Elmasry has served in many professional organizations in different positions and received many Canadian and international awards. He is a Founding Member of the Canadian Conference on VLSI, the Canadian Microelectronics Corporation (CMC), the International Conference on Microelectronics (ICM), MICRONET, and Canadian Institute for Teaching Overseas (CITO). He is a Fellow of the Royal Society of Canada and a Fellow of the Canadian Academy of Engineers.