Abstract—In this paper, a low-energy minimum-area CMOS standard cell library suitable for IoT applications is proposed. Energy consumption reduction is achieved by operating the library in Near-Threshold Voltage (NTV) region, and by designing layout of cells at the minimum possible area for the used technology process. Body biasing technique is proposed to boost pMOS performance. Operating voltage and transistor sizing are also selected to achieve the minimum energy consumption while operating at the frequency range of 1MHz to 20MHz which is suitable for IoT applications. The proposed library was designed and characterized in UMC 130 nm CMOS technology process. The library was modeled to be used in synthesis tools. To prove the benefit for IoT applications, the library was benchmarked by implementing 3 cryptographic algorithms: ASCON, AEGIS-128, and AEZ. Synthesis results are showing that the three cores can operate at 18 MHz, 14 MHz, and 16 MHz respectively, while consuming 0.466 pJ, 3.006 pJ, and 5.064 pJ.

Index Terms—CMOS digital integrated circuits, design methodology, near-threshold CMOS circuits, ultra-low power design, ultra-low energy, IoT, ASCON, AEGIS, AEZ

I. INTRODUCTION

The Internet of Things (IoT) is a novel paradigm that is rapidly gaining ground in the scenario of modern wireless communication. This is resulting in a wide range of applications touching every aspect of our life, like wearable devices, connected cars, and smart homes. Enablement of IoT spread is depending on the reliability of devices accessing it. Device reliability is based on: (1) computation power, (2) efficiency of energy consumption, and (3) security against possible threats.

Computation power of IoT devices has been widely explored in the literature [1]–[5], with a major interest in energy reduction techniques. These techniques have been explored and proposed at all design levels; namely circuit, logic, RTL, algorithm, and system levels. For IoT devices, most of energy saving is achieved by circuit-level solutions. One of the most growing topics in this area is the feasibility of voltage scaling to reduce energy consumption. Circuit design in subthreshold region has gained increasing interest to achieve low power requirements by operating the circuits at the Minimum Energy Point (MEP) [1]. The drawback of this MEP paradigm is the significant loss in performance [2]. To recover some of the performance loss while maintaining the power gains, Near-Threshold Voltage (NTV) operation was introduced [2]. It was shown that in NTV, energy savings can be in the order of 10X, with only a 10X degradation in performance, providing a much better energy/performance trade-off than subthreshold operation [3].

In addition to computation power and energy concerns in IoT applications, the possible threats deriving from widespread adoption of such a technology are stressed. As predicted by Cisco, there will be 50 billion IoT connected devices by 2020 [6]. Integration of such a tremendous number of devices into IoT potentially brings in a new concern, System Security. Thus, cryptography became one of the main approaches to secure and overcome attacks on user data. Significant research effort was done to provide algorithmic level cryptography solutions and to assess hardware implementation of these algorithms. In [7], ASIC implementation of 3 commonly used cryptography algorithms is conducted to check their suitability for IoT applications. Similarly, architectural solutions are proposed in [8] and used to implement co-processor suitable for Narrow-Band (NB) IoT devices. Also, in this direction and in 2012, the European Network of Excellence in Cryptology (ECRYPT) called for a new competition for authenticated ciphers: CAESAR (Competition for Authenticated Encryption: Security, Applicability, and Robustness) [9]. Review of hardware implementation of CAESAR algorithms are presented in [10], [11]. The winning algorithms were announced in 2019, and they were covering 3 main areas of application: lightweight, high performance, and defense-in-depth. In this paper, 3 of these applications were explored for potential benefit in IoT devices.

In this paper, a low-energy minimum-area standard cell library is proposed in UMC 130 nm process. Library is operating in NTV region at 350 mV supply. This supply was selected to achieve the minimum Power Delay Product (PDP) for the used process node. The minimum area is achieved by minimizing cell height and by utilizing Euler’s Path in design of each cell layout. A calculation method is proposed to get minimum cell height as a function of technology parameters. The Inverse Narrow Width Effect (INWE) was checked and utilized for library PPA gains. The proposed library is showing better energy and area measurements compared to other libraries surveyed. To show the gains from utilizing NTV operation, the library is benchmarked against a commercial library operating in Super-Threshold Voltage region to implement three of CAESAR finalists: ASCON [12], AEGIS-128 [13], and AEZ [14].
The rest of the paper is organized as follows. Section II describes the library architecture design and discusses the proposed solutions for improving library PPA. Then, comparison result with similar library is presented. Section III describes the flow used for designing the set of cells constructing the library. Section IV reports the benchmark results of the proposed library using cryptography IPs. Lastly, the paper is concluded in Section V.

II. LIBRARY ARCHITECTURE DESIGN

In digital design using standard-cell approach, all modules must have the same height [15], [16]. The placement of standard cells has to be aligned with some pre-specified standard-cell rows in the placement region. And because of its popularity, most placement algorithms assume a standard-cell design style. The minimum cell dimensions or in other words the minimum resolution of cell placement in 2D area is defined as Unit Tile. All the cells in standard-cell library have to be of –or multiples of– unit tile height and to have a width that is a multiple of unit tile width.

A. Minimum Cell Height Design

Here, we are proposing a method to define the minimum cell height for a given process node. All the drawings inside cell must meet the Design Rule Checks (DRC). And given that the shapes drawn inside any cell are from Front-End Of Line (FEOL) layers or Back-End Of Line (BEOL) layers, rules related to both layer groups must be considered.

The process parameters needed for FEOL checks are: minimum diffusion enclosure inside NPLUS (EN<sub>n</sub>), minimum diffusion enclosure inside PPLUS (EN<sub>p</sub>), minimum diffusion width inside NPLUS (W<sub>n</sub>), and minimum diffusion width inside PPLUS (W<sub>p</sub>). From Fig. 1, the minimum cell height (H<sub>min,FEOL</sub>) is calculated as:

\[ H_{\text{min,FEOL}} = 2EN_p + 2EN_p + W_n + W_p \]  

Then, to calculate H<sub>min</sub> from BEOL rules, we need to construct the layout given in Fig. 2. The limitation here is coming from the first metal layer (metal 1). (1) Any metal 1 shape must keep a spacing distance of S<sub>m1</sub> to other metal 1 shapes, (2) any via connecting metal 1 to poly shape or diffusion shape (CONT) must be enclosed by metal 1 shape, and finally (3) PG rails needs to be drawn on metal 1 to provide supply voltage to transistors. The minimum PG rail width is the minimum metal 1 width (W<sub>m1</sub>). So:

\[ H_{\text{min,BEOL}} = 4S_{m1} + 3CE_{m1} + W_{m1} \]  

In order to avoid any routing problem, either internally to connect transistors, or externally when the cell gets connected to other cells, cell height must be integer multiples of the pitch of the first horizontal metal layer -metal 2 in our proposed library- (P<sub>m2</sub>).

\[ H_{\text{min}} = nP_{m2} \]  

And from Equations (1), (2), (3), we can define:

\[ n = \max(H_{\text{min,FEOL}}, H_{\text{min,BEOL}})/P_{m2} \]  

where n is the minimum cell architecture tracks for a given technology. This equation is technology-independent, and can be applied to planar CMOS process nodes.

B. Cell Layout Design

Implementation of 5T architecture is enabled by: using Euler’s path theory for layout design, and using 3 metal layers for intra-cell routing.

Euler’s path theory [22]–[24] is used in [25] to provide the minimum-area transistor placement inside given cell area. The resulting transistor placement can be routed in multiple ways. The selected routing of each cell used in our library considers: (1) pin accessibility when used in Placement and Routing tools, (2) minimizing pin capacitance, and (3) meeting technology DRC rules. Three metal layers are used for intra-cell routing. Metal 1 and metal 2 are used extensively, while metal 3 is rarely used when all metal 1 and metal 2 resources are used.
C. Transistor Sizing

Transistor sizing is one of the main factors deciding the library power and speed performance. The used 5T architecture sets a limit on the maximum transistor sizing:

$$W_n + W_p \leq 5P_{m2} - 2EN_p - 2EN_n$$  \hspace{1cm} (5)

For the used process, this limit is calculated to be 1.04 µm. This value is allowing transistor sizing in the range around and beyond minimum pMOS and nMOS widths.

Literature was explored for the different transistor sizing techniques in NTV region in order to improve performance with the minimum impact on power. The Inverse-Narrow-Width Effect (INWE) was introduced in [26] as the reduction in the threshold voltage due to reduction in transistor width, allowing for higher driving currents. INWE can be utilized to achieve performance gain while operating in the NTV. [18] has used the minimum pMOS and nMOS finger widths to maximize INWE while having the minimum threshold voltage. [18] has shown the INWE in 90 nm, 65 nm, and 45 nm process nodes. In Fig. 3 and Fig. 4, the INWE impact in 130 nm process node is shown, which aligns with the previously introduced results for NMOS. For PMOS, the INWE is not significant. As shown in Fig. 4, a slight reduction in threshold voltage near the minimum width is noticed while reducing supply voltage. So, sizing PMOS at minimum width is not as important as NMOS. This was also shown for 180 nm in [27]. Also, from Fig. 4, $W_p$ needs to be $>350$ nm to avoid the threshold voltage curve peak, but increasing it too much will not improve performance due to increase in fanout load.

D. pMOS Body Biasing Technique

In order to balance the NMOS and PMOS currents and to reduce the difference between rise and fall times without increasing PMOS size, body biasing technique is used. The conventional body-biasing used in digital circuits is done by connecting n-well to the supply voltage and connecting p-substrate to the ground. This technique is cancelling the body effect, hence, the dependence of threshold voltage on the body voltage is not noticed. This technique is used in superthreshold region, and can be used also for NTV region [28]. In [29] the low-voltage swapped body biasing (LVSB) was used in 180 nm at 0.5 V, where the n-well and p-substrate voltages are swapped compared to the conventional biasing. The technique used in [28] is compromising between the con-
ventional technique and the LVSB one. In this technique, the body terminals of all nMOS and pMOS devices are connected together without supply or ground connections. The proposed technique in this paper is to connect only pMOS body to the ground. With this way, the forward body biasing provides significant performance gain for the pull-up network, while keeping the pull-down one at the same power consumption.

E. Testing and Supply Selection

Inverter INV_X1 was created with $W_n$ set to 160 nm, the minimum width available, and with pMOS body terminal connected to ground. A Fanout-of-4 (FO4) test-bench for the inverter cell was created to simulate the impact of transistor sizing and body biasing on delay, power, and PDP. Fig. 5 and Fig. 6 are showing that, the proposed body biasing is providing better performance with slight increase in power in the subthreshold and nearthreshold regions when compared to conventional biasing. This is not the case as the supply voltage increases, where power starts to grow faster. Supply voltage is selected to be 350 mV, where the performance gain is achieved at the cost of power increase while achieving the same PDP as of the conventional body biasing. This supply voltage is selected at the center of flat range in PDP curve to allow for PVT margins while having the same value.

F. Literature Comparison

In this section, comparison with other work is provided. In Table I, comparison is done with a 6T NVT library designed in 130 nm operating at 400 mV. Comparison is held for the nominal corner: TT, nominal supply, and room temperature of 25°C. Although our proposed library is lagging in delay by about 3X, it is achieving better PDP by about 2000X.

III. CELL DESIGN FLOW

The list of cells included in proposed library are covering the basic functions to build any design; namely combinational cells (e.g. INV, AND, OR, NAND, NOR, XOR, and XNOR) and sequential cells (e.g. D-Flipflop and D-Latch). The flow used is shown in Fig. 7. For each cell, static CMOS implementation is selected, which is needed to provide Rail-to-Rail swing with the low supply used. Then, a test-bench is created to check circuit functionality and to measure its timing and power parameters. Once passed, design is transformed into its layout implementation. Qualified layout needs to pass essential physical verification checks, namely DRC and LVS. Then, design functionality and specifications are rechecked to avoid any corruption due to layout design. SPICE model describing our cell is then generated. This model is passed to cell characterization and modeling flow, which creates the library model needed for synthesis tools. The generated model includes all cell delay information, in addition to cell power performance. This view is essential input to logic synthesis process. Proposed library model was generated using Synopsys Siliconsmart tool. The tool takes cell SPICE model as input, and based on user configuration, it generates the cell model in .LIB format.

IV. LIBRARY BENCHMARKING

In this section, comparison results are provided to show the power and performance gains of proposed library. Table II is showing the ratio of frequency lost by moving from superthreshold to subthreshold region. In Table III, power and frequency were explored considering process, voltage,
and temperature (PVT) variations. A 10% supply variation is considered. The worst performance corner found to be SSOp315vm40c with about 16X reduction from nominal one. The worst power consumption corner is found to be FF0p385v125c with about 10X increase compared to nominal one.

Finally, in Table IV, implementation of cryptographic cores is done. In the final portfolio of the CAESAR competition [30], ASCON was selected as the primary choice for lightweight authenticated encryption, and AEGIS-128 was selected as the primary choice for high performance authenticated encryption.

AEZ is another algorithm that was one of the finalists in the same competition. Assessment of the library was done by comparing synthesis results of the three designs using the proposed Near-Threshold library compared to using a commercial library operating in Super-Threshold region at the same technology node. The used RTL implementations are the ones used for CAESAR candidates benchmarking [31].

### TABLE II
**Maximum Frequency Comparison with Foundry Commercial Library**

<table>
<thead>
<tr>
<th>Tested Core</th>
<th>Max Frequency, MHz</th>
<th>Ratio of Frequency Loss</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASCON</td>
<td>18</td>
<td>330</td>
</tr>
<tr>
<td>AEGIS-128</td>
<td>14</td>
<td>100</td>
</tr>
<tr>
<td>AEZ</td>
<td>16</td>
<td>125</td>
</tr>
</tbody>
</table>

### TABLE III
**Library Power and Frequency Across PVT Corners**

<table>
<thead>
<tr>
<th>Corner</th>
<th>Internal (nW)</th>
<th>Switching (nW)</th>
<th>Leakage (nW)</th>
<th>Total (nW)</th>
<th>Max Freq</th>
</tr>
</thead>
<tbody>
<tr>
<td>TT0p35v25c</td>
<td>62.547</td>
<td>2.1999</td>
<td>6.1509</td>
<td>70.896</td>
<td>16</td>
</tr>
<tr>
<td>SSOp315vm40c</td>
<td>2.07</td>
<td>0.079</td>
<td>61.5</td>
<td>2.22</td>
<td>1</td>
</tr>
<tr>
<td>SSOp315v125c</td>
<td>4.13</td>
<td>0.165</td>
<td>19.0</td>
<td>23.7</td>
<td>5</td>
</tr>
<tr>
<td>FF0p385vm40c</td>
<td>295</td>
<td>9.86</td>
<td>2.29</td>
<td>307.0</td>
<td>52</td>
</tr>
<tr>
<td>FF0p385v125c</td>
<td>383.1</td>
<td>12.0</td>
<td>808</td>
<td>1203</td>
<td>67</td>
</tr>
</tbody>
</table>

### V. Conclusion

The proposed Near-Threshold standard cell library is showing significant energy saving when used in essential applications in IoT. This energy saving comes at the cost of frequency reduction. The paper has provided solutions to find optimal Power-Performance-Area operating point. A technology-dependent methodology is proposed for minimum standard cell layout architecture design. INWE was utilized in addition to a proposed body biasing technique that boosts performance in NVT. Three of the latest and best cryptographic cores are considered in this paper for benchmarking the proposed library. Quality of improvement can be measured as the ratio of energy improvement and frequency reduction. ASCON archives a ratio of 1.7, while AEGIS-128 achieves a ratio of 2.5, and finally AEZ achieves a ratio of 4.1. Still, the achieved frequencies are sufficient for IoT applications.

### VI. Acknowledgment

This work was partially funded by ONE Lab at Zewail City of Science and Technology and at Cairo University, NTRA, ITIDA, ASRT, and NSERC.

### References


Table IV: Power and Energy Comparison at Typical Corner Between Proposed Library and Foundry Library

<table>
<thead>
<tr>
<th>Parameter</th>
<th>ASCON @ 18 MHz</th>
<th>AEGIS-128 @ 14 MHz</th>
<th>AEZ @ 16 MHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Switching Power (micros)</td>
<td>2.1788</td>
<td>14.74</td>
<td>2.1999</td>
</tr>
<tr>
<td>Leakage Power (micros)</td>
<td>0.7979</td>
<td>7.3859</td>
<td>0.44421</td>
</tr>
<tr>
<td>Total Power (micros)</td>
<td>8.3804</td>
<td>24.09</td>
<td>70.1998</td>
</tr>
<tr>
<td>Internal Energy (micros)</td>
<td>0.465578</td>
<td>5.0640</td>
<td>53.64375</td>
</tr>
<tr>
<td>Ratio of Energy Gain</td>
<td>32.65594</td>
<td>17.34519</td>
<td>162.8643</td>
</tr>
</tbody>
</table>


