Thesis submitted to get the degree of bachelor of engineering

## High Speed Serial Data Link Transceiver

## Team members

Abdelrahman M. Sawaby
Abdelrahman Mohamed Elshorbge
Omar Tarek Abdelhalim
Mahmoud Ahmed Farghly
Mahmoud Sherif Taha

## Under the Supervision of

Prof. Mohamed Refky Amin,
Assistant Professor, Electronics and Electrical Communications
Engineering Department, Faculty of Engineering.

Prof. Hassan Mostafa,
Assistant Professor of Nanoelectronics, Bioelectronics, and Optoelectronics (Founder of the ONE Lab [Opto-Nano-Electronics]).

## Sponsored by:

## \{liCpedia



Aug. 2020

# Thesis submitted to get the degree of bachelor of engineering 

## High Speed Serial Data Link Transceiver

## Team members

Abdelrahman M. Sawaby
Abdelrahman Mohamed Elshorbge
Omar Tarek Abdelhalim
Mahmoud Ahmed Farghly
Mahmoud Sherif Taha

## Under the Supervision of

Prof. Mohamed Refky Amin,
Assistant Professor, Electronics and Electrical Communications
Engineering Department, Faculty of Engineering.

Prof. Hassan Mostafa,
Assistant Professor of Nanoelectronics, Bioelectronics, and Optoelectronics (Founder of the ONE Lab [Opto-Nano-Electronics]).

## Sponsored by:

## \{1, 1 Cpedia



Aug. 2020

## High Speed Serial Data Link Transceiver

## Contents

Nomenclature ..... 9

1. Introduction ..... 11
1.1. Motivation ..... 11
1.2. System overview ..... 12
1.3. Problem Statement ..... 13
1.4. Thesis Outline ..... 13
2. Survey ..... 15
2.1. Transmitter Survey ..... 15
2.1.1. Finite Impulse Response ..... 15
2.2. Receiver Analog Front End Survey ..... 17
2.2.1. Bandgap Reference ..... 17
2.2.2. Low Drop-out Voltage Regulator ..... 21
2.2.3. Continuous Time Linear Equalizer ..... 35
2.2.4. Variable Gain Amplifier ..... 39
2.3. Termination Calibration Circuit ..... 41
2.3.1. SR Latch ..... 43
2.3.2. Comparator ..... 45
2.4. Decision Feedback Equalizer ..... 49
2.4.1. Functionality ..... 49
2.4.2. DFE Architectures ..... 49
2.4.3. DFE Blocks ..... 52
3. Verilog-A Transceiver Model ..... 55
3.1. Transmitter Blocks ..... 55
3.1.1. Serializer and De-serializer ..... 55
3.1.2. Finite Impulse Response and Driver ..... 57
3.2. Receiver Blocks ..... 60
3.2.1. Variable Gain Amplifier ..... 60
3.2.2. Continuous Time Linear Equalizer ..... 61
3.2.3. Decision Feedback Equalizer ..... 64
3.3. Verilog-A Model Integration Results ..... 67
4. The Design of the Receiver ..... 71
4.1. BGR ..... 71
4.1.1. Design Methodology ..... 72
4.1.2. Simulation Results ..... 75
4.1.3. Results Summary ..... 80
4.2. LDO ..... 81
4.2.1. Design Methodology ..... 81
4.2.2. Simulations Results ..... 83
4.2.3. Monte Carlo Simulation ..... 105
4.2.4. Performance Summary ..... 106
4.3. CTLE ..... 109
4.3.1. Design Methodology and Procedure ..... 109
4.3.2. Analysis of the Main Architecture of CTLE ..... 109
4.3.3. Design Procedure and Parameters ..... 111
4.3.4. Implementation of Variable $R_{S}$ and $C_{S}$ for Equalization Adapt- ing ..... 113
4.3.5. Full Schematic of CTLE with Offset Cancellation Circuit ..... 115
4.3.6. Simulation Results ..... 115
4.4. VGA ..... 119
4.4.1. Proposed topology ..... 119
4.4.2. Offset Cancellation ..... 119
4.4.3. Common Mode ..... 120
4.4.4. Fixed-Gain Amplifier and Buffer ..... 120
4.4.5. Simulation \& Results ..... 121
4.5. Termination Calibration Circuit ..... 125
4.5.1. Latched Comparator ..... 125
4.5.2. SR latch ..... 128
4.5.3. Resistors with PMOS Slices ..... 128
4.5.4. Current Mirror Circuit ..... 129
4.5.5. Binary Search Counter ..... 130
4.5.6. Simulation Results ..... 130
4.6. Decision Feedback Equalizer ..... 135
4.6.1. Slicer ..... 136
4.6.2. Flip Flop ..... 144
4.6.3. Gm Cell ..... 161
4.6.4. Taps ..... 162
4.6.5. Integration Results of The DFE Blocks ..... 163
4.7. Clock and Data Recovery ..... 164
4.7.1. Voltage Control Oscillators ..... 165
4.7.2. Bang-Bang Phase Detector ..... 169
4.7.3. Charge Pump ..... 171
4.7.4. Simulation Results ..... 172
5. System Integration Results ..... 177
Acknowledgments ..... 183
A. Verilog-A Codes ..... 185
A.1. MUX Code ..... 185
A.2. FIR and Driver ..... 186
A.2.1. Flip-Flop Code ..... 186
A.2.2. Driver ..... 187
A.2.3. Matlab Code for FIR Coefficients ..... 188
A.3. VGA Code ..... 189
A.4. CTLE Code ..... 190
A.5. DFE Codes ..... 191
A.5.1. GM-Cell Code ..... 191
A.5.2. Slicers Code ..... 191
A.5.3. Taps Code ..... 192
A.5.4. FLIP FLOP Code ..... 193
Bibliography ..... 195

## List of Figures

1.1. SerDes Transceiver High-Level Block Diagram. ..... 12
2.1. Channel Response and Cursors Definition. ..... 16
2.2. FIR Block Diagram. ..... 16
2.3. The Conventional BGR. ..... 18
2.4. The Schematic of The Modified Circuit. ..... 18
2.5. The Schematic of The BGR. ..... 19
2.6. The Schematic of The BGR. ..... 19
2.7. The Schematic of The Proposed Design. ..... 20
2.8. (a) Conventional LDO regulator and (b) Capless-LDO regulator. ..... 21
2.9. Capless LDO Regulator Model for Stability Analysis. ..... 23
2.10. Capless-LDO regulator. ..... 24
2.11. One-Stage EA LDO Bode Plot. ..... 25
2.12. Two-Stage EA LDO Block. ..... 26
2.13. Two-Stage EA LDO Bode Plot. ..... 26
2.14. Input-to-output ripple paths in Capless-LDO. ..... 27
2.15. PSRR of CL-LDO. ..... 28
2.16. LDO Model for Noise. ..... 29
2.17. LDO major noise contributors. ..... 29
2.18. Feedback Resistance Network. ..... 29
2.19. Capless LDO with damping factor. ..... 31
2.20. Capless LDO with transimpedance. ..... 32
2.21. Capless LDO with adaptively biased. ..... 32
2.22. CL LDO Voltage subtractor. ..... 33
2.23. Block-Level Representation of The Feed-Forward Ripple Cancellation LDO. ..... 34
2.24. Equivalent Frequency Response ..... 36
2.25. CTLE with adaptive degeneration. ..... 37
2.26. $R_{s}$ and $C_{s}$ Adaptation. ..... 38
2.27. CTLE with adaptive gain filters. ..... 38
2.28. Schematic of VGA With Analog Control Signal. ..... 40
2.29. Schematic of VGA With Digital Control Signal. ..... 41
2.30. I/O termination resistor calibration circuit. ..... 43
2.31. On-Chip Resistance With Slices. ..... 43
2.32. SAFF. ..... 44
2.33. Improved SR latch. ..... 45
2.34. Two stages OTA. ..... 46
2.35. Strong Arm Latch. ..... 47
2.36. Double tail latch comparator ..... 48
2.37. Direct Full Rate DFE. ..... 49
2.38. Unrolled Full Rate DFE. ..... 50
2.39. Direct Half Full rate DFE. ..... 51
2.40. Multiplexed Half rate DFE. ..... 51
2.41. Loop Unrolled Half Rate DFE. ..... 52
3.1. Schematic of The Test Bench. ..... 56
3.2. The simulation result of the Serializer Verilog-A Model, the input is represented with the red line and the output is the red line. ..... 56
3.3. Channel Response of a 30 inch FR4. ..... 57
3.4. Circuit Model of The FIR. ..... 58
3.5. CML Driver Architecture. ..... 59
3.6. FIR Test-Bench. ..... 59
3.7. Pulse Response of The FIR Block. ..... 60
3.8. CTLE With Inductive Peaking. ..... 62
3.9. CTLE Test-Bench. ..... 63
3.10. Frequency Response of The CTLE. ..... 63
3.11. Schematic of DFE. ..... 65
3.12. Symbol of The DFE. ..... 65
3.13. Waveform of Bit "One". ..... 66
3.14. Waveform of Input Data "101011". ..... 66
3.15. Waveform of Output Data and Clk. ..... 67
3.16. Tx/Rx Verilog-A model Block Diagram. ..... 67
3.17. Eye Diagram before The Channel. ..... 68
3.18. Eye Diagram after The Channel. ..... 68
3.19. Eye Diagram after The CTLE ..... 69
3.20. Eye Diagram after The VGA. ..... 69
4.1. Schematic of the BGR ..... 71
4.2. Flow Chart of The Design Steps for The BGR. ..... 74
4.3. $V_{\text {ref }}$ Versus Temperature. ..... 75
4.4. $I_{r e f}$ Versus Temperature. ..... 76
4.5. $V_{r e f}$ Versus Variations in Supply Voltage. ..... 76
4.6. $V_{r e f}$ Versus Temperature Across Corners. ..... 77
4.7. PSRR Versus Frequency. ..... 77
4.8. PSRR Versus Temperature @ 1 KHz ..... 78
4.9. Transient Response of The BGR. ..... 78
4.10. Monte Carlo Simulation of The BGR at Nominal Corner. ..... 79
4.11. Monte Carlo Simulation of The BGR at Nominal Corner. ..... 79
4.12. Monte Carlo Simulation of The BGR at FF and SS Corner ..... 80
4.13. Bleeding Circuit. ..... 83
4.14. Schematic of The LDO. ..... 83
4.15. Quiescent Current of The LDO ..... 85
4.16. Power Efficiency of The LDO. ..... 85
4.17. Temperature Sweep of Quiescent Current of The LDO. ..... 85
4.18. Range of Quiescent Current of The LDO. ..... 86
4.19. Loop Gain and Phase of The LDO at Max Load Current ..... 87
4.20. Loop Gain and Phase of The LDO at Min Load Current ..... 88
4.21. Temperature Sweep of PSSR of The LDO at Max Load Current ..... 89
4.22. Range of PSSR of The LDO at Max load current. ..... 89
4.23. Temperature Sweep of PSSR of The LDO at Min Load Current ..... 90
4.24. Range of PSSR of The LDO at Min load current. ..... 90
4.25. Range of Output noise of The LDO ..... 91
4.26. Transient load current of AFE LDO. ..... 91
4.27. Nominal Load Transient Response of AFE LDO ..... 92
4.28. Worst Performance of Load Transient Response of AFE LDO at Load Capacitance $=5 \mathrm{pF}$ ..... 93
4.29. Worst Performance of Load Transient Response of AFE LDO at Load Capacitance $=100 \mathrm{pF}$ ..... 93
4.30. Worst Performance of Load Transient Response of AFE LDO at Load Capacitance $=200 \mathrm{pF}$ ..... 94
4.31. Transient load current of Digital LDO ..... 94
4.32. Nominal Load Transient Response of Digital LDO. ..... 95
4.33. Worst Performance of Load Transient Response of Dig LDO at Load Capacitance $=1 \mathrm{pF}$. ..... 96
4.34. Worst Performance of Load Transient Response of Dig LDO at Load Capacitance $=150 \mathrm{pF}$ ..... 96
4.35. Output Voltage of The LDO VS. Load Current. ..... 97
4.36. Temperature Sweep of Output Voltage of The LDO. ..... 97
4.37. Temperature Sweep of Load Regulation of The LDO ..... 98
4.38. Transient Input Supply Voltage of AFE LDO ..... 98
4.39. Nominal Line Transient Response of AFE LDO ..... 99
4.40. Worst Performance of Line Transient Response of AFE LDO at Max Load Capacitance. ..... 100
4.41. Transient Input Supply Voltage of Digital LDO ..... 100
4.42. Nominal Line Transient Response of Digital LDO. ..... 101
4.43. Worst Performance of Line Transient Response of Digital LDO at Max Load Capacitance. ..... 102
4.44. Output Voltage of The LDO VS. Supply Voltage. ..... 103
4.45. Temperature Sweep of Output Voltage of The LDO ..... 103
4.46. Temperature Sweep of Line Regulation of The LDO ..... 104
4.47. Monte Carlo for Overshoot of AFE ..... 105
4.48. Monte Carlo for Undershoot of AFE ..... 105
4.49. Monte Carlo of Steady State Error of AFE LDO. ..... 106
4.50. Monte Carlo of Quiescent Current of AFE LDO ..... 107
4.51. Schematic of The Conventional CTLE. ..... 110
4.52. CTLE With Shunt Inductor. ..... 111
4.53. Loading Network of The CTLE ..... 112
4.54. Slices of The $R_{S}$ Network. ..... 113
4.55. Slices of The $C_{S}$ Network. ..... 114
4.56. Slices of $R_{D}$ Network ..... 114
4.57. Current Steering DAC ..... 115
4.58. Full CTLE Schematic with Offset Cancellation Circuit. ..... 116
4.59. CTLE Frequency Response with $R_{S}$. ..... 116
4.60. CTLE Frequency Response with $C_{S}$. ..... 117
4.61. Simulation Results of The CTLE. ..... 117
4.62. Monte Carlo Simulation of The CTLE. ..... 118
4.63. The Schematic of The Proposed VGA. ..... 119
4.64. Schematic of Offset Cancellation Circuit. ..... 120
4.65. Schematics of Fixed-Gain Amplifier and Buffer. ..... 121
4.66. VGA Gain With Changing $R_{S}$. ..... 122
4.67. VGA Gain With Changing VDD $\pm 5 \%$ ..... 123
4.68. VGA Gain With Changing $I_{r e f} \pm 15 \%$. ..... 123
4.69. VGA Gain With Process Corners. ..... 124
4.70. VGA Gain With Varying Temperature From $-40^{\circ} \mathrm{C}$ to $100^{\circ} \mathrm{C}$. ..... 124
4.71. Termination Calibration Circuit ..... 125
4.72. Double Tail Latched Comparator. ..... 126
4.73. Transient Behavior of The Proposed Comparator. ..... 127
4.74. SR latch circuit. ..... 128
4.75. Resistors with Slices Circuit. ..... 129
4.76. Current Mirror circuit. ..... 129
4.77. Comparator Transient Response. ..... 130
4.78. Delay versus Common mode Voltage $\left(V_{d i f f}=5 \mathrm{mV}\right)$. ..... 131
4.79. Delay versus Differential Voltage $\left(V_{C M}=600 m V \& 1 V\right)$. ..... 131
4.80. Avg. Power Consumption versus Common mode Voltage ( $V_{d i f f}=$ 5 mV ). ..... 132
4.81. Avg. Power Consumption versus Differential Voltage ( $\left.V_{C M}=600 \mathrm{mV} \& 1 \mathrm{~V}\right)$ ..... 132
4.82. Delay versus Common mode Voltage $\left(V_{d i f f}=5 \mathrm{mV}\right.$ ) with Process Corners. ..... 132
4.83. Avg. Power Consumption versus Common Mode Voltage ( $V_{d i f f}=$ 5 mV ). ..... 133
4.84. Monte Carlo Offset Histogram. ..... 133
4.85. Calibrated Resistance Response (TT, Temp. $=27^{\circ} \mathrm{C}$ ). ..... 134
4.86. Calibrated Resistance Response at Slow Corner (SS, Temp. $=125^{\circ} \mathrm{C}$ ). 134
4.87. Calibrated Resistance Response at Fast Corner (FF, Temp. $=-40^{\circ} \mathrm{C}$ ). 135 ..... 135
4.88. Full Schematic of The DFE Block. ..... 136
4.89. SR Latch Implement AND-OR Structure. ..... 137
4.90. A Conventional Dynamic Latched Comparator. ..... 139
4.91. The timing diagram shows the states and charge induced on the gate of M1, M2 ..... 139
4.92. Clocked NMOS Capacitors. ..... 140
4.93. Comparator Transient Response. ..... 141
4.94. Delay Versus Common Mode Voltage ( $V_{\text {diff }}=100 \mathrm{mV}$ ). ..... 141
4.95. Average Power Consumption Versus Common Mode Voltage ( $V_{d i f f}=100 \mathrm{mV}$ ..... 142
4.96. Delay Versus Differential Voltage ( $V_{C M}=700 \mathrm{mV}$ ). ..... 142
4.97. Average Power Consumption Versus Differential Voltage ( $V_{C M}=700 \mathrm{mV}$ ). 1434.98. Delay Versus Load Capacitance.143
4.99. The Delay and Average Power Versus Differential Voltage with Pro- cess Corners ..... 144
4.100Conventional CML Latch. ..... 145
4.101Novel CML Latch. ..... 146
4.102Modified Novel CML Latch. ..... 147
4.103CML Circuit Main Architecture ..... 147
4.104Main Structure of SAFF. ..... 150
4.105Conventional SAFF ..... 150
4.106Modified SR Latch ..... 152
4.107The Minimum Propagation Delay of the FF. ..... 155
4.108. The Propagation Delay and The Setup Time of the FF ..... 156
4.109. The Process Corners of the FF ..... 157
4.110. Temperature Variations for the FF ..... 158
4.111. $t_{C Q}$ with Loading Capacitance for the FF ..... 160
4.112.Schematic of The Proposed Gm-Cell. ..... 161
4.113.Equivalent Transconductance (gm) of The GM-Cell Versus Frequency. ..... 161
4.114. Schematic of a Single DFE Tap. ..... 162
4.115.Timing Diagram for Even Data. ..... 163
4.116.Timing Diagram for Odd Data ..... 164
4.117Block Diagram of The CDR. ..... 164
4.118. Schematic of The Delay Unit. ..... 165
4.119.The Schematic of The Controlling Circuit of The VCO. ..... 166
4.120. The Transient Response of The VCO. ..... 167
4.121. Supply Current of The VCO Versus Time. ..... 168
4.122. Frequency of The VCO Versus $V_{c t r l}$. ..... 168
4.123.Phase Noise Versus Frequency Offset. ..... 169
4.124.Bang-bang Phase Detector Circuit ..... 169
4.125.Dual Edge Triggered Flip Flop circuit. ..... 170
4.126. XOR gate circuit. ..... 170
4.127. Charge Pump circuit. ..... 171
4.128. DETFF transient response ..... 172
4.129.BBPD Outputs While Early Clock ..... 172
4.130.BBPD Outputs While Late Clock. ..... 173
4.131.BBPD Outputs While Locking ..... 173
4.132.BBPD Outputs Versus Clock Delay with Data. ..... 174
4.133. $V_{\text {ctrl }}$ Transient Response ( 5 GHz Clock). ..... 174
4.134. $V_{c t r l}$ Transient Response ( 10 Gbps ). ..... 175
4.135. Data and Clock Transient Response ( 5 GHz Clock). ..... 175
4.136. Data and Clock Transient Response ( 10 Gbps ). ..... 176
4.137. Data and Clock Eye Diagram ( 5 GHz Clock) ..... 176
4.138. Data and Clock Eye Diagram ( 10 Gbps ). ..... 176
5.1. LDO Output Current and Output Voltage across Corners. ..... 177
5.2. BGR Current across Corners. ..... 178
5.3. The Frequency Response of The CTLE and The VGA across Corners. ..... 178
5.4. Eye Diagram after Channel with FIR Equalization. ..... 179
5.5. The Frequency Response of The CTLE and The VGA across Corners. ..... 180
5.6. Eye Diagram for the Data on the Even Summing Node. ..... 181
5.7. Eye Diagram for the Data on the Odd Summing Node. ..... 181
5.8. Timing Diagram for The Data. ..... 182

## List of Tables

2.1. Comparison of The FIR. ..... 17
2.2. Comparison Between Different Topologies. ..... 20
2.3. Comparison Between LDO and HDO. ..... 22
2.4. Comparison Between Different Topologies. ..... 35
2.5. Comparison of The CTLE. ..... 39
2.6. Comparison Between The Different Topologies of The VGA. ..... 41
2.7. Comparison between Different Topologies at $V_{C M}=0.6 \mathrm{~V}, \Delta V_{i n}=1$ mV . ..... 48
3.1. VGA Model Gain VS. Input Code. ..... 61
3.2. Design Parameters of The CTLE. ..... 63
4.1. Comparison Between Different Topologies. ..... 80
4.2. Stability Performance Summary of LDO at Max Current ..... 86
4.3. Load Transient Response Nominal Performance Summary of AFE LDO ..... 92
4.4. Worst Load Transient Response Performance Summary of AFE LDO. ..... 92
4.5. Load Transient Response Nominal Performance Summary of Digital LDO ..... 95
4.6. Worst Load Transient Response Performance Summary of Digital LDO ..... 95
4.7. Line Transient Response Nominal Performance Summary of AFE LDO ..... 99
4.8. Worst Line Transient Response Performance Summary of AFE LDO. ..... 99
4.9. Line Transient Response Nominal Performance Summary of Digital LDO. ..... 101
4.10. Worst Line Transient Response Performance Summary of Digital LDO. ..... 101
4.11. performance summary of AFE LDO. ..... 106
4.12. performance summary of Digital LDO. ..... 107
4.13. Comparison Between this Work and Different Topologies. ..... 108
4.14. Channel Loss Equalized with Digital Control Signals. ..... 117
4.15. The Required Specs of The VGA Block. ..... 120
4.16. VGA Gain and Corresponding Control Signals. ..... 122
4.17. Performance Summary and Comparison With This Work. ..... 122
4.18. Comparison between Different Topologies. ..... 133
4.19. Calibrated Resistances Values across Corners. ..... 135
4.20. Comparison Between The Different Topologies of The FF. ..... 159

## Nomenclature

AC Alternating Current.
AFE Analog Fromt End
AMS analog/mixed-signal.
BBPD Bang-Bang Phase Detector.
BER Bit Error Rate.
BGR Bandgap Reference.
CMFB Common Mode Feedback.
CML Current Mode Logic.
CP Charge Pump.
CTAT Complementary-To-Absolute-Temperature.
CTLE Continuous Time Linear Equalizer.
De-MUX Demultiplexer.
DETFF Dual Edge Triggered Flip Flop.
DFE Decision Feedback Equalizer.
EA Error Amplifier.
FFRC Feed-Forward Ripple Cancellation.
FIR The Finite Impulse Response.
HLFF Hybrid-Latch Flip-Flop.
ICs Integrated Circuits.
ISI Intersymbol Interference.
LDO Low-Dropout Regulator.

LMS Least Mean Square.
MS Master-Slave.
MUX Multiplexer.
NIC Negative Impedence Converter.
OP-AMP Operational Amplifier.
OTA Operation Transconductance Ampilifier.
PCB Printed Circuit Board.
PD Phase Detector.
PFD Phase and Frequency Detector.
PG Pulse Generator.
PSR Power Supply Rejection.
PSS Periodic Steady State.
PTAT Proportional-To-Absolute-Temperature.
PVT Process, Voltage, and Temperature.
RF Radio Frequency.
RLS Recursive Least Square.
Rx Receiver.
SAFF Sense Amplifier Based Flip Flop.
SDFF Semi-Dynamic Flip-Flop.
SerDes Serializer and Deserializer.
SL Slave Latch.
SoC System-on-Chip.
Tx Transmitter.
UGF Unity Gain Frequency.
UI Unit Interval.
VCO Voltage Control Oscillators.
VGA Variable Gain Amplifier.

## 1. Introduction

### 1.1. Motivation

With the continuous increase of on-chip computation capacities and exponential growth of data-intensive applications, the high-speed data transmission through serial links has become the backbone for modern Communication systems. To satisfy the massive data-exchanging requirement, the data rate of such serial links has been updated to tens of Gb/s [1].

Modern high-speed electronic systems are characterized by increased data speed Integrated Circuits (ICs). The input/output performance remains the bottleneck that limits the overall performance of a high-speed system. Serial data transfer is the most efficient way of communicating large data quickly between computer chips on printed circuit boards through copper cables and through short, medium, and long length fiber optics.

A Serializer and a Deserializer (SerDes) chip is a circuit which converts parallel data into serial data or vice versa. It is widely used in people's everyday lives such as in Gigabit Ethernet systems, wireless network routers, storage applications and fiber-optic communication systems.

There are several advantages of serial data transmission over parallel data with multichannel transmission. First, the data-skewing problem is eliminated. Because parallel data is transferred to serial data within the chip, only one data stream goes through the channel. Thus, the influences due to channel-to-channel interference no longer exist, Also data timing can be more easily controlled within the chip rather than through lossy channels.

Second, crosstalk creates interference between parallel lines which is less in serial link system as it has one data link.

Third, chip area can be reduced, and as a result, the cost can be decreased where the number of pads can be decreased after converting parallel data into serial data, and thus the chip area can be significantly decreased.

Due to the non-linear dispersion relation of the channel, the bits pulses spread across bit periods which results in Intersymbol Interference (ISI) that can be eliminated by adaptive equalizers that make serial links has more complex hardware.

### 1.2. System overview

The block diagram of the whole system is shown in Figure 1.1.


Figure 1.1.: SerDes Transceiver High-Level Block Diagram.

High speed serial links consists mainly of SerDes. The Serializer in Transmitter (Tx) and the Deserializer in the Receiver ( Rx ). SerDes main goals is to achieve high bit
rate, low power consumption and eliminate data skew. As shown in Figure 1.1, the transmitter consists of a Multiplexer (MUX) to convert from parallel data to serial ones, a pre-driver and a driver that makes the output signals able to transmit properly. The Finite Impulse Response (FIR) filter performs feed-forward equalization on the transmit data before ISI that results from channel.

The receiver consists of Continuous Time Linear Equalizer (CTLE) which boost high frequency component of the signal to equalize high frequency attenuation results from channel, Variable Gain Amplifier (VGA) which provide constant gain across all supported frequency range, Decision Feedback Equalizer (DFE) performs feedback equalization on the received data to cancel ISI that results from the whole system and finally the Demultiplexer (De-MUX) that converts from serial data to parallel ones again.

A termination is used in both transmitter and receiver to achieve matching and avoid multiple reflections. In order to provide power to the system, a Bandgap Reference (BGR) Circuit and Low-Dropout Regulator (LDO) are used.
In this work, all SerDes system components mentioned above are modeled using Verilog-A codes and simulated then integrated with each other and the whole Verilog-a SerDes system simulated with data rate of $10 \mathrm{~Gb} / \mathrm{s}$ and 30 inch FR4 channel. Also, a complete transistor level design of adaptive receiver is performed in 65 nm CMOS technology and integrated with Verilog-A modules of Tx then whole system simulated with data rate of $10 \mathrm{~Gb} / \mathrm{s}$ and 30 inch FR4 channel.

### 1.3. Problem Statement

This work is a 65 nm High speed SerDes link transceiver with Data rate $10 \mathrm{~Gb} / \mathrm{s}$ (USB 3.2 bit rate) for multi serial protocols with Bit Error Rate (BER) less than $10^{-12}$ and low power consumption, it is a chip-chip transceiver over a FR4 trace channel. The full transceiver is modeled as Verilog-A to test the transceiver performance with the channel for measuring the BER and power consumption. A full receiver is designed down to transistor level with its different stages. The equalization type and blocks is determined based on the channel response and it has adaptation for different channels to cancel the ISI and improve the eye diagram of signal received to achieve the required BER.

### 1.4. Thesis Outline

The thesis is organized as follows:

- chapter 2 presents survey for the Transceiver blocks.
- chapter 3 presents Verilog-A model for the Transceiver system.
- chapter 4 presents the design of the Receiver blocks.
- chapter 5 presents the simulation results of the proposed design.


## 2. Survey

In this chapter, the survey of the $\mathrm{Tx} / \mathrm{Rx}$ blocks will be discussed in details. In each section, the block functionality will be discussed briefly and also a comparison between the different topologies of each block, the topology used in the design and the reason for using this topology will be presented. The first section of this chapter is the Tx survey, this section contains the survey of FIR block. The second section is the Rx Analog Front End (AFE) survey, which contains the BGR, LDO, CTLE, and VGA. The third section is the termination calibration circuit survey. The last section contains the DFE survey.

### 2.1. Transmitter Survey

### 2.1.1. Finite Impulse Response

### 2.1.1.1. Introduction

Wired communication systems' main challenge is that the channels have LPF linear characteristics as it has a severe attenuation at high frequency. That response leads to distortion in data symbol in time domain then it causes ISI. SerDes systems need equalization as the rates of Data transferring increasing to be able to recognize the received data enough to sample it.

One of the equalization techniques is to add intended distortion to the signal before streaming to cancel the distortion added from the channel. The FIR filter based on that concept. So, it adds post cursors and pre-cursors to the main cursor of the data symbol of data transferred on the channel. That is done by dividing the data symbol received to several cursors with the UI width (post-cursors, main-cursor, and post-cursors) as shown in Figure 2.1. Then the main goal is to get a clear symbol, which consists of the main cursor only, and canceling the post and pre-cursors.
The FIR based on that technique, FIR built with different topologies:

- Direct FIR: As shown in Figure 2.2, it composes of parallel output drivers for output taps with intermediate delay unit with the UI delay. The gain of the driver considered to be the taps coefficients. This topology has low power and less complexity, an example in [2].


Figure 2.1.: Channel Response and Cursors Definition.

- Segmented DAC: It is based on a map table (RAM) and saved lookup tables that have a specific equalization for the channel and the signals code control parallel Tx segments based on the information in the lookup table. This topology has very flexible equalization but has high power consumption and complexity, an example in [3].


Figure 2.2.: FIR Block Diagram.

### 2.1.1.2. Comparison

A brief comparison between the main topologies as in Table 2.1.

|  | Direct FIR | Segment DAC |
| :---: | :---: | :---: |
| Power Consumption | Low | High |
| Complexity | Low | High |
| Flexibility and adaptability | Lower | Very Flexible |
| Output capacitance | Higher | Small |

Table 2.1.: Comparison of The FIR.

### 2.2. Receiver Analog Front End Survey

### 2.2.1. Bandgap Reference

### 2.2.1.1. Functionality and Specs Required

Bandgap voltage references are an essential block in any analog/mixed-signal (AMS) and digital systems. The function of the BGR circuit is to generate a stable voltage over the Process, Voltage, and Temperature (PVT) variations. The conventional BGR has an output voltage approximately equals to 1.2 V .

The specifications of the BGR in this project are an output voltage of 0.6 V , a maximum supply current of $10 \mu \mathrm{~A}$, a temperature coefficient of 50 PPM over a range from $0{ }^{\circ} \mathrm{C}$ to $80^{\circ} \mathrm{C}$, and the supply voltage of 1.8 V . To get the 0.6 V reference voltage, sub-1-V BGR will be used instead of the conventional BGR.

### 2.2.1.2. Different Topologies

There are many different techniques to design a BGR. The conventional design [4] is shown in Figure 2.3. The idea of the BGR is to combine two voltages, the first one is Proportional-To-Absolute-Temperature (PTAT) and the other one is Complementary-To-Absolute-Temperature (CTAT). At Figure 2.3, the voltage of the two nodes X and Y is the same due to the Operational Amplifier (OP-AMP). So, this forces the voltage drop on $R_{1}$ to be the difference between $V_{b e 2}$ and $V_{b e 1}$, which is the PTAT voltage. The CTAT voltage is $V_{b e 3}$. Adding these two voltages with appropriate weights would generate an almost constant voltage.


Figure 2.3.: The Conventional BGR.

The issue in the conventional design is that the voltage reference cannot be less than 0.7 V due to the $V_{b e 3}$ term in the equation.

To overcome this issue, a simple modification can be applied to the circuit [4]. The schematic of this modification is shown in Figure 2.4. In this design, the $V_{\text {ref }}$ is given by Equation 2.1.

$$
\begin{equation*}
V_{r e f}=\frac{R_{3}}{R_{2}} \cdot\left[V_{e b 2}+\frac{R_{2}}{R_{1}}\left(V_{T} \cdot \ln (N)+\frac{R_{2}}{R_{2 A 2}} \cdot V_{o s}\right)\right] \tag{2.1}
\end{equation*}
$$



Figure 2.4.: The Schematic of The Modified Circuit.

Another topology [5] is shown in Figure 2.5. The $V_{\text {ref }}$ of this design is given by Equation 2.2. From this equation, $V_{b e}$ and $V_{T}$ are the CTAT and PTAT voltages
respectively.

$$
\begin{equation*}
V_{\text {ref }}=\frac{R_{4}}{\left(R_{2}+2 R_{3}\right)}\left[V_{b e}+\frac{\left(R_{2}+2 R_{3}\right)}{R_{1}} V_{T} \ln (n)\right] \tag{2.2}
\end{equation*}
$$



Figure 2.5.: The Schematic of The BGR.

Another design is proposed by [6], the schematic of this design is shown in Figure 2.6. The issue in this design is that the output voltage is 1.2 V . So, a simple modification, based on [7], is applied to this design to get the required voltage reference.


Figure 2.6.: The Schematic of The BGR.

The proposed design is shown in Figure 2.7. In this design, the $Q_{3}$ transistor is removed. And to add the $V_{b e}$ term in $V_{r e f}$ equation, $R_{1}$ and $R_{4}$ are added. The $V_{r e f}$ equation of the proposed design is shown in Equation 2.3.

$$
\begin{equation*}
V_{r e f}=\frac{R_{3}}{R_{4}}\left(\frac{\Delta V_{B E}}{R_{2}} * R_{4}+\left|V_{B E}\right|\right) \tag{2.3}
\end{equation*}
$$



Figure 2.7.: The Schematic of The Proposed Design.

### 2.2.1.3. Comparison

In this section, a brief comparison between the different topologies mentioned above is presented. Table 2.2 shows the comparison. Based on this comparison, the most suitable design is [6], but with some modifications.

|  | Ka Nang Leung [4] | Mehesh [5] | H. Omran [6] |
| :--- | :---: | :---: | :---: |
| Technology | $0.6 \mu \mathrm{~m}$ |  | 90 nm |
| Min. Supply Voltage | 0.98 V | 1 V |  |
| Supply Current | $18 \mu \mathrm{~A}$ | $25 \mu \mathrm{~A}$ | $10 \mu \mathrm{~A}$ |
| $V_{\text {ref }}$ | 603 mV | 540 mV | 1.2 V |
| TC | $15 \mathrm{ppm} / \mathrm{C}^{*}-0$ to 100 C | $109 \mathrm{ppm} / \mathrm{C} *_{-} 40$ to 125 C | $13 \mathrm{ppm} / \mathrm{C} *_{-}-40$ to 125 C |

Table 2.2.: Comparison Between Different Topologies.

### 2.2.2. Low Drop-out Voltage Regulator

LDO voltage regulators are linear regulators essential building blocks in powermanagement systems. Power-management systems for microprocessors and portable devices often use multiple LDO regulators to provide a regulated supply voltage with minimal ripple to supply-noise-sensitive blocks [8].

In order to cater the customer needs, industry is pushing towards complete System-on-Chip (SoC) design solutions including power management. The conventional LDO voltage regulator requires a relatively large output capacitor in microfarad range. Large capacitors in microfarad range cannot be realized in current IC fabrication technologies. Thus, each LDO regulator needs an external pin for a board mounted output capacitor which will lead to low pin/pad utilization. The capacitorless LDO is designed to replace a large external output capacitor with internal capacitor of 1 nF [9].

So present research in LDOs is focused on removing this external capacitor and transferring the dominate pole from output node to internal node while maintaining stability, good transient response and high power supply rejection performance as shown in Figure 2.8. It also reduces the pin count resulting in reducing board overall cost of the design and makes it suitable for SoC applications.


Figure 2.8.: (a) Conventional LDO regulator and (b) Capless-LDO regulator.

### 2.2.2.1. Specs Required

Two LDOs, one for AFE Blocks and the other for Digital blocks. The specifications of the LDO in this project are reference voltage of 0.6 V , output voltage of 1.2 V
with steady state error $1 \%$, overshoot $10 \%$, maximum current 1 mA , the supply voltage is 1.8 V , and the current reference $5 \mu \mathrm{~A}$. The different specification between AFE blocks LDO and Digital blocks LDO is minimum current which can support each LDO.

### 2.2.2.2. Block diagram

It consist of pass element, an error amplifier and a resistor feedback network. The feedback network comprises of resistive voltage divider, which delivers scaled output voltage which is equal to the reference voltage when the output is at its nominal voltage. The error amplifier is constantly comparing the reference voltage and the voltage being feed from the voltage divider. This difference is amplified and the output of the error amplifier drives the pass element to keep the output voltage level at desired value.

1. Error Amplifier: Error amplifier design must be kept as simple as possible, so it does not draw too much of current. The less current branches it has, the overall quiescent current is lower. Also as we try to make the quiescent current as low as possible, there is a trade-off between biasing current and performance of the error amplifier (bandwidth, slew rate etc.). The DC open loop gain should be high under all load conditions to ensure accuracy of the output. Bandwidth of the amplifier should be large enough to react fast upon changes of the load conditions and input voltages. Output voltage swing of the amplifier is also important, because at low load currents, the pass device needs to be turned off, which leads to the error amplifier output being driven close to one of the supply rails depending on the pass device type [10].
2. Pass Element: Pass element is transferring large currents from input to the load and is driven by the error amplifier in a feedback loop. There are various topologies of pass elements: PMOS transistor (LDO), NMOS transistor (HDO) and BJT (n-type, p-type). Parameters which are used to choose any topology are dropout voltage, maximum current, and minimum gate voltage. Table 2.3 shows these parameters.

| Parameter | PMOS transistor (LDO) | NMOS transistor (HDO) |
| :---: | :---: | :---: |
| Min Input Voltage $V_{i n, \text { min }}$ | $V_{G S}+V_{D S(\text { sat })}$ | $V_{o}+V_{d o}$ |
| Voltage Dropout $V_{d o}$ | $V_{D S(\text { sat })}$ (Lower) | $V_{G S}+V_{D S(s a t)}$ (Higher) |
| Max Output Current $I_{o, \text { max }}$ | Higher | Lower |
| Output Impedance $R_{o u t}$ | $r_{d s}$ Higher | $\frac{1}{g_{m}}$ Lower |

Table 2.3.: Comparison Between LDO and HDO.
3. Feedback Resistance Network: Resistive feedback network makes the output voltage equal to reference voltage scaled with ratio $R_{F 2} / R_{F 1}$. The current flowing through the divider contributes to the quiescent current of the LDO, so
for low consumption, the resistance have to be in Mohm, so the consumption current in the resistance is in $\mu \mathrm{A}$.

### 2.2.2.3. LDO Parameters

1. Stability: A Capless-LDO regulator model for stability analysis is shown in Figure 2.9. This model uses as reference. Signals $\mathrm{V}_{\text {in }}$, $\mathrm{V}_{\text {out }}$ and $\mathrm{V}_{\text {ref }}$ represent the input, output, and reference Voltages. The Error Amplifier (EA) transfer function is represented by $\mathrm{A}_{\mathrm{EA}}$ ( s ) and is expressed in Equation 2.4.

$$
A_{E A}(S)= \begin{cases}\frac{A_{E A, 0}}{1+\frac{5}{W_{p 1}}} & \text { for one stage } E A  \tag{2.4}\\ \frac{A_{E A, 0}}{\left(1+\frac{S}{W p 1}\right)\left(1+\frac{S}{W p 2}\right)} & \text { for two stage } E A\end{cases}
$$



Figure 2.9.: Capless LDO Regulator Model for Stability Analysis.
Stability for one stage EA:

$$
\begin{equation*}
\text { loop gain }=\frac{G_{m, E A} R_{0, E A} G_{m, M P}\left(R_{L}\left\|r_{0}\right\| R_{F 2}+R_{F 1}\right)\left(\frac{R_{F 2}}{R_{F 2}+R_{F 1}}\right)}{\left(1+\frac{S}{W_{p 1}}\right)\left(1+\frac{S}{W_{p 0}}\right)} \tag{2.5}
\end{equation*}
$$

Where $P_{0}$ is the output pole of the system and is given by the $C_{L}$ and the parallel combination of the output resistance of the pass transistor $\left(\frac{1}{g_{d s}}\right)$, load resistance $\left(\frac{1}{g_{l}}\right)$, and feedback resistors $\left(\frac{1}{g_{\beta}}\right), \mathrm{P} 1$ is the output pole of the Error Amplifier and is given by the output capacitance of Error Amplifier and output resistance of Error Amplifier $\left(\frac{1}{g_{0, E A}}\right)$.

$$
\begin{equation*}
W_{p 0}=\frac{g_{o u t}}{C_{L}} \tag{2.6}
\end{equation*}
$$

$$
\begin{align*}
& g_{o u t}=\frac{1}{R_{o u t}}=\left(g_{d s}+g_{L}+g_{\beta}\right) \alpha I_{L}  \tag{2.7}\\
& g_{\beta}=\frac{1}{R_{F 1}+R_{F 2}}, \quad g_{d s}=\lambda I_{L}, \quad g_{L} \alpha I_{L}  \tag{2.8}\\
& W_{p 1}=\frac{g_{0, E A}}{\left(C_{1}+C_{2}\left(1+A_{p}\right)\right)} \alpha \sqrt{I_{L}}  \tag{2.9}\\
& g_{m, M P}=\sqrt{K_{p} I_{L}}  \tag{2.10}\\
& U G F \cong \frac{\beta g_{m, E A}}{C_{2}} \tag{2.11}
\end{align*}
$$



Figure 2.10.: Capless-LDO regulator.

The Dominant pole in Equation 2.9 is proportional to square root of $I_{L}$ and the nondominate pole in Equation 2.6 is proportional to $I_{L}$, so the worst stability condition typically happens at minimum $I_{L}$, thus it is important to achieve good phase margin at this point by placing the Unity Gain Frequency (UGF) in Equation 2.11 below the non-dominant pole frequency.


Figure 2.11.: One-Stage EA LDO Bode Plot.

Stability for two stage EA

$$
\begin{equation*}
\frac{V_{f b 2}}{V_{f b 1}}=-\frac{\beta A_{E A, 0} A_{p}}{\left(1+\frac{S}{W_{p 1}}\left(\frac{S^{2}}{W_{0}^{2}}+\frac{S}{W_{0} Q}+1\right)\right.} \tag{2.12}
\end{equation*}
$$

$$
\begin{equation*}
Q \alpha \frac{1}{\sqrt{g m_{p}}} \text { so } \alpha \frac{1}{\sqrt[4]{I_{L}}} \tag{2.13}
\end{equation*}
$$

The two non dominant poles must be above $U G F \cong \frac{\beta g m_{1}}{C_{m}}$ to ensure stability.


Figure 2.12.: Two-Stage EA LDO Block.

At light loads, these two non dominant poles become complex and can generate peaking due to high Q . If the magnitude of the peaking is large enough to cross the 0 dB line, the system will be unstable. So for no peaking occurs $\mathrm{Q} \leq 0.707$. To enhance stability: by distributing the power consumption such that most of the power is spent on the gain stages with non-dominant poles (especially the stage driving the pass transistor).


Figure 2.13.: Two-Stage EA LDO Bode Plot.
2. Load Transient: The load transient quantifies the peak output-voltage excursion and signal settling time when the load-current is stepped. An LDO regulator
with good load-transient response must achieve minimal overshoot/undershoot voltage and fast settling time.
To get good load transient, increase the bias current at the stage driving the gate of the pass transistor to improve slew rate but it increases in power consumption.
3. load regulation: The load regulation also quantifies the voltage variation at the output when change in the load-current occurs but it is measured once the output voltage is in steady-state.

$$
\begin{equation*}
\text { Load Regulation }=\left.\frac{\Delta V_{\text {out }}}{\Delta I_{L}}\right|_{t \rightarrow \infty}=R_{\text {ooout }, \text { cl }} \cong \frac{1}{\text { loopgain } @ d c} \tag{2.14}
\end{equation*}
$$

To get good load regulation, increase the error amplifier DC gain that leads to small $R_{o, \text { out }}$ and better load regulation. High EA DC gain at $\mathrm{I}_{\max }$ is particularly necessary to achieve good load regulation.
4. Power Supply Rejection (PSRR): PSR is a measure of the ac coupling of the input supply on the output voltage. The finite PSR in LDO is due to several paths between the input and output. The ripple coming from path four (voltage reference) is minimum when a high PSR voltage reference is implemented.


Figure 2.14.: Input-to-output ripple paths in Capless-LDO.
To improve the PSRR for whole frequency response, there is a Parameter which has a dominate effect on the response in each region as shown in Figure 2.15.

- Region 1 ( 10 to 100 Hz ): Thermal coupling into $\mathrm{V}_{\text {ref }}$.
- Region 2 ( 100 Hz to 100 KHz ): Open loop gain of Error amplifier.
- Region 3 (more 100KHz): Parasitic capacitance and output capacitor.


Figure 2.15.: PSRR of CL-LDO.
5. Noise: Noise in LDO regulators refers to the thermal and flicker noise in transistors and resistors. It can be specified as output voltage noise spectral density $(V / \sqrt{H z})$ or as integrated output noise voltage Vrms which is essentially the output spectral noise density integrated over a bandwidth.

To reduce coming noise from the voltage reference

- Add a low pass filter to the output of the voltage reference at the expense of increasing area.
- Adding cap in series with feedback resistor as shown in Figure 2.18, it brings Feedback gain attenuation $\left(\mathrm{A}_{\mathrm{FB}}\right)$ to 1 at high frequency and also improve transient response.

$$
\begin{align*}
& \frac{V_{\text {out }}}{V_{\text {ref }} \text { Noise }}=\frac{g_{m n} R_{\text {out }} A_{E A}}{1+g_{m p} R_{\text {out }} A_{E A} A_{F B}} \approx \frac{1}{A_{F B}}  \tag{2.15}\\
& A_{F B}=\frac{R_{2}}{R_{2}+R_{1} \| \frac{1}{S C_{f}}}=\frac{R_{2}\left(1+S C_{f} R_{1}\right)}{R_{2}+R_{1}+S C_{f} R_{2} R_{1}} \tag{2.16}
\end{align*}
$$

$\left.A_{F B}\right|_{S \rightarrow \infty}=1$


Figure 2.16.: LDO Model for Noise.


Figure 2.17.: LDO major noise contributors.


Figure 2.18.: Feedback Resistance Network.

To reduce the flicker noise the differential pair transistor dimensions need to be large as Flicker noise is inverse proportional to Area of transistor (WL) as shown in

Equation 2.17.

$$
\begin{equation*}
\text { Flicker Noise } V_{n}^{2}(f)=\frac{K}{C_{o x} W L f}\left(V^{2} / H z\right) \tag{2.17}
\end{equation*}
$$

To reduce feedback resistors' noise requires smaller resistance as Thermal noise is proportional to Resistance as shown in Equation 2.18, which in turn increases LDO $I_{q}$.

$$
\begin{equation*}
\text { Thermal Noise } V_{n}^{2}(f)=4 K T R\left(V^{2} / H z\right) \tag{2.18}
\end{equation*}
$$

## 6. Power Characteristics: [11]

$\operatorname{PowerEfficiency~}(\eta)=\frac{V_{\text {out }} I_{\text {Load }}}{V_{\text {out }} I_{\text {Load }}+\left(V_{\text {in }}-V_{\text {out }}\right) I_{\text {Load }}+V_{\text {in }} I_{q}}=\frac{V_{\text {out }} I_{\text {Load }}}{V_{\text {in }}\left(I_{\text {Load }}+I_{q}\right)} \approx \frac{V_{\text {out }} \eta_{I}}{V_{\text {in }}}$

To get improved power efficiency, reduce $I_{q}$ at min load current.

### 2.2.2.4. Advanced Compensation Topologies [8]

Capless LDO with damping factor: [12] The schematic of this topology is shown in Figure 2.19. It is based on Miller pole splitting compensation to achieve small onchip capacitance when compared with the conventional LDO regulator. A dampingfactor circuit stabilizes the LDO regulator for various capacitive load conditions.
The loop transfer functions for Capless LDO with damping factor can be expressed as in Equation 2.20.

$$
\begin{equation*}
\text { loop gain }=\frac{G_{m, E A 1} R_{0, E A 1} G_{m, E A 2} R_{0, E A 2} G_{m, M P} R_{\text {out }}\left(\frac{R_{F 2}}{R_{F 2}+R_{F 1}}\right)}{\left(1+\frac{S}{W_{P 1}}\right)\left(\frac{S^{2}}{W_{0}^{2}}+\frac{S}{W_{0} Q}+1\right)} \tag{2.20}
\end{equation*}
$$

Where $G_{m, E A 1} R_{0, E A 1}$ is gain of First stage Error Amplifier, $G_{m, E A 2} R_{0, E A 2}$ is gain of second stage Error Amplifier.

$$
\begin{equation*}
\omega_{P 1}=\frac{1}{C_{m} G_{m, E A 1} R_{0, E A 1} G_{m, E A 1} R_{0, E A 1} G_{m, M P} R_{o u t}} \tag{2.21}
\end{equation*}
$$



Figure 2.19.: Capless LDO with damping factor.

$$
\begin{align*}
& \omega_{0}=\sqrt{\frac{G_{m, E A 2} G_{m, M P}}{C_{L}\left(C_{m, M P}+C_{G D, M P}\right)}}  \tag{2.22}\\
& Q=\frac{G_{m, E A 2} G_{m, M P}}{\omega_{0}\left(G_{m, M P}-G_{m, E A 2}\right) C_{G D, M P}} \tag{2.23}
\end{align*}
$$

Load Transient Topologies: To improve the load transient, there are two approach:

- Pass transistor gate voltage slew-rate enhancement with multiple active loops.
- Output-impedance reduction.

Capless LDO with transimpedance: [13] Figure 2.20 shows the schematic of this topology. Adding current amplifier to amplify the current in pass transistor gate and current Sense Transistor $M_{S}$ to generate an additional fast loop. Load variations are detected by the $M_{S}$ to generate a scaled copy of $I_{L}$. During transitions from low to high load currents, the corresponding increase in the sense current improves the slew rate at the gate of the pass transistor.


Figure 2.20.: Capless LDO with transimpedance.

Capless LDO with adaptively biased: [14] Capless-LDO is shown in Figure 2.21. It uses an auxiliary loop to adjust the bias current of the EA's first stage. The EA is biased with a small fixed current $I_{B A N D}$ an adaptive bias current $I_{A B}$ proportional to $I_{L}$. The auxiliary loop is formed by the current sensing transistor Ms and a simple current mirror. The adaptive bias current $I_{A B}$ increases the loop bandwidth and, as a result, the load transient performance is improved. The current mirror between current sense Transistor $M_{S}$ and pass transistor is 1:N to make Sure that Sensed current is very low compared to the total Current as the sense current is added to $I_{q}$.


Figure 2.21.: Capless LDO with adaptively biased.

### 2.2.2.5. PSR Enhancement Topologies [8]

Capless LDO Voltage subtractor: The main idea is to provide high impedance from the gate of $\mathrm{M}_{\mathrm{P}}$ to ground and low impedance from the gate of $M_{p}$ to $V_{I N}$. This allows the gate to follow the signal at the source of $M_{p}$, Power Supply Rejection (PSR) at low frequencies is improved. $R_{B 1}, R_{B 2}$ and $M_{p s}$ form the low impedance from the gate of $M_{p}$ to $V_{I N}$, and $M_{N 1}$ and $M_{N 2}$ form the high impedance from the gate to ground.


Figure 2.22.: CL LDO Voltage subtractor.

High PSR LDO regulator based on a Feed-Forward Ripple Cancellation (FFRC):
The PSR at low frequencies can be enhanced by increasing the feedback gain of the LDO. However at high frequencies, the PSR is mainly due to paths 1 and 2 in Figure 2.23, and is limited by the dominant pole of the feedback loop. To achieve higher PSR at both DC and high frequencies, ripples generated in paths 1 and 2 should be removed.

To eliminate input ripples from appearing at the output, a zero transfer gain is necessary from the input to the output. In the ideal case (without considering $R_{d s, M P}$ ), this is achieved by implementing a feed-forward path that replicates same input ripples at the gate of the pass transistor. The gate-overdrive voltage is independent of input ripples, and as a result no ripple appears across the load. In the actual case (with $R_{d s, M P}$ ), part of the ripples leak through the finite output resistance of MP, and should be removed. This is done by increasing the ripple amplitude appearing at the gate of to cancel ripples that leak through $R_{d s, M P}$ by an amount of $\left(g_{m, M P}+g_{d s, M P}\right) / g_{m, M P}$.

The feed-forward path is implemented using a feed-forward amplifier and a summing amplifier. The summing amplifier is used to merge the feedback regulating loop with feed-forward path at the gate of the transistor [15].


Figure 2.23.: Block-Level Representation of The Feed-Forward Ripple Cancellation LDO.

### 2.2.2.6. Comparison Between The Topologies

The common design specifications help to reveal the advantages and disadvantages of the three Capless-LDO topologies based on their compensation scheme and Error Amplifier topology [8]. The comparison is shown in Table 2.4.

Damping factor Topology has high PSR at Heavy load, low output noise and low IQ. Voltage subtractor Topology has better Line transient and high PSR. Transimpedance Topology has better Load transient.

| Topology | CL LDO with damping factor | CL Ldo with transimpedance | CL Ldo Voltage <br> subtractor |
| :---: | :---: | :---: | :---: |
| Technology | $0.6 \mu \mathrm{~m}$ | $0.35 \mu \mathrm{~m}$ | $0.35 \mu \mathrm{~m}$ |
| $V_{\text {in }}(V)$ | 3 | 3 | 3 |
| $V_{\text {out }}(V)$ | 2.8 | 2.8 | 2.8 |
| $V_{\text {ref }}(V)$ | 1.4 | 1.4 | 1.4 |
| Quiescent Current ( $\mu A$ ) $\left(@ I_{L}=100 \mu A, / 50 \mathrm{~mA}\right)$ | 63/60 | 46/170 | 80/100 |
| Total on chip compensation capacitance ( pF ) | 8 | 2.7 | 2.8 |
| Load transient $\Delta V_{\text {out }}(V)^{1}$ | 1.02/0.65 | 0.962/0.289 | 1.207/0.345 |
| Load transient settling $(\mu s)^{1}$ | 1.2/3.1 | 1.04/3.56 | 1.73/1.56 |
| EA DC gain <br> $(\mathrm{dB})\left(@ I_{L}=100 \mu A, / 50 \mathrm{~mA}\right)$ | 79/80 | 80/46 | 71/63 |
| PSR @ $50 \mu A$ (dB) @ 1 KHz , $10 \mathrm{KHz}, 100 \mathrm{KHz})$ | -52/-50/-27 | -46/-26/-7 | -48/-47/-26 |
| $\text { PSR @ } 100 \mu A \text { (dB) @ } 1 \mathrm{KHz},$ $10 \mathrm{KHz}, 100 \mathrm{KHz} \text { ) }$ | -54/-52/-38 | -50/-31/-11 | -82/-62/-39 |
| Line Transient (mV) ${ }^{2}$ | 144/271 | 419/496 | 76/93 |
| Output Noise Spectral <br> Density (@100KHz ( $n V / \sqrt{H z}$ ) | 90 | 130 | 190 |

1 Worst performance for a load step from $100 \mu \mathrm{~A}$ to $50 \mathrm{~mA} / 50 \mathrm{~mA}$ to $100 \mu \mathrm{~A}$ with rise/fall times of 100 ns
2 For a input voltage step from 3 V to $3.6 \mathrm{~V} / 3.6 \mathrm{~V}$ to 3 V with rise/fall times of 600 ns and load current $=100 \mu \mathrm{~A}$.
Table 2.4.: Comparison Between Different Topologies.

### 2.2.3. Continuous Time Linear Equalizer

### 2.2.3.1. Introduction

SerDes systems data rates are increasing rapidly as it is usually for all communication systems. Increasing data rate means higher bandwidth, and all wired channels have increasing attenuation as the frequency increases, which leads to frequency dependent attenuation. That mainly causes ISI to data symbols then lead to high BER, so we need an equalization process to solve this issue and keep the low BER as we increases the data rate and operating frequency.

One of the major circuits that are used in equalization process is the CTLE. The CTLE is based on a main concept, it is designed to cancel the channel response as it adding higher attenuation at high frequency the CTLE makes the response flat as possible by performing boosting gain at high frequency and lower gain at DC as shown in Figure 2.24. Therefore, the frequency response of the CTLE is
a continuous-time high-pass filter that amplifies the high-frequency signal components around the Nyquist frequency of the transmitted signal, opposing to low pass response of the channel. Thus, the CTLE can restrain pre-cursor and post-cursor efficiently and potentially suppress the high-frequency noise of subsequent stages and that cancels the ISI and improve the performance and BER.


Figure 2.24.: Equivalent Frequency Response.

Besides, the performance and adaptation mechanism becomes crucial in a multi-drop bus environment. In these cases, the equalization requires adaptability for better performance and need to deal with various channel lengths and their time-varying properties, this adaptability depends on the output of the CTLE and measuring the equalization quality with various techniques. The techniques derived from digital signal processing such as Least Mean Square (LMS) used in [16, 17] and Recursive Least Square (RLS) algorithms. Power spectrum balancing method techniques are also used for adaptation as in $[18,19]$ and another technique for adaptation is the eye-size adaptation.

### 2.2.3.2. Existing Topology for CTLE

There are two main topologies for CTLE.

- Differential Amplifier with Adaptive Degeneration: It is simply a differential amplifier with degeneration with a variable capacitance and resistance as shown in Figure 2.25, which gives direct control over the DC gain and the location of the peak as shown in Figure 2.26 according to Equation 2.24, Equation 2.25, Equation 2.26 and Equation 2.27. This topology is simple in
adaptation and does not need very large power consumption and here are some examples in [20, 21, 22].

$$
\begin{align*}
& \text { DC Gain }=\frac{g_{m} R_{D}}{1+g_{m} R_{s} / 2}  \tag{2.24}\\
& \text { Ideal Peaking }=\frac{\text { Ideal Peaking Gain }}{\text { DC Gain }}=1+\frac{g_{m} R_{S}}{2}  \tag{2.25}\\
& \omega_{z}=\frac{1}{R_{s} C_{s}}  \tag{2.26}\\
& \omega_{p 1}=\frac{1+g_{m} R_{s} / 2}{R_{s} C_{s}} \tag{2.27}
\end{align*}
$$

In high data rates, this topology needs an extension for the bandwidth, there


Figure 2.25.: CTLE with adaptive degeneration.
are different techniques to extend the bandwidth such as using negative capacitance at the loading node to decrease the loading effect and get higher bandwidth as in [23]. Another technique is using shunt inductors to add another zero to help to extend the bandwidth as in [24]. Alternatively, sometimes more than one technique used for large boosting to bandwidth as in $[25,26]$. Moreover, as this topology based on a differential amplifier, it has the same problems due to mismatch like offset and shifted common-mode level. So there are some additional methods to solve these issues like offset cancellation circuit
and common mode feedback as in [23, 25, 27].


Figure 2.26.: $R_{s}$ and $C_{s}$ Adaptation.

- Dual Path Filters CTLE: As shown in Figure 2.27, it is composed of two filters, one is a high pass filter with peak location at the operating frequency of the system and the other is an all-pass filter with a bandwidth equal to the operating frequency. The adaptation controlled by varying the gain of each filter individually as in $[28,29]$. The main advantage of this topology is that the peak gain can be boosted without affecting the DC gain.

High-Pass Path


Figure 2.27.: CTLE with adaptive gain filters.

A comparison between the topologies is shown in Table 2.5.

|  | $[23]$ | $[24]$ | $[25]$ | $[27]$ | $[28]$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Technology | 65 nm | 55 nm | 40 nm | 28 nm | 90 nm |
| Data Rate | $5 \mathrm{~Gb} / \mathrm{s}$ | $12.5 \mathrm{~Gb} / \mathrm{s}$ | $5-20 \mathrm{~Gb} / \mathrm{s}$ | $12 \mathrm{~Gb} / \mathrm{s}$ | $40 \mathrm{~Gb} / \mathrm{s}$ |
| Arch. | CTLE + <br> Offset <br> Canceller | CTLE | CTLE + <br> CMFB | CTLE + <br> Offset <br> Canceller | CTLE |
| Max Loss <br> (dB) | 20 | 19 | 18.5 | 17 | 10 |
| Power Con- <br> sumption <br> (mW) | 33 | 14.4 | $11.5-25.5$ | 21 | 58 |
| Supply <br> Voltage (V) | 1.2 | 1.2 | 1.1 | 1 | 1.3 |

Table 2.5.: Comparison of The CTLE.

### 2.2.4. Variable Gain Amplifier

The VGA is an indispensable building block to maximize the dynamic range of modern wireless and wireline communication systems as well as medical equipment, hearing aids, disk drives, and so on. The VGA of an AGC loop is used to control the transmission signal power or to adjust the received signal amplitude. There are two possible approaches to build the VGA. One is to build a discrete gain step VGA with a digital control signal $[30,31,32]$, and the other is to build a continuous VGA gain controlled by an analog control signal [33, 34, 35]. In general, digitally controlled VGAs performed by arrays of resistors for gain variations [36] and analog VGAs adopt a variable transconductance or a variable resistance to control the gain. The main advantage of analog controlled VGAs is the ability of provide a continuations gain range (adapted to be dB-linear in most cases).

For SerDes systems, digital control signal approach is preferred as most straight forward method especially when the control unit of the whole system is designed as a digital block. In addition to there is no need to have high sensitivity in gain variations in SerDes systems compared to Radio Frequency (RF) systems. Figure 2.28 shows examples of VGA with analog control signal. The topology in Figure 2.28b have an extra advantage of high linearity. To achieve a dB-linear characteristic, NMOS load transistors M3 and M4 are biased in the sub-threshold region, while current starved diode connected NMOS load transistors M5 and M6 are biased in
the saturation region. By inspection, the gain is derived as in Equation 2.28.

$$
\begin{equation*}
A_{v}=\frac{g_{m ~ 1,2}}{g_{m 4.5}+g_{m 6,7}} \tag{2.28}
\end{equation*}
$$



Figure 2.28.: Schematic of VGA With Analog Control Signal.

Also, Figure 2.29 shows examples of VGA with digital control signal. In these two circuits, the input signal is split into full-amplitude (AP \& AN) and halfamplitude (AP2 \& AN2) paths by using a resistive divider. Gain is adjusted by setting the degeneration resistance of each amplifier to one of the possible values using thermometer-coded switched-R networks, resulting in different VGA gain steps with a targeted gain range. For large input levels, the bias current to the fullamplitude path amplifier is biased off, while for small signals both amplifiers are biased on. To enable gain adjustment without corrupting data, the full-amplitude path VGA bias is switched on and off with a slow time constant.

Table 2.6 summarizes the performance of recently published work about VGA.

(a) First Topology.

(b) Second Topology.

Figure 2.29.: Schematic of VGA With Digital Control Signal.

|  | $[37]$ | $[38]$ | $[39]$ | $[40]$ | $[41]$ |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Technology (nm) | 65 | 180 | 90 | 180 | 65 |
| Gain Range (dB) | 22 | 34 | 60 | -16.6 to 6.5 | 3 to 31 |
| Bandwidth $(\mathrm{GHz})$ | $2-2.2$ | 1.1 | 2.2 | 5.6 | 0.02 to 0.98 |
| Power $(\mathrm{mW})$ | 3.48 | 0.7 | 2.5 | 7.9 | 48 |
| Gain Control Mode | Analog | Analog | Analog | Digital | Digital |

Table 2.6.: Comparison Between The Different Topologies of The VGA.

### 2.3. Termination Calibration Circuit

In high speed systems and interfaces signal delays in Printed Circuit Board (PCB) long lines are an important factor and their effects cannot be ignored. When delays in interconnects are commensurable with signal transition times, they are considered transmission lines. Some of the main factors limiting the operating frequency of high speed devices are the I/Os and their transmission lines. Signal termination is required to reduce voltage reflections. Excessive transmission line reflections can cause random logic false triggering [42].
As a result of the mentioned phenomena, the system may fail to function under some operating conditions such as high temperatures or over-voltages.

$$
\begin{equation*}
V(Z)=V_{0}^{+} e^{-j \beta Z}+V_{0}^{-} e^{j \beta Z} \tag{2.29}
\end{equation*}
$$

$$
\begin{equation*}
I(Z)=\frac{V_{0}^{+}}{Z_{0}} e^{-j \beta Z}-\frac{V_{0}^{-}}{Z_{0}} e^{j \beta Z} \tag{2.30}
\end{equation*}
$$

From Telegrapher's Equations of Transmission Line (RLGC model), the voltage and current relations in Equation 2.29 and Equation 2.30 respectively, Where $V_{0}^{+}$and $V_{0}^{-}$are the voltages of the incident and reflected waves. The termination impedance $Z_{\text {term }}$ which is the load of the transmission line with characteristic impedance $Z_{0}$ can be given by Equation 2.31.

$$
\begin{align*}
& Z_{\text {term }}=\frac{V(0)}{I(0)}=\frac{V_{0}^{+}+V_{0}^{-}}{V_{0}^{+}-V_{0}^{-}} Z_{0}  \tag{2.31}\\
& V_{0}^{-}=\frac{Z_{\text {term }}-Z_{0}}{Z_{\text {term }}+Z_{0}} V_{0}^{+}=\rho V_{0}^{+} \tag{2.32}
\end{align*}
$$

In general, the amplitude of the wave reflected at the end of a transmission line is determined by the reflection coefficient, $\rho$.

$$
\begin{equation*}
\rho=\frac{Z_{\text {term }}-Z_{0}}{Z_{\text {term }}+Z_{0}} \tag{2.33}
\end{equation*}
$$

As it is seen from Equation 2.33, in order to have no reflections ( $\rho=0$ ), termination resistance and $Z_{0}$ must be matched (i.e., the same). If source impedance does not equal $Z_{0}$, then reflections occur at the near end of the line as well. Each end of the line has its own value of $\rho$.

Slices block, as shown in Figure 2.31, contains parallel connected resistors and control transistors which can operate in cut off or triode regions. These resistors are controlled by binary calibration signals (which come from logic block), the first block has the smallest resistance and contains 32 X resistors connected in parallel, the sixth one contains only one resistor with the same X value and has the highest resistance. The MSB binary code connected to the smallest resistance group (with 32 resistances connected in parallel) in replica ( $\mathrm{R}=32 \mathrm{X}$ ) and LSB to the highest resistance group of replica ( $\mathrm{R}=\mathrm{X}$ ). To avoid excessively large resistances of $R_{\text {cal }}$, there is also a block of resistance which is always on and has the smallest resistance ( $\mathrm{R}=64 \mathrm{X}$ ).


Figure 2.30.: I/O termination resistor calibration circuit.


Figure 2.31.: On-Chip Resistance With Slices.

### 2.3.1. SR Latch

Recently reported flip-flops achieve small delay between the latest point of data arrival and output transition. Typical representatives are SAFF, Hybrid-Latch FlipFlop (HLFF) and Semi-Dynamic Flip-Flop (SDFF). HLFF and SDFF outperform reported SAFF designs, because the latter are limited by the implementation of their output latch.

### 2.3.1.1. SAFF

The SR latch of the SAFF, shown in Figure 2.32, operates as follows: input $\bar{S}$ is a set input and $\bar{R}$ is a reset input.


Figure 2.32.: SAFF.

The low level at both $\bar{S}$ and $\bar{R}$ node is not permitted and that is guaranteed by the SA stage. The low level at $\bar{S}$ sets the $Q$ output to high, which in turn forces $\bar{Q}$ to low. Conversely, the low level at $\bar{R}$ sets $\bar{Q}$ the high, which in turn forces $Q$ to low. Therefore, one of the output signals will always be delayed with respect to the other. The rising edge always occurs first, after one gate delay, and the falling edge occurs after two gate delays. Additionally, the delay of the true output, $Q$ depends on the load on the complementary output $\bar{Q}$, and vice versa. This limits the performance of the SAFF [43].

### 2.3.1.2. Improved SR latch

As shown in Figure 2.33, The SR latch is modified In order to overcome the problem of non-symmetry of the SR latch in SAFF.

$$
\begin{equation*}
Q^{+}=S+\bar{R} \cdot Q \tag{2.34}
\end{equation*}
$$

$$
\begin{equation*}
\overline{Q^{+}}=R+\bar{S} \cdot \bar{Q} \tag{2.35}
\end{equation*}
$$

As shown in Equation 2.34 and Equation 2.35, Where $Q$ represents a present, while $Q^{+}$represents a future state of the SL , i.e., the state after the transition of the clock.


Figure 2.33.: Improved SR latch.
The SL modification starts with logic representations for the new output values $Q^{+}$ and $\overline{Q^{+}}$that are obtained by writing independent logic equations for the $Q$ and $\bar{Q}$ outputs of the cross-coupled NAND gate SR latch. Each of these two equations is implemented as AND-OR structure. Therefore, the output signals will be delayed with same delay.

### 2.3.2. Comparator

### 2.3.2.1. Two Stage OTA Comparator

As shown in Figure 2.34, the comparator is based on two stage 5T OTA with high gain or high slope between input and output. So if $\mathrm{V}_{\mathrm{in}+}>\mathrm{V}_{\mathrm{in}}$ the output is $V_{D D}$. Conversely, if $\mathrm{V}_{\text {in- }}>\mathrm{V}_{\mathrm{in}+}$ the output is ground. This comparator has high performance and low voltage offset with higher gain but consumes high biasing current with high power consumption.


Figure 2.34.: Two stages OTA.

### 2.3.2.2. Strong Arm Comparator

The latched Comparator has become popular for many reasons as it consumes zero static power and it directly produces rail-to-rail outputs and its input-referred offset arises from primarily one differential pair [44]. The latch of Figure 2.35 consists of a clocked differential pair, M1-M2, two cross-coupled pairs, M3-M4 and M5-M6, and four pre-charge switches, S1-S4. The circuit provides rail-to-rail outputs at outp and outn in response to the polarity of $V_{i n 1}-V_{i n 2}$.

The circuit has four phases of operation, in the first phase, CK is low; M1 and M2 are off; nodes $\mathrm{P}, \mathrm{Q}, \mathrm{X}$, and Y are pre-charged to $V_{D D}$. In the second phase, ClK goes high, S1-S4 turn off, and M1 and M2 turn on, drawing a differential current in proportion to $V_{i n 1}-V_{i n 2}$. With M4-M7 initially off, this current flows from CP and CQ allowing $\left|\mathrm{V}_{\mathrm{P}}-\mathrm{V}_{\mathrm{Q}}\right|$ to grow and possibly exceed $\left|V_{i n 1}-V_{i n 2}\right|$, so this phase provide voltage gain as it's called the amplification mode.

$$
\begin{equation*}
\left|V_{P}-V_{Q}\right|=\frac{g m_{1,2}\left|V_{i n 1}-V_{i n 2}\right|}{C_{P, Q}} t \tag{2.36}
\end{equation*}
$$

As $V_{P}$ and $V_{Q}$ fall to $V_{D D}-V_{T H N}$, the cross-coupled NMOS transistors turn on (third phase), allowing part of the drain currents of M1 and M2 to flow from X and Y. The amplification mode therefore lasts for approximately $\left(\frac{C_{P, Q}}{I_{C M}}\right) V_{T H N}$ seconds, where $I_{C M}$ is the common mode current drawn from each capacitance.


Figure 2.35.: Strong Arm Latch.

The voltage gain of this mode is:

$$
\begin{equation*}
A_{v}=\frac{g m_{1,2} V_{T H N}}{I_{C M}} \tag{2.37}
\end{equation*}
$$

The output voltages $V_{X}$ and $V_{Y}$ continue to fall until they reach $V_{D D}-V_{T H P}$, at which point M5 and M6 turn on and the circuit enters the fourth phase. The positive feedback around these transistors eventually brings one output back to $V_{D D}$ while allowing the other to fall to zero.

### 2.3.2.3. Double tail Latch Comparator

Figure 2.36 shows the schematic of the Double-Tail Latch type Voltage SA. This circuit has two tail that one tail for input stage and another for latching stage. It has less stacking and can therefore work when supply voltages are lower. Large size of the Transistor Mtail2 enables large current at latching stage which is independent of common mode voltages at inputs and small size of Mtail1 offers lower supply voltages resulting lower offset.

During rest phase (CLK is low), M3 and M4 charges to $V_{D D}$ which in turn charges $f_{n}, f_{p}$ nodes to $V_{D D}$. Hence MR1 and MR2 turns on and discharges output nodes to GND.

During evaluation phase (CLK is high), the tail current transistors Mtail1 and Mtail2 turns ON and $f_{n}, f_{p}$ common mode voltage decreases by discharging its current on caps $\mathrm{C}_{\text {Lfn }}, \mathrm{C}_{\mathrm{Lfp}}$. So if INP>INN that makes fn discharges faster than fp , then MR2 will turn off before MR1, therefore, Outn is connected to ground and Outp isn't any more connected to ground and the inverters start to regenerate the voltage difference
as soon as the inverters eventually brings Outp to $V_{D D}$ while allowing Outn to fall to zero and vice versa for INP $<$ INN.

Mtail1 and Mtail2 also provide additional shielding between the input and output which in turn reduces kickback noise.


Figure 2.36.: Double tail latch comparator.

### 2.3.2.4. Comparison

|  | B. Goll [45] | S. Rahmani [46] | KM Lei [47] |
| :---: | :---: | :---: | :---: |
| Technology | 65 nm | 180 nm | 65 nm |
| Supply <br> Voltage | 1.2 V | 1.2 V | 1.2 V |
| Frequency | 500 MHz | 500 MHz | 1 GHz |
| Avg. Power <br> Consumption | $329 \mu \mathrm{~W}$ | $273 \mu \mathrm{~W}$ | $153 \mu W$ |
| Delay | 550 pS | 273 pS | 117 pS |
| Offset |  | 2.07 mV | 7.8 mV |

Table 2.7.: Comparison between Different Topologies at $V_{C M}=0.6 \mathrm{~V}, \Delta V_{i n}=1$ mV .

### 2.4. Decision Feedback Equalizer

### 2.4.1. Functionality

DFE is nonlinear equalizer in Rx. It consists of Slicer, summing node and Taps. Tap coefficient can be tunable. Critical feedback timing path is one of challenges in design.

One of its advantages, It can amplify high frequency content without amplifying noise and crosstalk as the slicer eliminates noise before feedback, correct reflection losses.

One of its disadvantages, It cancel only post cursor ISI only and suffers from high power consumption especially, digital.

### 2.4.2. DFE Architectures

### 2.4.2.1. Direct Full rate DFE

- The block diagram is shown in Figure 2.37.
- Direct implementation.
- Low complexity (one summing node).
- Difficult in design for flip-flop, DFE Block in high frequency at low power consumption levels.
- Condition of critical timing path [48].

$$
\begin{equation*}
t_{c q, F F}+t_{\text {setup }, F F}+t_{F B}<U I \tag{2.38}
\end{equation*}
$$



Figure 2.37.: Direct Full Rate DFE.

### 2.4.2.2. Unrolled Full rate DFE

- The block diagram is shown in Figure 2.38.
- High complexity (two summing node, Mux).
- Condition of critical timing path.

Even though this architecture replaces the feedback delay with the MUX delay relax the critical timing path but difficulty in design for flip-flop, CDR Block in high frequency at low power consumption levels still exists.

$$
\begin{equation*}
t_{c q, F F}+t_{\text {setup }, F F}+t_{s q, M U X}<U I \tag{2.39}
\end{equation*}
$$



Figure 2.38.: Unrolled Full Rate DFE.

### 2.4.2.3. Direct Half rate DFE

- The block diagram is shown in Figure 2.39.
- High complexity (two summing node).
- Large area.
- Condition of critical timing path.

The timing here is as stringent as the timing Full rate DFE. The principal advantage of this architecture is the simpler design of the CDR circuit and, in particular, the clock buffer.

$$
\begin{equation*}
t_{c q, F F}+t_{s e t u p, F F}+t_{F B}<U I \tag{2.40}
\end{equation*}
$$



Figure 2.39.: Direct Half Full rate DFE.

### 2.4.2.4. Multiplexed Half rate DFE

- The block diagram is shown in Figure 2.40.
- Low complexity (one summing node).
- Condition of critical timing path. The timing is worse than the timing direct half rate DFE.
$t_{c q, F F}+t_{\text {setup }, F F}+t_{p, M U X}+t_{F B}<U I$


Figure 2.40.: Multiplexed Half rate DFE.

### 2.4.2.5. Loop Unrolled Half rate DFE

- The block diagram is shown in Figure 2.41.
- High complexity (more than two summing node).
- Reduces loading at summing nodes.
- Condition of critical timing path.


Figure 2.41.: Loop Unrolled Half Rate DFE.

The timing constrains is more relaxable.

$$
\begin{equation*}
t_{c q, F F}+t_{\text {setup }, F F}+t_{s q, M U X}<U I \tag{2.42}
\end{equation*}
$$

### 2.4.3. DFE Blocks

1. Gm cell: It used to amplify the input voltage signal.
2. Slicer (Comparator): It makes the Decision at clock edge from CDR and resolves the Differential input from the summing node to a binary 0 or 1. Swing of the signal on summing node effects on the slicer design.
3. Taps: Coefficient Tap values are chosen from the impulse response of the channel to get remove the post cursor.
4. Summing node: Summer performance referred to settling time is critical for DFE operation.

It can be Resistive-Load Summer or Integrating Summer.

The settling time Resistive-Load Summer is a dominant RC time-constant is formed by the load resistance and wiring and parasitic capacitance from the input stage, feedback taps, and slicer. The time constant for settling can be decreased by reducing the load resistance. However, to meet amplifier gain and voltage-swing requirements, the current must increase to compensate, resulting in a power penalty.
Integrating summer eliminates RC settling time, reduces bias current and static power but it is used in half rate DFE only.

## 3. Verilog-A Transceiver Model

In this chapter, the Verlog-A transceiver model will be explained in details. This chapter is divided into three sections. The first section shows the verilog-A Tx models, the Serializer and the FIR. Each subsection shows the functionality of each block and how the block was implemented in Verilog-A, it also shows the nonidealities of each block and how these non-idealities were taken into consideration in the Verilog-A. The results of each module will also be presented. The second section shows the Verilog-A Rx blocks. The last section shows the whole transceiver integration and the simulation results.

### 3.1. Transmitter Blocks

### 3.1.1. Serializer and De-serializer

### 3.1.1.1. Functionality

The Serializer and the De-serializer job is to convert the data from serial to parallel and vise versa. The input data of the transceiver comes from the micro-controller, but to make the output data of the micro-controller come at rate of $10 \mathrm{Gbits} / \mathrm{Sec}$, this would not be efficient to the micro-controller. In order to avoid this problem, the micro-controller divides the data into several pins, each pin runs at lower rate. The inverse of this operation is done at the receiver side.

### 3.1.1.2. Verilog-A Modeling

The Serializer and the De-serializer are based on the Transmission Gate Logic [49]. To model the operation of the serializer, the on-state of the transistor pair is replaced by a 50 Ohm resistance in the Verilog-A model, and the off-state is replaced by an open circuit. In the Verilog-A model, the If-statement is used to choose which branch is on based on the control signals. Also the node capacitance are taken into consideration in the modeling. The Verilog-A code of this block is at section A.1.

### 3.1.1.3. Test-bench and Results

In order to test the functionality of the model, the following test bench is performed. The schematic of the simulation is shown in Figure 3.1.


Figure 3.1.: Schematic of The Test Bench.

The result of the simulation is shown in Figure 3.2. A square wave is applied to the first input, and the control signals are set to activate the first branch in the first 0.5 nano-second, then the control signal changes to activate the second input, this is done by grounding the second and the third selection lines and changing the value of the first selection line. As shown in the Figure 3.2, the output shows the effect of the capacitance.


Figure 3.2.: The simulation result of the Serializer Verilog-A Model, the input is represented with the red line and the output is the red line.

### 3.1.2. Finite Impulse Response and Driver

### 3.1.2.1. Functionality and Modeling

The Finite Impulse Response model is composed of three taps for equalization (1 main-cursor, 2 post-cursors), the number of taps and its coefficient value is determined and calculated depending on the channel response of a specific channel type and length. The channel used for this model is a 30 inch of PCB FR4 channel.
As described in the survey in subsection 2.1.1, there are many methods and algorithms to calculate the coefficient values of the taps and the algorithm used in this model is zero forcing method and it will be described in this section. The main purpose of the FIR filter is to cancel the added ISI from the channel which is defined by the impulse response of the channel as shown in Figure 3.3.


Figure 3.3.: Channel Response of a 30 inch FR4.

In this model, there are three taps coefficients $\left(W_{1}, W_{2}, W_{3}\right)$ and three cursors $\left(h_{1}\right.$, $\left.h_{2}, h_{3}\right)$ for the impulse response channel shape. Due to the definition of the zero forcing algorithm, the main goal is to have an equivalent impulse response that has a clear one at the main cursor and zero at all other cursors which are expressed in the matrix $Z_{\text {des }}$, it has value one for specific index n and the rest of them are zeros as in Equation 3.1, the index n for the one value is defined with Equation 3.2. Then the channel impulse response taps coefficients are filled in matrix Hch, which has sizing defined by Equation 3.3. $k$ is the channel's pulse model length, $i$ is the number of taps and $l$ is the input symbol number. Then taps coefficients are calculated from the relation in Equation 3.4 by MatLab code included in the appendix at subsection A.2.3. The Codes of the Flip-Flop and The Driver are at section A.2.

$$
\operatorname{Hch}_{5 * 3}=\left[\begin{array}{ccc}
h_{1} & 0 & 0  \tag{3.1}\\
h_{2} & h_{1} & 0 \\
h_{3} & h_{2} & h_{1} \\
0 & h_{3} & h_{2} \\
0 & 0 & h_{3}
\end{array}\right] \quad \operatorname{dec}_{5 * 1}=\left[\begin{array}{l}
1 \\
0 \\
0 \\
0 \\
0
\end{array}\right]
$$

$$
\begin{equation*}
n=\text { number of channel pre cursor samples }+ \text { number of FIR taps }-1 \tag{3.2}
\end{equation*}
$$

$$
\text { number of rows }=k+l+i-2, \quad \text { number of colums }=i+l-1
$$

$$
\begin{equation*}
W_{Z 3 * 1}=\left(H c h_{3 * 5}^{T} * H c h_{3 * 5}\right)^{-1} * H c h_{3 * 5}^{T} * Z d e s_{5 * 1} \tag{3.4}
\end{equation*}
$$

The complete model of FIR filter is as shown in Figure 3.4, which is built with two main blocks.


Figure 3.4.: Circuit Model of The FIR.

- Driver gm cell: The driver model was built based on the Current Mode Logic (CML) driver shown in Figure 3.5. It works with digital input data of two levels (0 and 1 ) and it steers a current in the termination resistance with the specific value determined based on the output swing desired from the relation in Equation 3.5 and multiplied with a factor based on the taps coefficients, then the currents of all taps are combined at the summing node and turned into a voltage on the termination load. Verilog-A code implements that easily with the current's definition function as shown in the appendix.
- Delay Unit (Flip Flop): The delay unit is supposed to give a delay time equal to the Unit Interval (UI) of data rate, so it can be modeled with digital D-flip flop. The Verilog-A code algorithm is simply based on a cross-function which makes a transition at the output with the input data only when the clock signal has a transition with specific $V_{t h}=0.5 \mathrm{~V}$. The dir variable determines the active edge of clock rising or falling edge and the code is included in the appendix.

$$
\begin{equation*}
V_{\text {Swing diff }}=I R_{L} \tag{3.5}
\end{equation*}
$$



Figure 3.5.: CML Driver Architecture.

### 3.1.2.2. Results and Test Bench

The tap coefficient is added or subtracted depending on the value of the digital data "One" or "Zero". The test bench of the FIR module is shown in Figure 3.6. The figure shows the impulse response of the FIR filter and the values of the tap added or subtracted due to data value. The output of the FIR for a single pulse is shown in Figure 3.7.


Figure 3.6.: FIR Test-Bench.


Figure 3.7.: Pulse Response of The FIR Block.

### 3.2. Receiver Blocks

### 3.2.1. Variable Gain Amplifier

### 3.2.1.1. Functionality

The VGA helps the DFE to identify the signal as it amplify magnitude of the signal in order to eliminate the attenuation of the channel and total attenuation that affect the signal. In most cases, it is better to place VGA after CTLE block to avoid saturation.

### 3.2.1.2. Verilog-A Modeling

The VGA has been designed to provide gain from 7 dB up to 14 dB according to the channel response and the total attenuation that affect the signal. The Verilog-A code just multiply the input voltage by some factor its value is determined according to the three control signals $b_{0}, b_{1}$ and $b_{2}$. A series of "if statements" are used to achieve the proposed function by comparing the control signals with a certain threshold and decide the gain. The code of the VGA is at section A.3.

### 3.2.1.3. Test Bench and Results

In order to test the functionality of the model, the following test bench is performed. The VGA block was simulated with taking into consideration that input signal of the VGA is the output of CTLE block when integrating the whole system or just a sine wave source to test the block. And, the output of the VGA connected to the DFE block input in whole system. A load capacitance $C_{L}$ is placed at the output of the VGA to simulate the capacitance seen from the DFE in real case. The three control signals, $b_{0}, b_{1}$ and $b_{2}$, are used to determine the gain of the VGA.
Table 3.1 shows the values of control signals $b_{0}, b_{1}$ and $b_{2}$ and the corresponding gain of VGA.

| $b_{0} b_{1} b_{2}$ | Gain (dB) | Gain Referred <br> to 000 Gain <br> $(\mathrm{dB})$ |
| :---: | :---: | :---: |
| 000 | 7 | 0 |
| 001 | 8 | +1 |
| 010 | 9 | +2 |
| 011 | 10 | +3 |
| 100 | 11 | +4 |
| 101 | 12 | +5 |
| 110 | 13 | +6 |
| 111 | 14 | +7 |

Table 3.1.: VGA Model Gain VS. Input Code.

### 3.2.2. Continuous Time Linear Equalizer

### 3.2.2.1. Functionality

The proposed Continuous Time Linear Equalizer circuit topology is a differential amplifier with adaptive degeneration with variable resistance and variable capacitance, which is simple in adaptation. The desired data rate is $10 \mathrm{~Gb} / \mathrm{s}$ so the bandwidth needed is 5 GHz . To achieve this bandwidth, a bandwidth extension techniques are used like Negative Impedance Converter (NIC) to add negative capacitance as described in chapter two, another technique which is proposed in this work, it is the shunt peeking using an inductor as shown in Figure 3.8, because it keeps the bandwidth 5 GHz without consuming more power. The shunt peeking keeps the desired bandwidth and gives more gain to the peek, the adjustable resistance at the degeneration gives control over the DC gain without changing the peak gain. The adjustable capacitor controls the slope that the response increases with.

### 3.2.2.2. Modeling

A Verilog-A model is created and tested with some approximations. The first approximation is that the inductor and resistance are ideal instances with very large quality factors. The second approximation is that the transfer function does not include the self-loading due to the drain capacitance. However, it is an acceptable approximation and gives close results to the real design. The Verilog-A block modeled with the frequency response of the CTLE and transfer function in Equation 3.6. The CTLE block model has nine digital signals to control the adaptation of the CTLE and to adjust the equalization response. There are four signals to control the $R_{s}\left(D_{20}, D_{21}, D_{22}\right.$, and $\left.D_{23}\right)$ which adapt the DC gain, three signals to control $C_{s}\left(D_{10}, D_{11}, D_{12}\right)$ to adapt the peak and two signals to control $R_{D}\left(D_{30}, D_{31}\right)$ to


Figure 3.8.: CTLE With Inductive Peaking.
adapt the common mode. The code of the CTLE is attached in the appendix at section A.4.

$$
\begin{equation*}
\frac{V_{\text {out }}}{V_{\text {in }}}=\frac{g_{m 1} R_{D}}{1+g_{m 1} R_{s} / 2} * \frac{\left(1+\frac{S}{W_{z 1}}\right)\left(1+\frac{S}{W_{z 2}}\right)}{\left(1+\frac{S}{W_{p 1}}\right)\left(1+\frac{2 \approx \mathrm{~S}}{W_{n}}+\frac{S^{2}}{W_{n}^{2}}\right)} \tag{3.6}
\end{equation*}
$$

### 3.2.2.3. Test Bench and Results

The CTLE model block is tested with Alternating Current (AC) simulation to get the frequency response at specific digital control signals as shown in Figure 3.9. The frequency response of the CTLE is as shown in Figure 3.10 and confirm that the 5 GHz bandwidth is achievable with this topology. The design parameters values are listed in Table 3.2.


Figure 3.9.: CTLE Test-Bench.


Figure 3.10.: Frequency Response of The CTLE.

| Parameter | Value |
| :---: | :---: |
| $R_{D}$ | 300 Ohm |
| $R_{s}$ | 1000 Ohm |
| $g_{m}$ | $2 \mathrm{~mA} / \mathrm{V}$ |
| $L_{p}$ | 15 nH |
| $C_{s}$ | 60 fF |

Table 3.2.: Design Parameters of The CTLE.

### 3.2.3. Decision Feedback Equalizer

### 3.2.3.1. Functionality

The functionality of the Decision Feedback Equalizer is canceling the post cursor ISI and decision making of the data. The input Signal of the DFE is the output signal of CTLE and the output signal of DFE is Binary bits stream after making decision.

### 3.2.3.2. Verilog-A Modeling

The DFE architecture is a three-tap direct half rate, two slicers, one for even data sample at positive edge of clk and the other one is for the odd data sample at negative edge of clk. Frequency of Clk equal 5 GHz .

Gm cell: It is used to amplify the input voltage signal and its output current goes through the load of the summing node. The code is attached in the appendix at subsection A.5.1.

Slicer: It compares the differential voltage of summing node with threshold at clock edge (positive or negative) to determine the output binary data "Zero" or "One". The code of the slicer is at subsection A.5.2.

Taps: It is used to scale its binary data input with its coefficient and its output current goes through the load of the summing node. Value of taps equal the post cursor value normalized to the main cursor value. The code of the taps is at subsection A.5.3.

To limit the bandwidth of the GM cell, a capacitor is added to summing node to model the loading of the taps and the slicer and that Capacitor also controls the settling time of Summing Node.

### 3.2.3.3. Test-Bench and Results

In order to test the functionality of the model, the following test bench is performed. The schematic of the simulation is shown in Figure 3.11 and Figure 3.12.


Figure 3.11.: Schematic of DFE.


Figure 3.12.: Symbol of The DFE.

The bit "One" is represented by one main cursor (equal one) and three post cursor ( $0.8,0.5,0.3$ ) as in Figure 3.13 and bit "Zero" is represented with same value but with negative sign. So it achieves ISI.


Figure 3.13.: Waveform of Bit "One".

The Taps Coefficients equal the post cursor value normalized to the main cursor value so in this case the taps coefficients are ( $0.8,0.5,0.3$ ).
Example: The pattern of the input data: "101011" is shown in Figure 3.14.


Figure 3.14.: Waveform of Input Data "101011".

The output of DFE, Even data slicer makes decision at positive edge of clk and the output of the slicer changes every 200 ps . Odd data slicer makes decision at negative edge of clk and the output of slicer changes every 200 ps . The simulation results
is shown in Figure 3.15.From simulation results shown in the figure, the output " 101011 " is the same the input.


Figure 3.15.: Waveform of Output Data and Clk.

### 3.3. Verilog-A Model Integration Results

In this part, the integration of the whole system is discussed. As shown in Figure 3.16, the first part is to generate the random bits, eight random bits generators are used, each with rate of 1.25 GHz with different seeds. Then the 8 to 1 MUX is used to convert the data from parallel to serial. Then the serial data goes through the driver, then the channel. The channel used in this simulation is a 30 inch FR4 trace. The Rx part is shown in Figure 3.16, the first block is the CTLE, then the VGA, then the DFE. The last block is the De-MUX.


Figure 3.16.: Tx/Rx Verilog-A model Block Diagram.

In order to test the whole system, eight random bits generators with bit rate of 1.25 GHz are used to generate the stream of data, and a Transient Analysis simulation is performed to test the system. The best way to measure the performance of the system is to plot the Eye-Diagram of the data at different parts of the system.

The first result of the transient analysis is the eye diagram before the channel. The eye diagram is shown in Figure 3.17.


Figure 3.17.: Eye Diagram before The Channel.

The second result of the transient analysis is the eye diagram after the channel. The eye diagram is shown in Figure 3.18.


Figure 3.18.: Eye Diagram after The Channel.

After the channel, the signal goes through the CTLE block. The eye diagram after the CTLE is shown in Figure 3.19.


Figure 3.19.: Eye Diagram after The CTLE.

It can be noted from the eye diagram of the CTLE is that the DC gain is low. So, the VGA functionality is to improve the DC gain and improve the eye diagram. Eye diagram after the VGA is shown in Figure 3.20.


Figure 3.20.: Eye Diagram after The VGA.

## 4. The Design of the Receiver

In this chapter, the design of the receiver blocks will be discussed in details showing the steps used to get the design parameters. In the first section, the AFE design will be reviewed including the BGR, the LDO, the CTLE, the VGA, and the termination matching. The second section will review the digital part of the receiver including the DFE and the CDR.

### 4.1. BGR

As mentioned in subsection 2.2.1, the BGR is a very essential block in any analog circuit. In chapter three, several designs were discussed, but in this part, the design methodology of the proposed design will be discussed. The design of this BGR is based on the design in [6] with some modification to meet the new specs. The schematic of the modified BGR core is shown in Figure 4.1, also the start-up circuit is shown in the figure.


Figure 4.1.: Schematic of the BGR.

### 4.1.1. Design Methodology

The goal of this section is to determine the design parameter of the BGR. The first step is to analyze the circuit and write the design equation, then based on the design equation and the specs required, the design parameters can be determined.

### 4.1.1.1. Analysis of The BGR Core

Considering the circuit shown in Figure 4.1, the circuit has two BJT transistors, $Q_{1}$ is composed of N parallel instances; therefore, $V_{b e 1} \neq V_{b e 2}$. The NMOS transistor MN1 and MN2 have the same drain current due to the PMOS current mirror, they also have the same Gate voltage, this serves to equate the voltages at the two nodes $V_{1}$ and $V_{2}$, hence it imposes a voltage drop equal to $\Delta V_{b e}$ across $R_{1}$ that is shown in Equation 4.1.

$$
\begin{equation*}
\Delta V_{b e}=V_{T} \ln (N) \tag{4.1}
\end{equation*}
$$

where $V_{T}$ is the thermal voltage. Then $I_{11}$ can be calculated as in Equation 4.2, and $I_{12}$ can be calculated as in Equation 4.3.

$$
\begin{align*}
& I_{11}=\frac{\Delta V_{b e}}{R_{1}}  \tag{4.2}\\
& I_{12}=\frac{\left|V_{b e 2}\right|}{R_{2}} \tag{4.3}
\end{align*}
$$

From Equation 4.1, it can be easily shown that $\Delta V_{b e}$ is PTAT. The next step is to calculate $I_{1}$ as shown in Equation 4.4.

$$
\begin{equation*}
I_{1}=I_{11}+I_{12}=\frac{\Delta V_{b e}}{R_{1}}+\frac{\left|V_{b e 2}\right|}{R_{2}} \tag{4.4}
\end{equation*}
$$

The PMOS current mirror serves to equate the $I_{1}, I_{2}$ and $I_{3}$. Then the $V_{\text {ref }}$ can be calculated as shown in Equation 4.5.

$$
\begin{equation*}
V_{r e f}=I_{3} R_{3}=R_{3}\left(\frac{\Delta V_{b e}}{R_{1}}+\frac{\left|V_{b e 2}\right|}{R_{2}}\right)=\frac{R_{3}}{R_{2}}\left(\Delta V_{b e} \frac{R_{2}}{R_{1}}+\left|V_{b e 2}\right|\right) \tag{4.5}
\end{equation*}
$$

In Equation 4.5, the $\Delta V_{b e}$ serves as a PTAT voltage, and $\left|V_{b e 2}\right|$ serves as a CTAT voltage. The goal is get the values of $R_{1}, R_{2}$ and $R_{3}$ so that the first order variations in the PTAT and CTAT voltages cancels one another. This minimizes the variation in the $V_{\text {ref }}$. It is clear that the ratio of $R_{3} / R_{2}$ can be used to change the value of $V_{\text {ref }}$ without affecting the ratio of the PTAT and CTAT voltages, and the ratio of $R_{2} / R_{1}$ can be used to minimize the variations in $V_{r e f}$.
The non-idealities of the design is not considered in the above analysis. In order to take these non-idealities into considerations, the first step is to determine the source of the non-idealities.
The first concern is the differential pair, MN1 and MN2, that is used to equate $V_{1}$ and $V_{2}$. It was assumed that the two transistors have the same $V_{g s}$. So, in order to make the analysis more accurate, we should take into consideration the effect of the channel length modulation and that the $V_{d s}$ of the two transistors is not the same. So, the actual $\Delta V_{b e}$ should be calculated as in Equation 4.6.

$$
\begin{equation*}
\Delta V_{b e, a c t u a l}=V_{T} \ln (N)+V_{o s} \tag{4.6}
\end{equation*}
$$

Where the $V_{o s}$ is the offset voltage of the MN1 and MN2.
The effect of the channel length modulation can be minimized by increasing the channel length of the MN1 and MN2 transistors. This reduces the value of the $V_{o s}$.
The second source of non-ideality is the PMOS current mirror. The channel length modulation causes the current mirror ratio to be less accurate. This can be taken into consideration by adding an additional factor in the ratio of the currents. This can be shown in changing the equation of $I_{3}$ to be $I_{3}=G I_{1}$. This changes the $V_{r e f}$ equation to be as shown in Equation 4.7.

$$
\begin{equation*}
V_{r e f}=G \frac{R_{3}}{R_{2}}\left(\Delta V_{b e} \frac{R_{2}}{R_{1}}+\left|V_{b e 2}\right|\right) \tag{4.7}
\end{equation*}
$$

This non-ideality can also minimized by increasing the channel length of the PMOSs transistors.

### 4.1.1.2. Analysis of The Start-Up Circuit

Any BGR circuit has two stable states, Zero Current and desirable state. The purpose of the start-up circuit in BGRs is to ensure that the BGR core will not operate in Zero Current state. The general idea of the start-up circuit is this design is that when the BGR operates in zero currents state, the $V_{\text {ref }}=0$, then the MSU1 will be off and $I_{4}=0$, this makes $V_{x}=V_{D D}$. The second part is a two stage inverter
that works as a buffer. When $V_{X}=V_{D D}$, this makes $V_{Z}=V_{D D}$, which makes the MSU6 works as an closed switch. And $V_{y}=0$, this forces the Start $-u p$ Signal $=0$, which forces the BGR core to start-up.
The second mode of operation of the start-up circuit is when the BGR is on. In this mode, $V_{r e f}=0.6$, which makes MSU1 operates in saturation, then $V_{x}=V_{D D}-I_{4} R_{4}$. The value of $R_{4}$ is chosen so that the $I_{4} R_{4}$ term is approximately equals to $V_{D D}$, this makes the $V_{x}$ voltage equals to approximately zero and also $V_{Z}=0$, this makes the MSU6 operates as an open switch.

### 4.1.1.3. Design Steps

In this section, the design steps are explained. The flow chart of the steps is shown in Figure 4.2. The first step is to choose the length of the channel taking into the consideration the non-idealities mentioned in subsubsection 4.1.1.1.


Figure 4.2.: Flow Chart of The Design Steps for The BGR.

The second step is to get the width of the transistors from the $G_{m} / I_{d}$. The values of the currents can be obtained from the requirement on the supply current which is $>10 \mu A$, the supply current is divided into the three branches. The values of the currents can be chosen as $I_{1}=2.5 \mu A, I_{2}=2.5 \mu A$ and $I_{3}=5 \mu A$. The next step
is to choose the value of the resistors and iterate until the design satisfies the specs required.

### 4.1.2. Simulation Results

To fully test the BGR, three types of analysis were performed, DC, AC and Transient analysis.

### 4.1.2.1. DC Analysis

The purpose of this analysis is to ensure that all transistors are working in their appropriate regions. The first simulation was to show the variations of $V_{\text {ref }}$ and $I_{r e f}$ while changing the temperature from -40 to $100{ }^{\circ} \mathrm{C}$. The $V_{r e f}$ curve is shown in Figure 4.3, and the $I_{\text {ref }}$ is shown in Figure 4.4. From, it can be shown that the $V_{\text {ref }}^{\text {min }}: ~=0.60011 V @ T=100 C, V_{\text {ref } f_{\max }}=0.6014 V @ T=10 C$ and $V_{\text {ref }}^{\text {mon }}=0.6013 V @ T=27 C$, then the temperature Coefficient can be calculated as in Equation 4.8.

$$
\begin{equation*}
T C(p p m)=\frac{10^{6}}{V_{\text {ref }_{T n o m}}} * \frac{V_{r e f_{\max }}-V_{r_{\text {ref }}}}{T_{\max }-T_{\min }}=15.9 \tag{4.8}
\end{equation*}
$$



Figure 4.3.: $V_{\text {ref }}$ Versus Temperature.
Another variations that the BGR should account for is the variations of the supply voltage. To show the effect of the variations of the supply voltage over the $V_{r e f}$, a DC analysis with sweeping over the supply voltage is performed. The supply voltage is varied from 1 V to 2.5 V . The results of the simulation is shown in Figure 4.5. As shown in the figure, when the supply voltage is below 1.5 V , the BGR circuit does not operate in its right region, but as the supply voltage increases, the output of


Figure 4.4.: $I_{\text {ref }}$ Versus Temperature.
the BGR starts to be around 0.6 V . The figure can also show that the $V_{\text {ref }}$ is quite affected by the supply voltage, this can be due to the quite low PSRR, which will be shown later in this section.


Figure 4.5.: $V_{\text {ref }}$ Versus Variations in Supply Voltage.

Another variations the BGR should account for is the process variations, this can be shown from running corners simulations. In this simulation, four model libraries are taken into consideration. The four models are fast-fast, fast-slow, slow-slow and slow-fast. The variations of $V_{\text {ref }}$ with temperature across these corners are shown in Figure 4.6. It can be shown from the figure that the BGR is quite immune to corners. But in fast-fast and slow-slow corners, the $V_{\text {ref }}$ is shifted from the required value, which is 0.6 V . This shift occurs due to the change in the value of the resistances, in particular $R_{3}$. The shift can be solved by using digitally controlled resistance.


Figure 4.6.: $V_{\text {ref }}$ Versus Temperature Across Corners.

### 4.1.2.2. AC Analysis

In this part, the AC analysis is performed to test the PSRR of the BGR. The BGR should have a good PSRR to suppress the variations in the supply voltage and make $V_{r e f}$ as insensitive as possible to these variations. The PSRR is shown in Figure 4.7. As shown in the figure, the PSRR is quite low at low frequency, and as the frequency increases, the PSRR increase, this occurs due to the capacitor that is added in the output of the BGR.


Figure 4.7.: PSRR Versus Frequency.

Another simulation is the variations in the PSRR with temperature. The simulation is performed at frequency 1 KHz , while changing the temperature from -40 to 100 C. The results is shown in Figure 4.8. As shown in the figure, the PSRR increases as the temperature increases.


Figure 4.8.: PSRR Versus Temperature @ 1 KHz .

### 4.1.2.3. Transient Analysis

In this part, the transient analysis is performed to test the functionality of the startup circuit. As discussed in subsubsection 4.1.1.2, the start-up circuit function is to ensure that the BGR works in the right state. The transient response of the BGR is shown in Figure 4.9. From the figure, it can be indicated that the settling time for the BGR is approximately $60 \mu S$.


Figure 4.9.: Transient Response of The BGR.

### 4.1.2.4. Monte Carlo Simulation

In this part, the Monte Carlo Simulation is performed to test the system against process and mismatch variations. The analysis used is the transient analysis to show the start-up circuit effect. The first simulation is done at nominal corner at 27 C with process and mismatch variations. The result is shown in Figure 4.10. From the figure, the mean of the reference voltage is 600 mV and the standard deviation
equals to 10.9 mV . The figure also shows the transient response of the BGR, it shows that the start-up circuit works across process and mismatch variations.


Figure 4.10.: Monte Carlo Simulation of The BGR at Nominal Corner.

The second simulation is performed at nominal corner at two temperatures, -40 C and 100 C . The number of runs used is this simulation is 1000 runs. The results of this simulation are shown in Figure 4.11. From the figure, it can be shown that the mean of the reference voltage, when $\mathrm{T}=-40 \mathrm{C}$, equals to 599.9 mV and the standard deviation equals to 13.98 mV . And when $\mathrm{T}=100 \mathrm{C}$, the mean equal to 600 mV , and the standard deviation is 10.9 mV .


Figure 4.11.: Monte Carlo Simulation of The BGR at Nominal Corner.

Another simulation is performed at FF corner at $\mathrm{T}=-40 \mathrm{C}$, and SS corner at T $=100 \mathrm{C}$. The number of runs used in this simulation is 500 runs. The results are shown in Figure 4.12. From the figure, it can be shown that the mean of the reference voltage, when $\mathrm{T}=-40 \mathrm{C}$ at FF corner, equals to 600.4 mV and the standard
deviation equals to 13.18 mV . And when $\mathrm{T}=100 \mathrm{C}$ at SS corner, the mean equal to 600.1 mV , and the standard deviation is 11.07 mV .

(a) $V_{\text {ref }}$ Variations at $\mathrm{T}=-40 \mathrm{C}$ at FF Corner. (b) $V_{\text {ref }}$ Variations at $\mathrm{T}=100 \mathrm{C}$ at SS Corner.

Figure 4.12.: Monte Carlo Simulation of The BGR at FF and SS Corner.

### 4.1.3. Results Summary

In this section, a brief comparison between the different topologies mentioned in subsection 2.2.1 and this work is presented. Table 4.1 shows the comparison.

|  | K.N. Leung [4] | Mehesh [5] | H. Omran [6] | This Work |
| :---: | :---: | :---: | :---: | :---: |
| Technology | $0.6 \mu \mathrm{~m}$ |  | 90 nm | 65 nm |
| Min. Supply <br> Voltage | 0.98 V | 1 V |  | 1.6 V |
| Supply <br> Current | $18 \mu \mathrm{~A}$ | $25 \mu \mathrm{~A}$ | $10 \mu \mathrm{~A}$ | $10 \mu \mathrm{~A}$ |
| $V_{\text {ref }}$ | 603 mV | 540 mV | 1.2 V | 0.6 V |
| TC | $15 \mathrm{ppm} / \mathrm{C}$ <br> $* 0$ to 100 C | $109 \mathrm{ppm} / \mathrm{C}$ <br> $*-40$ to 125 C | $13 \mathrm{ppm} / \mathrm{C}$ <br> $*-40$ to 125 C | $15.9 \mathrm{ppm} / \mathrm{C} *-40$ to 100 C |

Table 4.1.: Comparison Between Different Topologies.

### 4.2. LDO

As mentioned in subsection 2.2.2, the LDO is a building blocks in power-management systems. In subsection 2.2.2, several design parameters and topologies were discussed, but in this part, the design methodology of the proposed design will be discussed. The design of this LDO is based on Cap less LDO with damping factor (Miller Compensation) topology with some modification.

### 4.2.1. Design Methodology

### 4.2.1.1. Design Steps

The Design approach is to start with

1. Feedback Resistance Network.
2. Pass Element.
3. Error Amplifier.

And design them according to the specifications of the system and their interdependent parameters.

### 4.2.1.2. Feedback Resistance Network

The targeted DC output voltage determines the gain ratio required from output feedback resistors $R_{F 1}$ and $R_{F 2}$ with respect to $V_{\text {ref }}$ as shown in equation Equation 4.9.

$$
\begin{equation*}
V_{\text {out }}=V_{\text {ref }}\left(1+\frac{R_{F 1}}{R_{F 2}}\right) \tag{4.9}
\end{equation*}
$$

The output voltage $V_{o u t}=1.2 \mathrm{~V}$ and $V_{r e f}=0.6 V$ so $R_{F 1}=R_{F 2}$.
The current flowing through the Feedback resistance contributes to the quiescent current of the LDO. For low consumption current, the value of $R_{F 1}$ and $R_{F 2}$ are chosen $1 M \Omega$. The flowing through the Feedback resistance $I_{q}$ from equation Equation 4.10 will be $0.6 \mu \mathrm{~A}$.

$$
\begin{equation*}
I_{q}=\frac{V_{\text {out }}}{R_{F 1}+R_{F 2}} \tag{4.10}
\end{equation*}
$$

Add cap in series with feedback resistor $R_{F 1}$ as mentioned in section 2.2.2.3.

### 4.2.1.3. Pass Element

Pass Element is PMOS transistor whose size is determined according to max current of LDO and voltage drop $V_{o d}$ as Equation 4.11 and max current $=1 \mathrm{~mA}$.

$$
\begin{equation*}
V_{o d}=V_{\text {in }}-V_{\text {out }}=1.8-1.2=0.6 \mathrm{~V} \tag{4.11}
\end{equation*}
$$

The $V_{D S(s a t)}$ of Pass transistor $V_{D S(s a t)}=\sqrt{\frac{2 I_{D}}{K_{p}\left(\frac{W}{L}\right)}}>0.6 \mathrm{~V}$.

$$
\begin{equation*}
\frac{W}{L}=\frac{2 I_{D M A X}}{K_{P}\left(V_{D S(s a t)}\right)^{2}} \tag{4.12}
\end{equation*}
$$

### 4.2.1.4. Error Amplifier

The error amplifier in the LDO is crucial for both line and load regulation. It must have high bandwidth to respond quickly to fast changes of the load conditions and input voltages. It must be compatible with the driving Pass transistor. To achieve steady state error of output voltage, the gain of error amplifier is larger than 55 dB . So the error amplifier is implemented using a two stage Operation Transconductance Ampilifier (OTA) with miller compensation topology.
The common mode input range error amplifier is from 0.9 V to 0.3 V so the differential pair of Error Amplifier is PMOS Transistor. Total current consumption of error amplifier is $3 \mu A$ ( $0.5 \mu A$ in the first stage and $2.5 \mu A$ in the second stage) so the operation region of PMOS Differential pair, their NMOS load transistor and driver of second stage is sub threshold to get high Gm and Rout of transistors in small current. The capacitive Load is gate capacitance of pass transistor.
The design step of error amplifier is using Systematic GM/ID design procedure as in [50].
The modifications in Error amplifier are:

- Add a resistance in series with compensator capacitor to get zero which increases the Bandwidth of the error amplifier and improve Phase Margin of LDO.
- Using cascade current mirror to improve PSRR of LDO with process and temperature variation by reducing the variation in gain of error amplifier by reducing variation in consumption current.


### 4.2.1.5. Bleeding circuit

It consists of a resistance and NMOS transistor works as switch by control signal in transistor gate as in Figure 4.13. It is used to consume the minimum current which

LDO can support to maintain the stability of the LDO.


Figure 4.13.: Bleeding Circuit.
The schematic of the whole LDO block is shown in Figure 4.14.


Figure 4.14.: Schematic of The LDO.

### 4.2.2. Simulations Results

The simulation results were divided into eight types of analysis:

- Steady-State Response: Covers the DC response of the LDO.
- Stability: Shows the open-loop frequency response of the LDO.
- Power Supply Rejection Ratio: Presents the input voltage attenuation throughout the frequency spectrum.
- Output Noise: Presents the output voltage noise spectral density throughout the frequency spectrum.
- Load Transient Response: Illustrates the LDO response to sudden full range load current variations.
- Load Regulation: Quantifies the variation of the output voltage throughout the entire load current range.
- Line Transient Response: Depicts the LDO response to sudden full range input supply variations.
- Line Regulation: Quantifies the variation of the output throughout the entire input supply range.

Each analysis comprises a set of simulations that characterizes the performance of the LDO. The simulations Consist of:

- Nominal Performance: Verifies the behavior of the LDO at normal conditions (ambient temperature of 27 C ).
- Temperature Sweep: Presents the behavior of the LDO within the temperature range (from -20 C to 100 C ).
- Temperature Sweep with process Corners: Presents the behavior of the LDO within same temperature range to get the best and worst result of the LDO.


### 4.2.2.1. Steady-State Response

Figure 4.15 shows the quiescent current $I_{Q}$ of the AFE LDO and Digital LDO throughout the entire load current range. The range of AFE LDO load current is from $200 \mu \mathrm{~A}$ to 1 mA . The range of Digital LDO load current is from $50 \mu \mathrm{~A}$ to 1 mA . The quiescent current reaches its maximum value at full-load condition.

Figure 4.16 shows power efficiency of the LDO which is expressed as in Equation 4.13. Power efficiency is below the maximum value which is expressed as in Equation 4.14.

$$
\begin{align*}
& \text { Power efficiency }(\eta)=\frac{V_{\text {out }} I_{\text {load }}}{V_{\text {in }}\left(I_{\text {load }}+I_{q}\right)}  \tag{4.13}\\
& \text { Maximum power efficiency }=1-\frac{V_{\text {od }}}{V_{\text {in }}}=66.67 \% \tag{4.14}
\end{align*}
$$



Figure 4.15.: Quiescent Current of The LDO.


Figure 4.16.: Power Efficiency of The LDO.


Figure 4.17.: Temperature Sweep of Quiescent Current of The LDO.

Figure 4.17 show the quiescent current $I_{Q}$ of the AFE LDO and Digital LDO throughout the entire load current range with Temperature sweep.

Figure 4.18 shows the maximum and minimum quiescent current of the AFE LDO and Digital LDO in load current range with Temperature sweep with process corners.


Figure 4.18.: Range of Quiescent Current of The LDO.

### 4.2.2.2. Loop Gain

Figure 4.19 shows the maximum and minimum Loop gain and phase of the AFE LDO and Digital LDO at maximum load current range with Temperature sweep with process corners.

|  | AFE LDO |  |  | Digital LDO |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | Min | Typical | Max | Min | Typical | Max |
| PM (Degree) | 25 | 30 | 100 | 25 | 45 | 100 |
| UGF (MHz) | 10 | 50 | 50 | 10 | 100 | 100 |

Table 4.2.: Stability Performance Summary of LDO at Max Current.

Figure 4.20 shows the maximum and minimum Loop gain and phase of the AFE LDO and Digital LDO at min load current range with Temperature sweep with process corners.


Figure 4.19.: Loop Gain and Phase of The LDO at Max Load Current.


Figure 4.20.: Loop Gain and Phase of The LDO at Min Load Current.

### 4.2.2.3. Power Supply Rejection Ratio

Figure 4.21 shows the PSSR of the AFE LDO and Digital LDO at max load current with Temperature sweep.


Figure 4.21.: Temperature Sweep of PSSR of The LDO at Max Load Current.

Figure 4.22shows the maximum and minimum PSSR of the AFE LDO and Digital LDO at max load current with Temperature sweep with process corners.


Figure 4.22.: Range of PSSR of The LDO at Max load current.

Figure 4.23 shows the PSSR of the AFE LDO and Digital LDO at min load current with Temperature sweep.

Figure 4.24 shows the maximum and minimum PSSR of the AFE LDO and Digital LDO at min load current with Temperature sweep with process corners.


Figure 4.23.: Temperature Sweep of PSSR of The LDO at Min Load Current.


Figure 4.24.: Range of PSSR of The LDO at Min load current.

### 4.2.2.4. Output Noise

Figure 4.25 shows the maximum and minimum output noise of the AFE LDO and Digital LDO at max load current with Temperature sweep with process corners.

### 4.2.2.5. Load Transient Response

AFE LDO: The AFE LDO has to be able to respond to sudden load current changes from $200 \mu A$ to $1 m A$, such as the one depicted in Figure 4.26. For this design, the duration of both rising and falling load current time is 100 ns .

Figure 4.27 shows how the output voltage is affected by these sudden load current changes with different load Capacitance.


Figure 4.25.: Range of Output noise of The LDO.


Figure 4.26.: Transient load current of AFE LDO.

The range of load capacitance is from 5 pF to 200 pF . The output voltage oscillates when the load capacitance less than 5 pF . The Settling time increases by increasing the load capacitance more than 200 pF .

With process, Temperature variation and Supply voltage variation, it can be observed that

- Increasing temperature leads to increase in overshoot and settling time.
- Decreasing Supply voltage leads to increase in overshoot and settling time.
- Max overshoot occurs in SS corners.
- Max Settling time occurs in FF corners.


Figure 4.27.: Nominal Load Transient Response of AFE LDO.

|  | Load capacitance $=5 \mathrm{pF}$ | Load capacitance $=200 \mathrm{pF}$ |
| :---: | :---: | :---: |
| Overshoot | $32 \mathrm{mV}(2.7 \%)$ | $41 \mathrm{mV}(3.4 \%)$ |
| Undershoot | $41 \mathrm{mV}(3.4 \%)$ | $54 \mathrm{mV}(4.5 \%)$ |
| Settling Time | 110 ns | 480 ns |

1. The settling time was calculated considering an error of less than $1 \%$.

Table 4.3.: Load Transient Response Nominal Performance Summary of AFE LDO.

Figure 4.28, Figure 4.29 and Figure 4.30 show how the output voltage is affected by sudden load current changes at worst performance corners of Overshoot and settling time with different load Capacitance.

|  | Load capacitance $=5 \mathrm{pF}$ | Load capacitance $=100 \mathrm{pF}$ | Load capacitance $=200 \mathrm{pF}$ |
| :---: | :---: | :---: | :---: |
| Overshoot | $83 \mathrm{mV}(7 \%)$ | $116 \mathrm{mV}(9.7 \%)$ | $105 \mathrm{mV}(8.75 \%)$ |
| Undershoot | $80 \mathrm{mV}(6.7 \%)$ | $105 \mathrm{mV}(8.75 \%)$ | $104 \mathrm{mV}(8.67 \%)$ |
| Settling Time ${ }^{1}$ | 700 ns | $3.15 \mu \mathrm{~s}$ | $2.65 \mu \mathrm{~s}$ |
| The settling time was calculated considering an error of less than $1 \%$. |  |  |  |

Table 4.4.: Worst Load Transient Response Performance Summary of AFE LDO.


Figure 4.28.: Worst Performance of Load Transient Response of AFE LDO at Load Capacitance $=5 \mathrm{pF}$.


Figure 4.29.: Worst Performance of Load Transient Response of AFE LDO at Load Capacitance $=100 \mathrm{pF}$.


Figure 4.30.: Worst Performance of Load Transient Response of AFE LDO at Load Capacitance $=200 \mathrm{pF}$.

Digital LDO: The Digital LDO has to be able to respond to sudden load current changes from $50 \mu A$ to $1 m A$, such as the one depicted in Figure 4.31. For this design, the duration of both rising and falling load current time is 200 ns .


Figure 4.31.: Transient load current of Digital LDO.
Figure 4.32 shows how the output voltage is affected by these sudden load current changes with different load Capacitance.

The range of load capacitance is from 1 pF to 150 pF . The output voltage oscillates when the load capacitance less than 5 pF . The Settling time increases by increasing the load capacitance more than 150 pF .

With process, Temperature variation and Supply voltage variation, it can be observed that


Figure 4.32.: Nominal Load Transient Response of Digital LDO.

|  | Load capacitance $=5 \mathrm{pF}$ | Load capacitance $=150 \mathrm{pF}$ |
| :---: | :---: | :---: |
| Overshoot | $44 \mathrm{mV}(3.7 \%)$ | $34 \mathrm{mV}(2.8 \%)$ |
| Undershoot | $47 \mathrm{mV}(3.9 \%)$ | $45 \mathrm{mV}(3.75 \%)$ |
| Settling Time $^{1}$ | 205 ns | $1 \mu \mathrm{~s}$ |

1. The settling time was calculated considering an error of less than $1 \%$.

Table 4.5.: Load Transient Response Nominal Performance Summary of Digital LDO.

- Increasing temperature leads to increase in overshoot and settling time.
- Decreasing Supply voltage leads to increase in overshoot and settling time.
- Max overshoot occurs in SS corners.
- Max Settling time occurs in FF corners.

Figure 4.33 and Figure 4.34 show how the output voltage is affected by sudden load current changes at worst performance corners of Overshoot and settling time with different load Capacitance.

|  | Load capacitance $=1 \mathrm{pF}$ | Load capacitance $=150 \mathrm{pF}$ |
| :---: | :---: | :---: |
| Overshoot | $77 \mathrm{mV}(6.4 \%)$ | $77 \mathrm{mV}(6.4 \%)$ |
| Undershoot | $88 \mathrm{mV}(7.3 \%)$ | $87 \mathrm{mV}(7.25 \%)$ |
| Settling Time $^{1}$ | 400 ns | $1.4 \mu \mathrm{~s}$ |

Table 4.6.: Worst Load Transient Response Performance Summary of Digital LDO.


Figure 4.33.: Worst Performance of Load Transient Response of Dig LDO at Load Capacitance $=1 \mathrm{pF}$.


Figure 4.34.: Worst Performance of Load Transient Response of Dig LDO at Load Capacitance $=150 \mathrm{pF}$.

### 4.2.2.6. Load Regulation

Figure 4.35 shows the output voltage variation of the AFE LDO and Digital LDO throughout the entire load current range.

The load regulation coefficient (LOR) is expressed in Equation 4.15.

$$
\begin{equation*}
L O R=\frac{\Delta V_{\text {out }}}{\Delta I_{\text {Load }^{2}}}=\frac{V_{\text {out }_{\text {Max }}}-V_{\text {out }_{\text {Min }}}}{I_{\text {Load }_{\text {Man }}}-I_{\text {Load }_{\text {Min }}}} \tag{4.15}
\end{equation*}
$$



Figure 4.35.: Output Voltage of The LDO VS. Load Current.

LOR for AFE LDO $=1.025 \mathrm{mV} / \mathrm{mA}$, and LOR for Digital $\mathrm{LDO}=1.0316 \mathrm{mV} / \mathrm{mA}$. Figure 4.36 shows the output voltage variation of the AFE LDO and Digital LDO throughout the entire load current range with Temperature sweep.


Figure 4.36.: Temperature Sweep of Output Voltage of The LDO.

Figure 4.37 shows the load regulation coefficient of the AFE LDO and Digital LDO with Temperature sweep.

### 4.2.2.7. Line Transient Response

AFE LDO: The AFE LDO has to be able to response to sudden full range input supply voltage changes ( 1.6 to 2 V ), such as the one depicted in Figure 4.38. The duration of both rising and falling input supply voltage time is of 100 nS . Moreover,


Figure 4.37.: Temperature Sweep of Load Regulation of The LDO.
the scenario to be analyzed is the full-load condition $\left(I_{\text {Load }}=1 \mathrm{~mA}\right)$ since is the most critical one.


Figure 4.38.: Transient Input Supply Voltage of AFE LDO.

Figure 4.39 shows how the output voltage is affected by these sudden input supply voltage changes with different load Capacitance. As it can be observed, the undershoot reaches its steady state a little bit faster than the overshoot.


Figure 4.39.: Nominal Line Transient Response of AFE LDO.

|  | Load capacitance $=5 \mathrm{pF}$ | Load capacitance $=200 \mathrm{pF}$ |
| :---: | :---: | :---: |
| Overshoot | $39 \mathrm{mV}(3.25 \%)$ | $53 \mathrm{mV}(4.4 \%)$ |
| Undershoot | $50 \mathrm{mV}(4.2 \%)$ | $57 \mathrm{mV}(4.75 \%)$ |
| Settling Time $^{1}$ | 110 ns | 340 ns |

Table 4.7.: Line Transient Response Nominal Performance Summary of AFE LDO.

With process, Temperature variation, it can be observed that:

- Increasing temperature leads to increase in overshoot and settling time.
- Max overshoot occurs in SS corners.
- Max Settling time occurs in FF corners.

Figure 4.40 shows how the output voltage is affected by sudden voltage supply changes at worst performance corners of Overshoot and settling time with max load capacitance.

| Corner | SS | FF |
| :---: | :---: | :---: |
| Overshoot | $84 \mathrm{mV}(7 \%)$ | $68 \mathrm{mV}(5.7 \%)$ |
| Undershoot | $84 \mathrm{mV}(7 \%)$ | $73 \mathrm{mV}(6.1 \%)$ |
| Settling Time $^{1}$ | 320 ns | $1.3 \mu \mathrm{~s}$ |

Table 4.8.: Worst Line Transient Response Performance Summary of AFE LDO.


Figure 4.40.: Worst Performance of Line Transient Response of AFE LDO at Max Load Capacitance.

Digital LDO: The Digital LDO has to be able to response to sudden full range input supply voltage changes ( 1.6 to 2 V ), such as the one depicted in Figure 4.41.The duration of both rising and falling input supply voltage time is of 200 ns . Moreover, the scenario to be analyzed is the full-load condition $\left(I_{\text {Load }}=1 \mathrm{~mA}\right)$ since is the most critical one.


Figure 4.41.: Transient Input Supply Voltage of Digital LDO.

Figure 4.42 shows how the output voltage is affected by these sudden input supply voltage changes with different load Capacitance. As it can be observed, the undershoot reaches its steady state a little bit faster than the overshoot.

With process, Temperature variation, it can be observed that:

- Increasing temperature leads to increase in overshoot and settling time.


Figure 4.42.: Nominal Line Transient Response of Digital LDO.

|  | Load capacitance $=1 \mathrm{pF}$ | Load capacitance $=150 \mathrm{pF}$ |
| :---: | :---: | :---: |
| Overshoot | $25 \mathrm{mV}(2.1 \%)$ | $38 \mathrm{mV}(3.2 \%)$ |
| Undershoot | $27 \mathrm{mV}(2.25 \%)$ | $30 \mathrm{mV}(2.5 \%)$ |
| Settling Time $^{1}$ | 210 ns | 360 ns |

1. The settlingt time was calculated considering an error of less than $1 \%$.
Table 4.9.: Line Transient Response Nominal Performance Summary of Digital LDO.

- Max overshoot occurs in SS corners.
- Max Settling time occurs in FF corners.

Figure 4.43 shows how the output voltage is affected by sudden voltage supply changes at worst performance corners of Overshoot and settling time with max load capacitance.

| Corner | SS | FF |
| :---: | :---: | :---: |
| Overshoot | $64 \mathrm{mV}(5.3 \%)$ | $44 \mathrm{mV}(3.7 \%)$ |
| Undershoot | $37 \mathrm{mV}(3.1 \%)$ | $32 \mathrm{mV}(2.7 \%)$ |
| Settling Time $^{1}$ | 180 ns | 520 ns |

1. The settling time was calculated considering an error of less than $1 \%$.

Table 4.10.: Worst Line Transient Response Performance Summary of Digital LDO.

### 4.2.2.8. Line Regulation

Figure 4.44 shows the output voltage variation of the AFE LDO and Digital LDO throughout the input voltage range.


Figure 4.43.: Worst Performance of Line Transient Response of Digital LDO at Max Load Capacitance.

The line regulation coefficient ( LiR ) is expressed as:

$$
\begin{equation*}
L i R=\frac{\Delta V_{\text {out }}}{\Delta V_{\text {in }}}=\frac{V_{\text {out }_{\text {Max }}}-V_{\text {out }_{\text {Min }}}}{V_{\text {in }_{\text {Max }}}-I_{\text {in }_{M i n}}} \tag{4.16}
\end{equation*}
$$

LiR for AFE LDO $=3.085 \mathrm{mV} / \mathrm{V} . \mathrm{LiR}$ for Digital $\mathrm{LDO}=2.5 \mathrm{mV} / \mathrm{V}$.
Figure 4.45 shows the output voltage variation of the AFE LDO and Digital LDO throughout the input voltage range with Temperature sweep.
Figure 4.46 shows the line regulation coefficient of the AFE LDO and Digital LDO with Temperature sweep.


Figure 4.44.: Output Voltage of The LDO VS. Supply Voltage.


Figure 4.45.: Temperature Sweep of Output Voltage of The LDO.


Figure 4.46.: Temperature Sweep of Line Regulation of The LDO.

### 4.2.3. Monte Carlo Simulation

Figure 4.47 and Figure 4.48 show the Overshoot and Undershoot of the AFE LDO at load Capacitance $=200 \mathrm{pF}$, Supply Voltage 1.6 V, Temperature $=100$ C, and sudden load current changes from $200 \mu \mathrm{~A}$ to 1 mA . For this design, the duration of both rising and falling load current time is 100 ns .


Figure 4.47.: Monte Carlo for Overshoot of AFE.


Figure 4.48.: Monte Carlo for Undershoot of AFE.

Figure 4.49 shows the Steady State Error of the AFE LDO at max load Current, Supply Voltage 1.6 V and Temperature $=100 \mathrm{C}$.

Figure 4.50 shows the Quiescent Current of the AFE LDO at max load Current, FF Corner, Supply Voltage 2 V and Temperature $=100$ C.


Figure 4.49.: Monte Carlo of Steady State Error of AFE LDO.

### 4.2.4. Performance Summary

### 4.2.4.1. Analog Front End LDO Summary

| Spec. | Value |  |  | Unit |
| :---: | :---: | :---: | :---: | :---: |
|  | Min. | Typ. | Max. |  |
| Supply Voltage | 1.6 | 1.8 | 2 | $V$ |
| Output Current | 0.2 |  | 1 | $m A$ |
| Overshoot of $V_{\text {out }}$ | $3 \%$ | $4.5 \%$ | $10 \%$ |  |
| Steady-State Error of $V_{\text {out }}$ |  |  | $1 \%$ |  |
| Quiescent current | 3.45 | 3.8 | 4.1 | $\mu A$ |
| PSR @ Min Current $(10 \mathrm{KHz} / 100 \mathrm{KHz})$ | $-49.5 /-44.3$ | $-54.5 /-46.4$ | $-56.6 /-47.1$ | $d B$ |
| PSR @ Max Current $(10 \mathrm{KHz} / 100 \mathrm{KHz})$ | $-36 /-30$ | $-51.4 /-44$ | $-54.3 /-49$ | $d B$ |
| EA DC Gain $(1 \mathrm{~mA} / 200 \mu \mathrm{~A})$ | $39.5 / 71$ | $65 / 79$ | $73 / 82$ | $d B$ |
| Temperature | -20 |  | 100 | ${ }^{0} C$ |

Table 4.11.: performance summary of AFE LDO.


Figure 4.50.: Monte Carlo of Quiescent Current of AFE LDO.

### 4.2.4.2. Digital LDO Summary

| Spec. | Value |  |  | Unit |
| :---: | :---: | :---: | :---: | :---: |
|  | Min | Typ. | Max. |  |
| Supply Voltage | 1.62 | 1.8 | 1.98 | $V$ |
| Output Current | 0.05 |  | 1 | $m A$ |
| Overshoot of $V_{\text {out }}$ | $3 \%$ | $4 \%$ | $8 \%$ |  |
| Steady-State Error of $V_{\text {out }}$ |  |  | $1 \%$ |  |
| Quiescent current | 3.2 | 3.5 | 3.8 | $\mu A$ |
| PSR @ Min Current $(10 \mathrm{KHz} / 100 \mathrm{KHz})$ | $-55 /-44.5$ | $-56.5 /-44.8$ | $-57.5 /-48$ | $d B$ |
| PSR @ Max Current $(10 \mathrm{KHz} / 100 \mathrm{KHz})$ | $-40.5 /-31$ | $-53 /-42.7$ | $-55.7 /-47$ | $d B$ |
| EA DC Gain $(1 \mathrm{~mA} / 50 \mathrm{AA})$ | $44 / 79.5$ | $65.5 / 83$ | $73 / 85$ | $d B$ |
| Temperature | -20 |  | 100 | ${ }^{0} \mathrm{C}$ |

Table 4.12.: performance summary of Digital LDO.

### 4.2.4.3. Comparison

| Topologies | This work | CL LDO with damping | CL Ldo with transimpedance | CL Ldo Voltage |
| :---: | :---: | :---: | :---: | :---: |
| Technology | 65 nm | $0.6 \mu \mathrm{~m}$ | $0.35 \mu \mathrm{~m}$ | $0.6 \mu \mathrm{~m}$ |
| Supply voltage (V) | 1.8 | 3 | 3 | 3 |
| Vout (V) | 1.2 | 2.8 | 2.8 | 2.8 |
| Reference Voltage (V) | 0.6 | 1.4 | 1.4 | 1.4 |
| Max Output Current (mA) | 1 | 50 | 50 | 50 |
| Min Output Current <br> $(\mu \mathrm{A})$ | 200 | 100 | 100 | 100 |
| Quiescent current ( $\mu \mathrm{A}$ ) <br> (@ min/max output current) | 3.75/3.8 | 63/60 | 46/170 | 80/100 |
| Total on chip compensation capacitance ( pF ) | 1 | 8 | 2.7 | 2.8 |
| Load transient $\Delta V_{\text {out }}(V)^{1}$ | 0.105/0.116 | 1.02/0.65 | 0.962/0.289 | 1.207/0.345 |
| Load transient <br> Settling $(\mu s)^{1}$ | 0.6/3.15 | 1.2/3.1 | 1.04/3.56 | 1.73/1.56 |
| $\begin{aligned} & \hline \text { EA DC gain (dB) (@ } \\ & \text { min/max output current) } \end{aligned}$ | 70/65 | 80/79 | 80/46 | 71/63 |
| PSR@ 50 mA (dB) <br> (@1KHz, 10KHz, 100KHz) | $\begin{gathered} -51.4 /- \\ 51.4 \\ /-44 \\ \hline \end{gathered}$ | -52/-50/-27 | -46/-26/-7 | -48/-47/-26 |
| $\begin{gathered} \text { PSR@ } 100 \mu \mathrm{~A}(\mathrm{~dB}) \\ (@ 1 \mathrm{KHz}, 10 \mathrm{KHz}, 100 \mathrm{KHz}) \end{gathered}$ | $\begin{gathered} \hline-54.5 /- \\ 54.5 \\ /-46.4 \\ \hline \end{gathered}$ | -54/-52/-38 | -50/-31/-11 | -82/-62/-39 |
| Line transient ( $m V)^{2}$ | 84/84 | 144/271 | 419/496 | 76/93 |
| Output noise spectral <br> density @ 100 kHz $(n V / \sqrt{H z})$ | 200 | 90 | 130 | 190 |

1. Worst performance for a load step from min to max current/max to min current with rise/fall times of 100 ns .
2. In this work, for an input voltage step from 1.6 V to $2 \mathrm{~V} / 2 \mathrm{~V}$ to 1.6 V with rise/fall times of 100 ns and load current $=1 \mathrm{~mA}$. In other topology, for a input voltage step from 3 V to $3.6 \mathrm{~V} / 3.6 \mathrm{~V}$ to 3 V with rise/fall times of 600 ns and load current $=100 \mu \mathrm{~A}$

Table 4.13.: Comparison Between this Work and Different Topologies.

### 4.3. CTLE

The CTLE as described in subsection 2.2.3 is one of the equalization blocks of the system and it is dealing with the signal received from the channel. The CTLE topologies and techniques are discussed in subsection 2.2.3 and the proposed topology is the same as in Verilog-A modeling in subsection 3.2.2, which is described in more detail in this chapter with design procedure and results.

### 4.3.1. Design Methodology and Procedure

This section analyzes the whole circuit of the CTLE and describes the different techniques used to build the adaptation needed to modify the CTLE response to meet the channel response as it is built to work with several channels in a specified range of equalization. Moreover, it is also introducing different modification on the circuit to improve the performance and sheds light on the design procedure used to build the circuit.

### 4.3.2. Analysis of the Main Architecture of CTLE

From a high-level point of view for the architecture of the CTLE. The circuit shown in Figure 4.51 is the conventional main core of the CTLE that consists of a differential pair amplifier with degeneration with variable resistance and capacitance to add the ability to adapt the circuit response. The frequency response required from the CTLE is the same as a high pass filter to boost the gain of high-frequency components. As it is obvious from Equation 4.17, the circuit has two poles and one zero and their values are as shown in Equation 4.18.

$$
\begin{align*}
& \frac{V_{\text {out }}}{V_{\text {in }}}=\frac{g_{m} R_{D}}{1+g_{m} R_{s} / 2} * \frac{\left(1+\frac{S}{W_{Z}}\right)}{\left(1+\frac{S}{W_{P 1}}\right)\left(1+\frac{S}{W_{P 2}}\right)}  \tag{4.17}\\
& W_{Z}=\frac{1}{R_{s} C_{s}}, \quad W_{p 1}=\frac{1+\frac{g_{m} R_{s}}{2}}{R_{s} C_{s}}, \quad W_{p 2}=\frac{1}{R_{D} C_{L}} \tag{4.18}
\end{align*}
$$

From Equation 4.18, it is obvious that $R_{S}$ and $C_{S}$ have direct control over the DC gain and the peak frequency as described in the survey in subsection 2.2.3. Also from Equation 4.18, it is clear that the circuit has a limitation on the bandwidth as $W_{p 1}$ and $W_{Z}$ are dependent on each other and that gives a tradeoff between high peak gain and bandwidth. Also for large $C_{L}$ and $R_{D}$, the bandwidth will be limited.
As the maximum data rate used in the system is $10 \mathrm{Gbit} / \mathrm{s}$, the required bandwidth of the CTLE is the Nyquist frequency which is 5 GHz . To obtain this bandwidth


Figure 4.51.: Schematic of The Conventional CTLE.
a simple bandwidth extension technique is used which implemented with a shunt inductor as shown in Figure 4.52.

The frequency response is as shown in Equation 4.19, $W_{Z 1}$ and $W_{P 1}$ still the same as in Equation 4.18 and the other two poles are dependent on each other with damping ratio $\zeta$ and natural frequency $W_{n}$ as in Equation 4.20. The shunt inductor added a zero and pole with more control over the other two poles as shown in Equation 4.21.

$$
\begin{align*}
& \frac{V_{\text {out }}}{V_{\text {in }}}=\frac{g_{m} R_{D}}{1+g_{m} R_{s} / 2} * \frac{\left(1+\frac{S}{W_{z 1}}\right)\left(1+\frac{S}{W_{z 2}}\right)}{\left(1+\frac{S}{W_{p 1}}\right)\left(1+\frac{2 \tau \mathrm{~S}}{W_{n}}+\frac{S^{2}}{W_{n}^{2}}\right)}  \tag{4.19}\\
& \zeta=\frac{R_{D}}{L_{P}} \cdot \sqrt{\frac{C_{L}}{L_{P}}}, \quad W_{n}=\frac{1}{\sqrt{C_{L} L_{P}}}  \tag{4.20}\\
& W_{Z 2}=\frac{R_{D}}{L_{P}}, \quad W_{P 1,2}=\frac{-R_{D}}{L_{P}} \pm \sqrt{\left(\frac{R_{D}}{L_{P}}\right)^{2}-\frac{1}{L_{P} C_{L}}} \tag{4.21}
\end{align*}
$$

From another point of view, the loading network as shown in Figure 4.53 is a parallel resonance circuit, and its resonance frequency depends on $L_{P}$ and $C_{L}$ and the quality factor of the inductor $Q$ as shown in Equation 4.22. With a quality factor $Q>10$ the expression of resonant frequency $W_{\text {res }}$ can be approximated to Equation 4.23, which is the same as Equation 4.20. At resonance, the equivalent of the load network


Figure 4.52.: CTLE With Shunt Inductor.
is only a resistance defined by Equation 4.24, so increasing $R_{D}$ should increase the peak gain and also the DC gain but in the same time with considering that $W_{Z 2}$ is proportional to $R_{D}$ as shown in Equation 4.21, so it will also decrease the peak gain as the zero gets farther. But as shown in Equation 4.20 the peak increases with $L_{P}$ and decrease with $R_{D}$.

$$
\begin{align*}
& W_{\text {res }}=\frac{1}{\sqrt{C_{L} L_{P}}} \sqrt{1-\frac{R_{l}^{2} C_{L}}{L_{P}}}  \tag{4.22}\\
& W_{\text {res }}=\frac{1}{\sqrt{C_{L} L_{P}}}  \tag{4.23}\\
& R_{\text {Deq }}=\frac{\left(R_{l}+R_{D}\right)^{2}+X_{L P}^{2}}{\left(R_{l}+R_{D}\right)} \tag{4.24}
\end{align*}
$$

### 4.3.3. Design Procedure and Parameters

1. The first step is to get the required bandwidth and keep the peak at the Nyquist frequency of the data rate for a defined load $C_{L}$, and that by using Equation 4.23 and get the proper $L_{P}$.


Figure 4.53.: Loading Network of The CTLE.
2. Then assuming an arbitrary current value $I_{s s}$ suitable for the aimed power consumption and using $g_{m} / I_{d}$ charts define $W / L$ that gives a certain $g_{m}$, which calculated from Equation 4.25 to achieve the required linearity rang for input differential swing. And that defines the available rang for $V_{i n C M}$ as shown in Equation 4.26 and Equation 4.27.

$$
\begin{align*}
& V_{i d, s w i n g}=\sqrt{2} V_{o d}=\sqrt{2} \frac{I_{s s}}{g_{m}}  \tag{4.25}\\
& V_{i n C M}<V_{D D}-I_{S S} R_{D}+V_{t h, d i f f, p a i r}  \tag{4.26}\\
& V_{i n C M}>V_{t h, d i f f, p a i r}+V_{o v, d i f f, p a i r}+V_{o v, d i f f, p a i r} \tag{4.27}
\end{align*}
$$

3. Calculating $R_{D}$ based on the required output differential swing from Equation 4.28.

$$
\begin{equation*}
V_{o u t, d i f f, s w i n g}=2 I_{s s} R_{D} \tag{4.28}
\end{equation*}
$$

4. Setting a suitable value for $R_{s}$ that get the required DC gain as in Equation 4.29.

$$
\begin{equation*}
\text { DC Gain }=\frac{g_{m} R_{D}}{1+g_{m} R_{s} / 2} \tag{4.29}
\end{equation*}
$$

5. Calculating $C_{s}$ to make sure that $W_{Z 2}$ and $W_{P 1}$ cancel each other.
6. Checking the peak gain and its boosting over the DC gain and if it does not meet the required rang, it can be adjusted by redefining $R_{D}$ or changing the assumed current and revisit the steps.

### 4.3.4. Implementation of Variable $R_{S}$ and $C_{S}$ for Equalization Adapting

- Variable $R_{s}$ : As described before $R_{s}$ gives a direct control over the DC gain and that boosting the gain of high frequency components over the DC to equalize the frequency response. $R_{s}$ will be controlled with digital signals so the best way to implement it is using multiple parallel slices of resistances with MOSFET switches to control the number of slices that are on or off to get a defined value for $R_{s}$. As shown in figure Figure 4.54, $R_{s}$ network composed of four slices with different values of rp-poly resistance. Every slice has an NMOS transistor working as a switch to turn the slice on or off. The NMOS transistor operates in the triode region to give a small on-resistance $R_{o n}$ and to be linear as shown in Equation 4.30, the gate voltage is a digital one and constant so at DC operating point $R_{o n}$ has the same value. The slices have different resistance values to get a large range of adaptation of DC gain.

$$
\begin{equation*}
R_{o n}=\frac{1}{\mu_{n} C_{o x} \frac{W}{L}\left(V_{g s}-V_{t h}\right)} \tag{4.30}
\end{equation*}
$$



Figure 4.54.: Slices of The $R_{S}$ Network.

- Variable $C_{S}$ : As described before $C_{S}$ gives control over the peak and the slope of gain boosting. Similar to $R_{S}$, it is implemented with three parallel slices of caps with NMOS switches and one permanent capacitor as shown in figure (5). Also the NMOS transistors are operating in the triode region with large $W / L$ to get very small $R_{o n}$ as shown in Equation 4.30. The transistors will add capacitance to the slides due to parasitic capacitance and that forming a network of capacitance depending on transistor sizing. So, the calculations must include that and to measure capacitance for every slice individually.
- Common Mode Feedback (CMFB): In amplifiers, input and output common mode must be well defined for the block to operate in the right DC operating point and the same to the next derived block. Because of the mismatch in the


Figure 4.55.: Slices of The $C_{S}$ Network.
fabrication process there is some error in mirroring current and the load even it is a passive load or an active load and that affects directly on the output common mode as shown in Equation 4.31.

$$
\begin{equation*}
V_{o u t, C M}=V_{D D}-I_{S S} R_{D} \tag{4.31}
\end{equation*}
$$

So to solve this issue one of the two parameters $R_{D}$ or $I_{S S}$ should be adjusted to get the right common mode level and that is done by a feedback circuit to adapt them whatever the mismatch is after fabrication. The first technique used in that is by using a circuit to sense the common mode level and compare it with the right common mode and feedback to control the current tail or the biasing of the active load as in [51]. The technique used in this work is very simple which is based on adapting $R_{D}$ and implemented by parallel slices of resistance with PMOS switches to control the $R_{D}$ value and directly adapt the common mode level as shown in Figure 4.56. The PMOS switches controlled by the digital signal and operating in the triode region.


Figure 4.56.: Slices of $R_{D}$ Network.

- Offset Cancellation: Fabrication mismatch not only causing shifting in common mode level, but it also has another problem. In differential amplifiers, if there is a mismatch between the two halves of the differential amplifier it shifts the common mode level in each half with a different value. That leads
to an offset in the output differential signal. This is a serious issue as the differential signal has a threshold voltage to be sampled about it the data is, and that offset corrupting the sampling. It requires an additional circuit to correct the common mode level as described before but in each half separately. The circuit used in this work is Current Steering DAC. It sinks current from one-half load with a value defined by a digital feedback circuit to adapt the common mode level in one half and that is enough to cancel the offset voltage. As shown in Figure 4.57, the current steering DAC compose of one differential pair to decide which half to correct its common mode and parallel slices of current tails controlled digital signals.


Figure 4.57.: Current Steering DAC.

### 4.3.5. Full Schematic of CTLE with Offset Cancellation Circuit

The full schematic of the circuit is as shown in Figure 4.58 with the Current Steering DAC. All adaptation signals controlled with a digital control circuit to adapt the response of the CTLE based on the output signal and its equalization as described before in subsection 2.2.3. The inductor layout used in this work is the symmetric spiral inductor as it gives higher L and higher quality factor.

### 4.3.6. Simulation Results

The technology used to build the CTLE is $65-\mathrm{nm}$ and with a supply voltage of 1.2 V . The CTLE frequency response with the adaptation of $R_{S}$ and $C_{S}$ is as shown in Figure 4.59 and Figure 4.60. Equalized channel loss by the CTLE has range as shown in Table 4.14. The common mode calibration with $R_{D}$ changes the frequency response as shown in Figure 4.61a, but it can be re-adjusted with the adaptation.


Figure 4.58.: Full CTLE Schematic with Offset Cancellation Circuit.

The CTLE response across process corners and temperature variation is as shown in Figure 4.61 b and Figure 4.61c. The current consumption is $500 \mu \mathrm{~A}$. The average power consumption is varying slightly with the input swing as shown in Figure 4.61d.


Figure 4.59.: CTLE Frequency Response with $R_{S}$.


Figure 4.60.: CTLE Frequency Response with $C_{S}$.

| Digital Control Signal | DC Gain (dB) | Peak Gain (dB) | Channel Loss Equalized (dB) |
| :---: | :---: | :---: | :---: |
| D2[0] D2[1] D2[2] D2[3] |  |  |  |
| 1110 | 0.65 | 1.5 | 2.15 |
| 0111 | 1 | 1.5 | 2.5 |
| 1001 | 1.75 | 1.7 | 3.25 |
| 0101 | 2.25 | 1.9 | 4 |
| 0011 | 3.5 | 1.9 | 5.4 |
| 0100 | 4.5 | 2 | 6.5 |
| 0010 | 5.5 | 2 | 7.5 |
| 0001 |  |  |  |

Table 4.14.: Channel Loss Equalized with Digital Control Signals.


(a) CTLE Frequency Response with $R_{D}$.

(c) CTLE Response Across Temperature.

(d) Average Power Consumption versus Input17 Swing.

Figure 4.61.: Simulation Results of The CTLE.

The common-mode rejection ratio and offset current are measured from Monte Carlo simulations as shown in Figure 4.62.

(a) CMRR for The CTLE.

(b) Offset Current in The CTLE.

Figure 4.62.: Monte Carlo Simulation of The CTLE.

### 4.4. VGA

### 4.4.1. Proposed topology

The proposed topology of VGA is chosen based on the required specs on VGA block which stated in Table 4.15. The below specs are stated to achieve low power consumption with reasonable performance since the needed gain from VGA not relatively large and not needed to support low gain (attenuation). A single stage common source amplifier with degeneration resistance structure was chosen as proposed topology as shown in Figure 4.63. This topology is considered digitally controlled and not support a continuous variations in gain. It has the same structure as digital controlled topologies that mentioned in survey chapter but, with only single stage because the needed gain from VGA block in the designed system is not a large gain relatively. Also, there are three additional parts on this circuit illustrated in next sections.


Figure 4.63.: The Schematic of The Proposed VGA.

As known, the proposed topology has one dominant pole at output node. So, 3 dB-bandwidth of the circuit equal to $\frac{1}{R_{D} C_{1}}$ and overall gain of the circuit equal to $\frac{g m R_{D}}{1+g m R_{S}}$.

### 4.4.2. Offset Cancellation

Circuit The main goal of offset cancellation circuit is to keep the current flow in both differential branches of VGA equal. The inequality of the current happens because

| Spec | Value |
| :---: | :---: |
| Technology $(\mathrm{nm})$ | 65 |
| Bandwidth $(\mathrm{GHz})$ | 5 min |
| Current Consumption $(\mu A)$ | 200 max |
| Gain Range $(\mathrm{dB})$ | from 4 up to 9 |

Table 4.15.: The Required Specs of The VGA Block.
of fabrication mismatching of transistors. Offset cancellation circuit as shown in Figure 4.64 is implemented using current steering DAC [52]. This DAC in consists of 6 slices representing 6 bits DAC. In ideal case, current steer equally in both branches and we can consider that offset cancellation circuit is not exist in this case. In case of two branches has mismatch the current steering DAC can be redirected to cancel this effect and equalize current in both branches. The designed circuit can provide a current up to $4 \mu A$ to be used to cancel current mismatches between two branches.


Figure 4.64.: Schematic of Offset Cancellation Circuit.

### 4.4.3. Common Mode

Adaptation Circuit Common mode adaptation circuit main goal is to adapt the resistor $R_{D}$ to be equal in both differential branches of VGA. A mismatch between resistors may be happen due to fabrication process. Common mode adaptation circuit simply implemented by slices parallel to default $R_{D}$ and controlled by PMOS switches. By adding slices in parallel to default $R_{D}$, the overall resistance can be changed and adapted keep common mode voltage value at a certain point and equal in both differential branches of VGA.

### 4.4.4. Fixed-Gain Amplifier and Buffer

For the standalone VGA measurement purpose only, a buffer is used for the VGA to drive a low-impedance load. The schematic of the fixed-gain amplifier and the buffer are shown in Figure 4.65. This amplifier provides a fixed gain of 10.5 dB to compensate for the buffer loss so that the resultant lowest gain can be boosted from
10.5 to 0 dB . A differential buffer is designed and added as the last stage. The buffer is simulated as having a loss of 10.8 dB at all frequencies.


Figure 4.65.: Schematics of Fixed-Gain Amplifier and Buffer.

### 4.4.5. Simulation \& Results

The above topology with all additional parts except buffer and fixed gain amplifier was designed and simulated to test its functionality. Figure 4.66 shows the gain of VGA with varying degeneration resistance $R_{S}$ through changing control signals. Also, Table 4.16 shows the different gains of VGA corresponding to control signals. The supported gain range of VGA is relatively small as illustrated before and it is power optimized design as the total power consumption is $180 \mu W$ (current consumption is $150 \mu A$ ) and this efficient compared to [37] that use 11 cascaded stages of VGAs, every stage consume $240 \mu W$ and provide about $2 d B$ gain. The design in this work is completely adaptive and tunable using degeneration resistance, load resistance adaptation and IDAC circuit. Also, the design has immunity to supply voltage and current reference variations as shown in Figure 4.67 and Figure 4.68. Figure 4.69 and Figure 4.70 show the gain of VGA with process corners and temperature from $-40^{\circ} \mathrm{C}$ to $100^{\circ} \mathrm{C}$. All simulation results were done at maximum allowable gain (i.e. minimum degeneration resistance). Table 4.17 summarizes the performance of recently published work about VGA compared to this work.


Figure 4.66.: VGA Gain With Changing $R_{S}$.

| $b_{0} b_{1} b_{2} b_{3} b_{4} b_{5}$ | Gain $(d B)$ |
| :---: | :---: |
| 000001 | 3.7 |
| 000010 | 5.2 |
| 000100 | 6.1 |
| 001000 | 7.3 |
| 010000 | 8 |
| 100000 | 9 |

Table 4.16.: VGA Gain and Corresponding Control Signals.

|  | $[37]$ | $[38]$ | $[39]$ | $[40]$ | $[41]$ | This Work |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Technology $(\mathrm{nm})$ | 65 | 180 | 90 | 180 | 65 | 65 |
| Gain Range $(\mathrm{dB})$ | 22 | 34 | 60 | -16.6 to 6.5 | 3 to 31 | 3 to 9 |
| Bandwidth $(\mathrm{GHz})$ | $2-2.2$ | 1.1 | 2.2 | 5.6 | 0.02 to 0.98 | $5-7$ |
| Power $(\mathrm{mW})$ | 3.48 | 0.7 | 2.5 | 7.9 | 48 | 0.18 |
| Gain Control Mode | Analog | Analog | Analog | Digital | Digital | Digital |

Table 4.17.: Performance Summary and Comparison With This Work.


Figure 4.67.: VGA Gain With Changing VDD $\pm 5 \%$.


Figure 4.68.: VGA Gain With Changing $I_{r e f} \pm 15 \%$.


Figure 4.69.: VGA Gain With Process Corners.


Figure 4.70.: VGA Gain With Varying Temperature From $-40^{\circ} \mathrm{C}$ to $100^{\circ} \mathrm{C}$.

### 4.5. Termination Calibration Circuit

As mentioned in section 2.3, this circuit is used to calibrate on chip slices to off chip external $50 \Omega$ resistance for channel termination to improve matching in order to have no reflection on the channel. As shown in Figure 4.71, the proposed design of this circuit which includes:

- Latched Comparator followed by SR latch.
- An off chip reference resistance ( $50 \Omega$ ) with high precision.
- Slices block with five PMOS slices with on chip resistances and one resistance which always on.
- Binary Search block which is a Verilog-A module designed to calibrate the slices.
- Current mirror to push the same current to the reference resistance and the slices.


Figure 4.71.: Termination Calibration Circuit.

### 4.5.1. Latched Comparator

The proposed topology for comparator is the double tail latched comparator as shown in Figure 4.72.


Figure 4.72.: Double Tail Latched Comparator.

While the clock is low, the output nodes (Outp and Outn) are discharged to ground. When the clock switches to high, the currents flowing in M7 and M8 charge the output nodes at different rates depending on the values of input voltages INP, INN. When one of the output voltages reaches the threshold voltage $V_{t h n}$ of the NMOS, a feedback operations starts and the two outputs eventually evaluate to VDD and ground. M1, M2, Mtail1, MR1 and MR2 are responsible for amplification. M5-M8 and Mtaill2 are responsible for regeneration.

The delay time can calculated as the time difference between $50 \%$ levels of clock and output as shown in Figure 4.73, the delay can be modeled as the summation of $t_{0}$ and $t_{\text {Latch }}$ and these equations can be given by Equation 4.32 and Equation 4.33, where $t_{0}$ represents the delay of the first stage (Preamplifier) due to capacitive discharge of the nodes (fn, fp) until the first NMOS transistor (MR1-MR2) turns on, and $t_{\text {Latch }}$ represents the time taken by the cross-coupled inverters to regenerate output to the settled values.

$$
\begin{equation*}
t_{0}=\frac{C_{f n, f_{p}} V_{t h n}}{I_{1}}=\frac{2 C_{f n, f p} V_{t h n}}{I_{M t a i l 1}} \tag{4.32}
\end{equation*}
$$

Where $C_{f n, f p}$ is the total parasitic capacitances at nodes ( $\mathrm{fn}, \mathrm{fp}$ ), $V_{t h n}$ is the threshold voltage of MR1-MR2 and $I_{\text {Mtail1 }}$ is the common mode current in Mtail1.

$$
\begin{equation*}
t_{\text {latch }}=\frac{C_{L}}{g m_{\text {eff }}} \ln \left(\frac{\Delta V_{\text {out }}}{\Delta V_{0}}\right)=\frac{C_{L}}{g m_{\text {eff }}} \ln \left(\frac{V_{D D} / 2}{\Delta V_{0}}\right) \tag{4.33}
\end{equation*}
$$

Where $C_{L}$ is the load capacitance at output nodes (Outp, Outn), $g m_{e f f}$ is the effective transconductance of the cross-coupled inverters, $\Delta V_{\text {out }}$ is the differential voltage at time $=t_{\text {latch }}$ which will be $V_{D D} / 2$ and $\Delta V_{0}$ is the output differential voltage at time $=t_{0}$.

The total delay time is given by Equation 4.34.

$$
\begin{equation*}
t_{\text {delay }}=t_{0}+t_{\text {latch }}=\frac{2 C_{f n, f p} V_{\text {thn }}}{I_{M t a i l 1}}+\frac{C_{L}}{g m_{e f f}} \ln \left(\frac{V_{D D} / 2}{\Delta V_{0}}\right) \tag{4.34}
\end{equation*}
$$



Figure 4.73.: Transient Behavior of The Proposed Comparator.

The comparator is designed for wide common mode range for the first stage with range ( 400 mV to 1.1 V ) using $G m / I d$ design methodology, the input pair should be large enough for low offset and switches Mtail1, M3 and M4 is designed to operate in linear region with small L for high speed switching and Mtaill should be small low input resolution. The input diff pair operation region changes with clock, so while in pre-charging phase (clock is low), the input transistors are off and when the clock switches to high, transistors operate in saturation region for proper operation on the specified common mode range. The second stage is designed for minimum latching delay by designing the two cross-coupled inverters which responsible for regeneration to have delay and Mtail2 should be large enough for fast latching.

### 4.5.2. SR latch

The comparator is followed by SR latch which is designed as two cross-coupled NAND gates, shown in Figure 4.74, operates as follows: input $\bar{S}$ is a set input and $\bar{R}$ is a reset input. The low level at both $\bar{S}$ and $\bar{R}$ node is not permitted and that is guaranteed by the SA stage. The low level at $\bar{S}$ sets the Q output to high, which in turn forces $\bar{Q}$ to low. Conversely, the low level at $\bar{R}$ sets $\bar{Q}$ the high, which in turn forces Q to low. Each NAND gate is designed as the CMOS logic design which the PMOS width is double of the NMOS width for equal $t_{p l h}$ and $t_{p h l}$ and minimum propagation delay.


Figure 4.74.: SR latch circuit.

### 4.5.3. Resistors with PMOS Slices

Each Slice contains parallel connected resistors and control transistors which can operate in cut off or triode regions. Since these resistors are controlled by binary calibration signals (which come from binary search block), each block has a resistance two times smaller than the next one. In proposed circuit contains six slices with resistors: for example, the first block has the smallest resistance and contains 16X resistors connected in parallel, the fifth one contains only one resistor with the same X value and has the highest resistance.

The MSB binary code connected to the smallest resistance group (with 16 resistances connected in parallel) and LSB to the highest resistance ( $\mathrm{R}=\mathrm{X}$ ). To avoid excessively large resistances of $R_{c a l}$, there is also a block of resistance which is always on and has the smallest resistance $(\mathrm{R}=32 \mathrm{X})$ as shown in Figure 4.75.


Figure 4.75.: Resistors with Slices Circuit.

On chip resistor vary about $\pm 25 \%$ of its value for all PVT conditions, so in the proposed design the unit resistor R is designed as rpoly resistor technology and equal to $2 \mathrm{~K} \Omega$. So at the initial value of binary search code $\left(V_{\text {cal }}=10000\right)$ the total resistance is about $40 \Omega$. When all slices are on $\left(V_{c a l}=11111\right)$ the total resistance is about $31 \Omega$ and when all slices are off $\left(V_{\text {cal }}=00000\right)$ the total resistance is about 62 $\Omega$.

### 4.5.4. Current Mirror Circuit

This circuit is used to push the same current in the reference resistor ( $50 \Omega$ ) and the slices block, so the proposed design shown in Figure 4.76 is the cascade current mirror for low mismatch between $I_{\text {cal }}$ and $I_{\text {ref }}$ which guarantees low error in the calibrated resistance, and it is chosen so that $\mathrm{I}=300 \mu \mathrm{~A}$.


Figure 4.76.: Current Mirror circuit.

### 4.5.5. Binary Search Counter

This block is modeled using Verilog-A to generate the digital signals for the slices by comparing the output signals from the latched comparator $(Q, \bar{Q})$ at the positive edge of the clock which is delayed by the comparator clock by the propagation delay $t_{C Q}$.

For general the algorithm for binary search, if count is up, set the next bit and if count is down, reset the set bit and set the next bit. Initially the output binary signals start from state (10000) and if Q is high it counts up and moves to state (11000) else it counts down and moves to state (01000) and so on until the output settled after five clocks for the five bits and so that the binary search is faster than linear search as the number of clocks equal to the number of bits.

### 4.5.6. Simulation Results

### 4.5.6.1. Latched Comparator Results

As shown in Figure 4.93, the output transient response of the latched comparator while the input common mode voltage is 0.6 V and the differential voltage is 5 mV at 1 GHz clock.

The Output of SR Latch $(Q, \bar{Q})$ trigger while clock is high after the propagation delay $\left(t_{C Q}\right)$ and reset again to VDD while clock is low.


Figure 4.77.: Comparator Transient Response.
The propagation delay $\left(t_{C Q}\right)$ changes with different inputs so for high differential voltage, the delay is minimum as it refers to high differential currents in both branches which will decrease the discharging time of the capacitance of the first stage so that the delay is decreased as total.

As shown in Figure 4.78 and Figure 4.79, the Delay while changing the common mode voltage and differential voltage respectively.


Figure 4.78.: Delay versus Common mode Voltage $\left(V_{d i f f}=5 \mathrm{mV}\right)$.


Figure 4.79.: Delay versus Differential Voltage ( $\left.V_{C M}=600 m V \& 1 V\right)$.

The average power consumption changes while sweeping on common mode voltage and differential voltage as shown in Figure 4.80 and Figure 4.81 respectively.

As shown in Figure 4.82 and Figure 4.83, the effect of process corners on the delay and average power consumption while sweeping on common mode voltage.
As shown in Figure 4.84, the mismatch offset using Monte Carlo simulation for 200 samples. The offset can be calculated as the standard deviation of this distribution which equals to 20.42 mV . The offset is calculated as the input differential voltage when the output is triggered, so one of the inputs is constant at VDD/2 and the other has very low slope ramp for high resolution.

The final comparison between the proposed design and other Comparator topologies designs as discussed in chapter three is shown in Table 4.18.


Figure 4.80.: Avg. Power Consumption versus Common mode Voltage ( $V_{d i f f}=$ 5 mV ).


Figure 4.81.: Avg. Power Consumption versus Differential Voltage ( $V_{C M}=$ $600 \mathrm{mV} \& 1 \mathrm{~V}$ ).


Figure 4.82.: Delay versus Common mode Voltage ( $V_{\text {diff }}=5 \mathrm{mV}$ ) with Process Corners.


Figure 4.83.: Avg. Power Consumption versus Common Mode Voltage ( $V_{\text {diff }}=$ 5 mV ).


Figure 4.84.: Monte Carlo Offset Histogram.

|  | This Work | KM Lei[47] | B. Goll[45] |
| :---: | :---: | :---: | :---: |
| Technology | 65 nm | 65 nm | 65 nm |
| Supply <br> Voltage | 1.2 V | 1.2 V | 1.2 V |
| Frequency | 1 GHz | 1 GHz | 500 MHz |
| Avg. Power <br> Consumption | $52 \mu W$ | $153 \mu W$ | $329 \mu W$ |
| Delay | 230 pS | 177 pS | 550 pS |
| Offset | 20.4 mV | 7.8 mV |  |

Table 4.18.: Comparison between Different Topologies.

### 4.5.6.2. Termination Calibration Results

The total calibrated resistance is calculated as shown in Figure 4.85 which changes every clock period while the binary search counter is running to get an equivalent termination of $50 \Omega$ with less error. There is some notches in the calibrated resistance as shown in Figure 4.85 because of the non-ideality of the switches in each slice so when the binary signal triggers from high to low, PMOS switch will not turn on to linear simultaneously so $R_{o n}$ of the switch increases in this transition. The calibrated resistance (at typical conditions and $27^{\circ} \mathrm{C}$ ) equals to $49.65 \Omega$ with $0.7 \%$ error.


Figure 4.85.: Calibrated Resistance Response (TT, Temp. $=27^{\circ} \mathrm{C}$ ).

The slow and fast corner results for calibration resistances is shown in Figure 4.86 and Figure 4.87 respectively. It is seen from waveforms that in the slow case resistance is settled at $49.3 \Omega$ while in the fast corner is settled at $49.09 \Omega$. All process corners with termination settled values is shown in Table 4.19 with $2 \%$ maximum error of the $50 \Omega$ matching termination.


Figure 4.86.: Calibrated Resistance Response at Slow Corner (SS, Temp. $=125$ ${ }^{\circ} \mathrm{C}$ ).


Figure 4.87.: Calibrated Resistance Response at Fast Corner (FF, Temp. $=-40$ ${ }^{\circ} \mathrm{C}$ ).

| Corners | Calibrated Resistances |
| :---: | :---: |
| TT, Temp. $=27^{\circ} \mathrm{C}$ | $49.65 \Omega$ |
| SS, Temp. $=27^{\circ} \mathrm{C}$ | $49.9 \Omega$ |
| FF, Temp. $=27^{\circ} \mathrm{C}$ | $49.35 \Omega$ |
| FS, Temp. $=27^{\circ} \mathrm{C}$ | $49.72 \Omega$ |
| SF, Temp. $=27^{\circ} \mathrm{C}$ | $49.28 \Omega$ |
| SS, Temp. $=125^{\circ} \mathrm{C}$ | $49.3 \Omega$ |
| FF, Temp. $=-40^{\circ} \mathrm{C}$ | $49.09 \Omega$ |

Table 4.19.: Calibrated Resistances Values across Corners.

### 4.6. Decision Feedback Equalizer

As mentioned in section 2.4, several DFE Architectures were discussed with its advantages and disadvantage. The proposed DFE Architecture is 3 Taps Direct Half rate DFE as its principal advantage is the simpler design of the DFE blocks and CDR circuit and, in particular, the clock buffer. The design is shown in Figure 4.88, it is for data rates of $10 \mathrm{~Gb} / \mathrm{s}$. Number of taps is 3 taps to have flexibility to equalize the transmitted data for different channel.

The Half-Rate DFE's main challenge is the time constrain on the feedback loops. The critical path is the second tap's loop as it has constrain on the delay of the loop as shown in Equation 4.35.

$$
\begin{equation*}
t_{C Q-\text { slicer }}+t_{\text {setup-FF }}+t_{C Q-F F}+t_{\text {Settle-sum-node }}<U I \tag{4.35}
\end{equation*}
$$

In this part, The DFE blocks (Slicer, D-flip-flop (DFF), Gmcell, Taps and assorted buffers) will be designed at the Transistor level and view its simulated results.


Figure 4.88.: Full Schematic of The DFE Block.

### 4.6.1. Slicer

The proposed topology for Slicer is a double tail latched comparator followed by SR latch. The comparator is designed for wide common mode range ( $400 \mathrm{mV}-1.1 \mathrm{~V}$ ) and clock frequency of 5 GHz . More information about the analysis and the design of this circuit are in subsection 4.5.1.

### 4.6.1.1. SR Larch

The SR latch is implemented as an AND-OR structure instead of two cross-coupled NAND gate as that implementation as shown in Figure 4.89 has less delay than two cross-coupled NAND gate [43].

### 4.6.1.2. Kickback noise in a dynamic latched comparator

One of non-idealities which dynamic latched comparator suffers from is Kickback noise [47]. Upsizing the input MOS pair of the comparator will generate more kickback noise that is proportional to the drain-gate parasitic capacitance of the input MOS pair. Such a kickback noise is dynamic as the input MOS pair in the


Figure 4.89.: SR Latch Implement AND-OR Structure.
whole comparison process can operate over a number of regions (cutoff, saturation and triode), significantly influencing the accuracy of the decision.

Assuming that both $V_{p}$ and $V_{n}$ are connected to $V_{C M}$, and when the CLK is low, both $D i^{+}$and $D i^{-}$will be pulled-up to VDD, grounding $V_{\text {out }, p}$ and $V_{\text {out }, n}$. During this time interval the comparator will be in the reset mode. Node A is floating and therefore $V_{G S}$ is uncertain. The input MOS pair (M1 and M2) has already been turned off since no current can flow from $D i^{+}$to $D i^{-}$to node A. The charge induced at the gate of M1 and M2 is given by Equation 4.36.

$$
\begin{equation*}
Q_{G, o f f}=V_{C M} C_{G B}+V_{G S} C_{G S O}+\left(V_{C M}-V_{D D}\right) C_{G D O} \tag{4.36}
\end{equation*}
$$

Where $C_{G B}$ is the equivalent capacitance between the gate and substrate. $C_{G S O}$ and $C_{G D O}$ are the gate-source and gate drain overlap capacitances. $V_{G D}$ is below 0 V in this region. The approximate value of the capacitance can be obtained by performing DC simulation.

When the rising edge of CLK happens, M4 and M5 are turned off while M3 becomes on. Node A is grounded first, driving M1 and M2 into the saturation region ( $V_{G S}>V_{t}$ and $V_{G D}<V_{t}$, where $V_{t}$ is the threshold voltage). At this point, the charges induced throughout the channel can be denoted by $Q_{c h, s a t}$, which is given
by Equation 4.37.

$$
\begin{equation*}
Q_{c h, s a t}(t)=-\frac{2}{3} W L C_{o x}\left(V_{C M}-V_{A}(t)-V_{t}\right) \tag{4.37}
\end{equation*}
$$

The minus sign in Equation 4.37 indicates that the charges induced in the channel are constituted by electrons (NMOS). In this operating region the channel acts as a shield between the gate and substrate. The induced charge caused by variation of $V_{G}$ will be provided from the source and drain of the transistor. The charge induced to the gate is thus given by Equation 4.38.

$$
\begin{equation*}
Q_{G, s a t}(t)=-Q_{c h, s a t}(t)+\left(V_{C M}-V_{A}(t)\right) C_{G S O}+\left(V_{C M}-D i_{+}(t)\right) C_{G D O} \tag{4.38}
\end{equation*}
$$

When M1 and M2 are turned on, nodes $D i_{+}$and $D i_{-}$start to discharge through them. The rate of discharge depends on the input voltage. The node ( $D i_{+} / D i_{-}$) on the side which input voltage $\left(V_{p} / V_{n}\right)$ is larger will discharge faster. The comparator generates the result depending on the discharging rate of $D i_{+}$and $D i_{-}$by triggering the positive feedback formed by the two cross-coupled inverters composed by M7, M8, M10 and M11. The result is stored at the output until CLK is low again. Then, the comparator continues with the next comparison. As $D i_{+}$and $D i_{-}$are small enough in this time interval, M1 and M2 will be driven into the triode region and the amount of charge induced on the channel is given by Equation 4.39.

$$
\begin{equation*}
Q_{c h, \text { triode }}(t)=-W L C_{o x}\left(V_{C M}-V_{A}(t)-V_{t}\right) \tag{4.39}
\end{equation*}
$$

The total charge induced at the gate is given by Equation 4.40.

$$
\begin{equation*}
Q_{G, \text { triode }}(t)=-Q_{c h, \text { triode }}(t)+\left(V_{C M}-V_{A}(t)\right) C_{G S O}+\left(V_{C M}-D i_{+}(t)\right) C_{G D O} \tag{4.40}
\end{equation*}
$$

Since both $D i_{+}$and $D i_{-}$and $V_{A}$ will vary over time, the voltage variation will couple back to the gate producing the kickback noise. In this time region, $V_{G S}$ and $V_{G D}$ are both greater than zero and will cause a difference of induced charge on the gate. Referring to Equation 4.36, Equation 4.38, and Equation 4.40, it is obvious that the charges induced at the gate of the input MOS pair are dynamic during the entire comparison and latch operations.
The proposed topology for Slicer is the double tail latched comparator as shown in Figure 4.90.
Figure 4.91 shows the charge induced on the gate at different instants and operating regions of M1 and M2.


Figure 4.90.: A Conventional Dynamic Latched Comparator.


Figure 4.91.: The timing diagram shows the states and charge induced on the gate of M1, M2.

One of Change to reducing the kickback noise is by adding four more NMOS capacitors M16-M19 to the input of the comparator. It can be showed that the charge induced at the gate of M1 and M2 is a function of time and depends on their operating regions. It can be modeled as a two-step function with different amplitudes at different time intervals. In order to cancel the charge loss due to this phenomenon, M16-M19 are chosen as NMOS because they can absorb electrons to form a channel when CLK goes high, compensating the electrons repelled from the gate of M1 and M2.

M16 and M18 compensate the loss of charge due to switching of the transistors from the off-region to saturation region, whereas M17 and M19 compensate the loss of charge due to the switching of transistors from saturation to triode regions. There are two delay blocks to synchronize the switching times of the different sets of MOS capacitors to accurately neutralize the charges induced. The delay cells are obtained by cascading two inverters.

The added NMOS capacitors (M16-M19) as shown in Figure 4.92 will lead to larger input capacitance of the comparator $\left(C_{I N}\right)$. Nonetheless, it will not affect the comparison results as it is determined by the polarity of the voltage difference between the input nodes, which is independent on $C_{I N}$.


Figure 4.92.: Clocked NMOS Capacitors.

### 4.6.1.3. Simulation Results

As shown in Figure 4.93, the output transient response of the latched comparator when:

- The input common mode voltage is 0.7 V and the differential voltage is 100 mV .
- CLk frequency is 5 GHz with rising and falling time 10 ps .
- Load capacitance is 5 pF .

The Output of SR Latch $(Q, \bar{Q})$ trigger while clock is high after the propagation delay $\left(t_{C Q}\right)$ and reset again to around $\mathrm{VDD} / 2(0.6-0.8)$ while clock is low.


Figure 4.93.: Comparator Transient Response.
The propagation delay $\left(t_{C Q}\right)$ changes with input common mode voltage, differential input voltage and load capacitance.

Figure 4.94 shows the Delay with changing the input common mode voltage (from 0.4 V to 1.1 V ) and differential input voltage $=0.1 \mathrm{~V}$.


Figure 4.94.: Delay Versus Common Mode Voltage ( $V_{d i f f}=100 \mathrm{mV}$ ).
Figure 4.95 shows the average power consumption with changing the input common mode voltage (from 0.4 V to 1.1 V ) and differential input voltage $=0.1 \mathrm{~V}$.


Figure 4.95.: Average Power Consumption Versus Common Mode Voltage $\left(V_{\text {diff }}=100 \mathrm{mV}\right)$.

The delay and average power consumption of slicer is minimum, when the common mode input is between 0.6 V to 0.8 V .

Figure 4.96 shows the Delay with changing the differential input voltage (from 10 mV to 0.7 V ) and input common mode voltage $=0.7 \mathrm{~V}$.


Figure 4.96.: Delay Versus Differential Voltage ( $V_{C M}=700 \mathrm{mV}$ ).
Figure 4.97 shows the average power consumption with changing the differential input voltage (from 10 mV to 0.7 V ) and input common mode voltage $=0.7 \mathrm{~V}$.
The delay and Average power consumption increases when the differential input decreases.


Figure 4.97.: Average Power Consumption Versus Differential Voltage $\left(V_{C M}=700 \mathrm{mV}\right.$ ).

Figure 4.98 shows the Delay with changing the Load capacitance (from 0.5 fF to 10 fF ), input common mode voltage $=0.7 \mathrm{~V}$ and differential input voltage $=0.1 \mathrm{~V}$.


Figure 4.98.: Delay Versus Load Capacitance.

When load capacitance increases more than 10 fF , the output of Slicer becomes distorted.

Figure 4.99 shows the Delay and average power consumption with changing the differential input voltage (from 10 mV to 0.7 V ) and input common mode voltage $=0.7 \mathrm{~V}$ with process corners.


Figure 4.99.: The Delay and Average Power Versus Differential Voltage with Process Corners.

### 4.6.2. Flip Flop

The flip-flop is a very common circuit in AMS circuits and digital systems. It is usually used as storing element or to as a delay unit. The most important characteristics and metrics of flip-flop is the time delays like setup time and propagation delay, power consumption and the type of triggering signal of the Flip Flop. There are many topologies for the flip-flop classified to main families and each family of topologies has the same main properties. In DFE, equalization taps are delayed with multiples of the UI of the data rate. So flip-flops are used to delay the taps' values added to the summing node. In this work, it is required to get a very small propagation delay for the flip-flip to meet the condition of the critical path as described before. So two topologies for the flip-flop design are adopted and fully designed then compare their performance. The two topologies are described in detail in this section.

### 4.6.2.1. CML Latch Based Flip Flop

## Circuit Operation

The conventional CML latch consists of two stages sampling and holding as shown in Figure 4.100, the first stage is a differential pair inverter, which tracks the input and transports it to the output node but inverted. The second stage is a crosscoupled transistor to hold and store the data on the output node. The current tail is switched between the two stages by a complementary clock signal on transistor M2 and M3. When the clock is high the current tail goes through the inverter and the track mode activated and the data get to the output, then the clock gets "Low" and the current switches to the cross-coupled and the holding mode activated be regenerating the data due to the positive feedback. As the CML latch based on a differential circuit, it has high immunity for common-mode noise.


Figure 4.100.: Conventional CML Latch.
The conventional CML latch has some issues that limit the data rates. The main limitation of this circuit is that a single current tail used to feed both the tracking and holding stages. Consequently, the bias operations of tracking and latch circuits are tightly related. Therefore, that limits the allowable transistor sizing. The M0 and M1 have required minimum small signal gain to operate correctly in tracking operation, but as its sizing increases the parasitic capacitance at the output increases and limits the speed and increases the latch delay. To overcome that the current tail be sufficiently large to get wider linearity and larger trans-conductance $g_{m}$. On the other hand, the cross-coupled circuit does not need a large current for higher speed.

Another problem that limits the speed, during the tracking mode all current flows in the inverter, so at switching to holding mode the cross-coupled transistors first need to charge the capacitance of the latch circuit so that increase the latch delay.

To overcome those limitations new topology is introduced, it is the Novel CML latch [53]. As shown in Figure 4.101, the cross-coupled transistors M11 and M12 always draw current with a separate current tail M9 that works in all modes. That gives more flexibility on the sizing of the cross-coupled transistors and its capacitance will be always charged.


Figure 4.101.: Novel CML Latch.
Moreover, another problem solved in Novel design, due to switching the transistor M2 it is suffering from the current spikes at the drain, so some dummies transistors added to solve this issue, M8 switches on when the clock is "High" and draws a portion of current during the tracking mode, and that generates opposite current spikes that cancel the previous spikes and smoothing the current. The transistor M3 helps to reduce the voltage drops on the drain of M5 and that by drawing current during all modes.
The Novel Latch has an issue at switching from tracking mode to latching mode. In tracking mode, the current pulled through the $R_{L}$ is the sum of the current drawn in the tracking and latching circuits, but in the latching mode only the current of latching circuit flows in $R_{L}$, so that causes a voltage drop on the swing specifically at the zero value. To solve this issue a proposed modification introduced in this work. As shown in Figure 4.102, an additional transistor M10 added in parallel with the current tail M9. That transistor is controlled with the completed clock signal, so during the latching mode, it switches on and draws additional current to compensate the current of the tracking circuit and remains the swing at the same value.
Another advantage of this modification, that it helps to decrease the cross-coupled sizing and consequently reduces the delay without losing the required small signal gain to regenerate the signal and holding it. And that by increasing the tail current but only in the latching mode to preserve the power consumption.


Figure 4.102.: Modified Novel CML Latch.

## Analysis of the proposed Circuit

This latch is a current mode circuit so it composes of three main parts as shown in Figure 4.103. The pull-down network is the circuit that decides in which half to steer the current depending on complementary inputs and clock signals, and this part is the differential pair and the cross-coupled transistors with its switches. A current mirror that mirrors the reference current supported from the BGR circuit implements the constant current source. This section discusses the main characteristics and properties of the circuit:


Figure 4.103.: CML Circuit Main Architecture.

- Output Swing: The output swing can be defined in terms of the current steering. The pull-down network is controlled by complementary inputs to make one of the branches off and the other branch is on and has all the current $I_{s s}$ flows in its load $R_{L}$. So the branch with no current reaches $V_{D D}$, but the other branch has a voltage drop on $R_{L}$ and leads to an output voltage as in Equation 4.41 too. The output voltage swing is therefore as in Equation 4.42.

$$
\begin{equation*}
V_{\text {out }, L}=V_{D D}-I_{s s} R_{L} \tag{4.41}
\end{equation*}
$$

$$
\begin{equation*}
V_{d i f f, s w i n g} I_{s s} R_{L} \tag{4.42}
\end{equation*}
$$

The output swing has limits and allowable rang to make sure that the transistors of the pull-down network and current source operate in the right region. To ensure that transistors M0, M1, and M5 are in the saturation region, the output voltage from Equation 4.41 must fulfill the condition in Equation 4.43.

$$
\begin{equation*}
V_{d i f f, s w i n g} \leq V_{D D}-V_{d s d, s a t 0,1}-V_{d s, s a t, 5}-V_{d s, 2} \tag{4.43}
\end{equation*}
$$

Where the $V_{d s, 2}$ is the maximum voltage drop on M 2 as it operates in the triode region. As the flip-flop composes of two cascaded latches so the swing has a limit to ensure proper switching and be low enough to off the transistor of the second stage, and that determined from Equation 4.44.

$$
\begin{equation*}
V_{d i f f, s w i n g} \geq V_{D D}-V_{d s d, s a t 0,1}-V_{d s, s a t, 5}-V_{d s, 2} \tag{4.44}
\end{equation*}
$$

- Circuit Delay: The delay of the latch is computed from the edge of clock switching until the output is inverted to a complementary output data. This circuit is approximated to a first-order system with one pole at the output node and neglecting other poles as it has a non-dominant effect. So that the delay $\tau_{p d}$ is defined from Equation 4.45. Where $R_{e q}$ and $C_{e q}$ are defined in Equation 4.46 and Equation 4.47.

$$
\begin{align*}
& \tau_{p d}=0.69 R_{e q} C_{e q} \\
& R_{e q} \approx R_{D}  \tag{4.46}\\
& C_{e q}=C_{g d 0}+C_{d b 0}+C_{L} \tag{4.47}
\end{align*}
$$

## Design Procedure and Parameters

1. Assuming a proper value for $I_{s s}$ and with the required output swing value for the system, $R_{\text {Dis }}$ calculated from Equation 4.42.
2. Sizing the current tail transistor M5 from Equation 4.48, and then sizing M10 to draws the same current when the clock gets "Low".

$$
\begin{equation*}
\frac{I_{s s}}{I_{r e f}}=\frac{1+\lambda V_{d s, 5}}{1+\lambda V_{d s, 4}} \tag{4.48}
\end{equation*}
$$

3. Sizing the differential pair M0,1 to achieve the conditions on the swing as in Equation 4.43 and Equation 4.44, then sizing the switches transistors M2,7 to operate in the triode region, then the dummies to draws a proper current with respect to the main current $I_{s s}$ and define the needed value for $V_{\text {ref }}$.
4. Designing the cross-coupled transistor and its current tail to achieve the minimum gain to regenerate the data and to hold it as in Equation 4.49, taking into account the parasitic capacitance loading at the output node, so that requires small sizing for the cross-coupled transistors.

$$
\begin{equation*}
\text { gain } \approx g_{m 11,12} R_{D} \geq 1 \tag{4.49}
\end{equation*}
$$

5. Calculating an estimated value for the propagation delay from Equation 4.45, and if it does not meet the required delay for the system, then the value of $I_{s s}$ is revisited and repeat the steps.

### 4.6.2.2. Sense Amplifier Based Flip Flop (SAFF)

## Circuit Operation

The Sense Amplifier based flip-flop consists of two main blocks as shown in Figure 4.104, the first block is the sense amplifier, which considered as a Pulse Generator (PG) and the second block is a Slave Latch (SL). It is similar to the Master-Slave (MS) latch combination consisting of two latches. MS latches have a problem that its latches can potentially transport the data at the output if a sufficient margin between the two clocks is not assured. SAFF does not suffer from this problem due to its mechanism which is described in this section.

The first stage is a pulse generator, it is a function of the clock and data signals. When the clock and data changes, a pulse is generated to set or reset the slave-latch depending on the data. The sense-amplifier is sensitive for the clock transition only and not for its level.

The conventional SAFF consists of a sense amplifier and an SR latch as shown in Figure 4.105.


Figure 4.104.: Main Structure of SAFF.


Figure 4.105.: Conventional SAFF.

The SA stage provides a negative pulse on one of the inputs of the SR latch $\bar{S}$ and $\bar{R}$ and that depending on the input data. That is by sensing the complementary differential inputs at the gate of MN1 and MN2, then at the edge of the clock (from "Low" to "High"), one of the output nodes start discharging through one of the NMOS input pairs and the current tail MN3 depending on the input data. The Low
voltage output node turns on one of the PMOS transistors MP1 or MP2 to charge the other complementary output node to "High". When the clock gets "Low", the current tail MN3 is turned off and the two PMOS transistors MP3 and MP4 start charging both of the output nodes, and the input data does not affect the output anymore.

The conventional SR latch consists of two NAND gates connected with feedback to capture the data until the next clock edge to be set or reset. The SR latch is a bottleneck because of its large delay which is approximately the same as the senseamplifier. Another modified SR latch is proposed in [43]. It has a smaller delay than the NAND latch. Nikolic's latch has an issue, the output low-to-high transition has two gate delays and the output high-to-low transition has gate delays, and that makes the rise not equal to the fall time, moreover, the modified SR Latch's delay still quietly large for this work. So another SR latch is adopted in this work and it is described in [54].

As shown in Figure 4.106, the modified SR latch [43] controlled with the complementary clock signals, the differential input data, the output of the sense-amplifier and its inverted signals. The signals $\bar{S}$ and $\bar{R}$ are refreshed to "High" when the clock is "Low". As described before, when the clock is "High" one of the output node $\bar{S}$ and $\bar{R}$ discharge to "Low" depending on the input data, this gives a note that when the output Q discharges the signal $\bar{S}$ remains at logic "High" and the transistors MN1, MN2, and MN4 starts to discharge when the clock goes from low-to-high, and similarly the $\bar{Q}$ discharges through MN7, MN6, and MN8. So discharging takes one gate delay. The opposite process occurs when Q charges to "High", as the signal $\bar{R}$ remains in logic "High" and R is a logic "Low". There is one inverter delay between the signals R and $\bar{R}$, but it does not affect the delay as the transition occurs when the clock is "Low" and is well settled before the rising edge of the clock. The transistors MP3 and MP7 are controlled by $\overline{\text { Clock }}$ so that the path exists only when the clock is logic "High". The transistors MP2 and MP6 are added to make that is no glitches occur on the output because of any glitches on the signals R, S and its complementary signals. When the clock's transition low-to-high occurs, the output Q start to charge through transistors MP1, MP2 and MP3, the same for $\bar{Q}$ it charges through MP5, MP6, and MP7 and that depending on the data. When the clock is "Low", the PMOS transistors (MP1, MP2, MP3, MP5, MP6, and MP7) and NMOS transistors (MN1, MN2, MN4, MN7, MN6, and MN8) are off and one of the two cross-coupled inverters which consist of transistors (MN3 and MP4) (MN5 and MP8) are turned on and the other turned off based on the data on the output nodes Q and $\bar{Q}$. These inverters hold the data during clock is a logic "Low".

## Circuit Analysis

The SAFF deals with digital data and clock signals with full swing ( $V_{D D}$ and 0 ), so the main mechanism of the circuit depends on opening a path from the output node to $V_{D D}$ or ground. So the main purpose is to have a path with small resistance to charge and discharge the output very fast, but there us a trade of between large


Figure 4.106.: Modified SR Latch.
sizing to increase the current and the parasitic capacitance as it increases the delay.

- Sense-Amplifier: When the clock is "Low", transistors MP3 and MP4 are on and charges the output nodes to $V_{D D}$, it starts in saturation, then the voltage on the output nodes increases and it turns into the triode region. When the clock gets "High" and depending on the complementary input at the gates of the differential pair, the current tail MN3 and one of the two transistors MN1 or MN2 is on to discharge one of the output nodes. It goes from cutoff to saturation and draws the current to the ground and then goes to the triode region. The complementary output nodes on an off the transistors MN5, MN6, MP1 and MP2 to charge one of them to $V_{D D}$ and discharges the other to Zero.
- Modified SR-Latch: The modified SR-Latch consists of paths to $V_{D D}$ and ground to charge and discharges the output nodes and cross-coupled inverters to hold the output data. So PMOS transistors MP1, MP2, MP3, MP5, MP6, and MP7 and NMOS transistors MN1, MN2, MN4, MN6, MN7, and MN8 should have a large size to get small charging and discharging resistance but not too much large not to increase loading capacitance at the output nodes. The inverters consist of transistors MN3, MP4, MN5, and MP8 needs high gain to be able to hold and store the data at the output nodes, but it affects highly on the output loading and the total delay of the SR-Latch. To keep the small delay for the SR-Latch, the inverters should be sized with the minimum sizing for an inverter. But due to the small gain of the two inverters, any current leakage from the output nodes will cause notches in the output
voltages and that is solved by buffering the flip-flop as it will be explained in the integration of the DFE.
- Delay The rise and fall time depend on the charging and discharging paths and calculated from the Equation 4.50. In the sense-amplifier, when the clock is "Low" the output nodes goes to $V_{D D}$, so MN5 and MN6 are on and its parasitic capacitances at the source are charged. So when the clock is "High" one of the output nodes is "High" as it should be and does not need to charge and the other node discharges to ground. So the dominant is the fall time which is defined from Equation 4.51.

$$
\begin{align*}
& \tau_{f, r}=0.69 R_{e q} C_{e q}  \tag{4.50}\\
& \tau_{f, S A}=0.69 R_{X} C_{X}  \tag{4.51}\\
& R_{X}=R_{M N 3}+R_{M N 1,2}+R_{M N 5,6} \tag{4.52}
\end{align*}
$$

$$
\begin{equation*}
C_{X}=C_{d b, M P 1,2}+C_{d b, M P 3,4}+C_{g s, M P 1,2}+C_{g d, M P 1,2}+C_{d b, M N 5,6}+C_{g d, M N 5,6}+C_{g s, M N 5,6}+C_{S R} \tag{4.53}
\end{equation*}
$$

The same for the SR-Latch but here there is fall time and rise time. To make both rise and fall time equal, that requires a sizing for the pull-up network to be nearly two times of the pull-down network's size as the PMOS has larger on-resistance than the NMOS for the same size. The propagation delay for SR-Latch is defined from Equation 4.54.

$$
\begin{gather*}
\tau_{f, S R}=0.69 R_{Y} C_{Y} \\
R_{Y}=R_{M N 1,6}+R_{M N 2,7}+R_{M N 4,8} \\
C_{Y}=C_{d b, M P 3,7}+C_{d b, M P 1,6}+C_{g s, M P 4,8}+C_{g d, M P 4,8}+C_{d b, M N 3,5}+C_{g d, M N 3,5}+C_{g s, M N 3,5}+C_{L} \tag{4.56}
\end{gather*}
$$

## Design Procedure

1. Design the input pair MN1 and MN2 and the current tail MN3 for the sense-
amplifier to work with input common-mode.
2. Size the cross-coupled inverters (MN5, MP1, MN6, and MP2) and the precharging transistors (MP3 and MP4) for the sense-amplifier to get the required delay using Equation 4.51.
3. Size the cross-coupled inverters (MN3, MP4, MN5, and MP8) for the SR-Latch with the minimum sizing to decrease the delay.
4. Size the PMOS and NMOS transistor for charging and discharging paths in the SR-Latch to get equal fall and rise time and to achieve the minimum delay using Equation 4.54.
5. Tuning on the PMOS and NMOS transistor for charging and discharging paths in the SR-Latch to deal with tradeoff between decreasing the equivalent resistance and decreasing the output capacitance and loading capacitance on the sense-amplifier $C_{S R}$.

## Simulations Results

This section compares the results for the two topologies of the flip-flop and shows that SAFF has better results than the CML, so it is preferred in this work. The flip-flop deals with digital input data which is the output of the slicer, so the input has a full swing from zero to $V_{D D}$ with common-mode equal to 0.6 V . With a large enough setup time for the input to get the minimum propagation delay, the CMLFF achieved $t_{C Q}=17 p S$ and the SAFF achieved $t_{C Q}=17 p S$ and the output is as shown in Figure 4.107.
It is known that the flip-flop's propagation delay depends on the setup time for the input data, as the setup time $t_{D C}$ decrease the propagation delay $t_{C Q}$ increase until it fails to read at a specific value for $t_{D C}$. In DFE feedback loop, both $t_{C Q}$ and $t_{D C}$ contributed as a delay for the loop path. Therefore, it has an optimal point, which achieves the minimum $t_{C Q}$ without increasing $t_{D C}$ so much. The relation between the setup time and propagation delay and the total delay added to the loop is as shown in Figure 4.108.
The process corners for the two flip-flops is shown in Figure 4.109.
The output across temperature variations is shown in Figure 4.110. It shows increase in the $t_{C Q}$ when the temperature gets very low as the on-resistance for transistors increase heavily.

(a) CML-FF.



(b) SAFF.

Figure 4.107.: The Minimum Propagation Delay of the FF.


Figure 4.108.: The Propagation Delay and The Setup Time of the FF.

(b) SAFF.

Figure 4.109.: The Process Corners of the FF.


Figure 4.110.: Temperature Variations for the FF.

The $t_{C Q}$ for both flip-flops increase linearly with the loading capacitance as shown in Figure 4.111.

The main specifications for both topologies compared to other works are stated in Table 4.20 .

| Parameter | CML-FF <br> In This <br> Work | SAFF In <br> This Work | $[43]$ | $[54]$ | $[55]$ | $[56]$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Rise Time <br> (pS) | 17 | 18 | 133.83 | 96.17 | 157.82 | 18.2 |
| Fall Delay <br> $(\mathrm{pS})$ | 19 | 20 | 166.71 | 95.09 | 254.48 | 19 |
| Average <br> Power <br> ( $\mu \mathrm{W})$ | 1900 | 396 | 265.1 | 408.9 | 236.9 | 185 |
| Power <br> Delay <br> Product <br> (fJ) | 32 | 4.5 | 39.84 | 39.1 | 48.84 | 3.38 |

Table 4.20.: Comparison Between The Different Topologies of The FF.


Figure 4.111.: $t_{C Q}$ with Loading Capacitance for the FF.

### 4.6.3. Gm Cell

The Gm-Cell used in the DFE design is a simple Gm-cell as followed in such a design. Gm-cell consists of differential pair with degenerative resistance as shown in Figure 4.112. The most important spec for Gm -cell is to have minimum bandwidth of 5 GHz and gm value of Gm -cell is designed to get a proper voltage range at summing node (about 1.15 ms in this work). Figure 4.113 Shows the equivalent transconductance of the cell in ms versus frequency in Hz .


Figure 4.112.: Schematic of The Proposed Gm-Cell.


Figure 4.113.: Equivalent Transconductance (gm) of The GM-Cell Versus Frequency.

### 4.6.4. Taps

Summing stages are implemented using the sampled current-integrating summers presented earlier, and a soft-decision architecture is employed involving direct feedback of all DFE taps. DFE tap coefficients are set by 5-bit current DACs that are controlled through a serial interface. A feedback of previous sampled bits from DFE slicer control the direction of tap's current whether it should be in positive or negative branch. Tap's current value is set by fine tuning and get current value that result the best eye diagram at summing node or by getting impulse response at the summing node and set tap's current value that eliminate ISI effect. In this work, 3 DFE current taps are designed. Every tap current is chosen as illustrated before. Figure 4.114 shows the design of a single tap.


Figure 4.114.: Schematic of a Single DFE Tap.

### 4.6.5. Integration Results of The DFE Blocks

As described before, the Half-rate DFE has the main challenge which is a time constrain on the feedback loops. In this work, the critical path is the second tap's loop as it has constrain on the delay of the loop as shown from Equation 4.57.

$$
\begin{equation*}
t_{C Q, \text { Slicer }}+t_{\text {Setu, }, F F}+t_{C Q, F F}+t_{\text {settle,sim,node }}<U I \tag{4.57}
\end{equation*}
$$

The DFE is tested using ideal input pulses to check the validity of feedback loops. The flip-flops have their separated clock which is delayed from the slicer's clock to implement the required setup time for it. The flip-flop's output is buffered with cascaded inverters and its delay is added to $t_{C Q-F F}$. The slicer's output also buffered because the differential pairs of the taps are very sensitive to the output of the flipflop and slicer, so it should have a clean digital input without notches. The average current consumption of the whole DFE without a value for $\operatorname{taps} I_{\text {avg }}=3 \mathrm{~mA}$. The taps calibrated and can drive a current $I_{\text {tap }}=20 \mu A$ up to $200 \mu A$. The timing diagram for the critical path for even and odd nodes is shown in Figure 4.115 and Figure 4.116. The timing condition is achieved and other taps apply its effect before sampling edge.


Figure 4.115.: Timing Diagram for Even Data.


Figure 4.116.: Timing Diagram for Odd Data.

### 4.7. Clock and Data Recovery

The CDR functionality is to generate the clock used in the Rx part. The CDR extracts the clock from the data before the slicer in the DFE. The block diagram of the CDR is shown in Figure 4.117. The CDR block consists of three main block, the Phase and Frequency Detector (PFD), the Charge Pump (CP), and the Voltage Control Oscillators (VCO).


Figure 4.117.: Block Diagram of The CDR.

### 4.7.1. Voltage Control Oscillators

In this section, the design of the VCO will be discussed. The VCO used in this design is a Current Starved Ring Oscillator [57]. The main idea of this design is to connect $N$ inverters in series. For the circuit to oscillate, $N$ should be an odd number to satisfy Barkhausen stability criterion. This design consists of three main parts, the first part is the delay units or the inverters. The second part is the controlling circuit. And the last part is the buffering circuit.

### 4.7.1.1. Delay Unit

The main part of the ring VCOs is the delay unit or the inverters. The schematic of the delay unit is shown in Figure 4.118. The frequency of oscillation is determined by the delay of the unit as shown in Equation 4.58. The delay is affected by the input and output capacitance of each unit. To control the frequency of oscillation, additional NMOS (MN2) and PMOS (MP2) transistors are added to control the current that flows through the unit. As the $V_{\text {Bias }, N, P}$ increases, the current increases and the delay decreases hence the frequency of oscillation increases.

$$
\begin{equation*}
f_{o s c}=\frac{1}{2 N t_{d}} \tag{4.58}
\end{equation*}
$$



Figure 4.118.: Schematic of The Delay Unit.
As discussed before, the frequency of oscillation is affected by the node capacitance at the input and output of the delay unit. The total capacitance is the sum of the input and the output capacitances as shown in Equation 4.59. From this equation, it can be indicated that to get a high frequency, the width and the length of the CMOS transistors need to be low. But the issue in choosing low width is that the
capacitance will be greatly affected after generating the layout. So, the width of the transistors needs to be high, and to compensate this, more current is needed.

$$
\begin{equation*}
C_{\text {input }} \approx C_{o x}\left(W_{p} L_{P}+W_{n} L_{n}\right) \tag{4.59}
\end{equation*}
$$

### 4.7.1.2. Controlling Circuit

The controlling circuit is responsible for controlling the current flows through the delay unit by controlling $V_{\text {Bias,N,P }}$. The schematic of the controlling circuit is shown in Figure 4.119. The controlling has two types of tuning, fine and coarse tuning. $V_{c t r l}$ is used for fine tuning and $V_{b 1}$ to $V_{b 8}$ are used for coarse tuning. In coarse tuning, thermometer code is used instead of binary code. Although the thermometer code has more control bits than the binary code, it is used because it has an increasing function unlike binary code.


Figure 4.119.: The Schematic of The Controlling Circuit of The VCO.
$V_{c t r l}$ is connected to MP1, when $V_{\text {ctrl }}$ increases, the $V_{G S}$ of the transistor decreases, hence the current decreases, this causes the frequency to decrease. The size of MP1 determines the gain of the VCO by changing the transconductance of the transistor, when the width of the transistor increases, the transconductance increases hence increasing the gain of the VCO, and when the width of the transistor decreases, the transconductance decreases hence decreasing the gain of the VCO. So the size is chosen so that the fine and the coarse tuning are overlapping with each other, this ensures that the VCO can cover all the frequencies.

### 4.7.1.3. Simulation Results of The VCO

In this part, the results of the VCO are presented. The simulation of the VCO can be divided into two analysis, Transient Analysis and Periodic Steady State (PSS) Analysis. The Transient analysis is performed to show the transient response of the VCO and the shape of the clock. And the PSS is performed to show the gain of the VCO and the effect of changing $V_{c t r l}$ over the frequency.
The first analysis is the Transient analysis, the transient response of the VCO is shown in Figure 4.120. The figure shows the I and Q signals with loading capacitance of 100 pF on each signal.


Figure 4.120.: The Transient Response of The VCO.
Another result of the Transient analysis is the supply current of the VCO. Figure 4.121 shows the supply current of the whole VCO including the buffers. The buffer is used to increase the maximum loading capacitance that the VCO can operate with. The current consumption is quite high due to the buffering. Also the large width of the delay unit transistors needs high current to get the required frequency.


Figure 4.121.: Supply Current of The VCO Versus Time.

The other analysis is the PSS. The results of this analysis are the frequency versus $V_{c t r l}$ and the phase noise of the VCO. The first result is shows the effect of changing $V_{c t r l}$ over the frequency, which is the fine tuning. It also shows the coarse tuning. The result is shown in Figure 4.122. The figure shows the fine and coarse tuning and the overlapping between the bands to ensure the ability to cover all the frequencies.


Figure 4.122.: Frequency of The VCO Versus $V_{c t r l}$.

Another result of this analysis is the phase noise versus frequency offset. The results is shown in Figure 4.123. From the figure, it is shown that the phase noise at frequency offset of 1 MHz is 89 dB , which is quite acceptable for this topology.


Figure 4.123.: Phase Noise Versus Frequency Offset.

### 4.7.2. Bang-Bang Phase Detector

The Bang-Bang Phase Detector is non-linear phase detector which detects the phase difference between the inputs, the Phase Detectors (PD) are widely used in clock and data recovery circuits. The output pulse of a linear PD is proportional to the phase error. It will have less jitter, but the speed is limited in multi- $\mathrm{Gb} / \mathrm{s}$ applications. On the other hand, the Bang-Bang Phase Detector (BBPD) has a higher speed at the cost of the worse jitter performance due to its bang-bang nature. The proposed BBPD is shown in Figure 4.124.


Figure 4.124.: Bang-bang Phase Detector Circuit.
It is composed of four Dual Edge Triggered Flip Flops (DETFF), two XOR gates. The operation principle of this BBPFD is described as follows. DETFF is triggered at the positive and negative edges of clock so signals A and B store the data sampled by the CLKI and CLKQ respectively, the phase difference between A and B reflects the Early or Late clock; either A leads B by $T_{C L K I} / 4$ or A lags B by $T_{C L K I} / 4$.

This BBPD compares the present sampled data (A and B) with the previous sampled data ( C and D ) by the XOR gates to decide Early or Late clock.

The design of DETFF is shown in Figure 4.125, it is composed of six Transmission gates and 4 inverters, so at first stage data is latched by TGs at the upper and lower path when clock is high and low respectively and the second stage which consists of two inverters and one TG, the data is held until the clock is triggered from high to low or low to high at the upper and lower paths respectively, at the final stage data is finally passed to the output so the upper path operates as the positive edge triggered flip flop and the lower path operates as the negative edge triggered flip flop.


Figure 4.125.: Dual Edge Triggered Flip Flop circuit.

The design of XOR gate is shown in Figure 4.126, it is designed as symmetric XOR gate so the PDN is designed as XNOR gate and the PUN is designed as XOR gate which guarantees that the delays $t_{P L H}$ and $t_{P H L}$ is approximately the same for proper operation of Early and Late signals.


Figure 4.126.: XOR gate circuit.

### 4.7.3. Charge Pump

The charge pump is used to sink or pump on its load by the UP and DOWN signals, the current steering charge pump is the proposed design as shown in Figure 4.127.


Figure 4.127.: Charge Pump circuit.
While UP is high and DW is low the current is steered to the PMOS current mirror (M7, M8) then the current is sunk to the loop filter and $V_{c t r l}$ increases and when UP is low and DW is high the current is drawn from the loop filter and $V_{\text {ctrl }}$ decreases to guarantee negative feedback for CDR operation. The current mirror is designed with large transistors length for low mismatch between currents, the reference current $\left(I_{C P}\right)$ equals $5 \mu \mathrm{~A}$ then it is mirrored to $100 \mu \mathrm{~A}$, and the maximum mismatch error between the currents is about $1 \mu \mathrm{~A}$.

### 4.7.4. Simulation Results

### 4.7.4.1. BBPD

The transient response of the Dual Edge Triggered Flip Flop is shown in Figure 4.128, the output is delayed from the input by $T_{C L K} / 4$ plus the propagation delay of the circuit $\left(t_{p d}\right)$ so the total delay is equal to 70 pS which consists of $50 \mathrm{pS}\left(T_{C L K} / 4\right)$ and $20 \mathrm{pS}\left(t_{p d}\right)$.


Figure 4.128.: DETFF transient response.
As shown in Figure 4.129, the clock is early in respect with data by 30 pS so the early signal is higher than the late signal on average by 820 mV , while in Figure 4.130, the clock is late so the average of late signal is higher than early by 920 mV while at locking the average difference is very low about 16 mV as shown in Figure 4.131, where early and late signals are alternating with each other.


Figure 4.129.: BBPD Outputs While Early Clock.


Figure 4.130.: BBPD Outputs While Late Clock.


Figure 4.131.: BBPD Outputs While Locking.

Figure 4.132 shows the phase difference between clock and data effect on the BBPD outputs so while the phase difference is below 43 pS the early is high and late is low and in the range between 43 pS and 59 pS the clock is locked with data and above 59 pS the late is high and early is low.


Figure 4.132.: BBPD Outputs Versus Clock Delay with Data.

### 4.7.4.2. CDR Simulation

The CDR is simulated with two types of data, 5 GHz clock and 10 Gbps random bit stream to show the performance of CDR with control voltage transient response, settling time and clock jitter.

Figure 4.133 shows the transient response of control voltage while the data is 5 GHz clock, it starts from 0.7 V then it decreases to about 588.8 mV with ripples about 35 mV which is corresponding to 5 GHz for VCO and it takes about 40 nS for locking. While using 10 Gbps random bit stream, it takes much time for locking about 120 nS as shown in Figure 4.134.


Figure 4.133.: $V_{\text {ctrl }}$ Transient Response ( 5 GHz Clock).


Figure 4.134.: $V_{\text {ctrl }}$ Transient Response (10 Gbps).

Figures Figure 4.135 and Figure 4.136 show the transient response of data and clock at locking while using 5 GHz clock and 10 Gbps random bit stream respectively.


Figure 4.135.: Data and Clock Transient Response (5 GHz Clock).
The eye diagram shown in Figure 4.137 and Figure 4.138 show the clock jitter at 50 pS and 150 pS which correspond to positive and negative edges respectively with the two types of data as explained before. The maximum clock jitter is about 10 pS above and 5 pS below the desired clock edge.


Figure 4.136.: Data and Clock Transient Response (10 Gbps).


Figure 4.137.: Data and Clock Eye Diagram (5 GHz Clock).


Figure 4.138.: Data and Clock Eye Diagram (10 Gbps).

## 5. System Integration Results

In this work, the main goal is to build a system compatible with the USB 3.2 2nd generation standard. The equalization process composes of an FIR filter with 3taps, CTLE with its VGA, and a DFE with 1-tap. The transmitter modeled with Verilog-A with a load capacitance 1 pf. The channel is a 30 -inch of FR4 PCB track. The first stage is to test the analog front-end part, which composes of (BGC, LDO, CTLE, and VGA). The BGC and LDO provide a constant and stable $V_{D D}$ across PVT and independent on the derived current. The AFE part is simulated at the critical points of temperature and $V_{D D}$ variation $\left(V_{D D}=1.8 \pm 10 \%\right.$ and $T=-20$ up to $\left.100{ }^{\circ} \mathrm{C}\right)$.

When the CTLE and the VGA get started, that produces a step in the current derived. The current's maximum overshoot and error achieve the requirements across corners as shown in 5.1a. The LDO output voltage has variations in the maximum overshoot but it still under $10 \%$ with an error smaller than $1 \%$ and achieves the requirements as shown in 5.1 b . It has a settling time of about 100 ns .


Figure 5.1.: LDO Output Current and Output Voltage across Corners.

Figure 5.2 shows the variations in the reference current produced from the Band-gab circuit and that affects directly on the current mirrored then consequently on the response of the CTLE and the VGA. So the limitation on the frequency response will set the maximum accepted error.


Figure 5.2.: BGR Current across Corners.

The frequency response for the CTLE and VGA are as shown in Figure 5.3. Each block has significant variations in its response but it is acceptable as it can be readjusted by the adaptation of the output equalization. So the error in the BGC reference current can be afforded.


Figure 5.3.: The Frequency Response of The CTLE and The VGA across Corners.

The second stage is to test the equalization quality across the system with input random bitstream. The main indication of the equalization quality is the eye-diagram and opening part of it. The eye-diagram after the channel with the equalization of the FIR filter only is shown in Figure 5.4.


Figure 5.4.: Eye Diagram after Channel with FIR Equalization.
The eye-diagram after CTLE and after the VGA are shown in Figure 5.5. The VGA only add gain and increase the swing and then the high of the open part of the eye.

Figure 5.6 and Figure 5.7 show the eye-diagram at the even and odd summing nodes of the DFE. It is obvious that the opened part of the eye at the summing node repeated every 200 ps as the even and the odd nodes are sampled with clock 5 GHz . The corrupted eye does not matter as there is no sampling at this time.


Figure 5.5.: The Frequency Response of The CTLE and The VGA across Corners.


Figure 5.6.: Eye Diagram for the Data on the Even Summing Node.


Figure 5.7.: Eye Diagram for the Data on the Odd Summing Node.

The timing diagram for the data across the system with an ideal clock is shown in Figure 5.8. It samples correctly and it has two output lines of data with rate 5 Gbit/s (even and odd) which then will be combined to get the final output serial data with rate $10 \mathrm{Gbit} / \mathrm{s}$.

(a) Data at Input, TX and after the Channel.

(b) Data at CTLE, VGA, and Summing Nodes.

(c) Data at Even and Odd Slicers and the Buffered output.

Figure 5.8.: Timing Diagram for The Data.

## Acknowledgments

All praises and thanks are due to Allah (Alhamdulillah) for His infinite blessings on us and for helping in finishing this work.

We wish to express our sincere appreciation to our supervisor,

Dr. Hassan Mostafa,

for his assistance, motivation and guidance through out our work.
We wish to also express our sincere appreciation to the company that sponsored this work, IC-Pedia, specially

Eng. Yehia Hamdy, Eng. Salma Elsawy and Eng. Mohamed Fouad,

for their help. Without their support and guidance, this project could not have reached its goal. We would like to also thank ONELAb for their help and support throughout this work.

We wish to express our sincere appreciation to Dr. Sameh Assem Ibrahime, for his help and guidance.

We wish to acknowledge the support and great love of our families. They kept us going on and this work would not have been possible without their input.

## A. Verilog-A Codes

## A.1. MUX Code

```
// VerilogA for mux, mux
8_1, veriloga
'include "constants.vams" 'include "disciplines.vams"
// Defining the module Inputs and outputs
module mux_8_1(sig1, sig2, sig3, sig4, sig5, sig6, sig7, sig8, sel1,sel2, sel3, gnd,
sigout);
input sig1, sig2, sig3, sig4, sig5, sig6, sig7, sig8, sel1,sel2, sel3, gnd;
output sigout;
electrical sig1, sig2, sig3, sig4, sig5, sig6, sig7, sig8, sel1,sel2, sel3, gnd, sigout;
// Parameter of On resistance of the transistor
parameter real Ron=50;
// The thresold voltage of the selection lines
parameter real Vth=0.5;
// Parameter Capacitance value
parameter real c=100e-15;
analog begin
if (V(sel3)<Vth) begin // sel3 = Zero
    if (V(sel2)<Vth) begin // sel2 = zero
        if (V(sel1)<Vth) begin // sel1=zero
            V(sig1, sigout) <+ I(sig1, sigout)*Ron;
        end
        else begin // sel1=one
            V(sig2, sigout) <+ I(sig2, sigout)*Ron;
        end
    end
    else begin // sel2=one
        if (V(sel1)<Vth) begin // sel1=zero
            V(sig3, sigout) <+ I(sig3, sigout)*Ron;
        end
```

```
            else begin // sel1=one
                V(sig4, sigout) <+ I(sig4, sigout)*Ron;
            end
        end
    end
    else begin // sel3=one
        if (V(sel2)<Vth) begin // sel2 = zero
        if (V(sel1)<Vth) begin // sel1=zero
            V(sig5, sigout) <+ I(sig5, sigout)*Ron;
        end
        else begin // sel1=one
            V(sig6, sigout) <+ I(sig6, sigout)*Ron;
        end
    end
    else begin // sel2=one
        if (V(sel1)<Vth) begin // sel1=zero
            V(sig7, sigout) <+ I(sig7, sigout)*Ron;
        end
        else begin // sel1=one
            V(sig8, sigout) <+ I(sig8, sigout)*Ron;
        end
        end
    end
    V(gnd, sigout) <+ idt(I(gnd, sigout))/c;
end
endmodule
```


## A.2. FIR and Driver

## A.2.1. Flip-Flop Code

// Flip Flop code

module flip_flop(qp, clk, dp);
output qp; voltage qp; // Q output
input clk; voltage clk; // Clock input (edge triggered)
input dp; voltage dp; // D input
parameter real $\mathrm{td}=0$ from [0:inf); // delay from clock to q
parameter real $\mathrm{tt}=0$ from [0:inf); // transition time of output signals
parameter real vh_clock $=1$, vl_clock $=0, v p \_\max =0.5, v p \_\min =0,5, \mathrm{DC}=1$
parameter real vth_clock $=\left(\right.$ vh_clock + vl $\_$clock $) / 2$;
parameter real vth_d=(vp_max $\left.+\mathrm{vp} \mathrm{\_min}\right) / 2$;
parameter integer dir $=+1$ from [-1:+1] exclude $0 ; / /$ determine the type of triggering edge
real state1;
analog begin

```
\(@\left(\operatorname{cross}\left(\mathrm{~V}(\mathrm{clk})-\mathrm{vth} \_\right.\right.\)clock, dir) \()\)begin //this triggerd by the clock edge
        state \(1=(\mathrm{V}(\mathrm{dp})>(\) vth_d +DC\()) ;\)
```

    end
    \(\mathrm{V}(\mathrm{qp})<+\) transition( state1 ? (vp_max + DC) : (vp_min + DC), td, tt );
    end
endmodule

## A.2.2. Driver

module ffe_drivers(outp,outn,GND,in);
inout outp, outn, GND;
input in;
electrical outp, outn, in, GND;
parameter real tap_coff=1; // tap coefficient
parameter real cout $=0$;
parameter real gain $=1$;
parameter real vth $=0.5$;
parameter real $\mathrm{DC}=0.2$; //DC of the output differential signal
analog begin
I (outp,GND $)<+\left(\right.$ gain* $\left(3 / 5-\mathrm{V}(\mathrm{in})^{*} 3 / 5\right)^{*}$ tap_coff+DC) $/ 50 ; / /$ steering cuurent with tap value

$$
\mathrm{I}(\text { outn }, \mathrm{GND})<+\left(\text { gain } *(3 / 5+(\mathrm{V}(\mathrm{in})-1) * 3 / 5)^{*} \text { tap_coff }+\mathrm{DC}\right) / 50 ;
$$

end
endmodule

## A.2.3. Matlab Code for FIR Coefficients

```
clear all
fid1=fopen('input.txt','r');
txt=textscan(fid1,'%f','delimiter','');
t=txt{1,1}(1,1); %% number of taps
c=txt{1,1}(2,1); %% number of cursors
pret=txt{1,1}(3,1); %% number of pre tabs
prec=txt {1,1}(4,1); %% number of pre cursors
la=c+t-1;
y = zeros(1,c);
for i=1:1:c
        y(1,i)=txt{1,1}(i+4,1);
end
Hch= zeros(la,t); %% martrix channel
for i=1:1:t
    for j=1:1:la
    if j <= c
        Hch(j+i-1,i)=y(1,j);
    end
    end
end
% % clear ij txt
Zdes=[zeros(1,pret+prec) 1 zeros(1,la-pret-prec-1)]';
W=(Hch`*Hch)^(-1)*Hch`*Zdes; %% matrix of FFE Coefficient
avg=0;
for i=1:1:size(W,1)
    avg=avg+ abs(W(i,1));
end
Wn=W/avg;
fid=fopen('FFE coff.txt','w');
for i=1:1:size(Wn,1)
    fprintf(fid,%%f\r',Wn(i,1));
    fprintf(fid,'\n');
end
fclose('all') ;
```


## A.3. VGA Code

'include "disciplines.vams"
module VGA(in, out, gnd, b0, b1, b2 );
input in, gnd, b0, b1 ,b2;
output out;
electrical in,out,gnd ,b0 ,b1 ,b2 ;
parameter real vth $=0.25$;
parameter real v _one $=0.5$;
parameter real v_zero $=-0.5$;
analog begin
if $(\mathrm{V}(\mathrm{b} 0$, gnd $)<$ vth $\& \& \mathrm{~V}(\mathrm{~b} 1$, gnd $)<$ vth $\& \& \mathrm{~V}(\mathrm{~b} 2$, gnd $)<\mathrm{vth})$ begin
$\mathrm{V}($ out , gnd $)<+\left(\left(10^{\wedge}(5 / 20)\right)^{*} \mathrm{~V}(\right.$ in, gnd $\left.)\right) ;$
end
else if $(\mathrm{V}(\mathrm{b} 0$, gnd $)>$ vth $\& \& \mathrm{~V}(\mathrm{~b} 1$, gnd $)<$ vth $\& \& \mathrm{~V}(\mathrm{~b} 2$, gnd $)<\mathrm{vth})$ begin
$\mathrm{V}($ out, gnd $)<+\left(\left(10^{\wedge}(6 / 20)\right)^{*} \mathrm{~V}(\right.$ in, gnd $\left.)\right) ;$
end
else if $(\mathrm{V}(\mathrm{b} 0$, gnd $)<$ vth $\& \& \mathrm{~V}(\mathrm{~b} 1$, gnd $)>$ vth $\& \& \mathrm{~V}(\mathrm{~b} 2$, gnd $)<\mathrm{vth})$ begin
$\mathrm{V}($ out, gnd $)<+\left(\left(10^{\wedge}(7 / 20)\right)^{*} \mathrm{~V}(\right.$ in, gnd $\left.)\right) ;$
end
else if $(\mathrm{V}(\mathrm{b} 0$, gnd $)>$ vth $\& \& \mathrm{~V}(\mathrm{~b} 1$, gnd $)>$ vth $\& \& \mathrm{~V}(\mathrm{~b} 2$, gnd $)<\mathrm{vth})$ begin V (out, gnd $)<+\left((2.51)^{*} \mathrm{~V}(\right.$ in , gnd $\left.)\right)$;
end
else if $(\mathrm{V}(\mathrm{b} 0$, gnd $)<\mathrm{vth} \& \& \mathrm{~V}(\mathrm{~b} 1$, gnd $)<\mathrm{vth} \& \& \mathrm{~V}(\mathrm{~b} 2, \mathrm{gnd})>\mathrm{vth})$ begin $\mathrm{V}($ out, gnd $)<+\left(\left(10^{\wedge}(8 / 20)\right)^{*} \mathrm{~V}(\right.$ in, gnd $\left.)\right) ;$
end
else if $(\mathrm{V}(\mathrm{b} 0$, gnd $)>$ vth $\& \& \mathrm{~V}(\mathrm{~b} 1$, gnd $)<\mathrm{vth} \& \& \mathrm{~V}(\mathrm{~b} 2, \mathrm{gnd})>\mathrm{vth})$ begin
$\mathrm{V}($ out, gnd $)<+\left(\left(10^{\wedge}(9 / 20)\right)^{*} \mathrm{~V}(\right.$ in, gnd $\left.)\right) ;$
end
else if $(\mathrm{V}(\mathrm{b} 0$, gnd $)<$ vth $\& \& \mathrm{~V}(\mathrm{~b} 1$, gnd $)>$ vth $\& \& \mathrm{~V}(\mathrm{~b} 2$, gnd $)>$ vth $)$ begin
$\mathrm{V}($ out , gnd $)<+\left(\left(10^{\wedge}(10 / 20)\right)^{*} \mathrm{~V}(\right.$ in , gnd $\left.)\right) ;$
end
else
$\mathrm{V}($ out, gnd $)<+\left(\left(10^{\wedge}(11 / 20)\right)^{*} \mathrm{~V}(\right.$ in, gnd $\left.)\right) ;$
end
endmodule

## A.4. CTLE Code

// CTLE code
module ctle_vmod (inp, inn, outp, outn, D10, D11, D12, D20, D21, D22, D23, D30, D31);
inout inp,inn,outp,outn;
input D10,D11,D12,D20,D21,D22,D23,D30,D31; // control signals
electrical inp,inn,outp,outn,D10,D11,D12,D20,D21,D22,D23,D30,D31;
real wz1, wz2, wp1, wp2_real, wp3_real, wp2_imag, wp3_imag, k, rt, b, Rs, Rd, Cs; //CTLE parameters
real $n n[0: 3]$;
real pp [0:5];
analog begin
Rs=1/(V(D20)/Rs0+V(D21)/Rs1+V(D22)/Rs2+V(D23)/Rs3); // Rs value controlled with the digital signals
$\mathrm{Rd}=1 /(\mathrm{V}(\mathrm{D} 30) / \mathrm{Rd} 1+\mathrm{V}(\mathrm{D} 31) / \mathrm{Rd} 1+1 / \mathrm{Rd} 0) ; / / \mathrm{Rd}$ value controlled with the digital signals $\mathrm{Cs}=\mathrm{Cs} 1+\mathrm{V}(\mathrm{D} 10)^{*} \mathrm{Cs} 2+\mathrm{V}(\mathrm{D} 11)^{*} \mathrm{Cs} 3+\mathrm{V}(\mathrm{D} 12)^{*} \mathrm{Cs} 4 ; / / \mathrm{Cs}$ value controlled with the digital signals
$\mathrm{k}=(\mathrm{gm} * \mathrm{Rd}) /(1+\mathrm{gm} * \mathrm{Rs} / 2) ; / / \mathrm{DC}$ gain
$\mathrm{b}=\mathrm{Rd} /\left(2^{*} \mathrm{Lp}\right)$;
wz1 $=1 /\left(\mathrm{Rs}^{*} \mathrm{Cs}\right)$;
wz2=Rd/Lp;
wp1 $=(1+\mathrm{gm} * \mathrm{Rs} / 2) /(\mathrm{Rs} * \mathrm{Cs}) ; / /$ 1th pole
wp2_real=b; //2th pole real part
wp3_real=b; //3th pole real part
wp2_imag=-sqrt(-rt); //2th pole imag part
wp3_imag=sqrt(-rt); //3th pole imag part
$\mathrm{nn}[0]=-\mathrm{wz} 1$;
$\mathrm{nn}[1]=0$;
$\mathrm{nn}[2]=-\mathrm{wz} 2$;
$\mathrm{nn}[3]=0$;
pp $[0]=-w p 1$;
$\mathrm{pp}[1]=0$;
pp[2]=-wp2_real;
pp [3]=-wp2_imag;
$\mathrm{pp}[4]=-$ wp $3 \_$real;
pp[5]=-wp3_imag;
V(outp,outn) $<+\mathrm{k}^{*}$ laplace_zp(V(inp,inn),nn,pp); // outpyt defined with the transfer function of the CTLE
end
endmodule

## A.5. DFE Codes

## A.5.1. GM-Cell Code

// VerilogA for GP_DFE, gmcell, veriloga
'include "constants.vams"
'include "disciplines.vams"
module gmcell(vinp , vinn , ioutp , ioutn);
input vinp,vinn;
output ioutp,ioutn;
electrical vinp ,vinn, ioutp, ioutn;
parameter real $\mathrm{gm}=0.001$;
analog begin
I (ioutp) $<+(-1){ }^{\text {g gm }} * \mathrm{~V}($ vinp $) ;$
$\mathrm{I}($ ioutn $)<+(-1)^{*}$ gm* $\mathrm{V}($ vinn $)$;
end
endmodule

## A.5.2. Slicers Code

// VerilogA for GP_DFE, slicer, veriloga
'include "constants.vams"
'include "disciplines.vams"
module slicer(inp, inn, clk, outp, outn);
input inp, inn, clk;
output outp, outn;
electrical inp, inn, clk, outp, outn;
parameter real vth $=0.5$; // threshold of clock
parameter real datavth $=0 ; / /$ threshold of data
parameter real vhigh $=0.5$; / level of high for single terminal
parameter real vlow $=-0.5$; // level of low for single terminal
parameter real $\mathrm{td}=0$ from [0:inf); // delay of slicer
parameter real $\mathrm{tr}=1 \mathrm{e}-12$ from [0:inf); // rise time of output signal
parameter real $\mathrm{tf}=1 \mathrm{e}-12$ from [0:inf); // fall time of output signal
parameter real edgeclk $=1$; // if edgeclk=1, slicer makes decision at positive edge
// if edgeclk=0, slicer makes decision at negative edge
real sample;
real yp,yn;
analog begin

```
@(initial_step) begin
    sample=0;
```

end
$@(\operatorname{cross}(\mathrm{~V}(\mathrm{clk})-\mathrm{vth}$, edgeclk $))$ begin
sample $=\mathrm{V}($ inp $)-\mathrm{V}(\mathrm{inn})$;
if(sample $>$ datavth) begin // making decision
$y p=$ vhigh;
yn=vlow;
end
else begin
yp=vlow;
yn=vhigh;
end
end
V (outp) $<+$ transition(yp, td, tr, tf);
$\mathrm{V}($ outn $)<+\operatorname{transition}(\mathrm{yn}, \mathrm{td}, \operatorname{tr}, \mathrm{tf})$;
end
endmodule

## A.5.3. Taps Code

## // VerilogA for GP, tap, veriloga

'include "constants.vams"
'include "disciplines.vams"
module Tap ( vinp, vinn, ioutp, ioutn);
input vinp,vinn;
output ioutp,ioutn;
electrical vinp ,vinn, ioutp, ioutn;
parameter real tap_coff $=0.5$; // tap coefficient value
parameter real main_cursor= 1 ; // main cursor value
parameter real factor=.02;
analog begin

```
I(ioutp) <+ (factor*tap_coff/main_cursor)*V(vinp);
I(ioutn) <+ (factor*tap_coff/main_cursor)*V(vinn);
end
endmodule
```


## A.5.4. FLIP FLOP Code

```
// VerilogA for GP_DFE, flipflop, veriloga
```

'include "constants.vams"
'include "disciplines.vams"
module flipflop(dp, dn, clk, qp, qn);
input dp, dn, clk;
output qp, qn;
electrical dp, dn, clk, qp, qn;
parameter real vth $=0.5$; // threshold of clock
parameter real vhigh $=0.5$; / level of high for single terminal
parameter real vlow $=-0.5$; // level of low for single terminal
parameter real $\mathrm{td}=0$ from [0:inf); // delay of flipflop
parameter real $\mathrm{tr}=1 \mathrm{e}-12$ from $[0: \mathrm{inf})$; // rise time of output signal
parameter real $\mathrm{tf}=1 \mathrm{e}-12$ from [0:inf); // fall time of output signal
parameter real edgeclk=1; // if edgeclk=1 positive edge flipflop
// if edgeclk=0 negative edge flipflop
real yp,yn;
real sample;
analog begin
$@(\operatorname{cross}(\mathrm{~V}(\mathrm{clk})-\mathrm{vth}$, edgeclk $))$ begin
sample $=\mathrm{V}(\mathrm{dp})$;
if(sample $==$ vhigh) begin
yp=vhigh;
yn=vlow;
end
else if (sample == vlow )begin
$y p=$ vlow;

```
                yn=vhigh;
            end
            else begin
                yp=0;
                    yn=0;
        end
    end
    V(qp)<+ transition(yp, td, tr, tf);
    V(qn)<+ transition(yn, td, tr, tf);
end
endmodule
```


## Bibliography

[1] D. R. Stauffer, J. T. Mechler, M. A. Sorna, K. Dramstad, C. R. Ogilvie, A. Mohammad, and J. D. Rockrohr, High speed serdes devices and applications. Springer Science \& Business Media, 2008.
[2] M. Meghelli, S. Rylov, J. Bulzacchelli, W. Rhee, A. Rylyakov, H. Ainspan, B. Parker, M. Beakes, A. Chung, T. Beukema, et al., "A 10gb/s 5-tap-dfe/4-tapffe transceiver in 90 nm cmos," in 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, pp. 213-222, IEEE, 2006.
[3] B. Casper, J. Jaussi, F. O’Mahony, M. Mansuri, K. Canagasaby, J. Kennedy, E. Yeung, and R. Mooney, "A 20gb/s forwarded clock transceiver in 90 nm cmos b.," in 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, pp. 263-272, IEEE, 2006.
[4] K. N. Leung and P. K. Mok, "A sub-1-v 15-ppm//spl deg/c cmos bandgap voltage reference without requiring low threshold voltage device," IEEE Journal of Solid-State Circuits, vol. 37, no. 4, pp. 526-530, 2002.
[5] M. K. Adimulam and K. K. Movva, "A low power cmos current mode bandgap reference circuit with low temperature coefficient of output voltage," in 2012 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics, pp. 144-149, IEEE, 2012.
[6] H. Omran, M. H. Amer, and A. M. Mansour, "Systematic design of bandgap voltage reference using precomputed lookup tables," IEEE Access, vol. 7, pp. 100131-100142, 2019.
[7] H. Banba, H. Shiga, A. Umezawa, T. Miyaba, T. Tanzawa, S. Atsumi, and K. Sakui, "A cmos bandgap reference circuit with sub-1-v operation," IEEE Journal of Solid-State Circuits, vol. 34, no. 5, pp. 670-674, 1999.
[8] J. Torres, M. El-Nozahi, A. Amer, S. Gopalraju, R. Abdullah, K. Entesari, and E. Sanchez-Sinencio, "Low drop-out voltage regulators: Capacitor-less architecture comparison," IEEE Circuits and Systems Magazine, vol. 14, no. 2, pp. 6-26, 2014.
[9] S. Shirahatti, A. Nandi, et al., "A capacitor-less low drop out voltage regulator," in 2010 INTERNATIONAL CONFERENCE ON COMMUNICATION CONTROL AND COMPUTING TECHNOLOGIES, pp. 177-182, IEEE, 2010.
[10] M. Čermák, "Design of low-dropout voltage regulator," 2016.
[11] G. Rincon-Mora, Analog IC design with low-dropout regulators (LDOs). McGraw-Hill, Inc, 2009.
[12] K. N. Leung and P. K. Mok, "A capacitor-free cmos low-dropout regulator with damping-factor-control frequency compensation," IEEE Journal of Solid-State Circuits, vol. 38, no. 10, pp. 1691-1702, 2003.
[13] J.-J. Chen, F.-C. Yang, C.-M. Kung, B.-P. Lai, and Y.-S. Hwang, "A capacitorfree fast-transient-response ldo with dual-loop controlled paths," in 2007 IEEE Asian Solid-State Circuits Conference, pp. 364-367, IEEE, 2007.
[14] Y.-H. Lam, W.-H. Ki, and C.-Y. Tsui, "Adaptively-biased capacitor-less cmos low dropout regulator with direct current feedback," in Asia and South Pacific Conference on Design Automation, 2006., pp. 2-pp, IEEE, 2006.
[15] M. El-Nozahi, A. Amer, J. Torres, K. Entesari, and E. Sánchez-Sinencio, "High psr low drop-out regulator with feed-forward ripple cancellation technique," IEEE Journal of Solid-State Circuits, vol. 45, no. 3, pp. 565-577, 2010.
[16] H. Higashi, S. Masaki, M. Kibune, S. Matsubara, T. Chiba, Y. Doi, H. Yamaguchi, H. Takauchi, H. Ishida, K. Gotoh, et al., "A 5-6.4-gb/s 12-channel transceiver with pre-emphasis and equalization," IEEE Journal of solid-state circuits, vol. 40, no. 4, pp. 978-985, 2005.
[17] A. Momtaz, D. Chung, N. Kocaman, J. Cao, M. Caresosa, B. Zhang, and I. Fujimori, "A fully integrated $10-\mathrm{gb} / \mathrm{s}$ receiver with adaptive optical dispersion equalizer in $0.13-\mu \mathrm{m}$ cmos," IEEE journal of solid-state circuits, vol. 42, no. 4, pp. 872-880, 2007.
[18] C. Cai, Y. Zhou, and J. Zhao, " $5-20$ gbit/s adaptive ctle with spectrum balancing method," Electronics Letters, vol. 54, no. 5, pp. 274-276, 2018.
[19] Y.-H. Kim, Y.-J. Kim, T. Lee, and L.-S. Kim, "A 21-gbit/s 1.63-pj/bit adaptive ctle and one-tap dfe with single loop spectrum balancing method," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 2, pp. 789-793, 2015.
[20] Y. Choi and Y.-B. Kim, "A 10-gb/s receiver with a continuous-time linear equalizer and 1-tap decision-feedback equalizer," in 2015 IEEE 58th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1-4, IEEE, 2015.
[21] D. Liu, L. He, Y.-K. Chou, and F. Lin, "A low-power 10-gb/s receiver with merged ctle and dfe summer," in 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), pp. 1579-1581, IEEE, 2016.
[22] S. Agarwal and V. S. R. Pasupureddi, "A 5-gb/s adaptive ctle with eyemonitoring for multi-drop bus applications," in 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 410-413, IEEE, 2014.
[23] B.-J. Lim and C. Yoo, "Continuous-time linear equalizer with automatic boosting gain adaptation and input offset cancellation," International Journal of Circuit Theory and Applications, vol. 46, no. 11, pp. 2151-2159, 2018.
[24] G. Chen, M. Gong, D. Fu, and J. Zhang, "A high efficient ctle for 12.5 gbps receiver of jesd204b standard," IEICE Electronics Express, vol. 15, no. 15, pp. 20180617-20180617, 2018.
[25] Y.-F. Lin, C.-C. Huang, J.-Y. M. Lee, C.-T. Chang, and S.-I. Liu, "A 5-20 $\mathrm{gb} / \mathrm{s}$ power scalable adaptive linear equalizer using edge counting," in 2014 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 273-276, IEEE, 2014.
[26] Q. Pan, Y. Wang, Y. Lu, and C. P. Yue, "An $18-\mathrm{gb} / \mathrm{s}$ fully integrated optical receiver with adaptive cascaded equalizer," IEEE Journal of Selected Topics in Quantum Electronics, vol. 22, no. 6, pp. 361-369, 2016.
[27] B. Lim and C. Yoo, "A 12-gb/s continuous-time linear equalizer with offset canceller," Journal of Semiconductor Technology and Science, vol. 19, no. 2, pp. 220-225, 2019.
[28] C.-F. Liao and S.-I. Liu, "A $40 \mathrm{gb} / \mathrm{s}$ cmos serial-link receiver with adaptive equalization and clock/data recovery," IEEE journal of solid-state circuits, vol. 43, no. 11, pp. 2492-2502, 2008.
[29] G. Zhang, P. Chaudhari, and M. M. Green, "A bicmos 10gb/s adaptive cable equalizer," in Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS'03., vol. 1, pp. I-I, IEEE, 2003.
[30] S. Otaka, H. Tanimoto, S. Watanabe, and T. Maeda, "A 1.9-ghz si-bipolar variable attenuator for phs transmitter," IEEE Journal of Solid-State Circuits, vol. 32, no. 9, pp. 1424-1429, 1997.
[31] P. Orsatti, F. Piazza, and Q. Huang, "A 71-mhz cmos if-baseband strip for gsm," IEEE Journal of Solid-State Circuits, vol. 35, no. 1, pp. 104-108, 2000.
[32] H. O. Elwan and M. Ismail, "Digitally programmable decibel-linear cmos vga for low-power mixed-signal applications," IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, no. 5, pp. 388-398, 2000.
[33] H. D. Lee, C.-H. Kim, and S. Hong, "A sige bicmos transmitter module for imt2000 applications," IEEE microwave and wireless components letters, vol. 14, no. 8, pp. 371-373, 2004.
[34] J. Kwon, K. Kim, W. Song, and G.-H. Cho, "Wideband high dynamic range cmos variable gain amplifier for low voltage and low power wireless applications," Electronics letters, vol. 39, no. 10, pp. 759-760, 2003.
[35] T. Yamaji, N. Kanou, and T. Itakura, "A temperature-stable cmos variable-gain amplifier with $80-\mathrm{db}$ linearly controlled gain range," IEEE Journal of SolidState Circuits, vol. 37, no. 5, pp. 553-558, 2002.
[36] J. Hauptmann, F. Dielacher, R. Steiner, C. C. Enz, and F. Krummenacher, "A low-noise amplifier with automatic gain control and anticlipping control in cmos technology," IEEE Journal of Solid-State Circuits, vol. 27, no. 7, pp. 974-981, 1992.
[37] H. Liu, C. C. Boon, X. He, X. Zhu, X. Yi, L. Kong, and M. C. Heimlich, "A wideband analog-controlled variable-gain amplifier with db-linear characteristic for high-frequency applications," IEEE transactions on microwave theory and techniques, vol. 64, no. 2, pp. 533-540, 2016.
[38] M. Dongi and M. Jalali, "A wideband cmos vga with db-linear gain based on active feedback and negative capacitance," in 2017 Iranian Conference on Electrical Engineering (ICEE), pp. 506-510, IEEE, 2017.
[39] Y. Wang, B. Afshar, L. Ye, V. C. Gaudet, and A. M. Niknejad, "Design of a low power, inductorless wideband variable-gain amplifier for high-speed receiver systems," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 4, pp. 696-707, 2011.
[40] T. B. Kumar, K. Ma, and K. S. Yeo, "A 7.9-mw 5.6-ghz digitally controlled variable gain amplifier with linearization," IEEE transactions on microwave theory and techniques, vol. 60, no. 11, pp. 3482-3490, 2012.
[41] Y. Wang, C. Hull, G. Murata, and S. Ravid, "A linear-in-db analog baseband circuit for low power 60 ghz receiver in standard 65 nm cmos ," in 2013 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), pp. 225-228, IEEE, 2013.
[42] V. Melikyan, A. Balabanyan, A. Hayrapetyan, and N. Melikyan, "Receiver/transmitter input/output termination resistance calibration method," in 2013 IEEE XXXIII International Scientific Conference Electronics and Nanotechnology (ELNANO), pp. 126-130, IEEE, 2013.
[43] B. Nikolic, V. G. Oklobdzija, V. Stojanovic, W. Jia, J. K.-S. Chiu, and M. M.-T. Leung, "Improved sense-amplifier-based flip-flop: Design and measurements," IEEE Journal of Solid-State Circuits, vol. 35, no. 6, pp. 876-884, 2000.
[44] B. Razavi, "The strongarm latch [a circuit for all seasons]," IEEE Solid-State Circuits Magazine, vol. 7, no. 2, pp. 12-17, 2015.
[45] B. Goll and H. Zimmermann, "A comparator with reduced delay time in $65-\mathrm{nm}$ cmos for supply voltages down to 0.65 v," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 56, no. 11, pp. 810-814, 2009.
[46] S. Rahmani and M. Ghaznavi-Ghoushchi, "Design and analysis of a high speed double-tail comparator with isomorphic latch-preamplifier pairs and tail bootstrapping," Analog Integrated Circuits and Signal Processing, vol. 93, no. 3, pp. 507-521, 2017.
[47] K.-M. Lei, P.-I. Mak, and R. P. Martins, "Systematic analysis and cancellation
of kickback noise in a dynamic latched comparator," Analog Integrated Circuits and Signal Processing, vol. 77, no. 2, pp. 277-284, 2013.
[48] S. Ibrahim and B. Razavi, "Low-power cmos equalizer design for $20-\mathrm{gb} / \mathrm{s}$ systems," IEEE journal of solid-state circuits, vol. 46, no. 6, pp. 1321-1336, 2011.
[49] M. Mishra and S. Akashe, "High performance, low power $200 \mathrm{gb} / \mathrm{s}$ 4: 1 mux with tgl in 45 nm technology," Applied Nanoscience, vol. 4, no. 3, pp. 271-277, 2014.
[50] M. N. Sabry, H. Omran, and M. Dessouky, "Systematic design and optimization of operational transconductance amplifier using gm/id design methodology," Microelectronics journal, vol. 75, pp. 87-96, 2018.
[51] H. Kimura, P. M. Aziz, T. Jing, A. Sinha, S. P. Kotagiri, R. Narayan, H. Gao, P. Jing, G. Hom, A. Liang, et al., "A $28 \mathrm{gb} / \mathrm{s} 560 \mathrm{mw}$ multi-standard serdes with single-stage analog front-end and 14-tap decision feedback equalizer in 28 nm cmos," IEEE Journal of Solid-State Circuits, vol. 49, no. 12, pp. 3091-3103, 2014.
[52] B. Razavi, "The current-steering dac [a circuit for all seasons]," IEEE SolidState Circuits Magazine, vol. 10, no. 1, pp. 11-15, 2018.
[53] R. Mohanavelu and P. Heydari, "A novel ultra high-speed flip-flop-based frequency divider," in 2004 IEEE International Symposium on Circuits and Systems (ISCAS), vol. 4, pp. IV-169, IEEE, 2004.
[54] D. Anoop, N. K. YB, and M. Vasantha, "High performance sense amplifier based flip flop for driver applications," in 2017 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS), pp. 129-132, IEEE, 2017.
[55] M. Matsui, H. Hara, Y. Uetani, L.-S. Kim, T. Nagamatsu, Y. Watanabe, A. Chiba, K. Matsuda, and T. Sakurai, "A $200 \mathrm{mhz} 13 \mathrm{~mm} / \mathrm{sup} 2 / 2-\mathrm{d}$ dct macrocell using sense-amplifying pipeline flip-flop scheme," IEEE Journal of Solid-State Circuits, vol. 29, no. 12, pp. 1482-1490, 1994.
[56] M. Maiti, A. Paul, S. K. Saw, and A. Majumder, "A dynamic current mode d-flipflop for high speed application," in 2019 3rd International Conference on Electronics, Materials Engineering $\mathcal{E}^{\text {B Nano-Technology (IEMENTech), pp. 1-3, }}$ IEEE, 2019.
[57] S. Suman, K. Sharma, and P. Ghosh, "Analysis and design of current starved ring vco," in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 3222-3227, IEEE, 2016.

