CUFE
EECE
Main Stream System

Spring 2021
Senior Level Graduation Project

ELC480

## Graduation Project

## Wideband sub-6GHz transmitter based on RF Mixing-DAC for 5G communications Final Report

Submitted by:

Islam Mohamed Shaher
Abdelrahman Ayman Mahmoud
Mohamed Mahmoud Gaber

Supervised by:
Dr. Hassan Mostafa

## ACKNOWLEDGMENTS

Listed below are the names of the people who provided us with significant help in developing our graduation project in addition to our sponsor Si-Vision LLC.

To all we extend our sincere thanks.
********

Dr. Hassan Mostafa

Eng. Ahmed Hamed
Eng. Hassan Aly
Eng. Hesham Ahmed
Eng. Ahmed Yassin


#### Abstract

\section*{Wideband sub-6GHz transmitter based on RF Mixing-DAC for 5G communications using 65 nm CMOS technology}


With the vast and continuous improvements in the communications field, along with the continuous development in the IOT field, sensors, and devices, the urgent need for an efficient transmitter suitable for low energy purposes and be able to work and achieve the standards of 5G communications increases. RFmixing DAC based transmitter has been purposed as a suitable solution/product for these requirements, which will be suitable to be used in mobile phones, IOT devices, and smart cities allowing their needs and requirements for highlyperformance communication operations.

Low power is necessary for mobile \& IOT devices because consumers are expected to use the devices for extended periods without charging or changing the power source.

The performance of this transmitters is measured through power efficiency, low cost, and multi-band/multi-carrier communication standards support such as $\mathrm{Wi}-\mathrm{Fi}, 3 \mathrm{G}$, and 4G. Those multicarrier standards require both high linearity and large bandwidth. For example, the combination LTE, GSM and WCDMA needs at least SFDR>80dBc, IMD $<-80 \mathrm{dBc}$ in 300 MHz bandwidth and 3.5 GHz frequency. Current-Steering DACs have large bandwidth but their linearity degrades at high frequency. As a result, a mixer is needed at the DAC's output and have a high linearity. An RF-DAC topology integrates both functionalities in one stage which results in different trade-offs and enhance linearity. The goal of this work is to implement a wideband RF-frontend of a transmitter through a 16-bit RF-DAC to achieve IMD <-60dBc and SFDR>60dB. This work is a supplement to another work implemented the RF-DAC core, PMU,
and system-level simulations and specifications. In this work, the implementation of PA, reconfigurable dividers and I/Q dividers are done.

The reconfigurable frequency dividers are used for fast and slow clocks within the chip. The fast clock is used as a sampling frequency of the RF-DAC and logic driving of the sigma-delta block. The slow clock is used for the fractional sample rate conversion block. Different divider and logic topologies are investigated for optimum operation. The dividers have been designed twice, once for maximum frequency range and the other for optimum power consumption for the required transmitter range which is a subset of the maximum range.

We had designed a linear power amplifier using class AB, differential topology and using a balun we convert the differential output to a single one to connect the single ended antenna. We had achieved $\mathrm{P}_{\text {out }}=9 \mathrm{dBm}$, Gain $=17 \mathrm{dBm}$, Output Intercept point $3=26 \mathrm{dBm}$, and $\operatorname{PAEmax}=12.66 \%$ at out Pin,max=-8dBm, using Vdd=1.5v.

In cooperation with Si-Vision LLC, a state-of-art design of the integrated CMOS RF Mixing-DAC transceiver is carried out.

## Table Of Contents

Acknowledgments ..... II
Abstract ..... III
Table Of Contents ..... V
List Of Figures ..... VIII
List Of Tables ..... XII
Chapter 1 Introduction ..... 1
1.1 Introduction to our project ..... 1
1.2 Overview about the old status and introducing the trending architecture for our transmitter ..... 2
1.3 Project schematic ..... 4
1.4 The goal of our Thesis ..... 7
1.5 Thesis Outline ..... 8
Chapter 2 reconfigurable frequency dividers ..... 9
2.1 Choice of the divider logic ..... 9
2.1.1 CML ..... 9
2.1.2 TSPC ..... 11
2.2 Choice of the divider topology ..... 13
2.2.1 Miller divider. ..... 13
2.2.2 Ripple counter ..... 14
2.2.3 Pulse swallow divider ..... 14
2.2.4 Multi Modulus Divider MMD ..... 17
2.3 Implementation ..... 18
2.3.1 Critical path ..... 18
2.3.2 Modular approach ..... 23
2.3.3 Comparison of the implementations of the modular MMD ..... 24
2.3.4 Extension of the lower division range ..... 26
2.3.5 Extension of the upper division range ..... 27
2.4 Results ..... 29
2.4.1 Transient Simulation ..... 29
2.4.2 Frequency range ..... 29
2.4.3 Power dissipation ..... 30
2.4.4 Jitter ..... 31
2.4.5 Corners ..... 32
2.4.6 Summary of the achieved specifications ..... 34
Chapter 3 : I/Q dividers ..... 35
3.1 Introduction: ..... 35
3.1.1 Analog Dividers: ..... 35
3.1.2 Digital Dividers: ..... 35
3.2 Methodology: ..... 35
3.3 Specification: ..... 36
3.4 Self-biased LO Buffer: ..... 36
3.5 I/Q Dividers literature survey(Divide-by-2): ..... 36
3.5.1 Razavi Divider: ..... 37
3.5.2 Wang Divider: ..... 39
3.5.3 I/Q Divider: ..... 42
3.6 I/Q dividers comparison ..... 45
3.7 Simulations: ..... 45
3.7.1 Self-biased LO Buffer: ..... 45
3.7.2 I/Q Divider: ..... 49
3.8 Corner simulation ..... 53
3.9 Summary of the results: ..... 62
Chapter 4 : Power Amplifier. ..... 63
4.1 Introduction ..... 63
4.2 General considerations ..... 63
4.2.1 Effect of High Currents ..... 63
4.2.2 Efficiency ..... 64
4.2.3 Linearity ..... 65
4.2.4 Single-Ended and Differential PAs ..... 66
4.3 Classification of Power Amplifiers ..... 68
4.3.1 Classical PAs ..... 68
4.3.2 Switch Mode PAs ..... 79
4.3.3 Summary of PAs Classes ..... 85
4.4 Specifications and Design Methodology ..... 85
4.4.1 The main stage with the balun ..... 86
4.4.2 The biasing stage ..... 87
4.5 Other topics ..... 88
4.5.1 Conjugate Match vs. Loadline Match ..... 88
4.5.2 Transformer (Balun) Analysis ..... 92
Simulation Results ..... 94
4.5.3 Typical Simulation Results ..... 95
4.5.4 Performance Summary ..... 115
Chapter 5 : Conclusion ..... 116
Future work ..... 117
References ..... 118

## List Of Figures

Figure 1-1 Conventional Analog Transmitter ..... 3
Figure 1-2 "Digital RF" Reconfigurable Transmitter ..... 3
Figure 1-3: The project schematic Error! Bookmark not defined.
Figure 2-1 CML latch implementation ..... 10
Figure 2-2 CML flipflop ..... 10
Figure 2-3 Stacking in CML logic. ..... 11
Figure 2-4 TSPC and ETSPC ..... 12
Figure 2-5 TSPC with logic gate included ..... 12
Figure 2-6) a) Miller divider [8] ..... 13
Figure 2-7 Ripple counter implementation ..... 14
Figure 2-8 Pulse Swallow divider Architecture ..... 15
Figure 2-9 Swallow Counter implementation ..... 16
Figure 2-10 Divide by 8/9 circuit ..... 16
Figure 2-11 2/3 divider implementation ..... 17
Figure 2-12 state diagram of $2 / 3$ divider ..... 17
Figure 2-13 implementations of the $16 . .31$ divider [4] ..... 18
Figure 2-14 Critical path [5] ..... 19
Figure 2-15 Switching sequence showing the critical path ..... 19
Figure 2-16 The critical path after modification ..... 20
Figure 2-17 The switching sequence after modification ..... 20
Figure 2-18 Second critical path in the old connection ..... 21
Figure 2-19 Schematic of the $2 / 3$ divider ..... 22
Figure 2-20 MMD modular approach [11] ..... 23
Figure 2-2 1 divide-by-2/3 modular cell [9] ..... 23
Figure 2-22 DFF only implementation of the modular MMD ..... 24
Figure 2-23 Timing diagram of the FF only implementation ..... 25
Figure 2-24 critical path of the FF only implementation ..... 25
Figure 2-25 Critical path of the FF only implementation ..... 25
Figure 2-26 Critical path of the latch implementation ..... 26
Figure 2-27 Temporary division error ..... 26
Figure 2-28 Timing diagram after seamless switching. ..... 27
Figure 2-29 Proposed 2/3 divider for extended MMD range ..... 28
Figure 2-30 Transient simulations of the 4-Stage MMD ..... 29
Figure 2-31 Output frequency ratio vs Fin when input prescaler is 31 ..... 30
Figure 2-32 Average power vs Input frequency ..... 30
Figure 2-33 Jitter PSD vs offset frequency ..... 31
Figure 2-34 Division ratio at different corners ..... 32
Figure 2-35 Rise time at different conrners ..... 33
Figure 2-36 Fall time at different corners ..... 33
Figure 3-1 Self-biased LO Buffer ..... 36
Figure 3-2 Razavi divider ..... 37
Figure 3-3 Razavi divider in case that the input clock is high ..... 38
Figure 3-4 Razavi divider in case that the input clock is low ..... 39
Figure 3-5 Wang Divider ..... 39
Figure 3-6 Wang divider in case that the input clock is high ..... 40
Figure 3-7 Wang divider in case that the input clock is low ..... 41
Figure 3-8 The output signals of Wang Divider ..... 42
Figure 3-9 The output of the FF(IP) if width of the cross-coupled latch is greater than the width of the input data transistors ..... 43
Figure 3-10 Latch of the LO divider ..... 44
Figure 3-11 I/Q divider[4] ..... 44
Figure 3-12 Schematics of self-biased LO Buffer with transistor sizing ..... 46
Figure 3-13 The self-biased Buffer output vs time ..... 46
Figure 3-14 Schematics of chain of inverters that comes after self-biased ..... LO
buffer. ..... 47
Figure 3-15 The output of first inverter vs time ..... 47
Figure 3-16 CLKN and CLKP vs time ..... 48
Figure 3-17 Schematics of I/Q latch with each transistor sizing ..... 49
Figure 3-18 Schematics of I/Q Divider ..... 50
Figure 3-19 CLKN and output of the divider vs time ..... 50
Figure 3-20 schematics of Chain of inverters for the divider outputs with transistor sizing ..... 51
Figure 3-21 Schematic of modelling of tree network with transistor sizing ..... 52
Figure 3-22 The final schematics of I/Q divider ..... 52
Figure 3-23 The final output of I/Q divider ..... 53
Figure 3-24 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 11 GHZ ..... 54
Figure 3-25 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 11 GHZ ..... 55
Figure 3-26 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider ..... 56of supply 1.32 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 11 GHZ
Figure 3-27 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{c}$ and frequency 11 GHZ ..... 57
Figure 3-28 phase-shift, rise-time, fall-time and duty-cycle vs corners at a dividerof supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 8 GHZ58
Figure 3-29 phase-shift, rise-time, fall-time and duty-cycle vs corners at a dividerof supply 1.08 v , temperature $-40^{\circ} \mathrm{C}$ and frequency 8 GHZ .59
Figure 3-30 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 8 GHZ ..... 60
Figure 3-31 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{c}$ and frequency 8 GHZ ..... 61
Figure 4-1: Polar RF-Mixing-DAC transmitter.... Error! Bookmark not defined.Figure 4-2: Tapering in a TX chain.64
Figure 4-3: PA characterization by (a) two-tone test, (b) compression. ..... 66
Figure 4-4: Upconverter/PA interface with (a) single-ended or, (b) balun connection ..... 66
Figure 4-5: (a) feedback in single ended PA, (b) less problematic situation in a differential PA. ..... 67
Figure 4-6: Use of a balun between the PA and antenna ..... 68
Figure 4-7 : The conduction angle of the classical PAs. ..... 69
Figure 4-8: Class A stage ..... 69
Figure 4-9: Class B stage ..... 71
Figure 4-10: Output network currents during (a) positive and (b) negative output half cycles ..... 72
Figure 4-11: Current and voltage waveforms in a class B stage ..... 73
Figure 4-12: Class B circuit with resonant secondary network. ..... 73
Figure 4-13: Class B circuit for efficiency calculation.Error! Bookmark ..... not
defined.
Figure 4-14: Class C stage and its waveforms ..... 76
Figure 4-15: Efficiency vs. Theta ..... 77
Figure 4-16: Pout vs. Theta ..... 78
Figure 4-17: Class E stage ..... 79
Figure 4-18: (a) Class E stage, (b) condition to ensure minimal overlap between drain current and voltage, (c) condition to ensure low sensitivity to timing errors. ..... 80
Figure 4-19: Class E matching network viewed as a damped network ..... 81
Figure 4-20: Waves of class E. ..... 81
Figure 4-21: Class E, off mode ..... 81
Figure 4-22: Design equations of class E ..... 82
Figure 4-23: Class E, Drain shape ..... 82
Figure 4-24: Class F stage ..... 83
Figure 4-25: Eff. of class F ..... 83
Figure 4-26: Wave shapes of class F ..... 84
Figure 4-27: PA Classes comparison. ..... 85
Figure 4-28: Circuitry of a current source with source resistance and load. ..... 89
Figure 4-29: Obtaining optimum load impedance in loadline match. ..... 90
Figure 4-30: Compression characteristics for conjugate (S22) match (solid curve)and power match (dashed curve). The 1-dB compression points (B, B') andmaximum linear power points (A, A') show improvements under power matchconditions.92
Figure 4-31: Transformer output network with non-idealities. ..... 93
Figure 4-32: Our Lineaer PA Schematic ..... 95
Figure 4-33: vout and vin for 4.01 Ghz ..... 96
Figure 4-34: vout and vin for 3.96Ghz ..... 96
Figure 4-35: vout and vin at 4.06Ghz ..... 97
Figure 4-36: vout and vin at 1 Ghz , without changing the matching network. ..... 97
Figure 4-37: vout and vin at 1 Ghz , with changing the matching network ..... 98
EECE ..... X
Figure 4-38: vout and vin at 6Ghz, without changing the matching network. ..... 98
Figure 4-39: vout and vin at 6Ghz, with changing the matching network ..... 99
Figure 4-40: IIP3 for 4.01 Ghz ..... 100
Figure 4-41: IIP3 for 3.96 Ghz ..... 100
Figure 4-42: IIP3 for 4.06 Ghz ..... 101
Figure 4-43: IIP3 for 1 Ghz, without changing ..... 101
Figure 4-44: IIP3 for 1 Ghz , with changing ..... 102
Figure 4-45: IIP3 for 6 Ghz , without changing ..... 102
Figure 4-46: IIP3 for 6 Ghz , with changing ..... 103
Figure 4-47: Compression point For $\mathrm{F}=4.01 \mathrm{Ghz}$ ..... 104
Figure 4-48: Compression point For $\mathrm{F}=3.96 \mathrm{Ghz}$ ..... 104
Figure 4-49: Compression point For $\mathrm{F}=4.06 \mathrm{Ghz}$ ..... 105
Figure 4-50: Compression point For $\mathrm{F}=1 \mathrm{GHZ}$, without changing. ..... 105
Figure 4-51: Compression point For $\mathrm{F}=1 \mathrm{GHZ}$, with changing ..... 106
Figure 4-52: Compression point For $\mathrm{F}=6 \mathrm{GHZ}$, without changing. ..... 106
Figure 4-53: Compression point For $\mathrm{F}=6 \mathrm{GHZ}$, with changing ..... 107
Figure 4-54: Efficiency of $\mathrm{F}=4.01 \mathrm{Ghz}$ ..... 108
Figure 4-55:n Efficiency of $\mathrm{F}=3.96 \mathrm{Ghz}$ ..... 108
Figure 4-56: Efficiency of $\mathrm{F}=4.06 \mathrm{Ghz}$ ..... 109
Figure 4-57: Efficiency of $\mathrm{F}=1 \mathrm{Ghz}$, without changing ..... 109
Figure 4-58: Efficiency of $\mathrm{F}=6 \mathrm{Ghz}$, without changing ..... 110
Figure 4-59: Efficiency of $\mathrm{F}=1 \mathrm{Ghz}$, with changing ..... 110
Figure 4-60: Efficiency of $\mathrm{F}=6 \mathrm{Ghz}$, with changing ..... 111
Figure 4-61: The harmonics of $\mathrm{F}=4.01 \mathrm{Ghz}$ ..... 112
Figure 4-62: The harmonics of $\mathrm{F}=3.96 \mathrm{Ghz}$ ..... 112
Figure 4-63: The harmonics of $\mathrm{F}=1 \mathrm{Ghz}$, without changing ..... 113
Figure 4-64: The harmonics of $\mathrm{F}=4.06 \mathrm{Ghz}$ ..... 113
Figure 4-65: The harmonics of $\mathrm{F}=1 \mathrm{Ghz}$, with changing ..... 114
Figure 4-66: The harmonics of $\mathrm{F}=6 \mathrm{Ghz}$, without changing ..... 114
Figure 4-67: The harmonics of $\mathrm{F}=6 \mathrm{Ghz}$, with changing ..... 115

## List Of Tables

Table 1-1: Thesis Outline ..... 8
Table 2-1 Required specifications on the dividers ..... 9
Table 2-2 Transistors aspect ratios ..... 22
Table 2-3 Summary of the required and achieved specifications ..... 34
Table 2-4 Comparison of the divider with other dividers from the literature ..... 34
Table 3-1 A table that illustates the properties of Wang divider, Razavi divider and this work divider ..... 45
Table 3-2 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 11 GHZ ..... 54
Table 3-3 phase-shift, rise-time, fall-time and duty-cycle vs corners at a dividerof supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 11 GHZ55
Table 3-4 phase-shift, rise-time, fall-time and duty-cycle vs corners at a dividerof supply 1.32 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 11 GHZ56
Table 3-5 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{c}$ and frequency 11 GHZ ..... 57
Table 3-6 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 8 GHZ ..... 58
Table 3-7 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider
of supply 1.08 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 8 GHZ ..... 59
Table 3-8 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider
of supply 1.32 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 8 GHZ ..... 60
Table 3-9 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{c}$ and frequency 8 GHZ ..... 61
Table 3-10 A Table that illustrates the specifications achieved and required ..... 62
Table 4-1: Specifications of the Required PA ..... 86
Table 4-2: Summary of Typical Results. ..... 116

## CHAPTER 1 Introduction

### 1.1 Introduction to our project

There has never been a technology in history that has achieved such rapid market penetration as mobile cellular subscriptions and related cellular communications. The significant reduction in production costs is one of the primary elements driving this continuous progress in the mobile devices and cellular communications fields.

The semiconductor industry has seen a doubling of the number of transistors per chip area every 18 months since the creation of the integrated circuit by Jack Kilby and Robert Noyce in 1958. Gordon Moore had predicted this exponential rise in 1965 [1]. His forecast still holds today, and it is known as "Moore's law."

Moore's law not only results in reduced production costs, but it also results in current mobile phones having huge signal and data processing capabilities. In recent years, cellphones have outperformed most desktop computers in terms of computational capability.

Smartphones must enable high-data-rate wireless connections to provide the best possible user experience. Furthermore, to access a wide range of services, these devices must support a variety of communication standards, such as UMTS, wireless LAN, Bluetooth, GPS, and so on.

Because of the vast number of supported communication standards some of which allow operation in several frequency bands - an equal number of specialized, high-performance RF frontends is required.

New OFDM (Orthogonal Frequency Division Multiplex)-based technologies like 5G, in particular, necessitate high bandwidth, linearity, and dynamic range. As a result, based on the performance criteria that we are aiming
for, we will design a powerful RF frontend based on a reconfigurable RF-DAC transmission that is optimized for low-energy and suited for future 5G communications.

### 1.2 Overview about the old status and introducing the trending architecture for our transmitter

Analog and RF parts, unfortunately, do not drop in size as much as digital circuits when shifting from one CMOS technology node to the next. With a linear drop in CMOS feature size, the necessary area of digital circuits shrinks quadratically. The area of the analogue blocks, on the other hand, reduces linearly, at most [2]. The reason for this is because the lengths of transistors in analogue circuits cannot be scaled forever due to the short-channel effect when a specified circuit performance is required. Furthermore, capacitors, inductors, and wires, which barely shrink from one technological node to the next, account for a significant amount of the analogue block's footprint. As a result, analogue blocks in nanoscale CMOS use a significant amount of the total area of a mixedsignal circuit. Worse, putting many parallel radios on a single chip necessitates even more area and raises production costs.

Fig. 1.1 illustrates an older block diagram of one transmit path's required analogue and RF components. Aside from the intricacy of the analogue integration circuits required, separate off-chip components such as saw filters, PA, and antenna switches are always required for each transmit and receive. Offchip components add a lot to a cellular phone's or IoT device's overall bill of materials (BOM). Reconfigurable multi-standard, multi-band radios are required to avoid several concurrent radios on the same device. The term "Software Defined Radio (SDR)" was coined as a result of this. As shown in Fig. 1.2 (a) the most basic SDR transmitter includes merely of digital signal processing and a high-speed, high-resolution, high-output-power digital to analogue converter (DAC) that drives the antenna directly. In the end, this system would be versatile and reconfigurable to accommodate various communication standards. To meet the Nyquist requirement, the ADC and DAC would have to operate at a multiple
of the RF carrier frequency. In addition, the converter must have a large dynamic range and, as a result, a very high resolution. Converters with sufficient performance to produce such a system are difficult to implement with today's technology. Furthermore, the extraordinarily high-power consumption of such a converter raises the question of whether such an ideal SDR system is appropriate for all mobile devices or for the low-energy needs that are currently required for all IOT smart devices and nodes.

Some reconfigurable transmitters based on the so-called RF-DAC architecture have been developed in recent years. This type of transmitter appears to be a great alternative to traditional transmitter designs. Broadband OFDM transmission is enabled via RF-DAC-based transmitters, which are reconfigurable for new mobile communication standards such as 5G. The RFDAC is a single component that combines a $\mathrm{D} / \mathrm{A}$ converter and a mixer. The digital blocks do not have to run at a multiple of the carrier frequency because this architecture still includes a mixer. Figure 1.2 (b) shows a simplified block design of an RF-DAC transmitter. Digital, programmable blocks can be used to replace analogue baseband circuitry. As a result, RF-DAC-based transmitters are seen as the first step toward Software Defined Radio (SDR).


Figure 1-1 Conventional Analog Transmitter
Figure 1-1


Figure 1-2 "Digital RF" Reconfigurable Transmitter

The RF-DAC architecture is not only exceedingly flexible, but it is also ideally suited for sub-65nm CMOS nodes. Nanoscale CMOS nodes enable digital circuits that are both speedy and power-efficient. However, implementing highperformance analogue circuits is difficult due to the low supply voltages required by nanoscale CMOS and the narrow channel effects of the transistors. The transistors in the RF-DAC, on the other hand, act as switches and have no strict linearity requirements. Digital signal processing can replace high-performance analogue baseband building elements that are required for legacy analogue transmitter topologies, such as variable gain amplifiers (VGAs). In addition to attaining nanoscale CMOS node compatibility, substituting analogue with digital building blocks greatly simplifies the transmitter design's portability from one CMOS process node to the next.

### 1.3 Project schematic

The following is a simplified block diagram for the proposed RFDAC transmitter. The baseband processor produces 16 -bit output for the I and Q branches. The RFDAC is divided into most unary bits and least binary bits for area, linearity, and layout complexity optimization. A common local oscillator is used to feed the transmitter at twice the carrier frequency and a differential IQ divider is used to feed the carrier to the I and Q branches of the DAC. Also, a fast and a slow clock divider are used. The fast clock is required for the sampling frequency of the RFDAC which is connected to the latches of the data switches for binary and unary bits while the slow clock is required for baseband sampling and clock domain conversion. The output of the RFDAC from the two branches are summed before the power amplifier stage which feeds an output antenna.


Figure 1-3: The project schematic

The top-level specifications of the RF-DAC system based on system simulations and available standards are depicted in table 1 while the specifications of the RF-DAC cells implementation, digital blocks, and PA are depicted in table 1-2, 1-3, and 1-4 .

Table 1-1 Top-level specifications of the RF-DAC

| Specification | Required |
| :---: | :---: |
| Technology | $65 \mathrm{~nm}-\mathrm{CMOS}$ |
| RF-Mixing-DAC <br> Resolution | $16-\mathrm{bits}$ |
| Frequency of <br> Measurements | 3.5 GHz |
| Sampling Rate for <br> Measurements | 1.75 GSps |
| Maximum Sampling <br> Rate | 1.95 GSps |
| IMD | $>60 \mathrm{db}$ |
| $\boldsymbol{S F D R}_{\boldsymbol{R B}}$ | $>60 \mathrm{db}$ |
| $\boldsymbol{P}_{\text {out }}$ | 7 dbm |
| Operating Frequencies | $1-6 \mathrm{GHz}$ |
| Maximum Bandwidth | 150 MHz |
| RB | 300 MHz |
| ACLR-5G | $>45 \mathrm{dbc}$ |
| Back-off | 15 db |

Table 1-2 Mixing-DAC cells specifications

| Specification | Required |
| :---: | :---: |
| Technology | $65 \mathrm{~nm}-\mathrm{CMOS}$ |
| RF-Mixing-DAC <br> Resolution | $16-\mathrm{bits}$ |
| Frequency of <br> Measurements | 3.5 GHz |
| Sampling Rate for <br> Measurements | 1.75 GSps |
| Maximum Sampling <br> Rate | 1.95 GSps |
| $\boldsymbol{I M D}_{\mathbf{3 - F u l l s c a l e}}$ | $>65 \mathrm{db}$ |
| $\boldsymbol{S F D R}_{\text {RB-fullScale }}$ | $>65 \mathrm{db}$ |
| $\boldsymbol{P}_{\text {out }}$ | -3.02 dbm |
| DNL $_{\text {Max }}$ | 0.8 LSB |
| $\boldsymbol{I N L}_{\text {Max }}$ | 2 LSB |
| Peak to Peak Voltage | 0.5 volts |
| Operating Frequencies | $1-6 \mathrm{GHz}$ |
| Maximum Bandwidth | 150 MHz |
| NSD | $<-164 \mathrm{dBm} / \mathrm{Hz}$ |
| RB | 300 MHz |
| ACLR-5G | $>45 \mathrm{dbc}$ |

Table 1-3 Digital blocks specifications

| Specification | Required |
| :---: | :---: |
| Technology | $65 \mathrm{~nm}-\mathrm{CMOS}$ |
| Rise Time | $<57 \mathrm{ps}$ |
| Fall Time | $<57 \mathrm{ps}$ |

Table 1-4 PA specifications

| Specification | Required |
| :--- | :---: |
| Technology | $65 \mathrm{~nm}-\mathrm{CMOS}$ |
| Class | ab |
| $\boldsymbol{I I P}_{\mathbf{3}}$ | 3 dBm |
| Gain | 17 dB |
| NF | $<7 \mathrm{~dB}$ |
| Frequency <br> Measurements | 3.5 GHz |
| Operating Frequencies | $1-6 \mathrm{GHz}$ |
| Maximum Bandwidth | 150 MHz |
| Back-Off | 15 dB |
| Output Power | 9 dBm |

### 1.4 The goal of our Thesis

Our goal is to implement some of the blocks of the RF-DAC based transmitter which is suitable for low energy devices (e.g., IOT devices) and suited for 5G communications. This transmitter should be able to send with high data rates, providing high spectral purity and wideband, to be able to handle the new communication trends. We have implemented a transmitter having a moderate output power, allowing us for a higher level of integration and limiting the number of expensive off-chip components, leading to a decrease in the production cost as much as possible.

We will begin our project with a circuit-level implementation for the different existing blocks, performing the suitable simulations and handling most of the corner cases for better reliability of our product.

### 1.5 Thesis Outline

Given in Table 1-5 the structure of this work.

Table 1-5: Thesis Outline

| Chapter | Purpose |
| :--- | :--- |
| Chapter 2 | Implementation of the reconfigurable fast and slow divider |
| Chapter 3 | Implementation of the differential IQ divider for the Mixing <br> transistors |
| Chapter 4 | Power Amplifier overview and Class-AB PA design. |

## CHAPTER 2 RECONFIGURABLE FREQUENCY DIVIDERS

The first type of required dividers for the RFDAC transmitter is the programmable divider to allow a variable input carrier frequency. Two types of clocks are required to be derived from the carrier frequency, a fast and a slow clock. The fast clock is required for the sampling frequency of the DAC which is connected to the latches of the data switches for binary and unary bits. Also, it is used for oversampling in the sigma-delta block used for noise shaping. The slow clock is required for baseband sampling and clock domain conversion. The fast clock divider is implemented as a $2 / 3$ divider and the slow clock divider range is from 16 to 31 . The required specifications by the system designer are shown in table 1.

Table 2-1 Required specifications on the dividers

|  | Requirement |
| :---: | :---: |
| Frequency range | $1: 6 \mathrm{GHz}$ |
| Division ratios | $2: 3$ for fast clock |
|  | $16: 31$ for slow clock |
| Rise/fall time | $<5 \%$ of the output |
| clock |  |$|$| Max average power |
| :---: |
| For slow divider |

### 2.1 Choice of the divider logic

### 2.1.1 CML

Static CMOS registers are considered the ideal choice for power and reliable operation but they can't be used for such high frequency. The alternatives for it are CML and TSPC logic. Current mode logic is similar in operation to the differential pair amplifier where it operates with moderate swings and can drive another differential stage. Since the divider is required for the digital blocks and driving the switches of the DAC, a CML2CMOS converter stage would be required. Figure 1 shows an implementation of a CML latch. CML latches have two modes of operation, the transparent mode where the output follows the input, and the opaque mode where the output is keeping the old output using positive feedback for memory operation. Figure 2 summarizes the operation of 2 CML latches used in negative feedback to form an IQ divider in the two modes of
operation. Each mode can be activated using a switch transistor connected to the latch clock where the clock signals are flipped between the two latches to form the positive and the negative edge-triggered latch. It can operate at a much higher frequency than CMOS and TSPC due to having a constant resistance for the time constant. It is also free from input-dependent supply and ground glitches due to the constant current drawn. But it has the disadvantage of static power which yields a higher average power relative to the TSPC implementation.


Figure 2-1 CML latch implementation


Figure 2-2 CML flipflop

Another complexity occurs when implementing gates containing stacked transistors. To guarantee that all transistors operate in a saturation region where
the required input swing is relatively small for full current swing, the commonmode difference between the stacked inputs should be larger than one overdrive voltage. For the same swing for every stage, this requires a modification to shift the common-mode as shown in figure 3.


Figure 2-3 Stacking in CML logic

### 2.1.2 TSPC

TSPC is considered a dynamic implementation due to the absence of feedback which limits the minimum usable frequency by the minimum refresh rate. As shown in figure 4 , there are 2 implementations of the TSPC, the conventional and the extended version. When the $\mathrm{clk}=1$ in the negative edge triggered TSPC flip-flop in the figure, the second stage is precharged to the ground and the third stage has only the PUN available which makes the output keeps the old value. The first stage has both pull networks available to save the input. When the $\mathrm{clk}=1$, the second stage can be discharged to ground if the input was 0 like in dynamic gates, and then the output changes while the first stage has only the PUN available so it does not affect the output. The extended version is intended for high frequencies but is less reliable than TSPC due to the dependence on sizing for correct logic operation between the pmos and the nmos due to the absence of one of the two pull networks to minimize the delay. The nmos in the $2^{\text {nd }}$ stage should have a higher width than the pmos to force it to zero in the positive phase of the clock. The same for the $3^{\text {rd }}$ stage for correct low output. A higher margin will also be required to count for special corners such as fs "in the first stage" or sf " in the $2^{\text {nd }}$ and $3^{\text {rd }}$ stage " which will worsen the intrinsic delay for every stage. It also has a higher power consumption due to the short circuit
current. The TSPC is utilized for the circuit implementation as it can work at 5 GHz for the 65 nm technology with a good maximum frequency margin to avoid the power dissipation of CML implementation.


Figure 2-4 TSPC and ETSPC
The other advantage of using TSPC implementation is the ability to integrate static logic gates in the first stage as shown in figure 5 which improves density and speed. The NAND gate integrated within the flip flop is required at the input of the 2 flipflops of the 2/3 Prescaler


Figure 2-5 TSPC with logic gate included

### 2.2 Choice of the divider topology

### 2.2.1 Miller divider

Miller divider is used for frequencies higher than that capable by digital dividers which rely on settling time due to discrete-time implementation. This divider divides by 2 and consists of a mixer and a low pass filter as shown in figure 6)a). If the output is correct, the output of the mixer will have components at 3 fin/2,fin/2. The role of the LPF is to attenuate the component at 3fin/2 and allow the component at fin $/ 2$ to exist in the loop. The topology is shown in figure 6. This topology requires that the loop gain for the 3fin/2 component is much less than that of the fin $/ 2$ component.


Figure 2-6) a) Miller divider [8]
Miller dividers are usually implemented with double balanced mixer and inductive loads as shown in figure 6)b) to eliminate the headroom-gain tradeoff and the speed-gain tradeoff [8].


Figure 2-6)b) Miller divider implementation [8]

### 2.2.2 Ripple counter

A ripple counter can be used for the aid of frequency division. For example, 5 stages of ripple counter can divide from 16 to 31 . Also, the less minimum ratio can be obtained if a multiplexer is used. It consists of a chain of T flip-flops or D-FFs in negative feedback forming divide-by-2 stages. These stages would count down if they're positive edge-triggered. The implementation of it is shown in figure 7 . When it reaches the required division ratio a reset signal is produced to count from the beginning. The problem in this divider is the large combinational delay producing the reset signal that should be able to operate at the input high frequency. Hence, other digital architectures are investigated.


Figure 2-7 Ripple counter implementation

### 2.2.3 Pulse swallow divider

An implementation of the feedback divider that also has unity is the pulse swallow divider. It has three blocks as shown in figure 8 :
1."dual-modulus Prescaler"; it divides by $\mathrm{N}+1$ or N according to the input signal
2. "swallow counter"; it divides by a factor of $S$ which is variable and depends on the digital input to it. It controls the modulus of the dual modulus prescaler and it has a reset input signal
3. "program counter"; it divides by a constant value which is $P$. When it reaches the final count. It produces a reset signal to the swallow counter


Figure 2-8 Pulse Swallow divider Architecture

### 2.2.3.1 Theory of operation

In the beginning, assume the modulus signal is 1 , then the dual modulus divides by $\mathrm{N}+1$, and every $\mathrm{N}+1$ pulse it increments the swallow counter by one. When it reaches the full state, it changes the modulus signal. In the second mode, the dual modulus divides by N until the program counter reaches the final state. It then resets the swallow counter. This requires that the program counter is larger than the swallow counter

The number of cycles as seen by the input clock are $=(N+1) S+N(P-S)=N P+S$
The swallow counter can be implemented as a cascade of divide by 2 stages and the reset signal is generated by a NAND gate when it reaches the value of the digital input and gets stored by means of an RS latch as shown in figure 9. The $R$ input of the latch is connected by the reset signal coming from the program counter. The output signal from the latch also feeds the cascade divider to freeze it.


Figure 2-9 Swallow Counter implementation

The dual modulus prescaler is made using synchronous flipflops as discussed in $2 / 3$ divider and for high modulus, they can contain asynchronous divide by 2 stages as shown in figure 10 for the $8 / 9$ divider .


Figure 2-10 Divide by 8/9 circuit
Generally, the pulse swallow dividers are used for high division ratios. For our circuit, the division ratio range extends from 16 to 31 which requires that $\mathrm{N} * \mathrm{P}=16$ while s changes from 0 to 15 . This will give only one possible implementation for $\mathrm{N}=1$ and $\mathrm{P}=16$. This would require at least 10 flip-flops while the design using multi-modulus dividers would require only 8. Furthermore, The critical path in this design would be in the swallow divider as it has two and gates and one of them has 4 inputs which have a bad performance when compared to the MMD especially because the dual modulus unit has a small value and hence its output clock is high and the critical path may fail.

### 2.2.4 Multi Modulus Divider MMD

The principle of working of the MMD is based upon the $2 / 3$ Prescaler which is composed of 2 FFs and NAND gates as shown in figure 11. When the Prescaler signal "Pi" is 0 the first flip flop does not affect the output of the second one which gets inverted every clock cycle yielding a divide by 2 output as shown in figure 12. When the prescaler is 1,2 additional states are inserted and division by 3 is achieved. The MODin signal is used when cascading units and it is high in the most state divider.


Figure 2-11 2/3 divider implementation


Figure 2-12 state diagram of $2 / 3$ divider

The MMD consists of a cascade of $2 / 3$ dividers where the MODin signal to each module is a combination of the current state and the input signals determining the required ratio [4].

The division range of MMD can be expressed as [5]

$$
N=2^{0} P_{0}+2^{1} P_{1}+\cdots+2^{n-1} P_{n-1}+2^{n}
$$

Since the required division ratio is 16 to 31 , the required number of stages is 4 . The schematic of MMD used for division is shown in figure 13 the flipflop feeding the output frequency represents the most bit of the states. Since the flip flops are positive edge-triggered this cascade would countdown from 1111 to 0000 in divide by 16 mode when all prescaler bits are zero. As mentioned before, when the "out" signal of any block is 1 its next value will be zero and when its value is zero it allows the first flip flop to determine its next state whether being 0 in case of divide by 3 or 1 in case of divide by 2 . In this topology, the division by 3 at each block occurs when the states that are the most to it are all ones due to the and gates that determine the MODin signal. When a state of them divides by 3, it inserts additional cycles which are equal to $2^{n}$ where n is the number of the states that are least to it. Hence, b0,b1,b2, and b3 adds $1,2,4$, and 8 cycles respectively .


Figure 2-13 implementations of the $16 . .31$ divider [4]

### 2.3 Implementation

### 2.3.1 Critical path

The critical path in this circuit occurs during division by an odd number " b0=1 ". When the state goes from 0000 to 1111 this requires flipping of all states and after then all and gates flip their output. This critical path is shown in figure 14 [5]. If the summation of all those delays exceeds a complete period, the and gate of the least bit will produce zero instead of one at the rising edge and hence the least state will not introduce the additional pulse at count 14 " 1110 " as shown in figure 15


Figure 2-14 Critical path [5]


Figure 2-15 Switching sequence showing the critical path
The solution to this problem involves changing the critical path without changing the implementation. In the previous circuit, the signal by and gate is generated when the state is 1111 and must enter the first flip-flops before the next state 1110. Hence the maximum delay through all the flip-flops and AND gates, in this case, will be only a clock period. The problem of this as shown before is that the state before 1111 is 0000 which flips all states and the chain of and gates.

In the proposed circuit, the output clock is taken from the output of the $2^{\text {nd }}$ flipflop instead of its inversion while the and gates are driven by the inversion. In this case, q1 takes the signal at q2 $=0$ at the state 0000 which comes after

0001, and eliminates the previous problem as only the least state would change in the gates chain at this cycle. Figure 16 shows the new critical path. For the sake of generality, the signal is generated at the input of flip-flops at states $7,3,1,0$, and gets registered at the state 15 for all of them. The new switching sequence is shown in figure 17


Figure 2-16 The critical path after modification


Figure 2-17 The switching sequence after modification

In the case of negative edge-triggered flip-flops, this result can be obtained by taking the clock states and the gates driving from the inversion of the last flipflop.

The second critical path in the old connection was at the transition from 8 to 7 as shown in figure 18 where a glitched state of 15 appears before the transition of the most bit state any may get registered as happened in the figure. This is eliminated in the new connection as the inversion of states which is driving the and gates became counting upwards instead of downwards.


Figure 2-18 Second critical path in the old connection

Since the MMD has less hardware overhead and optimum speed, it is used for the implementation of the wide range divider. Figure 19 shows the schematic of the final design of the $2 / 3$ divider unit. The sizing of the transistors is based on the driving capability and number of load transistors. Delay optimization is done using fminunc of Matlab due to the complexity of the expression due to feedback where a constraint on the load and the sum of widths of transistors clocked by the IQ divider is entered. However, due to the large load capacitance on the fast divider feeding the data latches of the RFDAC, an inverter chain would be required to feed the tree for delay optimization to avoid increasing the load on the IQ divider and the VCO preceding it if the dividers would increase instead to keep the same speed. In the slow 16-31 divider, the later stages are operating at a lower frequency and the modulus signal from the $2^{\text {nd }}, 3^{\text {rd }}$, and $4^{\text {th }}$ stage must settle within a maximum time of 2,4 , and 8 input cycles respectively. They are scaled down to decrease the power while taking into consideration the load capacitance on the last stage for correct operation at the frequency bounds 1 to 6 GHz at all temperatures and process corners. The aspect ratios of the designed $2 / 3$ divider are given in table 2 . It can be seen that the aspect ratio increase in stages having stacked transistors or driving a larger number of transistors. The
smallest transistor is about 3 times above the minimum width in the process to account for interconnecting capacitance estimated to be 3-5 ff for short wires inside wires of the flipflop.


Figure 2-19 Schematic of the $2 / 3$ divider

Table 2-2 Transistors aspect ratios

| Transistor | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| W/L | 10.6 | 13.5 | 10.4 | 15 | 13.5 | 10.4 | 8 | 15.5 |


| Transistor | M9 | M10 | M11 | M12 | M13 | M14 | M15 | M16 |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| W/L | 24.7 | 8 | 15.5 | 35 | 21 | 30 | 32 | 32 |


| Transistor | M17 | M18 |
| :--- | :--- | :--- |
| W/L | 16 | 25 |

### 2.3.2 Modular approach

Another modification on the multi-modulus divider existing in the literature is the adoption of a modular cell as shown in figure 20 which redefines the MODout signal. As shown in figure 21, the MODout signal is taken from the latch output of the first flip-flop instead of the input to it and the and gate is integrated before the latch in every stage. In this case, the anding with the prescaler bit is done between the two latches. This also eliminates the previous problem of the critical path as the signal would propagate gradually between the stages at different states.


Figure 2-20 MMD modular approach [11]

(a)

Figure 2-21 divide-by-2/3 modular cell [9]
The problem with the TSPC flip-flop is that it is not consisting of two separated latches. Also, the first stage in the flipflop can only keep high value when CLK is high so it will not be able to operate correctly. This problem can be avoided by connecting the MODout to the output of the flipflop directly as shown in figure 22 and no timing problems will occur.


Figure 2-22 DFF only implementation of the modular MMD

### 2.3.3 Comparison of the implementations of the modular MMD

To compare the two implementations, we should consider the least state bit operating at the maximum frequency. Without loss of generality, this would be applied on a two-bit divider and the critical path would be the same for any nbit divider. As shown in the timing diagram in figure 23, during one input clock cycle the two-state bits will flip together, and then MODout signal changes and would be registered in the next clock edge after passing by an and gate. While in the latch implementation, the MODout signal changes on one input cycle and then passes by the gate and got registered on the other input cycle as shown in figure 25. The two critical paths are drawn in Figures 24 and 26. Due to this restriction, the final design is based on the classical MMD which yields the optimal critical path.


Figure 2-23 Timing diagram of the FF only implementation


Figure 2-24 critical path of the FF only implementation


Figure 2-25 Critical path of the FF only implementation


Figure 2-26 Critical path of the latch implementation

### 2.3.4 Extension of the lower division range

The division range of the previous divider is from 16 to 31 which was required for the RFDAC. An investigation of the extension of the division range of the divider has been done. The modification existing in the literature entails the extension of the lower boundary to become 8 . This can be done by using an or gate with MODout of the most state to make it always 1 when division by 8 . In this case, the output will be taken from the $2^{\text {nd }}$ most state using a multiplexer and hence it will be completely like a 3 stage MMD. However, in some applications like sigma-delta modulators, the prescaler bits change instantaneously before the rising edge of the clock after the last count which is 0000. When no extension on the lower boundary is used, the new division ratio will be correct. When the extension is used and the ratio changes from a value less than 16 to a value larger than it, the output will switch to the most state which has an unknown state. This could introduce an intermediate wrong division ratio. This has been explained in an example of changing the division ratio from 9 to 16 as shown in figure 27.


Figure 2-27 Temporary division error

### 2.3.4.1 Seamless switching

The solution to this problem is to ensure that the most state has the same value of the other states at the end of the division cycle which is low [9]. This is achieved by adding an asynchronous reset to the output of the $2^{\text {nd }}$ flipflop of the most state connected to the inversion of the most prescaler bit. It will be only added in the pulldown network to avoid additional delay overhead at the cost of a small short circuit power occurring only at the cycle when the division changes. As shown in figure 28 , when this reset is added, seamless switching has been achieved.


Figure 2-28 Timing diagram after seamless switching

### 2.3.5 Extension of the upper division range

The modification added in this work is the extension on the upper division range. Since the three stages can divide by 2 or 3 , the maximum division ratio can reach 81. The full range is not used completely due to the dependence on the added cycles of each stage on the other stage and the logic complexity for utilizing many cycles. The number of independently added cycles are the cycles in which only one stage is dividing by three and the others are dividing by two. A simple analysis can find that the number of independent cycles is 32 . The conventional design uses only 15 of them. The modification uses the other 17 cycles independently which makes the division range from 16 to 31 and 33 to 48 . As shown in the modified $2 / 3$ divider in figure 29 , the added signals are called MOD2in and P4. This signal guarantees that the most states to it are not dividing by 3 to detect those independent 17 cycles. This signal moves gradually between the stages using the first flipflop to avoid long critical paths while the MOD1in signal used for least prescaler bits are implemented using combinational logic directly like that in the classical MMD since this signal can have a short critical as explained before by choosing the appropriate signal for clocking the stages according to whether they are rising or falling edge flip-flops .

The modified equation for divider would be

$$
\mathrm{N}=\mathrm{P}_{0}+2 \mathrm{P}_{1}+4 \mathrm{P}_{2}+8 \mathrm{P}_{3}+17 \mathrm{P}_{4}+16
$$

For general n stages, a simple analysis can find that the number of these added cycles would be $1+(N-2) * 2^{N-1}$.


Figure 2-29 Proposed 2/3 divider for extended MMD range

To find the critical path in the new design, we should analyze all the possible states. When CKout goes high, the most states to it change at this edge and MOD1in change. The upper flipflop path would be through 1 NAND gate and independent of the other path. The lower flipflop path would be 2 or 1 NAND gates according to MOD2in being 1 or 0 . Notice also that the output of the multiplexer would start changing since all the signals would be the same until the next positive edge of CKout .

When CKout goes from high to low. The worst path to the upper flip-flop would be through the 2 NAND gates which is the case when MOD2in changes from 0 to 1 . The path to the lower flip-flop would be through the two NAND gates since the input to the upper flip-flop shouldn't change as the inputs to the NAND gate before it is changing before from 01 to 10 with nearly the same slew rate. If the glitches should be avoided, the TSPC version having CK in the pull-down network as shown previously in figure 4 can be used which will block any changes until the NAND before it reaches zero. When CKout remains 0 when the stage is dividing by 3 , the path to the upper flip-flop would be through 2 NAND gates while the lower flip-flop transition is not important as it will not affect the next CKout. In conclusion, the critical path would be either a 2 NAND gates as the conventional design or a multiplexer with an AND gate producing MOD1in.

### 2.4 Results

### 2.4.1 Transient Simulation

Figure 30 shows the transient simulation for the 4 stage MMD when the input prescaler is 31 and the input is 5 GHz


Figure 2-30 Transient simulations of the 4-Stage MMD

### 2.4.2 Frequency range

The divider is implemented without extension on division ranges to ensure optimum operation at the required ratios for the RFDAC. Figure 31 shows the ratio between the input and output frequency when the divisor is 31 against the input frequency. This ratio yields the worst delay for all paths in the feedback. The circuit operates correctly between 285 MHZ and 18.1 GHz .


Figure 2-31 Output frequency ratio vs Fin when input prescaler is 31

### 2.4.3 Power dissipation

The circuit dissipates 1.56 mW at an input frequency of 5 GHZ . The frequency dependence is linear as expected in figure 32.


Figure 2-32 Average power vs Input frequency

### 2.4.4 Jitter

In asynchronous frequency dividers and assuming ideal signal source, each stage has additive noise to the signal. This noise will cause deviation of the point where the signal crosses a certain threshold. A method of calculation of the variance of this jitter from a large number of samples is called cycle to cycle jitter as shown in the next equation.

$$
\sigma_{\tau}^{c-c}=\lim _{N \rightarrow \infty} \sqrt{\frac{1}{N} \sum_{n=1}^{N}\left(T_{n+1}-T_{n}\right)^{2}}
$$

The jitter can be found in terms of variance of the noise rms and the slope of the output voltage as

$$
\sigma_{t_{0}}^{2}=\frac{\sigma_{V}^{2}}{(\mathrm{SL})^{2}}
$$

The output signal is sampled at equal periods from a reference transition. The frequency-domain "PSD" of this signal represents the fluctuation of the output from ideal where the dc is the mean of the signal which is the threshold. The instantaneous jitter can be get by dividing this signal by the slope producing the jitter PSD as shown in figure 33 where the rms value of the jitter is the integration of this PSD which is found to be 161.4 fs . In the future work, a synchronizer flipflop at the output clocked by the input frequency can be used to minimize the jitter where the delay between and output should be less an input cycle


Figure 2-33 Jitter PSD vs offset frequency

### 2.4.5 Corners

Simulation of the division ratio and rise/fall time of the circuit against different corners is done to ensure reliable operation. The simulated corners are the temperature at -40 and 125, process variations tt ,ff, ss, fs, and sf, and supply variation of $-10 \%$ and $10 \%$. The simulation results are shown in Figures 34,35 and 36. The nominal rise time and fall time were 26.3 ps and 22.4 ps respectively and haven't exceeded 38 ps and 31 ps at worst corners which are less than the required specification of $5 \%$ of the output frequency.


Figure 2-34 Division ratio at different corners


Figure 2-35 Rise time at different conrners


Figure 2-36 Fall time at different corners

### 2.4.6 Summary of the achieved specifications

Table 3 summarizes the required specifications and the achieved from the implemented design. Table 4 shows a comparison between the 4 -stage divider and similar dividers from the literature.

Table 2-3 Summary of the required and achieved specifications

|  | Requirement | Achieved |
| :---: | :---: | :---: |
| Frequency range | $1: 6 \mathrm{GHZ}$ | $0.2: 17 \mathrm{GHZ}$ <br> $0.4: 14 \mathrm{GHZ}$ across <br> corners |
| Division ratios | $2: 3$ for fast clock <br> $16: 31$ for slow clock | Achieved |
| Rise/fall time | $<5 \%$ |  | | Fast clock: |
| :---: |
| 6 p to 10 p across |
| corners (2 to $2.8 \%)$ |
| 17 p to 40 p for slow |
| clock across corners |
| $(<1 \%)$ |

Table 2-4 Comparison of the divider with other dividers from the literature

| Ref.[] | Max. <br> Freq. <br> (GHz) | Power <br> $(\mathrm{mW})$ | Divider ratios | Technology <br> $(\mathrm{nm})$ | Logic |
| :--- | :--- | :--- | :--- | :--- | :--- |
| $[5]$ | 19 | 39.8 | $16 \ldots 31$ | 65 nm | CML |
| $[6]$ | 12 | 28.1 | $256,260,264,268$ | 180 nm | CML+TSPC |
| $[7]$ | 5.8 | 2.2 | $32,33,47,48$ | 180 nm | TSPC |
| This <br> Work | 18.1 | 1.56 | $16 \ldots 31$ | 65 nm | TSPC |

## CHAPTER 3 : I/Q DIVIDERS

### 3.1 Introduction:

LO-divider is used in many applications such as PLL. It's a circuit that takes a frequency of a signal as an input and outputs a signal of frequency $f_{\text {out }}=$ $\frac{f_{\text {in }}}{n}$.

LO-dividers can be in digital or analog, we will discuss them briefly:

### 3.1.1 Analog Dividers:

They are not common and used only at very high frequencies.

## Examples:

## Regenerative frequency dividers:

In this architecture, the input is mixed with the feedback signal Which produces the sum and difference frequencies and then a LPF to remove high frequencies and then an amplifier to amplify the signal and then output (feedback signal) is mixed with the input.

### 3.1.2 Digital Dividers:

They are more common to be used in modern ICs and can work up to tens of GHZ.

Examples:

### 3.1.2.1 Binary Counter:

It's a circuit that generates a binary sequence of pulses, the circuit contains of a series of flip-flops that is used for a power of 2 integer division

### 3.1.2.2 Johnson Counter:

It's a type of shift register network such that the last complemented output is connected to the first register input.

The output is derived from one or more of the register outputs.

### 3.2 Methodology:

In this work, we receive from antenna a sinewave of peak 250 mv and of frequency 10 GHZ and thus we need to amplify it and change it to square-wave that ranges from 0 to 1.2 v . This is done using self-biased LO buffer to convert the sinewave into $C L K$ and $\overline{C L K}$.

Then $\overline{C L K}$ and $C L K$ will be used as the divider input(2:1) and the divider(2:1) will divide the input signal frequency by 2 and thus we will have four signals each of $90^{\circ}$ phase difference and frequency $5 \mathrm{GHZ}(200 \mathrm{ps})$.

The divider $(2: 1)$ consists of two cross-coupled latches connected in a negative feedback loop (a period of time is needed for the signal to be built).

### 3.3 Specification:

The divider needs to be very fast for high frequency (10 GHZ), consumes low power.

Supply voltage is 1.2 v , the minimum signal swing is 500 mv peak-peak and the phase shift between the signals must be $90^{\circ}$.

The rise-time and fall-time must be less than $5 \%$ of the total period in typical case however in case of corners rise-time and fall-time must be less than $10 \%$ of the total period. Duty cycle must be around $50 \%$. The load used for the tree network is 15 ff and an inverter of PMOS 8um and NMOS 6um. CLKN and CLKP must have no skew between them.

### 3.4 Self-biased LO Buffer:

This circuit is used to convert a rail-rail sinewave to a sine-wave suitable for CMOS. The capacitor operates to remove DC offsets. We need to choose R such that the common mode output is at half the supply voltage to achieve $50 \%$ duty-cycle. The Self-biased buffer amplifies the signal to make it close to supply voltage or zero voltage and chain of inverters is used to change it to square-wave.

Note: The values used are not the same sizing values used in our circuit.


Figure 3-1 Self-biased LO Buffer[4]

### 3.5 I/Q Dividers literature survey(Divide-by-2):

Here we will explain different divider topologies:

### 3.5.1 Razavi Divider:

The Razavi divider consists of two latches that acts as master and slave connected together where its complemented output connected to its input forming a Johnson counter as shown in fig 3-2.


Figure 3-2 Razavi divider
To understand the operation of Razavi divider, we will explain two cases:

- When the input clock is high


## Left latch:

PMOS transistors are switched on and since one of the input NMOS devices are switched on, thus one of the outputs are pulled down to ground, however since static current exists, thus the low logic level is degraded

For example: As shown in fig3-3, the input clock is high thus the PMOS transistors are switched on and since $\phi_{2}$ is high and $\phi_{4}$ is low thus voltage output $\phi_{3}$ rises to supply voltage but $\phi_{1}$ maintains its state however it is degraded due to static current.

## Right latch:

PMOS transistors are Switched off, thus both output voltages are low either by maintaining the previous state or by being discharged by one of the outside devices.

For example: from figure 3-3, it's shown that the input clock is high, thus $\phi_{2}$ is discharged by means of $\phi_{3}$ being high while $\phi_{4}$ maintained its previous state.


Figure 3-3 Razavi divider in case that the input clock is high

## - When the input clock is low

## Left latch:

PMOS transistors are switched off, thus both of output voltages are either maintained at a low state from a previous state or being discharged by the help of one of the outside NMOS devices.
For example: From fig.3-4, it's shown that the input clock is low, thus PMOS transistors are switched off and so $\phi_{3}$ is discharged through the outside NMOS $\phi_{4}$ while $\phi_{1}$ maintains its value from the previous state.

## Right latch:

PMOS transistors are switched on and since one of the input NMOS devices are switched on, thus one of the outputs are pulled down to ground, however since static current exists, thus the low logic level is degraded and the other output rises to supply voltage.

For example: As shown in fig3-4, the input clock is low thus the PMOS transistors are switched on and since $\phi_{3}$ is high and $\phi_{1}$ is low thus voltage output $\phi_{4}$ rises to supply voltage but $\phi_{1}$ maintains its state however it is degraded due to static current.


Figure 3-4 Razavi divider in case that the input clock is low

### 3.5.2 Wang Divider:

The Wang divider consists of two latches that acts as master and slave connected together where its complemented output connected to its input forming a Johnson counter.


Figure 3-5 Wang Divider
Wang divider is considered as a modification to Razavi divider as the Outside NMOS devices still operates when PMOS devices are switched off, thus we added an NMOS device to each latch to the source of the input NMOS devices to disable them when the PMOS devices are switched off.

To understand the operation of Wang divider such as we did in case of Razavi divider, we will explain two cases:

- When the input clock is high


## Left latch

PMOS transistors are switched on and clock NMOS devices will be switched on thus the outside NMOS devices sources will be short-circuited to ground, thus one of the voltage output will be charged and the other will be discharged
according to the voltage at the gate of the outside input NMOS devices.
For example: As shown in fig 3-6, PMOS are switched on and NMOS clock transistors are switched on, thus $\phi_{3}$ will be charged since it's connected to VDD while $\phi_{1}$ will be discharged since the voltage on the gate of the input NMOS device $\phi_{2}$ is high.

## Right latch

Since the input clock is high, thus PMOS transistors are switched off and NMOS clock transistors are switched off thus outside input NMOS devices sources are open-circuited, thus the outputs signals maintains their value due to the crosscoupled pair.
For example: As shown in fig 3-6, since input clock is high, thus PMOS transistors are switched off and also NMOS clock transistor are switched off, thus the output signals maintain their state.


Figure 3-6 Wang divider in case that the input clock is high

## - When the input clock is low

## Left latch

Since the input clock is high, thus PMOS transistors are switched off and NMOS clock transistors are switched off thus outside input NMOS devices sources are open-circuited, thus the outputs signals maintain their state due to the crosscoupled pair.

For example: As shown in fig 3-7, since input clock is low, thus PMOS transistors are switched off and also NMOS clock transistors are switched off, thus the output signals maintain their state.

## Right latch

PMOS transistors are switched on and NMOS clock transistors will be switched on thus the outside NMOS devices sources will be short-circuited to ground, thus one of the voltage output will be charged and the other will be discharged according to the voltage at the gate of the outside input NMOS devices.
For example: As shown in fig 3-7, PMOS are switched on and NMOS clock transistors are switched on, thus $\phi 4$ will be charged since it's connected to VDD while $\phi_{2}$ will be discharged since the voltage on the gate of the input NMOS device $\phi з$ is high.


Figure 3-7 Wang divider in case that the input clock is low
Since the right latch maintains its state while the left latch changes its output when the clock input is high in the first case while in the second case, the left latch maintains its state while the right latch changes its output, thus causing phase shift between the signals that is equal to $90^{\circ}$, also the period becomes double that of the clock thus the frequency is divided by 2 as shown in fig 3-8.


Figure 3-8 The output signals of Wang Divider

Now, we will talk about this work I/Q Divider and its operation.

### 3.5.3 I/Q Divider:

This latch consists of two cascaded parts connected together as shown in fig. 1

### 3.5.3.1 Left part:

The latch consists of two inverters connected to $D$ and $\bar{D}$ and their drain and source connected with two transistors with signals CLK and $\overline{C L K}$

When CLK is high, PMOS(where its gate is connected to $\overline{C L K}$ signal) and NMOS (where its gate is connected to $C L K$ ) are switched on, Thus the circuit becomes an inverter that inverts the input data.

When CLK is low, Clock PMOS( transistor connected to $\overline{C L K}$ ) and Clock NMOS ( transistor connected to $C L K$ ) are both switched off. Thus the crosscoupled latch should store the value of $\bar{Q}$ and $Q$ which will be explained in the next section.

### 3.5.3.2 Right Part (Cross-Coupled latch):

The Cross-coupled latch stores the values of $\bar{Q}$ and $Q$ through forcing the NMOS and PMOS they are connected to (through cross-coupling) to have the opposite of their values .

## For example:

Assume that $\bar{Q}=1, Q=0$, Thus $\bar{Q}$ connected to the gate of NMOS and PMOS in the cross-coupled latch will cause PMOS to be switched off and NMOS will be switched on, Thus $Q$ will be kept low since it's connected to GND, while $Q$ connected to the gate of NMOS and PMOS in the cross-coupled latch will switch off NMOS and switch on PMOS thus $\bar{Q}$ will be kept high since it is supplied by VDD.

The cross-coupled latch can write values into $\bar{Q}$ and $Q$ instead of storing the values and that happens when the left part of the latch is weaker than the right part(cross-coupled latch) meaning that width of the transistors connected to $\bar{D}$ and $D$ must be higher than the width of the transistors of the cross-coupled latch, otherwise the output will be rewritten by the crosscoupled latch shown in fig3-9.


Figure 3-9 The output of the $\mathrm{FF}(\mathrm{IP})$ if width of the cross-coupled latch is greater than the width of the input data transistors


Figure 3-10 Latch of the LO divider[4]

Note: All numbers used in fig 3-10 are not the same used in our design


Figure 3-11 I/Q divider[4]

Now we will form a Flip-Flop if we connected the two latches together as shown in fig 3-11. This connection acts as a flip-flop since we operate
it such as when $\overline{C L K}$ is high, the transistors connected to $\overline{C L K}$ and $C L K$ in the left latch will both be switched on and the circuit will become inverter while in the right latch, both transistors connected to $\overline{C L K}$ and CLK will be switched off, thus only values of LO_QP and LO_QN will be stored .

Now in the second half of the cycle where $\overline{C L K}$ is low, the left latch won't work, it will store the values of LO_IN and LO_IP while in the right latch , it will act as an inverter since the two transistors connected to $\overline{C L K}$ and CLK are switched on thus LO_QP,LO_QN,LO_IN and LO_IP will have a phase shift of $90^{\circ}$. Also being stored (not changed) in half period of the cycle in case of LO_QP, LO_QN, LO_IP and LO_IN will cause the period to be doubled, thus frequency will be divided and that's the required.

### 3.6 I/Q dividers comparison

Here we will compare between Wang divider, Razavi divider and this work's divider

Table 3-1 A table that illustates the properties of Wang divider, Razavi divider and this work divider

|  | Wang divider | Razavi divider | This <br> work divider |
| :--- | :--- | :--- | :--- |
| Static current | yes | yes | no |
| Area | considerate | small | big |
| speed | Very fast | faster | fast |
| Duty-cycle | $50 \%$ | $25 \%$ | $50 \%$ |

Here we choose I/Q divider since it has no static current, fast enough for our application and provides $50 \%$ duty-cycle.

### 3.7 Simulations:

### 3.7.1 Self-biased LO Buffer:

When producing $C L K$ and $\overline{C L K}$, we need them to be inversion of each other, thus we need to remove the skew between them.

To remove the skew, we get from the base two sine-waves, one of them is inversion of the other with peak 250 mv thus by converting each of them into square-wave, we will have two square-waves that have exactly phase shift of $180^{\circ}$, thus the skew is removed.

We will start by the simulation of the Self-biased LO buffer which converts sinewave into square-wave as explained before.

Schematics are shown in figure 3-12.


Figure 3-12 Schematics of self-biased LO Buffer with transistor sizing

The self-biased Buffer here inverted and amplified the input signal.
The capacitor is used to remove dc offset.
It must be noted that R must be of a value such that common mode output will be half that of the supply thus ensuring a duty cycle of $50 \%$.

We chose $R=100 k \Omega$ as shown in figure 3-12.


Figure 3-13 The self-biased Buffer output vs time

To convert it to square-wave, we introduce a chain of inverters as shown in fig3-14, we needed two inverters to be able to get a square-wave with low level logic 0 and high logic level 1.2.


Figure 3-14 Schematics of chain of inverters that comes after self-biased LO buffer

Thus the output of the first inverter will be shown in fig3-15.


Figure 3-15 The output of first inverter vs time

As shown in fig.3-15, due to the chain of inverters, the amplified sinewave(suitable for CMOS) in fig.3-13 becomes square-wave.

However, we needed another inverter to account for corners and mismatches.


Figure 3-16 CLKN and CLKP vs time

As shown in fig 3-16, it is shown that the common-mode output is around 600 mv and thus being able to achieve $50 \%$ duty-cycle and this is demonstrated in fig. 19 (dutycycle $=\frac{818-768}{100} * 100 \%=50 \%$ ) which is important since we are going to synchronize the divider with it thus if the input clock duty-cycle is not $50 \%$ thus the output will fail to achieve $50 \%$ duty-cycle.

### 3.7.2 I/Q Divider:

The schematics of I/Q latch is shown in fig 3-17.


Figure 3-17 Schematics of I/Q latch with each transistor sizing
Now as we explained before we need the width of the cross-coupled latch to be smaller than the width of the transistors where their gates are connected to $D$ and $\bar{D}$ to ensure that cross-coupled latch will write on the output of the latch.

As shown in fig 3-17, we estimated parasitic caps due to routing in our design to make an approximate model for layout in our schematics.

Here we choose the width of clock transistors to be higher than that of the input transistors since the current in the input transistors is half than that of clock transistors thus the width of input transistors is half than that of clock transistors, in addition to the parasitic caps that can degrade the performance if we have very high width.

Now to form a flip-flop, two I/Q latches must be incorporated in a negative feedback loop as shown in fig 3-18


Figure 3-18 Schematics of I/Q Divider
plotting the transient response of the outputs of the divider as shown in fig 3-19.


Figure 3-19 CLKN and output of the divider vs time

As shown in fig 3-19, output of left latch LO_IP responds and left latch acts as an inverter when CLKN is high and maintains its state when CLKN
is low while output of Right latch DN responds and right latch acts as an inverter when CLKN is low

Also from observing fig3-19, Duty cycle $=\frac{682.6-585.3}{200} \%=48.6 \%$ and Risetime $(10 \%$ to $90 \%)=\frac{995.5-973.9}{200} * 100 \%=10.8 \%$ of total period while falltime $(10 \%$ to $90 \%)=\frac{(1.292-1.272) * 10^{3}}{200} \%=10 \%$ of total period .

Thus duty cycle, rise-time and fall-time needs to be improved thus we will construct a chain of inverters as shown in fig 3-20


Figure 3-20 schematics of Chain of inverters for the divider outputs with transistor sizing
As shown in fig 3-20, we used this chain of inverters to improve the risetime, fall-time and duty cycle of divider output. We estimated parasitic capacitors to model the routing in the layout.

Now since a tree network must be designed to get equal delay to all RFDAC cells, we need to model the tree as a capacitor of 15 ff and an inverter of PMOS of width 8um and NMOS of width 6 um and 4 ff cap for routing effects as shown in fig 3-21.


Figure 3-21 Schematic of modelling of tree network with transistor sizing
Now we will incorporate the tree network with the I/Q divider after the chain of inverters as shown in fig 3-22.


Figure 3-22 The final schematics of I/Q divider


Figure 3-23 The final output of I/Q divider
Through using chain of inverters, rise-time and fall-time decreases and dutycycle is improved. This is demonstrated in Figure 3-23 as dutycycle $=\frac{833.2-733.8}{200}$ * $100 \%=49.7 \%$, Risetime $=\frac{589-580.9}{200} * 100 \%=4 \%<5 \%$ and falltime $=\frac{737.2-730.6}{200} *$ $100 \%=3.3 \%<5 \%$ and thus achieving the specifications required.

### 3.8 Corner simulation

We will run corners which are variations in temperature, supply and frequency(originally 10GHZ).

We will make variation in temperature: $-40,125^{\circ} \mathrm{c}$ and variation in supply: $1.08 \mathrm{v}, 1.32 \mathrm{v}$ and variation in frequency: $8 \mathrm{GHZ}-11 \mathrm{GHZ}$.

The Corners we will apply the variations are: FF, FS, SF and SS, thus we will plot against these four corners with variations as follows:

- I/Q divider of supply 1.08 v , operates at 11 GHZ at temperature $-40^{\circ} \mathrm{C}$


Figure 3-24 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $-40^{\circ} \mathrm{C}$ and frequency 11 GHZ

Table 3-2 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $-40^{\circ} \mathrm{C}$ and frequency 11 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $3.949 \%$ <br> total period | $4.14 \%$ of total <br> period | $4.051 \%$ of <br> total period | $4.449 \%$ <br> total period |
| Falltime | $3.166 \%$ <br> total period | $3.143 \%$ <br> total period | $3.236 \%$ of <br> total period | $3.263 \%$ of <br> total period |

- I/Q divider of supply 1.08 v , operates at 11 GHZ at temperature $125^{\circ} \mathrm{C}$


Figure 3-25 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 11 GHZ

Table 3-3 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 11 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $5.548 \%$ of <br> total period | $5.609 \%$ of total <br> period | $5.674 \%$ of <br> total period | $6.02 \%$ of <br> total period |
| Falltime | $4.796 \%$ of <br> total period | $4.609 \%$ of total <br> period | $4.949 \%$ of <br> total period | $4.976 \%$ of <br> total period |

- I/Q divider of supply 1.32 v , operates at 11 GHZ at temperature $-40^{\circ} \mathrm{C}$


Figure 3-26 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 11 GHZ

Table 3-4 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $-40^{\circ} \mathrm{c}$ and frequency 11 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $4 \%$ of total <br> period | $3.656 \%$ of total <br> period | $3.645 \%$ of <br> total period | $3.804 \%$ of <br> total period |
| Falltime | $3.059 \%$ of <br> total period | $2.937 \%$ of total <br> period | $3 \%$ of total <br> period | $2.957 \%$ of <br> total period |

- I/Q divider of supply 1.32 v , operates at 11 GHZ at temperature $125^{\circ} \mathrm{C}$


Figure 3-27 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{C}$ and frequency 11 GHZ

Table 3-5 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{c}$ and frequency 11 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $5.278 \%$ of <br> total period | $5.103 \%$ of <br> total period | $5.189 \%$ of <br> total period | $5.281 \%$ of <br> total period |
| Falltime | $4.754 \%$ of <br> total period | $4.325 \%$ of <br> total period | $4.598 \%$ of <br> total period | $4.443 \%$ of <br> total period |

- I/Q divider of supply 1.08 v , operates at 8 GHZ at temperature $125^{\circ} \mathrm{C}$


Figure 3-28 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 8 GHZ

Table 3-6 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $125^{\circ} \mathrm{c}$ and frequency 8 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $4 \%$ of total <br> period | $4.0788 \%$ of <br> total period | $4.126 \%$ of <br> total period | $4.379 \%$ of <br> total period |
| Falltime | $3.488 \%$ of <br> total period | $3.35 \%$ of total <br> period | $3.598 \%$ of <br> total period | $3.619 \%$ of <br> total period |

- I/Q divider of supply 1.08 v , operates at 8 GHZ at temperature $-40^{\circ} \mathrm{C}$


Figure 3-29 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $-40^{\circ} \mathrm{C}$ and frequency 8 GHZ

Table 3-7 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.08 v , temperature $-40^{\circ} \mathrm{C}$ and frequency 8 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $2.83 \%$ of total <br> period | $3.011 \%$ of total <br> period | $2.949 \%$ of <br> total period | $3.157 \%$ <br> total period |
| Falltime | $2.303 \%$ of total <br> period | $2.286 \%$ of total <br> period | $2.357 \%$ of <br> total period | $2.363 \%$ of <br> total period |

- I/Q divider of supply 1.32 v , operates at 8 GHZ at temperature $-40^{\circ} \mathrm{C}$


Figure 3-30 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $-40^{\circ} \mathrm{C}$ and frequency 8 GHZ

Table 3-8 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $-40^{\circ} \mathrm{C}$ and frequency 8 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $2.6 \%$ of total <br> period | $2.653 \%$ <br> total period | $2.649 \%$ of <br> total period | $2.768 \%$ of <br> total period |
| Falltime | $2.225 \%$ of total <br> period | $2.138 \%$ of <br> total period | $2.176 \%$ of <br> total period | $2.15 \%$ of <br> total period |

- I/Q divider of supply 1.32 v , operates at 8 GHZ at temperature $125^{\circ} \mathrm{c}$


Figure 3-31 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{c}$ and frequency 8 GHZ

Table 3-9 phase-shift, rise-time, fall-time and duty-cycle vs corners at a divider of supply 1.32 v , temperature $125^{\circ} \mathrm{C}$ and frequency 8 GHZ

|  | FF | FS | SF | SS |
| :--- | :--- | :--- | :--- | :--- |
| phaseshift | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ | $\approx 90 \mathrm{deg}$ |
| dutycycle | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ | $\approx 50 \%$ |
| Risetime | $3.841 \%$ of <br> total period | $3.669 \%$ of <br> total period | $3.768 \%$ of <br> total period | $3.844 \%$ of <br> total period |
| Falltime | $3.458 \%$ of <br> total period | $3.145 \%$ of <br> total period | $3.344 \%$ of <br> total period | $3.234 \%$ total period <br> tot |

From the previous results on the corners, we deduce that the worst corner is SS with temperature $125^{\circ} \mathrm{c}$ and supply voltage 1.08 v as the rise-time and fall-time is higher than the rest of the cases.

### 3.9 Summary of the results:

After finishing the design procedure and simulating it in the ideal case and running corners (we will use the results that we got from the worst corner SS with temperature $125^{\circ} \mathrm{c}$ and supply voltage 1.08 v ), thus we will summarize the results in the next table.

Table 3-10 A Table that illustrates the specifications achieved and required

| Specification | Required | Achieved | Achieved in the <br> worst corner |
| :--- | :--- | :--- | :--- |
| Technology |  |  |  |
| Frequency <br> produced | 5 GHZ | 5 GHZ | 5 GHZ |
| Phase-shift | 90 deg | 90 deg | 90 deg |
| Duty-cycle | $50 \%$ | $50 \%$ | $50 \%$ |
| Rise-time | <5\% in ideal <br> And <10\% in <br> worst corner | $4 \%$ | $6 \%$ |
| fall-time | <5\% in ideal <br> And <10\% in <br> worst corner | $3.3 \%$ | $4.976 \%$ |

## CHAPTER 4 : POWER AMPLIFIER

### 4.1 Introduction

Power amplifiers (PAs) are the most power-hungry building block of RF transceivers and pose difficult design challenges. In the past ten years, the design of PAs has evolved considerably, drawing upon relatively complex transmitter architectures to improve the trade-off between linearity and efficiency [8]. There are general considerations when we design the PAs. The general Polar RF-Mixing-DAC transmitter is shown in Fig. 4-1.


Figure 4-1: Simplified block diagram of the proposed RF-DAC transmitter, based on an I/Q vector modulator.

### 4.2 General considerations

There are many factors that affect the design procedure such as the effect of high currents, efficiency, linearity, and single ended and differential PAs. Most of them are taken from [8].

### 4.2.1 Effect of High Currents

One of the challenges in the design of power amplifiers and packages is the large currents passing through the output device and the matching network. If the output transistor is chosen to carry a large current, the input capacitance will be very large, making the preceding stage difficult to design. We may solve this problem by interposing a number of tapered stages between the upconversion mixer(s) and the output stage, as shown in Fig. 4-2.


Figure 4-2: Tapering in a TX chain.

### 4.2.2 Efficiency

Since PAs are the most power-hungry block in RF transceivers, their efficiency is critical. The efficiency of the PAs is defined by two metrics. The drain efficiency which is defined as [eq. 4.1]:

$$
\begin{equation*}
\eta=\frac{P_{L}}{P_{s u p p}} \tag{4.1}
\end{equation*}
$$

where $P_{L}$ denotes the average power delivered to the load and $P_{\text {supp }}$ the average power drawn from the supply voltage. In some cases, the output
stage may have a relatively low power gain, e.g., 3 dB , requiring a high input power. A quantity embodying this effect is the "power-added efficiency" (PAE), defined as [eq. 4.2]:

$$
\begin{equation*}
\text { PAE }=\frac{P_{L}-P_{\text {in }}}{P_{\text {supp }}} \tag{4.2}
\end{equation*}
$$

So $\mathrm{PAE}=\eta \times\left(1-\frac{1}{\mathrm{G}}\right)$, if G is large then $\mathrm{PAE}=$ Drain efficiency.

### 4.2.3 Linearity

The linearity of PAs becomes critical for some modulation schemes. Nonlinearity leads to two effects: (1) High adjacent channel power as result of spectral regrowth and (2) Amplitude compression. The PA characterization begins with two generic tests of nonlinearity based on unmodulated tones, intermodulation and compression, by applying two sufficiently large tones, the amplitude of the tones is chosen such that each main component at the output is 6 dB below the full power level thus, producing the max desired output voltage swing when added in phase [Fig. 4-3(a)]. For compression, a single tone is applied and its amplitude gradually increases to determine the output 1dB compression point [Fig. 4-3(b)].


Figure 4-3: PA characterization by (a) two-tone test, (b) compression.

### 4.2.4 Single-Ended and Differential PAs

Because single-ended RF circuits are easier to test, most standalone PAs are constructed as a cascade of single-ended stages [Fig. 4-4(a)]. Single-ended PAs, on the other hand, have two disadvantages. To begin with, they squander half of the transmitter. Because they only detect one upconverter output, they benefit. This issue can be solved. The problem can be rectified by connecting the upconverter and the PA with a balun [Fig. 4-4(b)], although this isn't always the best solution. Losses are introduced.


Figure 4-4: Upconverter/PA interface with (a) single-ended or, (b) balun connection.

The second disadvantage is that they draw a lot of transient current from the supply to the ground. The inductance of the supply wire If the output impedance of the network is comparable to Ld, LB1 changes it, As shown in Fig. 4-5(a). Some output is permitted by LB1. Through the Vdd line, the signal might go back to previous stages, generating ripples. in terms of frequency Similarly, ground wire inductance LB2 causes the output to degenerate. as well as introducing feedback.


Figure 4-5: (a) feedback in single ended PA, (b) less problematic situation in a differential PA.

By contrast, the differential realization eases the two issues. This topology draws a much smaller transient current from Vdd and ground lines exhibiting less sensitivity to LB1 and LB2. A balun must now be inserted between the PA and antenna [Fig. 4-5(b)].

While using a differential PA alleviates both the voltage gain and package parasitic concerns, in most situations the PA must still drive a single-ended antenna. As a result, a balun must be placed between the PA and the antenna. See Fig. 4-6.


Figure 4-6: Use of a balun between the PA and antenna.

### 4.3 Classification of Power Amplifiers

It is classified to Classical and Switch mode PAs.

### 4.3.1 Classical PAs

Classical PAs is that both the input and the output waveforms are considered sinusoidal. The difference between them is by the conduction angle, which is defined as the percentage of the signal period during which the transistor remains on multiplied by $360^{\circ}$, See Fig. 4-4. So it's topologies is A, B and C.

| PA Class | Conduction Angle ( $\boldsymbol{\theta}$ ) |
| :---: | :---: |
| $\mathbf{A}$ | $2 \pi$ |
| $\mathbf{A B}$ | $\pi<2 \pi$ |
| $\mathbf{B}$ | $\pi$ |
| $\mathbf{C}$ | $0<\pi$ |

Figure 4-7 : The conduction angle of the classical PAs.

### 4.3.1.1 Class A PAs

Class A is defined as a circuit in which transistors remain on and operate linearly across the full input and output range [Fig. 4-8]. Transistor bias current is chosen higher than peak signal current to ensure that the device doesn't turn off at any point. If linearity is required, then class A operation is necessary.


Figure 4-8: Class A stage.

We want Vx to become 2Vdd and virtually zero to get maximum efficiency. The amount of power delivered is roughly equivalent to:

$$
\begin{equation*}
P_{\text {load }}=\frac{v_{d d}^{2}}{2 R_{\text {in }}} \tag{4.3}
\end{equation*}
$$

Vdd/Rin is the constant current carried by the inductive load. The highest efficiency that class A can achieve is $50 \%$. M1 dissipates the other half of the energy.

### 4.3.1.2 Class B PAs

The traditional class B PA employs two parallel stages each of which conducts for only $180^{\circ}$, thereby achieving a higher efficiency than the class A counterpart, see Fig. 4-9.


Figure 4-9: Class B stage.
How T1 combines the half-cycle current waveforms generated by M1 and M2?

Using superposition, we draw the output network in the two half cycles as shown in Fig. 4-10(a). When M1 is on, ID1 flows from node X, producing a current in the secondary that flows into RL and generates a positive $\mathrm{V}_{\text {out }}$. Conversely, when M2 is on and draws current from node Y, the secondary current flows out of RL and generates a negative $\mathrm{V}_{\text {out }}$ as shown in Fig. 4-10(b).


Figure 4-10: Output network currents during (a) positive and (b) negative output half cycles.
If the parasitic capacitances are small and the primary and secondary inductances are large, the swing above VDD is approximately half that below VDD, an undesirable situation resulting in low efficiency, see Fig. 4-11. For this reason, the secondary (or primary) of the transformer is tuned by a parallel capacitance, see Fig. 4-12.


Figure 4-11: Current and voltage waveforms in a class B stage.


Figure 4-12: Class B circuit with resonant secondary network.

Calculation of Class B PA Efficiency: (See Fig. 4-13)
A half-cycle sinusoidal current, $\mathrm{ID}_{1}=\mathrm{I}_{\mathrm{p}} \sin \omega_{o} \mathrm{t}$, producing an output voltage given by:

$$
\begin{equation*}
V_{\text {out }}(t)=\frac{m}{n} I_{p} R_{L} \sin \omega_{0} t \tag{4.4}
\end{equation*}
$$

and delivering an average power of:

$$
P_{o u t}=\left(\frac{m}{n}\right)^{2} \frac{R_{L} I_{p}^{2}}{2}
$$

average power provided by VDD is equal to:

$$
\begin{equation*}
P_{s u p p}=2 \frac{I_{p}}{\pi} V_{D D} \tag{4.6}
\end{equation*}
$$

Drain efficiency of class B stages:

$$
\begin{equation*}
\eta=\frac{\pi}{4 V_{D D}}\left(\frac{m}{n}\right)^{2} I_{p} R_{L} \tag{4.7}
\end{equation*}
$$



Figure 4-13: Class B circuit for efficiency calculation.

The primary of the transformer, therefore, senses a voltage waveform given by:

$$
\begin{equation*}
V_{X Y}=2 V_{p} \sin \omega_{0} t \tag{4.8}
\end{equation*}
$$

which, upon experiencing a ratio of $n /(2 m)$, yields the output voltage:

$$
\begin{align*}
V_{o u t}(t) & =\left(\frac{n}{2 m}\right) 2 V_{p} \sin \omega_{0} t  \tag{4.9}\\
& =\frac{m}{n} I_{p} R_{L} \sin \omega_{0} t
\end{align*}
$$

Then,

$$
\begin{equation*}
V_{p}=\frac{m^{2}}{n^{2}} I_{p} R_{L} \tag{4.10}
\end{equation*}
$$

If $\mathrm{V}_{\mathrm{p}}=\mathrm{VDD}$ then,

$$
\begin{aligned}
\eta & =\frac{\pi}{4} \\
& \approx 79 \%
\end{aligned}
$$

For power levels above roughly 100 mW , an off-chip balun may be used if efficiency is critical.

### 4.3.1.3 Class AB PAs

The term "class AB" is sometimes used to refer to a single-ended PA (e.g., a CS stage) whose conduction angle falls between $180^{\circ}$ and $360^{\circ}$, i.e., in which the output transistor turns off for less than half of a period. From another perspective, a class AB PA is less linear than a class A stage and more linear than a class B stage.

### 4.3.1.4 Class C Pas

In class C stages, the conduction angle is further reduced. To avoid large harmonic levels at the antenna, the matching network must provide some filtering. As $\theta$ decreases, the transistor is on for a smaller fraction of the period, thus dissipating less power. For the same reason, however, the transistor delivers less power to the load, see Fig. 4-14.


Figure 4-14: Class C stage and its waveforms.

It's efficiency:

$$
\begin{equation*}
\eta=\frac{1}{4} \frac{\theta-\sin \theta}{\sin (\theta / 2)-(\theta / 2) \cos (\theta / 2)} \tag{4.11}
\end{equation*}
$$



Figure 4-15: Efficiency vs. Theta

Output power equal:

$$
\begin{equation*}
P_{\text {out }} \propto \frac{\theta-\sin \theta}{1-\cos (\theta / 2)} \tag{4.12}
\end{equation*}
$$



Figure 4-16: Pout vs. Theta

Efficiency of $100 \%$ as $\theta$ approaches zero.
Pout falls to zero as $\theta$ approaches zero.

### 4.3.2 Switch Mode PAs

The output current and voltage waveforms are sinusoidal, which is the main concept in classes A, B, and C. When this premise is abandoned, higher harmonics can be used to enhance performance. The topologies that follow are based on specialised passive output network to shape wave shapes while limiting the time during which a transistor carries a large current and maintains a large voltage.

### 4.3.2.1 Class E PAs

Probably class $E$ is the most well-known, and certainly most widely touted, switching mode for RF applications is. Rely on specific output passive networks to shape the waveforms, minimizing the time during which the output transistor carries a large current and sustains a large voltage, so $\eta$. The large parasitics of on-chip inductors typically dictate that matching networks be realized externally.


Figure 4-17: Class E stage

Three Conditions Required for Vx (See Fig. 4-18, 7-19):

1) As the switch turns off VX remains low long enough for the current to drop to zero, i.e., VX and ID1 have nonoverlapping waveforms. The first condition resolves the issue of finite fall time at the gate of M1.(guaranteed by C1).
2) $V x$ reaches zero just before the switch turns on. The second condition ensures that the VDS and ID of the switching device do not overlap in the vicinity of the turn-on point, thus minimizing the power loss.
3) $d V_{x} / d t$ is also near zero when the switch turns on. The third condition lowers the sensitivity of the efficiency to violations of the second condition.


Figure 4-18: (a) Class E stage, (b) condition to ensure minimal overlap between drain current and voltage, (c) condition to ensure low sensitivity to timing errors.

The time response depends on the Q of the network and appears as shown above for underdamped, overdamped and critically-damped conditions.


Figure 4-19: Class E matching network viewed as a damped network.

The waves of class E are shown in Fig. 4-20, [14]:


Figure 4-21: Class E, off mode


Figure 4-20: Waves of class E.

In the conventional class-E PA, the RF-choke (RFC) is assumed to have a sufficiently high reactance and the output current through the load resistor RL is essentially a sinusoid at fundamental frequency.

Under these conditions, the analytical design equations can be derived and are given by [15], see Fig. 4-22,23:


Figure 4-23: Class E, Drain shape

$$
\begin{aligned}
R_{L} & =0.5768 \cdot \frac{V_{D D}^{2}}{R_{L}} \\
C_{1} & =0.1836 \cdot \frac{1}{\omega R_{L}} \\
L_{x} & =1.1525 \cdot \frac{R_{L}}{\omega}
\end{aligned}
$$

Figure 4-22: Design equations of class E

### 4.3.2.2 Class F PAs

Class F relays on the idea of harmonic termination. If in generic switching stage the load provides high termination at second or third harmonics, the waveform across the switch exhibits sharper edges than sinusoid thus, reduce power loss in this figure $\mathrm{L} 1, \mathrm{C} 1$ resonates at twice or three times the input frequency thus, Vx approaches rectangular waveform. If drain current is assumed half wave rectified sinusoid thenthe peak efficiency of class F is equal to $88 \%$, see Fig 7-24:


Figure 4-24: Class F stage

The matching network is designed such that its input impedance is low at the fundamental and high at the second harmonic. There are two groups of Class F RF power amplifiers:

1- Odd harmonic Class F power amplifiers.
2- Even harmonic Class F power amplifiers (Inverse Class F).

If we increased the harmonics then the efficiency will increase, see Fig. 4-25:

| Resonant Network Order | Max. Efficiency | Max. Output Power |
| :---: | :---: | :---: |
| $\mathbf{1}$ | $78.50 \%$ | $0.7860 \mathrm{P}_{\text {out }}$ |
| $\mathbf{3}$ | $90.69 \%$ | $0.9075 \mathrm{P}_{\text {out }}$ |
| $\mathbf{5}$ | $94.77 \%$ | $0.9484 \mathrm{P}_{\text {out }}$ |

Figure 4-25: Eff. of class F
EECE

Its Wave forms(Fig. 4-2), [16]:


Figure 4-26: Wave shapes of class F.

### 4.3.3 Summary of PAs Classes

| Class | A | AB | B | C | D | E | F |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Transistor <br> Mode | Current <br> source | Current <br> source | Current <br> source | Current <br> source | Switch | Switch | Switch |
| Conduction <br> Angle | $2 \pi$ | $\pi \sim 2 \pi$ | $\pi$ | $0 \sim \pi$ | $\pi$ | $\pi$ | $\pi$ |
| Output <br> Power | Medium | Medium | Medium | Low | High | High | High |
| Theoretical <br> Efficiency | $50 \%$ | $50 \% \sim$ <br> $78.5 \%$ | $78.5 \%$ | $78.5 \% \sim$ <br> $100 \%$ | $100 \%$ | $100 \%$ | $100 \%$ |
| Typical <br> Efficiency | $35 \%$ | $35 \%$ <br> $60 \%$ | $60 \%$ | $70 \%$ | $75 \%$ | $80 \%$ | $75 \%$ |
| Power Gain | High | Medium | Medium | Low | Low | Low | Low |
| Linearity | Very <br> high | High | high | Low | Low | Low | Low |
| Peak Drain <br> Voltage | $2 \mathrm{~V}_{\mathrm{DD}}$ | $2 \mathrm{~V}_{\mathrm{DD}}$ | $2 \mathrm{~V}_{\mathrm{DD}}$ | $2 \mathrm{~V}_{\mathrm{DD}}$ | $2 \mathrm{~V}_{\mathrm{DD}}$ | $3.6 \mathrm{~V}_{\mathrm{DD}}$ | $2 \mathrm{~V}_{\mathrm{DD}}$ |

Figure 4-27: PA Classes comparison

### 4.4 Specifications and Design Methodology

In Table 4-1, the system designer told me that I have to achieve these specifications. I also achieved other specifications, like efficiency and THD.

Table 4-1: Specifications of the Required PA.

PARAMETER
Output Power
Gain
Input Intercept Point 3 (OIP3)
$Z_{\text {in }}$ (Differential)
$Z_{\text {out }}$
Frequency Range
Bandwidth
Fundamental Frequency Linearity

## SPECIFICATIONS

$$
9 \mathrm{dBm}
$$

17 dBm
$>18 \mathrm{dBm}$
50 Ohm
50 Ohm
1G to 6G
100MHZ
4.01GHZ

Maximum Input power $=-8 \mathrm{dBm}$

The Cadence ADE with TSMC 65 nm process technology will be used to display and simulate the physical schematic design. There are two sections to the circuit:

1- The main stage of the PA which contains the differential topology with the balun (transformer) at the output, which contains of 4 inductors with ideal mutual inductance (its coupling factor $(\mathrm{k})=1)$.

2- The biasing stage, which contains of nmos transistor and ideal current source.

### 4.4.1 The main stage with the balun

The main stage is implemented by class B, see Fig. 4-12, as illustrated before; as we have a large specification on the linearity, then we changed it to class $A B$ to obtain the required gain. The balun is implemented by an ideal transformer. For no loading on the preceding stage, to not making the design of the preceding stage difficult, we designed the capacitor to not exceed 530 fF , so we chose the value of the inductor at the output to be 1.5 nH , which is a reasonable value, so at the designed frequency (4.01 GHZ) the value of the capacitor will be:

$$
C=\frac{1}{2 w^{2} L_{\text {out }}}=525 f F
$$

The factor of 2 at the output as we used 2 inductors at the output not a one, as there is a problem if we use mutual inductance between 2 inductors and one inductor. Also, for a factor of non-ideality we chose mimcaps.

The value of the drain inductors, we chose it to achieve the maximum efficiency, so $\mathrm{L}=2.37 \mathrm{nH}$ for R (input impedance, not differential) $=79$, which is $\mathrm{R}_{\text {opt }}$.

The transistor width was chosen to be the maximum (6um) for the required linearity and gain. Also, we use multiplier=2, so the equivalent width=12um. And $\mathrm{V}_{\mathrm{dd}}=1.5 \mathrm{v}$ for no stress on the transistor and to obtain the requirements.

For the range from 1 G to 6 Ghz , we will change the matching network to obtain the requirements. At $1 G$, we will change the capacitor by the value of 8.44 pF , and at 6 G , we will change the capacitor by the value of 235 fF , without changing the value of the inductor $(1.5 \mathrm{nH})$.

### 4.4.2 The biasing stage

We used here diode connected transistor with an ideal current source at the drain to determine the dc volt at the gate $\left(\mathrm{Vgs}_{\mathrm{g}}\right)$. We chose the minimum width for it to use minimum current; for no loading on the gate of the transistor. The $\mathrm{W}_{\min }=600 \mathrm{n}$, then to obtain conduction angle $=180^{\circ}$ (class B), $\mathrm{I}_{\mathrm{dc}}$ will be $90 u \mathrm{~A}$, and $\mathrm{V}_{\mathrm{gs}}=\mathrm{V}_{\mathrm{th}}=325 \mathrm{v}$. For the required gain, we chose obtain the conduction angle $(\theta)=230^{\circ}($ class AB$)$, then $\mathrm{I}_{\mathrm{dc}}$ will be 180 uA , and $\mathrm{V}_{\mathrm{gs}}=390 \mathrm{v}$.

The value of the biasing resistance was chosen to be $20 \mathrm{k} \Omega$, for no attenuation between the biasing transistor and the gate. Also, the coupling capacitor was chosen to be 600 for no attenuation to the signal and no loading on the preceding stages.

### 4.5 Other topics

I will talk here about Conjugate Match vs. Loadline Match, and the balun.

### 4.5.1 Conjugate Match vs. Loadline Match

In small signal analysis, complex conjugate matching (or power matching) technique has been extensively used in order to maximize the output power. However, the maximum power efficiency of this technique is ideally $50 \%$, since the conjugate match requires the same real part of the load impedance and the source impedance. Half of the available power is wasted. In PA designs, where the power is very precious, such huge losses are usually unaffordable, not mentioning the shortened battery life as well as the heat dissipation challenges. PAs deal with large signals, and the most well-known matching method in the design, therefore, is loadline match. To distinguish the two matching techniques, conjugate match and loadline match, let's consider an ideal current source ( $\mathrm{I}_{\mathrm{s}}$ ) paralleled with its source resistance $\left(\mathrm{R}_{\mathrm{S}}\right)$, Fig. 4-30.


Figure 4-28: Circuitry of a current source with source resistance and load.

Conjugate match requires $\mathrm{RS}=\mathrm{RL}$, so the power delivered to and the voltage across the load can be expressed as:

$$
\begin{equation*}
P_{L}=\frac{1}{4} I_{S}^{2} R_{L}, \quad V_{\text {out }}=\frac{1}{2} I_{S} R_{L} \tag{4.12}
\end{equation*}
$$

The catch is that the above equations by default take no account of the physical limitations of the maximum output voltage/current of the source. When the ideal source is replaced by a real world device, in our case a FET working as a current source, it changes the whole story. For example, in a case where the maximum limiting current of a current generator (FET) is $100-\mathrm{mA}$, and for maximum power transfer the load attached to the FET is the same as its output
impedance, say 1 -KOhm. Simple calculations show that the voltage appearing across the generator terminals would be $50-\mathrm{V}$, which is way beyond a typical FET's breakdown voltage nowadays.

To prevent the device from breaking down, a FET would have to be forced to work below its full capacity. So smaller output power, resulting from the low device output current, is delivered to the load. This is definitely not a desirable situation. In the other case, the FET could be "current limited" and has to keep a much lower voltage than its breakdown voltage at the output terminals so as to accommodate the maximum current allowed by the device. Fig. 4-8 shows how optimum load impedance can be obtained by considering both of the device's voltage and current limitations.


Figure 4-29: Obtaining optimum load impedance in loadline match.

Optimum load, as shown in Fig. 4-31 curve (b), accommodates the maximum permissible current and voltage swings across the device's output, and makes the two limits happen "at the same time". Its expression is:
$\mathrm{R}=\mathrm{R}_{\mathrm{S}} \mathrm{P} \mathrm{R}_{\mathrm{L}}=\frac{V_{\max }}{I_{\max }}$

The two scenarios discussed here, conjugate and loadline match, are both widely used in modern transceiver designs. Yet when trying to adopt the theory, one should always keep in mind that the basic conjugate match theorem only applies to unrestricted cases where currents and voltages at the generator terminal are unbounded by physical constraints. The loadline match is a realworld compromise, which is necessary to extract the maximum power from RF transistors and at the same time keep the RF voltage swing within specified limits and/or the available DC supply. Fig. 4-32 shows the Pin vs. Pout plot of a ClassA PA when conjugate match (solid line) and loadline match (dashed line) are applied to the load design. We noticed that the conjugate match yield a $1-\mathrm{dB}$ compression power significantly lower than its loadline match counterpart [17].


Figure 4-30: Compression characteristics for conjugate (S22) match (solid curve) and power match (dashed curve). The 1-dB compression points ( $B, B^{\prime}$ ) and maximum linear power points ( $A, A^{\prime}$ ) show improvements under power match conditions.

### 4.5.2 Transformer (Balun) Analysis

Transformers give one alternative way (the first way was done by LC tank network) to do impedance transformation for PAs. The behaviors of ideal transformers have been well studied. In Fig. 4-33, it shows a typical transformer-based output network with non-idealities, where R1, R2 represent the series resistances of the primary and secondary inductors (L1, L2), $M$ is the transformer's mutual inductance, and $m$ is the
transformer's turn ratio between primary and secondary coils. The equations of R1, R2, $M$ and $m$ are shown in Eq. 4.14 to Eq.4.16,[5].


Figure 4-31: Transformer output network with non-idealities.

$$
\begin{gather*}
R_{1}=\frac{\omega L_{1}}{Q_{1}}, \quad R_{2}=\frac{\omega L_{2}}{Q_{2}}  \tag{4.14}\\
M=k \sqrt{L_{1} \cdot L_{2}}  \tag{4.15}\\
m=k \cdot \sqrt{\frac{L_{2}}{L_{1}}} \approx\left|\frac{I_{1}}{I_{2}}\right| \approx\left|\frac{V_{2}}{V_{1}}\right| \tag{4.16}
\end{gather*}
$$

The coupling factor $\mathrm{k}(0<\mathrm{k}<1)$ in Eq. $4-15$ represents the grade of coupling between primary and secondary coils. When $\mathrm{k}=1$, it means perfect coupling, while $\mathrm{k}=0$ expresses a system where the primary and secondary coils are independent of each other.

Also, the relations between R's and $m$ and L's are given by Eq. 4.17:
$\mathrm{R}=\frac{V}{I}$ then $R_{1}=\frac{L_{1}}{L_{2}} R_{2}=\frac{1}{m^{2}} R_{2}$

## Simulation Results

We will simulate the circuit to draw the:

1. Output vs. Input.
2. OIP3 and Output Power.
3. Compression Point.
4. Efficiency.
5. Harmonics.

We run Transient analysis for 1, PAC analysis for 2 and PSS analysis for 3,4 and 5 .

At Fig. 4.32, our PA schematic is shown:


Figure 4-32: Our Lineaer PA Schematic

### 4.5.3 Typical Simulation Results

For the fundamental freq. $=4.01 \mathrm{GHZ}$ and the edge of the band, 3.96 GHZ and 4.06 GHZ . For the range, I simulated the PA at 1 G and 6 G , and we will have to design the matching network for them.

### 4.5.3.1 output vs. input

For $\mathrm{F}=4.01 \mathrm{Ghz}$ :

Expressions


Figure 4-33: vout and vin for 4.01Ghz

For $\mathrm{F}=3.96 \mathrm{Ghz}$ :

Transient Response


Figure 4-34: vout and vin for 3.96Ghz

For $\mathrm{F}=4.06 \mathrm{GHZ}$ :


Figure 4-35: vout and vin at 4.06Ghz

For $\mathrm{F}=1 \mathrm{G}$, without changing the matching network:


Figure 4-36: vout and vin at 1 Ghz , without changing the matching network

For $\mathrm{F}=1 \mathrm{G}$, with changing the matching network:


Figure 4-37: vout and vin at 1 Ghz, with changing the matching network

For $\mathrm{F}=6 \mathrm{G}$, without changing the matching network:


Figure 4-38: vout and vin at 6Ghz, without changing the matching network

For $\mathrm{F}=6 \mathrm{G}$, with changing the matching network:


Figure 4-39: vout and vin at 6Ghz, with changing the matching network

So, the conclusion of the Fig. 33 to 39: Within the band, the results achieve the requirements. For the range, if we didn't change the matching network it will not achieve the requirements, so, we will use a programable PA.

### 4.5.3.2 OIP3:

For $\mathrm{F}=4.01 \mathrm{GHZ}$ :


Figure 4-40: IIP3 for 4.01 Ghz

For $\mathrm{F}=3.96 \mathrm{GHZ}$ :


Figure 4-41: IIP3 for 3.96 Ghz

For $\mathrm{F}=4.06 \mathrm{GHZ}$ :


Figure 4-42: IIP3 for 4.06 Ghz
For $\mathrm{F}=1 \mathrm{GHZ}$, without changing:


Figure 4-43: IIP3 for 1 Ghz, without changing

For $\mathrm{F}=1 \mathrm{GHZ}$, with changing:


Figure 4-44: IIP3 for 1 Ghz , with changing
For $\mathrm{F}=6 \mathrm{GHZ}$, without changing:

Periodic AC Response

- trace="3rd Order';ipnCurves - trace="1st Order";ipnCurves


Figure 4-45: IIP3 for 6 Ghz, without changing

For $\mathrm{F}=6 \mathrm{GHZ}$, with changing:


Figure 4-46: IIP3 for 6 Ghz, with changing

So, the conclusion of Fig. 40 to 46: Within the band, the results achieve the requirements, which is OIP3 $>18 \mathrm{dBm}$, and Pout $>9 \mathrm{dBm}$. For the range, if we didn't change the matching network it will not achieve the requirements, so, we will use a programable PA.

### 4.5.3.3 Compression point

For $\mathrm{F}=4.01 \mathrm{GHZ}$ :


Figure 4-47: Compression point For F=4.01 Ghz

For $\mathrm{F}=3.96 \mathrm{GHZ}$ :

Periodic Steady State Response


Figure 4-48: Compression point For $\mathrm{F}=3.96 \mathrm{Ghz}$

For $\mathrm{F}=4.06 \mathrm{GHZ}$ :


Figure 4-49: Compression point For $\mathrm{F}=4.06 \mathrm{Ghz}$

## For $\mathrm{F}=1 \mathrm{GHZ}$, without changing:

Periodic Steady State Response


Figure 4-50: Compression point For $\mathrm{F}=1 \mathrm{GHZ}$, without changing

For $\mathrm{F}=1 \mathrm{GHZ}$, with changing:


Figure 4-51: Compression point For F=1GHZ, with changing

## For $\mathrm{F}=6 \mathrm{GHZ}$, without changing:



Figure 4-52: Compression point For $\mathrm{F}=6 \mathrm{GHZ}$, without changing

For $\mathrm{F}=6 \mathrm{GHZ}$, with changing:

Periodic Steady State Response


Figure 4-53: Compression point For $\mathrm{F}=6 \mathrm{GHZ}$, with changing

So, the conclusion of the Fig. 47 to 53: Within the band, the results achieve the requirement, which is $\mathrm{P} 1 \mathrm{~dB}>-8 \mathrm{dBm}$ (which is the maximum Pin ). For the range, if we didn't change the matching network it will achieve this requirement, so, we will not use a programable PA.

### 4.5.3.4 The Efficiency

For $\mathrm{F}=4.01 \mathrm{GHZ}$ :


Figure 4-54: Efficiency of $\mathrm{F}=4.01 \mathrm{Ghz}$
For $\mathrm{F}=3.96 \mathrm{GHZ}$ :


Figure 4-55:n Efficiency of $\mathrm{F}=3.96 \mathrm{Ghz}$

For $\mathrm{F}=4.06 \mathrm{GHZ}$ :


Figure 4-56: Efficiency of $\mathrm{F}=4.06 \mathrm{Ghz}$

For 1GHZ, without changing:


Figure 4-57: Efficiency of $\mathrm{F}=1 \mathrm{Ghz}$, without changing

For 1GHZ, with changing:


Figure 4-59: Efficiency of $\mathrm{F}=1 \mathrm{Ghz}$, with changing
For 6GHZ, without changing:


Figure 4-58: Efficiency of $\mathrm{F}=6 \mathrm{Ghz}$, without changing

For 6GHZ, with changing:

Periodic Steady State Response


Figure 4-60: Efficiency of $\mathrm{F}=6 \mathrm{Ghz}$, with changing

So, the conclusion of the Fig. 54 to 60: Within the band, the results achieve the Max efficiency, which is $12.6 \%$ for Pin, $\max =-8 \mathrm{dBm}$. For the range, if we didn't change the matching network it will not achieve the Max efficiency, so, we will use a programable PA.

### 4.5.3.5 The harmonics

For $\mathrm{F}=4.01 \mathrm{GHZ}$ :


Figure 4-61: The harmonics of $\mathrm{F}=4.01 \mathrm{Ghz}$
For $\mathrm{F}=3.96 \mathrm{GHZ}$ :
Periodic Steady State Response


Figure 4-62: The harmonics of $\mathrm{F}=3.96 \mathrm{Ghz}$

For $\mathrm{F}=4.06 \mathrm{GHZ}$ :


Figure 4-64: The harmonics of $\mathrm{F}=4.06 \mathrm{Ghz}$
For $\mathrm{F}=1 \mathrm{GHZ}$, without changing:


Figure 4-63: The harmonics of $\mathrm{F}=1 \mathrm{Ghz}$, without changing

For $\mathrm{F}=1 \mathrm{GHZ}$, with changing:


Figure 4-65: The harmonics of $\mathrm{F}=1 \mathrm{Ghz}$, with changing
For $\mathrm{F}=6 \mathrm{GHZ}$, without changing:


Figure 4-66: The harmonics of $\mathrm{F}=6 \mathrm{Ghz}$, without changing

For $\mathrm{F}=6 \mathrm{GHZ}$, with changing:
Periodic Steady State Response
v/vout pin $=-8$; pss dB20(vpeak)


Figure 4-67: The harmonics of $\mathrm{F}=6 \mathrm{Ghz}$, with changing

So, the conclusion of Fig. 61 to 67: Within the band, the results achieve a good rejection to harmonics, which is 43dB for odd harmonics, and 50dB to even harmonics as it is a differential topology. For the range, if we didn't change the matching network it will not achieve the same value, so, we will use a programable PA.

### 4.5.4 Performance Summary

### 4.5.4.1 Typical Summary

Table 4-2: Summary of Typical Results.

| PARAMETER | SPECIFICATIONS | ACHIEVEMENTS |
| :---: | :---: | :---: |
| Output Power | 9 dBm | $>9 \mathrm{dBm}$ |
| Gain | 17 dBm | 17 dBm |
| Input Intercept | $>18 \mathrm{dBm}$ | 26 dBm |
| Point 3 (OIP3) | 1 G to 6G | Using <br> Frequency Range |
| Programmable |  |  |
| Bandwidth | PA |  |
| Linearity | Maximum Input <br> power $=-8 \mathrm{dBm}$ | $>100 \mathrm{MHZ}$ |
|  |  | $>-2 \mathrm{dBm}$ |

## CHAPTER 5 : CONCLUSION

Our survey had found that for optimum operation between 1:6 GHz , TSPC logic can be used to implement the multi modulus divider for a low power operation with a good percentage for rise and fall times where the 4-stage MMD implementation dissipates 600uW. Furthermore, an extension to the division range of the MMD has been introduced in this work to add the range from 33:48 without overhead on the critical path. Also, it was shown that the I/Q Divider is fast enough for a signal of 10 GHZ with adequate rise-time and fall-time, also the phase-shift between the output signals was proved to be 90 deg which is suitable to be received as it minimizes BER very much. So, we need to observe the effect of mismatch as it can add a skew between the two clocks which would disturb the divider operation and leads to the change in the phase-shift between the signals which would cause the BER to be big, however, as it is clear we didn't discuss mismatch (montecarlo). In addition to, the classes and a lot of things of power amplifiers were discussed and analyzed. A class AB power amplifier was designed and optimized to achieve high linearity and high gain for 5G communications. The proposed design of PA was implemented and simulated using cadence virtuoso design suit. the proposed design was optimized to meet the given specifications in typical conditions.

## FUTURE WORK

1. Implementation of DAC tree and calibration blocks.
2. Corners simulation for the Power Amplifier.
3. Monto Carlo simulation.
4. Programmable PA.
5. Layout simulation.

## REFERENCES

[1] G. E. Moore. "Cramming more components onto integrated circuits". In: Electronics 38.8 (Apr. 1965), pp. 114-117.
[2] A. K. Kruth. "The Impact of Technology Scaling on Integrated Analogue CMOS RF Front-Ends for Wireless Applications". Ph.D. thesis. RWTH Aachen University, 2008. url: http://darwin.bth.rwth-aachen.de/ opus3/volltexte/2008/2563/.
[3] A. Elkholy, S. Saxena, R. K. Nandwana, A. Elshazly and P. K. Hanumolu, "A 2.0-5.5 GHz Wide Bandwidth Ring-Based Digital Fractional-N PLL With Extended Range Multi-Modulus Divider," in IEEE Journal of Solid-State Circuits, vol. 51, no. 8, pp. 1771-1784, Aug. 2016, doi: 10.1109/JSSC.2016.2557807.
[4] Niklas Zimmermann, Design and Implementation of a Broadband RF-DAC Transmitter for Wireless Communications
[5] Qun Jane Gu and Zhuo Gao ," A CMOS High Speed Multi-Modulus Divider With Retiming for Jitter Suppression,"
[6] J.-H. Tsai and H.-D. Shih, "A 7.5GHz-12GHz divide-by256/260/264/268 frequency divider for frequency synthesizers," 2012 Int. Conf. on Microwave and Millimeter wave technology, 2012.
[7] V.K. Krishna, M.A. Do, C.C. Boon and K.S. Yeo, "A low-power singlephase clock multiband divider," IEEE Trans. VLSI Systems, , vol. 20, pp. 376-380, Feb. 2012.
[8] B. Razavi. RF microelectronics (Vol. 2). New Jersey: Prentice Hall, 2011.
[9] Ahmed Elkholy, Saurabh Saxena, Romesh Kumar, and Amr Elshazly "A 2.05.5 GHz Wide Bandwidth Ring-Based Digital Fractional-N PLL With Extended Range Multi-Modulus Divider,"
[10] Mark Ray, William Souder, Marcus Ratcliff, Foster Dai and J. David Irwin, "A 13GHz Low Power Multi-Modulus Divider Implemented in $0.13 \mu \mathrm{~m}$ SiGe Technology,"
[11] Eissa, Mohamed \& El-Shennawy, Mohammed. (2010). A Technique for Robust Division Ratio Switching in Multi Modulus Dividers with Modulus Extension. Proceedings of the International Conference on Microelectronics, ICM. 10.1109/ICM.2010.5696212.
[12] HongMo Wang, "A 1.8 V 3 mW 16.8 GHz frequency divider in $0.25 / \mathrm{spl}$ mu/m CMOS," 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056), 2000, pp. 196-197, doi: 10.1109/ISSCC.2000.839746.
[13] B. Razavi, K. F. Lee and R. H. Yan, "Design of high-speed, low-power frequency dividers and phase-locked loops in deep submicron CMOS," in IEEE Journal of Solid-State Circuits, vol. 30, no. 2, pp. 101-109, Feb. 1995, doi: 10.1109/4.341736.
[14] S. Cripps, RF Power Amplifiers for Wireless Communications, Norwood, MA: Artech House, 1999.
[15] Lim, Alfred \& Tan, Aaron \& Kong, Zhi \& MA, K.. (2019). A Design Methodology and Analysis for Transformer-Based Class-E Power Amplifier. Electronics. 8. 494. 10.3390/electronics8050494.
[16] Marian K. Kazimierczuk, RF Power Amplifiers, New York: Wiley, 2015.
[17] FANG, Qiang, CMOS RF Power Amplifier Design for Wireless Communications, UC Riverside Electronic Theses and Dissertations, 2012.

