| I   | NI | VER | SITY | $\mathbf{OF}$ | SOI      | TTH | M        | PT | O | V |
|-----|----|-----|------|---------------|----------|-----|----------|----|---|---|
| • , |    | •   |      |               | . 74 / 4 | . , | <b>~</b> |    |   |   |

FACULTY OF FACULTY OF PHYSICAL SCIENCES AND ENGINEERING

School of Optoelectronics Research Centre

**Clock Generation for Silicon Photonics based Optical Communication Systems** 

by

## **Fanfan Meng**

Thesis for the degree of Doctor of Philosophy

Feb, 2018

#### UNIVERSITY OF SOUTHAMPTON

## **ABSTRACT**

#### FACULTY OF OF PHYSICAL SCIENCES AND ENGINEERING

### ELECTRONIC AND ELECTRICAL ENGINEERING

Thesis for the degree of Doctor of Philosophy

## **Clock Generation for Silicon Photonics based Optical Communication Systems**

### By Fanfan Meng

As communications data traffic continues to increase, electronic interconnects over short reaches are struggling to keep up with the bandwidth and power consumption requirements. One of the technology trends is to migrate from copper to optical based interconnects where silicon Photonics (SiP) technology has emerged as an excellent technical solution to meet the performance and cost requirements of these short-reach applications.

The clock generation system is a critical module that none of the communication systems can overlook. However, the reported clock generation solutions utilized in SiP transceivers inherit limitations from traditional electronic interconnects, where the clock signals are limited by the frequency tuning range, system settling time and the number of clock phases. The motivation for this PhD project is to build a novel clock generation system that can be fully integrated with future SiP transceiver and the innovation has been realized in various aspects of the work.

Firstly, a novel high-speed ring-based voltage-controlled oscillator (VCO) is proposed using inductor peaking. The proposed VCO topology was validated with four design examples fabricated in different CMOS processes nodes (130nm and 65nm) and measured results show close agreement with theoretical analysis. The figure of merit (FOM) of 203 is the best combination of frequency and tuning range currently.

Secondly, a dedicated phase locked loop (PLL) structure combined with the inductor peaking VCO was created, focusing on the requirements of frequency controllability and system settling time for SiP communication system. A programmable frequency

range of more than 25GHz has been achieved using a 40nm process while the measured phase locking time is always less than half of a microsecond.

Finally, with the mainstream CMOS process for analogue circuits design migrating towards to 28nm High-k/Metal Gate (HKMG), design methodologies on the proposed VCO have been realized in order to adapt this evolution. Two specific design cases have been implemented to fully utilize the advantages of new CMOS process and mitigate the side-effects of 28nm HKMG process.

# **Contents**

| Contents  | i                                                |
|-----------|--------------------------------------------------|
| List of F | iguresv                                          |
| List of T | ables xiii                                       |
| Academi   | c Thesis: Declaration Of Authorshipxiv           |
| Acknowl   | edgementsxv                                      |
| Definitio | ns and Abbreviationsxvi                          |
| Publicati | ons xviii                                        |
| Chapter   | 1 Introduction1                                  |
| 1.1       | Background of Silicon Photonics                  |
| 1.2       | Project Motivations4                             |
| 1.2.1     | Technology Trend of Electro-Optics Interconnect4 |
| 1.2.2     | 2 Targeting Demands for Clock Generation System6 |
| 1.3       | Design Challenges                                |
| 1.4       | Thesis Outline9                                  |
| 1.5       | Summary11                                        |
| Chapter   | 2 Literature Review12                            |
| 2.1       | Introduction                                     |
| 2.2       | Traditional Broadband Serializer-Deserializers   |
| 2.2.1     | Electrical Transceiver Links                     |
| 2.2.2     | 2 Optical Transceiver Link14                     |
| 2.3       | Voltage Controlled Oscillator (VCO)              |
| 2.3.1     | Conventional VCO Structure                       |
| 2.3.2     | 2 Broadband VCO                                  |
| 2.3.3     | B Low Noise VCO                                  |
| 2.4       | Conventional Phase Locked Loop (PLL)             |

| 2.     | .4.1 Type I PLL |                                                         | 28     |  |
|--------|-----------------|---------------------------------------------------------|--------|--|
| 2.     | 4.2             | Type II PLL                                             | 30     |  |
| 2.5    | Alt             | ternative PLL Topologies                                | 33     |  |
| 2.     | 5.1             | Dual Tuning PLL                                         | 33     |  |
| 2.     | 5.2             | Multiple Loop PLLs                                      | 36     |  |
| 2.     | 5.3             | Multiplying Delay Locked PLL                            | 39     |  |
| 2.     | 5.4             | PLL-based CDR                                           | 40     |  |
| 2.6    | Su              | mmary                                                   | 44     |  |
| Chapt  | er 3            | Ultra-wide Tuning Range CMOS Ring-based VCO with In     | ductor |  |
| Peakir | ıg              |                                                         | 46     |  |
| 3.1    | Int             | roduction                                               | 46     |  |
| 3.2    | An              | alysis on Bandwidth Enhancement of VCO                  | 47     |  |
| 3.     | 2.1             | Analysis on Transistor Sizing to Frequency              | 48     |  |
| 3.     | 2.2             | Analysis on Noise Contribution                          | 49     |  |
| 3.     | 2.3             | Analysis on Voltage Tuning Range                        | 50     |  |
| 3.3    | Co              | mmon Source RO-VCO with Inductor Peaking                | 52     |  |
| 3.4    | Inc             | luctive Peaking VCO Design Application                  | 55     |  |
| 3.     | 4.1             | The Research Purposes of Implementation                 | 55     |  |
| 3.     | 4.2             | Chip Level Design                                       | 55     |  |
| 3.     | 4.3             | Testing and Experimental Results                        | 58     |  |
| 3.5    | Su              | mmary                                                   | 63     |  |
| Chapt  | er 4 ]          | Theoretical Analysis of Dual-Loop Triple-Controlled PLL | 65     |  |
| 4.1    | Int             | roduction                                               | 65     |  |
| 4.2    | Co              | nsiderations of Inductor Peaking VCO in PLL             | 66     |  |
| 4.3    | Inv             | vestigation of Classic PLL with Inductor Peaking VCO    | 67     |  |
| 4.     | 3.1             | Charge-pump PLL with Inductor Peaking VCO               | 67     |  |
| 4      | 3.2             | Dual-tuning PLL with Inductive Peaking VCO              | 72     |  |

| 4.4    | Pro                                                      | Proposed Dual-loop Triple-controlled PLL                    |    |
|--------|----------------------------------------------------------|-------------------------------------------------------------|----|
| 4.4    | 4.4.1 Trade-off between Loop Bandwidth and Settling Time |                                                             | 78 |
| 4.4    | 4.2                                                      | Bandwidth Enhancement                                       | 79 |
| 4.5    | Per                                                      | formance Comparison                                         | 85 |
| 4.6    | Sui                                                      | mmary                                                       | 86 |
| Chapte | er 5 I                                                   | Practical Implementation of Dual-Loop Triple-Controlled PLL | 87 |
| 5.1    | Int                                                      | roduction                                                   | 87 |
| 5.2    | Toj                                                      | p level System Design                                       | 88 |
| 5.3    | Arc                                                      | chitecture of DLTC-PLL                                      | 89 |
| 5.3    | 3.1                                                      | Inductor Peaking VCO with Voltage Tuning Range Enhancement  | 90 |
| 5.3    | 3.2                                                      | Adaptive DC Shifting Unit                                   | 94 |
| 5.3    | 3.3                                                      | Charge Pump with Voltage Protection                         | 96 |
| 5.3    | 3.4                                                      | Fast Acquisition PFD                                        | 99 |
| 5.3    | 3.5                                                      | Integer-N Frequency Pre-scaler10                            | 00 |
| 5      | 3.6                                                      | Reference Clock Scheme10                                    | 02 |
| 5.4    | Ou                                                       | tput Buffer Stages10                                        | 03 |
| 5.4    | 4.1                                                      | Output Buffer 110                                           | 04 |
| 5.4    | 4.2                                                      | Output Buffer 2                                             | 04 |
| 5.5    | Ful                                                      | l Chip Layout10                                             | 05 |
| 5.6    | Sin                                                      | nulation and Testing10                                      | 07 |
| 5.0    | 6.1                                                      | Simulation Environment10                                    | 07 |
| 5.0    | 6.2                                                      | Simulation Results                                          | 08 |
| 5.0    | 6.3                                                      | Testing Measurements                                        | 10 |
| 5.7    | Sui                                                      | mmary1                                                      | 18 |
| Chapte | er 6 I                                                   | Design Methodology and Enhancement for Ultra-small Process1 | 19 |
| 6.1.   | Int                                                      | roduction1                                                  | 19 |

| 6.2.   | Design Case I: Advanced Layout Techniques for High-speed      | Analogue   |
|--------|---------------------------------------------------------------|------------|
| Circu  | ıits                                                          | 120        |
| 6.2    | 2.1. Circuits Details                                         | 121        |
| 6.2    | 2.2. Experimental Results                                     | 125        |
| 6.3.   | Design Case II: Inductive Peaking VCO with Cascode Noise Redu | iction 130 |
| 6.3    | 3.1. Inductive Peaking VCO with Cascode Noise Reduction       | 130        |
| 6.3    | 3.2. Design Example                                           | 136        |
| 6.3    | 3.3. Simulation Results                                       | 138        |
| 6.4.   | Summary                                                       | 141        |
| Chapte | er 7 Conclusions and Future                                   | 143        |
| 7.1    | Conclusions                                                   | 143        |
| 7.2    | Future Work                                                   | 146        |
| Appen  | dix. A Basic Concepts                                         | 151        |
| A.1    | General Theory of Oscillation                                 | 151        |
| A.2    | RC and LC based Oscillator                                    | 152        |
| Α.     | 2.1 RC based Oscillator                                       | 152        |
| A.     | 2.2 LC based Oscillator                                       | 154        |
| A.3    | Phase Noise and Reference Spur                                | 156        |
| Appen  | dix. B Necessary Building Blocks                              | 161        |
| B.1    | Phase (Frequency) Detector (PD/PFD)                           | 161        |
| B.2    | Charge Pump                                                   | 166        |
| B.3    | Loop Filter                                                   | 170        |
| B.4    | Frequency Division                                            | 172        |
| Appen  | dix. C Additional Results of Simulations and Measurements     | 176        |
| Appen  | dix. D Design Methodology for High-speed Analogue Circuit     | 178        |
| D.1    | Design flow of High-speed Analogue Circuit                    | 178        |
| D.2    | Design Example                                                | 180        |

| Append  | lix. E Inductor Modelling                | 187 |
|---------|------------------------------------------|-----|
| E.1.    | Design Flow of fully customized inductor | 187 |
| E.2.    | Design Example                           | 189 |
| Referei | nces                                     | 193 |

# **List of Figures**

| Fig. 1- 1 The Past and Predicted Growth of the Total Internet Traffic [1]                                                                              | 1    |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|------|
| Fig. 1- 2 Data Rate Limits for Different Data Communication Distances [5]                                                                              | 2    |
| Fig. 1- 3 Realization of Processor-memory Optical Link [10]                                                                                            | 3    |
| Fig. 1-4 (a) Conceptual View of Silicon Photonics IC integrated with Electrical IC Hybrid Integration. (b) The Principle of Segmented Optical DAC [17] | -    |
| Fig. 1- 5 Technology Process Scaling (a) Power Supply Voltage (b) Threshology of Single Transistor                                                     |      |
| Fig. 2 - 1Typical High-speed Electrical Link System                                                                                                    | .13  |
| Fig. 2 - 2 Typical Optical Transceiver System                                                                                                          | . 15 |
| Fig. 2 - 3 Topology of VCO (a) LC Tank (b) Ring Based                                                                                                  | .17  |
| Fig. 2 - 4 Common Used RO-VCO (a) Current Starved (b) Common Source                                                                                    | .18  |
| Fig. 2 - 5 Block Diagram of 3-stages Ring Oscillator with LC Tank                                                                                      | .20  |
| Fig. 2 - 6 RO-VCO with Multiple Pass Loop [59]                                                                                                         | .21  |
| Fig. 2 - 7 (a) Rail-to-rail Voltage Tuning Delay Cell (b) Bias-level-shift Circuit                                                                     | .22  |
| Fig. 2 - 8 Linear Tuning Delay Cell                                                                                                                    | .23  |
| Fig. 2 - 9 (a) Full Differential (FD) Delay Cell (b) Pseudo-differential Delay Cell                                                                    | .24  |
| Fig. 2 - 10 (a) Model of Injection in a Ring Oscillator (b) Pseudo Differential De Cell with Injection Locking                                         |      |
| Fig. 2 - 11 Diagram of Scaling Kvco Delay Cell [64]                                                                                                    | .27  |
| Fig. 2 - 12 Linear Model of PLL                                                                                                                        | .28  |
| Fig. 2 - 13 Operation of XOR Phase Detector                                                                                                            | . 29 |
| Fig. 2 - 14 Conventional Type I PLL                                                                                                                    | .29  |
| Fig. 2 - 15 Schematic Diagram of Phase Frequency Detector (PFD)                                                                                        | .31  |
| Fig. 2 - 16 Block Diagram of Conventional Type II PLL                                                                                                  | .31  |

| Fig. 2 - 17 Topology of Dual Control PLL                                                                                                          |
|---------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. 2 - 18 Conventional Dual-tuning PLL with Analogue Filter                                                                                     |
| Fig. 2 - 19 Conventional Injection Locking PLL                                                                                                    |
| Fig. 2 - 20 Several Topologies of Multiple-loop PLL                                                                                               |
| Fig. 2 - 21 Multiple Loop PLL with Cascaded Injection Locking                                                                                     |
| Fig. 2 - 22 Multiple Loop PLL with Two Identical VCOs                                                                                             |
| Fig. 2 - 23 Block Diagram of Typical DLL                                                                                                          |
| Fig. 2 - 24 DLL with Multiple Loop                                                                                                                |
| Fig. 2 - 25 Multiplying Delay-locked Loop (MDLL)                                                                                                  |
| Fig. 2 - 26 Traditional CDR Topology41                                                                                                            |
| Fig. 2 - 27 (a) Hogge PD (b) Alexander PD (Bang-Bang PD)                                                                                          |
| Fig. 2 - 28 Conventional PLL-based CDR                                                                                                            |
| Fig. 2 - 29 Full Rate Referenceless CDR                                                                                                           |
| Fig. 3 - 1 Typical Three Stage RO-VCO                                                                                                             |
| Fig. 3 - 2 Transistor Sizing to Frequency Expansion                                                                                               |
| Fig. 3 - 3 Noise Contribution Comparison (a) Transistor Sizing to Noise Contribution (b) Noise Contribution Scaling with Size Ratio               |
| Fig. 3 - 4 (a) Control Voltage to Frequency Range (b) Gain-Frequency Response with <i>Nsize</i> =3                                                |
| Fig. 3 - 5 RO-VCO with Peaking Inductor                                                                                                           |
| Fig. 3 - 6 (a) Control Voltage to Frequency Response (b) Gain-Frequency Response with <i>Nsize</i> =3                                             |
| Fig. 3 - 7 Proposed Structure of Design Example (a) Top Circuit Topology (b) Proposed VCO Structure with Inductor Peaking (c) 50Ω Output Buffer55 |
| Fig. 3 - 8 Layout View of the Inductors used with (a) VCO-3 and (b) VCO-457                                                                       |

| Fig. 3 - 9 Microscope View of the Proposed Four RO-VCO (a) VCO-1 (b) VCO-2 (c)                                      |
|---------------------------------------------------------------------------------------------------------------------|
| VCO-3 (d) VCO-457                                                                                                   |
| Fig. 3 - 10 Testing Bench for Four Design Examples                                                                  |
| Fig. 3 - 11. The Spectrum of Highest Oscillation Frequency of (a) VCO-1 (b) VCO-2 (c) VCO-3 (d) VCO-4               |
| Fig. 3 - 12 Measured Frequency Tuning Characteristics                                                               |
| Fig. 3 - 13 Measured Phase Noise Results of Each Design Examples61                                                  |
| Fig. 4 - 1 (a) Frequency Tuning Characteristics of Inductor Peaking VCO (b)  Nonlinear Behaviour of the Gain of VCO |
| Fig. 4 - 2 Block Diagram of Charge Pump PLL (CP-PLL)67                                                              |
| Fig. 4 - 3 Bode Plot of Open Loop Transfer Function                                                                 |
| Fig. 4 - 4 Frequency Output of CP-PLL                                                                               |
| Fig. 4 - 5 Period Jitter Performance of CP-PLL70                                                                    |
| Fig. 4 - 6 Influences of Nonlinear Behaviours (a) Damping Factor (b) Natural Frequency                              |
| Fig. 4 - 7 Tendency under Nonlinear Behaviours (a) Loop Bandwidth (b) Phase Margin                                  |
| Fig. 4 - 8 Block Diagram of Dual Tuning PLL (DT-PLL)72                                                              |
| Fig. 4 - 9 Applied Structure of Dual Tuning VCO                                                                     |
| Fig. 4 - 10 Frequency Tuning Characteristics of (a) Coarse Tuning Voltage (b) Fine Tuning Voltage                   |
| Fig. 4 - 11 Frequency Output of DT-PLL                                                                              |
| Fig. 4 - 12 Period Jitter Performance of DT-PLL                                                                     |
| Fig. 4 - 13 Nonlinear Behaviours of (a) Gain on $V_{coarse}$ (b) Damping Factor (c) Loop Bandwidth (d) Phase Margin |
| Fig. 4 - 14 Setting Time in DT-PLL.                                                                                 |

| Fig. 4 - 15 Trade-off between Loop Bandwidth and Settling Time                                                                                                                                                                 |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. 4 - 16 Instability Issue in DT-PLL                                                                                                                                                                                        |
| Fig. 4 - 17 Proposed Structure of Loop Filter                                                                                                                                                                                  |
| Fig. 4 - 18 Equivalent Structure of Proposed Loop Filter                                                                                                                                                                       |
| Fig. 4 - 19 Settling Time of DT-PLL with Proposed Loop Filter                                                                                                                                                                  |
| Fig. 4 - 20 Loop Bandwidth with Different Charge Pump Current                                                                                                                                                                  |
| Fig. 4 - 21 Block Diagram of Proposed Dual-loop Triple-Control PLL (DLTC-PLL)                                                                                                                                                  |
| Fig. 4 - 22 Settling Response of Proposed DLTC-PLL                                                                                                                                                                             |
| Fig. 5 - 1 Top Level Structure of DLTC-PLL                                                                                                                                                                                     |
| Fig. 5 - 2 Implemented Structure of DLTC-PLL                                                                                                                                                                                   |
| Fig. 5 - 3 Source (Sink) Current Scaling with Drain-source Voltage of MOSFET90                                                                                                                                                 |
| Fig. 5 - 4 Implemented Inductor Peaking VCO with Voltage Regulation91                                                                                                                                                          |
| Fig. 5 - 5 Implementation of Adaptive DC Shifting Unit95                                                                                                                                                                       |
| Fig. 5 - 6 Implementation of Gate Switch Charge Pump with Voltage Protection 97                                                                                                                                                |
| Fig. 5 - 7 Implementation of Fast Acquisition PFD Error! Bookmark not defined.                                                                                                                                                 |
| Fig. 5 - 8 Implementation of Integer-N Frequency Pre-scaler                                                                                                                                                                    |
| Fig. 5 - 9 Implementation of Reference Clock Scheme                                                                                                                                                                            |
| Fig. 5 - 10 Implemented Structure of Output Buffer 1                                                                                                                                                                           |
| Fig. 5 - 11 Implemented Structure of Output Buffer 2                                                                                                                                                                           |
| Fig. 5 - 12 Full Chip Layout of DLTC-PLL in 40nm Process                                                                                                                                                                       |
| Fig. 5 - 13 Simulation Test Bench for 40nm DLTC-PLL 108                                                                                                                                                                        |
| Fig. 5 - 14 Post Layout Results (a) Transient Respond of 4.4GHz Lock-up Time (b) Transient Respond of 35.2GHz Lock-up Time (c) Period Performance of 35.2GHz Output Frequency (d) Phase Noise Performance of 35.2GHz Frequency |

| Fig. 5 - 15 Performance with Different Protocols Frequency (a) Jitter (b) Phase Noise                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Fig. 5 - 16 Concept View of PCB Connection                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
| Fig. 5 - 17 PCBs Layout                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| Fig. 5 - 18 Measurement Environment with Photography of 40nm Silicon Die111                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Fig. 5 - 19 Settling Responds between 3.6GHz to 28.8GHz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| Fig. 5 - 20 Phase Noise Performance (a) High-frequency Band (b) Middle-frequency Band (c) Low-frequency Band (c) Low-frequency Band (d) Low-frequency Band (e) Low-frequency Band (e) Low-frequency Band (f) L |
| Fig. 6 - 1 Comparison in Different Process Node (a) Transition Frequency $f_T$ (b) Noise Floor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| Fig. 6 - 2 Implemented Inverter-based Oscillator Example Structure with 50Ω Impedance Output Buffer                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| Fig. 6 - 3 Single Transistor with Minimum Finger Width                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| Fig. 6 - 4 Optimal Transistor Layout with a Finger Width of 440nm123                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Fig. 6 - 5 (a) Microscope View of Three Oscillator Examples (b) Layout View of Delay Cell of Three Oscillator Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| Fig. 6 - 6 Averaged Measurement Results of Oscillation Frequency of Three Oscillator Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| Fig. 6 - 7Averaged Measurement Results of Power to Frequency Efficiency of Three Oscillator Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Fig. 6 - 8 Measured Frequency Spectrum and Phase Noise Results at 1MHz Offset Frequency of Three Oscillator Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Fig. 6 - 9 Averaged Phase Noise Performance of Three Oscillator Examples129                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| Fig. 6 - 10 Comparison of Width versus Flicker Noise in Triode and Saturation Region                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| Fig. 6 - 11 Noise Floor Scaling with VDS                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |

| Fig. 6 - 12 Delay Cell of Inductive Peaking VCO (a) Conventional (b) Proposed wit                                                                                                                                                                   |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| Cascode Noise Reduction                                                                                                                                                                                                                             |  |  |  |
| Fig. 6 - 13 Proposed Three Stages Inductive Peaking VCO with Cascode Noise Reduction                                                                                                                                                                |  |  |  |
| Fig. 6 - 14 Phase Noise Comparison between Conventional Inductive Peaking VCO                                                                                                                                                                       |  |  |  |
| and Proposed Cascode Noise Reduction VCO during All Frequency Tuning Range                                                                                                                                                                          |  |  |  |
| Fig. 6 - 15 Design Example (a) Top Structure of Differential Quadrature Inductive<br>Peaking VCO with Cascode Noise Reduction and its Output Buffer (b) Differential<br>Delay Cell of Proposed VCO with Symmetric Inductor (c) Layout View of Fully |  |  |  |
| Customized Symmetrical Inductor                                                                                                                                                                                                                     |  |  |  |
| Fig. 6 - 16 Layout View of Proposed VCO                                                                                                                                                                                                             |  |  |  |
| Fig. 6 - 17 Simulation Testbench 139                                                                                                                                                                                                                |  |  |  |
| Fig. 6 - 18 Post-layout Simulation Results (a) Frequency Tuning Range (b) Phase Noise                                                                                                                                                               |  |  |  |
| Fig. 7- 1 Illustrations of Integration (a) Wire Bonding (b) Flip-chip Bonding 146                                                                                                                                                                   |  |  |  |
| Fig. 7- 2 Potential Circuit Design based on DLTC-PLL                                                                                                                                                                                                |  |  |  |
| Fig. 7- 3 Topology of Spectral Slice Synthesizer                                                                                                                                                                                                    |  |  |  |
| Fig. 7- 4 (a) Optical Comb Generator (b) Narrowband Clock Generation System for Optical Comb                                                                                                                                                        |  |  |  |
| Fig. A - 1 Oscillatory Changing in Feedback System with Time                                                                                                                                                                                        |  |  |  |
| Fig. A - 2 Common Source Amplifier in Loop with Different Stages: (a) single stage common source amplifier (b) two-stages (c) three stages                                                                                                          |  |  |  |
| Fig. A - 3 Magnitude and Phase of the Impedance of an LC Tank                                                                                                                                                                                       |  |  |  |
| Fig. A - 4 Conversion of a Tank to Three Parallel Components: (a) Practical LC Tank Configuration. (b) Equivalent Parallel LC Tank Configuration                                                                                                    |  |  |  |
| Fig. A - 5 (a) Single Tuned Stage with LC Load. (b) Two Tuned Stages in a Feedback                                                                                                                                                                  |  |  |  |

| Fig. A - 6 Noise Interference on Zero Crossing                                                                       |
|----------------------------------------------------------------------------------------------------------------------|
| Fig. A - 7 Power Spectrum Density of Oscillator (a) Ideal Oscillator (b) Practical Oscillator                        |
| Fig. A - 8 Reference Spur on VCO's Tuning Node                                                                       |
| Fig. A - 9 Phase Noise and Reference Spur in Power Spectrum                                                          |
| Fig. A - 10 Phase Noise Comparison between VCO and PLL                                                               |
| Fig. B - 1 (a) Operation of XOR PD (b) The Characteristics of XOR PD161                                              |
| Fig. B - 2 Common Linear PFD Structure                                                                               |
| Fig. B - 3 Characteristics of PFD (a) Ideal (b) Parctical                                                            |
| Fig. B - 4 Non-ideal Behavior of Missing Clock Edge                                                                  |
| Fig. B - 5 Non-ideal Behavior of Dead Zone                                                                           |
| Fig. B - 6 Basic Block Diagram of Pre-charge PFD                                                                     |
| Fig. B - 7 Structure of Fast Acquisition Latch-based PFD                                                             |
| Fig. B - 8 Missing Clock Edge Elimination                                                                            |
| Fig. B - 9 Basic Concept of Charge Pump                                                                              |
| Fig. B - 10 Practical Implementation of Charge Pump (a) without mismatch cancellation (b) with mismatch cancellation |
| Fig. B - 11 (a) Regulated Cascade Charge Pump (b) Op-amp Regulated Charge Pump                                       |
|                                                                                                                      |
| Fig. B - 12 Gate Switching Charge Pump                                                                               |
| Fig. B - 13 (a) Passive Loop Filter (b) Active Loop Filter170                                                        |
| Fig. B - 14 Dual-path Loop Filter                                                                                    |
| Fig. B - 15 Behavior of Sample-reset Charge Pump                                                                     |
| Fig. B - 16 Alternative Register Structure for Divide-by-2 Frequency Divider (a) True                                |
| Single Phase Clocked (TSPC) (b) Razavi Divider (c) Wang's Topology (d) Current                                       |
| Mode Latch (CML)                                                                                                     |

| Fig. B - 17 Architecture of High Dividing Order Programmable Pre-scaler with 2/3 |  |
|----------------------------------------------------------------------------------|--|
| Dual-modulus Divider                                                             |  |
| Fig. B - 18 Structure Diagram of Dual Modulus Cell                               |  |
| Fig. B - 19 Divide-by-2/3 Dual Modulus by Multiplexing Logic                     |  |

# **List of Tables**

| Table 1- 1 Widely Applied Communication Protocols and Corresponding Data Rate6         |
|----------------------------------------------------------------------------------------|
| Table 2 - 1 Examples of Transceiver with Corresponding Clock in Present Research Works |
| Table 3 - 1 Key Parameters of Four Design Examples                                     |
| Table 3 - 2 Performance Summary and Comparison                                         |
| Table. 4 - 1 Parameter's Configuration Values of CP-PLL                                |
| Table. 4 - 2 Parameter's Configuration Values of DT-PLL74                              |
| Table. 4 - 3 Performance Comparison between CP-PLL, DT-PLL and Proposed DLTC-PLL       |
| Table 5 - 1 Detailed Parameters of Inductor Peaking VCO in DLTC-PLL92                  |
| Table 5 - 2 Detailed Parameters in Adaptive DC Shifting Unit96                         |
| Table 5 - 3 Look-up Table of Potential Output Frequencies                              |
| Table 5 - 4 Chip Details                                                               |
| Table 5 - 5 Parameters of Testing for High, Middle and Low Frequency Band113           |
| Table 5 - 6 Performance Comparison of DLTC-PLL with Recent Works115                    |
| Table 6 - 1 Key Parameters of Three Oscillator Examples                                |
| Table 6 - 2 Cell Area and Parasitic Parameters of Three Oscillator Examples124         |
| Table 6 - 3 Averaged FOM of Three Oscillator Design Examples                           |
| Table 6 - 4 Key Parameters of Design Example                                           |
| Table 6 - 5 Performance Summary and Comparisons                                        |
| Table C - 1 Measurement Results from 6 Different Wafers                                |

# **Academic Thesis: Declaration Of Authorship**

| <b>I</b> , ] | Fanfan Meng, declare that this thesis and the work presented in it are my own and        |  |  |  |  |
|--------------|------------------------------------------------------------------------------------------|--|--|--|--|
| ha           | has been generated by me as the result of my own original research.                      |  |  |  |  |
|              | Clock Generation for Silicon Photonics based Optical Communication Systems.              |  |  |  |  |
|              | onfirm that:                                                                             |  |  |  |  |
| 1.           | This work was done wholly or mainly while in candidature for a research degree at this   |  |  |  |  |
|              | University;                                                                              |  |  |  |  |
| 2.           | Where any part of this thesis has previously been submitted for a degree or any other    |  |  |  |  |
|              | qualification at this University or any other institution, this has been clearly stated; |  |  |  |  |
| 3.           | Where I have consulted the published work of others, this is always clearly attributed;  |  |  |  |  |
| 4.           | Where I have quoted from the work of others, the source is always given. With the        |  |  |  |  |
|              | exception of such quotations, this thesis is entirely my own work;                       |  |  |  |  |
| 5.           | I have acknowledged all main sources of help;                                            |  |  |  |  |
| 6.           | Where the thesis is based on work done by myself jointly with others, I have made        |  |  |  |  |
|              | clear exactly what was done by others and what I have contributed myself;                |  |  |  |  |
| 7.           | Parts of this work have been published as listed in Chapter 3 and Chapter 6.             |  |  |  |  |
| Sig          | gned: Fanfan Meng                                                                        |  |  |  |  |
| Da           | nte: 02/03/2018                                                                          |  |  |  |  |

# Acknowledgements

First and foremost, I would like to take this opportunity to thank all my supervisors, Professor Graham Reed, for providing me with a chance to participate in silicon photonics projects and the supports of practical experiments. Without it, I would not have built up my skills and experiences, and all the ideas will not come into solid achievements. Professor Peter Wilson, for giving me an opportunity to start my PhD journey and all the advice throughout my PhD study. Without his help, this thesis would not have been possible. Dr. Ke, Li, for enlightening my mind in this area, with his knowledge and patient guidance I am able to establish my learning curve in various aspects within each step of the project.

I would also like to thank all the colleagues in the silicon photonics group, especially, Dr. David Thomson and Dr. Cosimo Lacava, for their precious advice and support during the experiment testing. Furthermore, special thanks also go to my colleague, Shenghao Liu, for all his encouragement and help from the first day of my PhD.

Finally, I want to acknowledge my parents, who gave me the financial support and strength during my PhD studying life. I will always owe them for their support, which cannot be counted and redeemed. In addition, further thanks goes to my girlfriend for her encouragement and company in my most frustrated moment.

This thesis records the journey and accomplishment in my PhD life over the past four years. I would like to dedicate this thesis to my parents and my friends for appreciating all the company and strength they have given me.

## **Definitions and Abbreviations**

AAC Automatic amplitude control

ADC Analogue-to-digital converter

AWG Arbitrary waveform generator

BPG Bit pattern generator

CDR Clock and data recovery

CML Common mode latch

CMOS Complementary metal-oxide-semiconductor

CP Charge pump

DAC Digital-to-analogue converter

DLL Delay-locked loop

DLTC Dual-loop Triple control

DNW Deep N-well

DRAM Dynamic random-access memory

EAM Electro-absorption modulator

ECL External cavity lasers

FD Frequency detector

FOM Figure of Merits

FTR Frequency tuning range

HKMG High-k/Metal Gate

LC Inductor-capacitor tank

LF Loop filter

LVT Low threshold voltage

MOSFET Metal-oxide-semiconductor Field-effect transistor

MZM Mach-Zehnder modulator

N-PAM N-level pulse amplitude modulation

OTA Operational transconductance amplifier

PFD Phase frequency detector

PLL Phase lock loop

PN Phase noise

PSD Power spectral density

PSRR power to noise rejection ratio

PVT Process Voltage Temperature

QAM Quadrature amplitude modulation

RC Resistor-capacitor

RO Ring-based oscillator

SDH Synchronous digital hierarchy

Ser-Des Serializer-deserializer

SNR Signal-to-noise ratio

SiP Silicon Photonics

SoCs Systems-on-chip

SOI Silicon on insulator

SONET Synchronous optical networking

SSB Single-side band

TIA Transimpedance amplifier

TSPC True-single Phase Clock

VCDL Voltage controlled delay line

VCO Voltage control oscillator

VDD Power supply voltage

VSCEL Vertical-cavity surface-emitting laser

WDM Multiple wavelengths modulation

## **Publications**

During the research work of this PhD project, a number of papers have been published which are listed as following:

- F. Meng, K. Li, D. J. Thomson, P. Wilson, and G. T. Reed, "Advanced Layout Techniques for High-speed Analogue Circuits in a 28nm HKMG CMOS Process", In: *Electronics Letter*, 2018. DOI:10.1049/EL.2017.4453
- F. Meng, K. Li, D. J. Thomson, L. Cosimo, P. Wilson, and G. T. Reed, "Ultra-wideband Inductive Peaking VCO with Cascode Noise Reduction", In: 2018 IEEE International Symposium on Circuits and Systems, ISCAS, 2018.
- K. Li, F. Meng, D. J. Thomson, P. Wilson, and G. T. Reed, "Analysis and Implementation of an Ultra-Wide Tuning Range CMOS Ring-VCO with Inductor Peaking", *IEEE Microwave and Wireless Components Letters*, vol. 27, no. 1, pp. 49-51, Jan 2017.
- X. K. Ruan, K. Li, D. J. Thomson, L. Cosimo, F. Meng, I. Demirtzioglou, P. Petropoulos, Y. X. Zhu, G. T. Reed, and F. Zhang, "Experimental Comparison of direct Detection Nyquist SSB Transmission based on Silicon Dual-drive and IQ Mach-Zehnder Modulators with Electrical Packaging", Opt. Express 25(16), pp. 19332-19342 (2017).

# **Chapter 1 Introduction**

## 1.1 Background of Silicon Photonics

Broadband access demands have been rapidly growing to almost all the corners of the world and form an indispensable part in the daily life of billions of people. The development of high throughput devices is driven by the increase amount of end users and the emerging data intensive applications, such as network data centres, high-performance computing and 4K/8K high-definition television (HDTV), for which the requirements posing significant challenges to the communications network bandwidth.



Fig. 1-1 The Past and Predicted Growth of the Total Internet Traffic [1]

According to the statistics and predictions of Cisco in 2017 [1], fig. 1-1 illustrates the explosive growth of the global internet, in which monthly data traffic is expected to exceed 278 Exabytes in 2021, 53% growth to that in 2016.

To keep pace with Moore's Law, microprocessor architectures are transitioning from multi-core to many-core structures. According to the estimation in [2, 3], the I/O interfaces among future intra-chip architectures will require bandwidth in the range of 200Gb/s to 1Tb/s and begin the tera-scale computing era. However, under current interconnect scenarios, the traditional metallic approach for supporting data transfer has gradually reached the physical limits in both bandwidth and transmission distance. In addition, large power consumption [2] and increased system complexity [4] make the electrical-based communication network no longer able to provide a reliable solution for the increased demands on high bandwidth applications. By contrast, optical access is superior to metallic-based techniques in terms of both capacity and transmission distance. As a result, the optical-based communication networks have become a preferable candidate to break through the barriers of high-speed data transmission.



Fig. 1- 2 Data Rate Limits for Different Data Communication Distances [5]

The first demonstration of using optical fibres as a medium for light propagation dates back to the mid-1950s [6]. With the improvement of optical fibres, optical communication has been widely deployed and used in enterprise distances (over 1km) and long-haul applications. Corresponding protocols are standardized, such as synchronous optical networking (SONET) and synchronous digital hierarchy (SDH).

Although optical-based data transfer has been successfully applied to long-haul communication, the usage in short reach applications such as rack-to-rack or board-to-board communications is still limited due to the high cost of optical technology. Therefore, a new technology called silicon photonics has emerged to fill the gap. As shown in fig. 1-2, Intel's vision for the deployment of silicon photonics technology is typical, and mainly aims for areas with distance less than 100m short reach communications, especially for future intra-chip communication [5], although, increasingly, longer reach applications are emerging.

Silicon photonics is an emerging field for optical communications and optical interconnects in microelectronics and the key driving force is the usage of mature complementary metal-oxide-semiconductor (CMOS) technology to achieve high-volume production at low cost [7, 8]. Incorporated with highly sophisticated CMOS processing techniques, silicon photonics provides an electronic-photonic platform that constructs optical waveguides onto silicon surfaces to confine and direct light as the transmission carrier. Besides the advantages of highly integrated and high-volume manufacturing features of silicon semiconductors, the technology also guarantees high energy efficiency and less packaging cost, which provides a possible inexpensive solution for high performance applications [9].



Fig. 1- 3 Realization of Processor-memory Optical Link [10]

For example, an attractive system applications for using silicon photonic interconnects is for processor-memory interfaces, especially for dynamic random-access memory (DRAM) [11]. There are two of concerns for next generation DRAM. One is the higher

bandwidth of each pin throughput (6.4Gb/s data rate required based on DDR5 standard [12]) while the other is memory capacity. The implementation of silicon photonics interconnects can significantly increase the data rate at each pin without degrading the total memory capacity. An excellent feasibility of future processor-memory with optical link is demonstrated in [10], and the topology is shown in fig.1-3. The processor, memory, interface circuits and photonic devices are implemented together onto a single CMOS chip in 40nm SOI technology process. The read and write functionality is realized and verified by the processor-memory optical link with the data rate of 2.5Gb/s between processor and memory. As claimed by the author, the total aggregate bandwidth can be increased by more than ten times with single wavelength operation [10].

## 1.2 Project Motivations

### 1.2.1 Technology Trend of Electro-Optics Interconnect

To effectively merge silicon photonics technology with existing electronic application, the primary research is focusing on the development of electro-optics interfacing circuits. A particularly important area is the silicon photonics transceiver systems. In the past few decades, different type of modulators, such as electro-absorption modulators (EAM), Mach-Zehnder modulators (MZM) and Ring resonator modulators, and corresponding high-speed drivers have been integrated in monolithic or by hybrid integration [13-16]. Similarly the photo-detector and transimpedance amplifier (TIA) can be integrated on the receive side.



Fig. 1-4 (a) Conceptual View of Silicon Photonics IC integrated with Electrical IC by Hybrid Integration. (b) The Principle of Segmented Optical DAC [17]

Fig. 1-4(a) illustrates the hybrid integration of silicon photonics ICs with an electronic ICs. As shown in recent works, modern silicon MZM modulators have the capability of handling data rates of more than 50Gb/s [18, 19]. Moreover, the data rate of an entire single channel integrated silicon photonics transceiver could range from 10Gb/s to 56Gb/s [13, 16, 20, 21]. In addition to this, high-order modulation techniques, such as N-level pulse amplitude modulation (PAM-N) and quadrature amplitude modulation (OAM), have been widely applied to optical links for further exploitation of channel capacity [22]. The practical implementation of these higher order modulation techniques require the adaption of high-speed data converters, such as analogue-to-digital and digital-to-analogue converter (ADC/DAC). Furthermore, the optical data converter has been shown as another promising candidate for silicon photonics applications. Fig.1-4(b) depicts the principle of an optical DAC in a segmented transmitter [17]. A segmented MZM modulator (N-bits) associated with the corresponding drivers are paired together to convert the binary electrical data signal (M-bits) into multi-level optical signals and then transmitted through the waveguide. In recent research, a 3-bit segmented MZM optical DAC is presented in [23] which demonstrated a 10Gb/s data rate with PAM8 modulation.

Although many application areas, such as telecommunication networks, high performance computing systems and storage interfaces, have all shown promising development of using silicon photonics technology, the current electrical IC design in these electro-optics circuit are mostly focusing on the driver and TIA. However, to guarantee the functionality and performance, a critical module that none of the communication systems can overlook is the clock generation system. In the recently reported optical transceivers and optical DACs, the fact is that the solution of clock generation is either provided by an external frequency source such as arbitrary waveform generator (AWG) [17, 24, 25] and bit pattern generator (BPG) [23], or inherited from the traditional electronic interconnects [26, 27], in which the performance is limited by the frequency tuning range, system settling time and the number of clock phases. These current situations and limiting factors motivate us to investigate the possibilities of developing a novel on-chip clock generation system for comprehensive silicon photonic based optical communication, which is then set as the major target in this PhD project.

### 1.2.2 Targeting Demands for Clock Generation System

The aim in this research project is focused on designing a versatile clock generation system for a silicon photonics based optical communication system. The demands for this clock generation system include the following:

First of all, as the optical channel is able to carry much higher bandwidth compared with traditional electrical channels, in order to maximize the advantages in optical communications, the clock served in silicon photonics transceiver systems is desired to provide highest possible bandwidth. In addition, with the market of consumer electronics continuing to grow, it requires increasing flexibility and compatibility between different electronic equipment. Hence, multimedia Systems-on-chip (SoCs) standards have drawn much attention to current multiple wireline transceiver systems. Table. 1-1 shows several widely-applied communication protocols and their corresponding data rates.

Table 1-1 Widely Applied Communication Protocols and Corresponding Data Rate

| Protocol          | Network Application | Link Rate (Gbps)                |
|-------------------|---------------------|---------------------------------|
| Serial ATA (SATA) | Storage             | 16.0, 6.0, 3.0, 1.5             |
| PCI Express       | Peripheral          | 15.7, 7.87, 4.0, 2.0            |
| HDMI/TMDS         | Peripheral          | 18.0 (in HDMI 2.0), 10.2        |
| HDMI 2.1          | Peripheral          | 48.0                            |
| USB               | Peripheral          | 10.0, 5.0                       |
| XFP/XFI           | Ethernet, Storage   | 10.5188, 10.3125                |
| HyperTransport    | Memory, Computing   | 51.2, 41.6, 22.4, 12.8 (32bits) |
| SuperMHL          | Peripheral          | 36 (×6 A/V lanes)               |

However, to integrate multiple protocols with various data rates onto the same die with video/audio signal processing/decoding and microprocessors, intensive port density and increased overall cost is an inevitable consequence. Fortunately, this deficiency in the electrical approach can be made up through optical system deployment. As for optical interfaces, constraints on port density can be effectively resolved via the large bandwidth of optical fibre. Data in several channels can be carried by Wavelengths Division Multiplexing (WDM) through a single optical fibre. However, a significant premise for all specifications that are listed in table 1-1 is to have a versatile clock generation system that can cover a broadband tuning range up to tens of gigahertz.

On the other hand, since the data rate in the communication network is increased by the use of optical interconnects, the system latency issue gradually manifests itself as another bottleneck in a silicon photonics computing system and limits the overall performance. The latency issue potentially comes from many different factors, such as network architecture (e.g. distance or topology), protocols (e.g. data frame length) and device latency, to name just a few. Specifically speaking, regarding an optical transceiver system, one source of device latency is the settling time of the clock generation circuit.

In addition to this, in high-speed data conversion, the time interleaved architecture is commonly used to increase the overall sampling speed of the system, and the architecture can also benefit from the silicon photonics communication system. To avoid time skew mismatches, the staggered phase clock should be triggered by the same source. Therefore, the premise of achieving this functionality is to generate multiple phase clocks from a single clock generation system.

In summary, the clock generation system for usage in silicon photonics based optical communication system should include following features:

- Over 28GHz high-speed clock signal for maximizing the data rate in optical transmission.
- Wide tuning range for compatibility with multiple communication protocols.
- Nanoseconds level settling time for low latency start-up.
- Quadrature phase clock signals.

## 1.3 Design Challenges

Although considerable advantages have been obtained by CMOS process scaling, significant challenges still exist in designing high-speed CMOS circuits [4, 28]. Several design challenges in this project are highlighted below:

• Insufficient gain of CMOS transistor in advanced deep sub-micron technology can be a major issue in effective circuit design [4, 26]. With process scaling, the dominant factor increased is the transition frequency, which can effectively benefit the digital circuits, however, the transistor's gain-bandwidth product is reduced, generation to generation, due to parasitic wire capacitance and

resistance [4], which poses serious challenges for analogue designers on expanding system bandwidth. Although passive/active inductor techniques have been applied to alleviate this issue, the impact is still limited by either overwhelming area cost [29] or a low Q factor [30, 31].



Fig. 1-5 Technology Process Scaling (a) Power Supply Voltage (b) Threshold Voltage of Single Transistor

- Another advantage of process scaling that becomes a challenge for high-speed CMOS circuit design is the reduction of power supply voltage. As it can see from fig.1-5(a), for low power purposes, the supply voltage is decreased with process scaling. With lowering the power supply voltage, the voltage headroom assigned to each transistor is consequently reduced. On the other hand, the threshold voltage of the transistor is increased with process scaling down which is illustrated in fig.1-5(b) (the results are based on simulation in TSMC 65nm, 40nm, 28nm technology nodes). With the same drain current, the threshold voltage of a single transistor is compared between different processes. Consequently, for high-speed purposes, larger gate voltage is required in current mode circuits in order to effectively bias the transistors into the saturation region. With the interaction of decreased power supply and increased threshold voltage, the valid operation range of transistor is greatly reduced. Furthermore, a cascoded structure is commonly used in analogue circuit design to provide large output resistance. However, with a nominal power supply voltage, the more stacked transistors, the less voltage headroom that is available.
- Device mismatch and variation is also an important issue that needs to be considered during circuit design. According to the investigation in [28], the

standard deviation of drain current around different process corners is approximately 7.5%~9%, which is a considerable figure for high performance analogue design, and the current mismatch of the same transistors in a proportional size expansion ( $\times$ 1,  $\times$ 3,  $\times$ 12) varies significantly. It shows that large size transistors have less current deviation. However, the consequence is a reduction of transistor speed and increased chip area. Therefore, during the design stage, designing enough tolerance to process variation is always a trade-off with operation speed, chip area and power budget.

• In addition to those issues mentioned above, with the scaling down of transistor dimensions, the intrinsic noise floor of transistor gets progressively worse in more advanced technology process nodes. Furthermore, from the perspective of circuit design, the jitter on the transmitted data is primarily determined by the noise performance of the phase locked loop (PLL) which is the core circuitry in the clock generation system. Therefore, designing a robust low noise PLL is essential. However, with a GHz output frequency, the suppression of jitter becomes more of a challenge. The noise mapping of the PLL is a trade-off with the loop bandwidth. For instance, with larger loop bandwidth, the inband phase noise can be decreased, but more low frequency noise passes from input to output. With a wide band specification, the jitter becomes even worse. For those reasons, a novel architecture is required to achieve the requirements of the clock generation system for silicon photonics applications.

### 1.4 Thesis Outline

The structure of the thesis is described as follows. Initially, the current trend of demands for data intensive applications is briefly introduced. The fundamental background knowledge about silicon photonics and its promising applications are reviewed. This leads to the conclusion that a clock generation system is an essential part of future integrated silicon photonics circuits, and therefore, this target is the core purpose of the work reported in this thesis.

In chapter 2, a comprehensive literature review about modern high-speed electrical and optical transceiver architecture is presented, along with the state of art performance published in recent years. Moreover, the fundamental principles of the

core modules of this work, including the voltage control oscillator (VCO) and the phase locked loop (PLL) for generating clock signals for these transceivers are detailed reviewed from the perspective of performance and topologies.

In chapter 3, a new type of ring-based VCO is presented to accomplish the ultra-wide tuning range and high bandwidth demands of a clock generation system. The idea of this structure innovatively combines the advantages of two different types of VCO (inductor-capacitor VCO and ring-based VCO) by incorporating them with the inductive peaking technique. Theoretical analyses for both conventional and proposed VCOs are presented. Four design examples based on different process technology nodes are fabricated which demonstrated the functionality of the idea.

In chapter 4, the utilization of the proposed VCO structure is embedded into a PLL system to form a functional clock generation system. Considering the stability and controllability of the whole system, the novel VCO is incorporated with different types of currently used PLL structure for performance investigation and comparison. However, many issues, such as stability, jitter and settling time, are identified. Therefore, a novel PLL topology is proposed which can effectively resolve the problems according to simulation analysis. Ultimately, a comparison of different types of PLL with the proposed structures is made.

In chapter 5 focusses on the further practical implementation of dual-loop triple-controlled PLL (DLTC-PLL). Each building blocks within the PLL has been described in detail. For a practical design, the structure of several blocks (phase/frequency detector and charge pump) is modified to improve the reliability of the PLL. Moreover, by implementing an integer-N frequency pre-scaler and clock scheme, the overall frequency range is over 25GHz. The testing measurement and post-layout simulation are presented at the end of this chapter.

In chapter 6, two specific design cases are presented, in order to explore a design methodology for high-speed analogue applications in sub-micron CMOS technology processes. One of the design cases is to investigate and optimize the transistor finger width to the performance of high-speed analogue circuits which hopes to provide an elegant solution to simplify the decision of finger width during the practical design. In second case the optimized finger width is applied to an inductor peaking VCO with noise reduction for a more sophisticated investigation.

Finally, the thesis is concluded in the chapter 7, and subsequently several ideas and research works of clock generation system for silicon photonics based applications are discussed in the Future Work section.

## 1.5 Summary

In this chapter, a brief background knowledge about the development of silicon photonics has been introduced in associated with the demands for data intensive application and current situation of silicon photonics based application. More importantly, a critical role that existed to achieve the well functionality and performance of future silicon photonics components is the clock generation system. According to the current trend and requirement of high-speed application, the clock generation system should be equipped with various capability, including high oscillation frequency, wide tuning bandwidth, fast locking and multiple phase signalling, to maximize the advantages of silicon photonics based system.

# **Chapter 2 Literature Review**

## 2.1 Introduction

This chapter contains the literature review of modern broadband clock generation blocks. The review starts by introducing the basic principle of a transceiver system including both electrical and optical link elements. The state of the art performance of current transceivers is summarized, accompanied with widely applied modulation techniques. The clock generation blocks are reviewed starting with the voltage controlled oscillator (VCO) and continuing to phase locked loops (PLL). In the case of the VCO, two types of conventional VCOs are described and several novel structures are discussed in terms of the aspects of achievable frequency, bandwidth and noise performance. In addition, the fundamental principles of two different types PLL are studied with their corresponding features. Alternative PLL topologies that have been widely implemented in high-speed applications are reviewed, focussing on their advantages and drawbacks. Finally, this chapter ends with a comparison of the speed performance among the range of current electrical and optical transceivers.

## 2.2 Traditional Broadband Serializer-Deserializers

#### 2.2.1 Electrical Transceiver Links

Over the past few decades, numerous data intensive applications have been widely developed that require high speed data links. High-speed electrical links are the most

traditional approach and commonly used in short distance point-to-point communication such as internet routers, network storage and processor-memory interfaces. To achieve high data rates, these systems apply specialized I/O circuitry with carefully designed impedance channels.

Fig. 2-1 shows the topology of a typical high-speed electrical link system.



Fig. 2 - 1Typical High-speed Electrical Link System

The electrical transceiver link contains three main parts, which are the transmitter, receiver and the channel itself. Due to the limited number of high-speed I/O interfaces on many chip packages, the parallel input data channels are serialized by a high-bandwidth transmitter for transmission through an electrical channel. At the receiver, the incoming data is sampled and de-serialized into binary value for processing by other devices. In a conventional electrical transceiver link, the electrical channel is commonly applied with a differential low-swing format to have better common mode noise rejection during data communication.

To ensure accuracy in transmitting the data and receiving at the far end of the channel, a high-frequency clock signal is generated by a frequency synthesis phase locked loop (PLL) at the transmitter for synchronization and the incoming data stream is recovered from a clock and data recovery (CDR) in the receiver. To ensure sufficient timing margin at high data rates, high-precision low noise clocks are necessary.

From the perspective of cost, the electrical signaling over copper is still the preferred option for data intensive applications, such as backplane communication. According to the Ethernet Standard Task Force, the next generation IEEE's 400-Gbps (802.3bs) [32] aims to expand the bandwidth of existing 100Gbps standard [33] to be up to 4

times faster. To accommodate this growing demand, 16 electrical links of modern commercially available 25Gbps interfaces working together on the backplane are required which significantly increases the system complexity [4] and causes increased power consumption. To reduce the complexity and power consumption, many efforts have been made to expand the data rate of communication interfaces to reduce the number of required lanes. In [4, 34], a fully integrated 40Gb/s transceiver link is presented. Moreover, with the help of the development of modern semiconductor technology processes, a 60Gb/s (and potentially beyond) transmitter is proposed as well [35-37]. However, there are still significant challenges in designing 50Gb/s+ transceivers, especially at the receiver, with not only the bandwidth of receiver needed to accommodate a 50Gb/s data rate, but also the clock alignment requirements on both the transmitter and receiver becoming stringent and requiring dynamic adjustment. In addition, to address the issue of data rate scaling, more advanced modulation schemes are used to improve the aggregate data rate in a given bandwidth of the transceiver. The most common approach used is N-level pulse amplitude modulation (N-PAM). A dual-mode 10-PAM serial link transceiver was presented in [38] to achieve 10Gb/s data transmission. Similarly, a 56GB/s PAM4 SerDes transceiver is proposed in [39]. Other ultra-high speed transceivers with PAM modulation schemes can be found in [36, 40]. However, an unavoidable issue for using pulse amplitude modulation is that both ADC and DAC are required for PAM transceiver systems with more than 4 levels which means that a multi-GHz ADC and DAC pose another challenge to the system complexity. Furthermore, the signal-to-noise ratio (SNR) is degraded with N (the number of levels in the modulation scheme) [38] which make the PAM technique becomes less advantageous in more advanced technology process nodes as the supply voltage is decreased with the process node size. Moreover, the losses inherent in the electrical channel are significant in broadband transceiver systems which therefore additional pre-driver and equalizer is required to compensate for the channel loss at high frequencies [34].

# 2.2.2 Optical Transceiver Link

The primary goal of optical communication systems has traditionally targeted the large volume of data transmission, over long distances, the so called "long haul" communications system. However, since the transmission in short reach is approaching the limit of the electrical channel, the transmission based on optical

channels has gradually moved toward short reach applications. Moreover, the losses in an optical channel over short distances only varies in fractions of dBs, which gives the optical link approach the potential for Terahertz transmission without the requirement of channel equalization. However, to achieve this potential performance advantage, an optical link requires additional circuitry to interface the electrical elements with the optical source. Fig. 2-2 depicts a typical optical transceiver system.



Fig. 2 - 2 Typical Optical Transceiver System

A modulator is used to convert the electrical data into optical form and encode the light with data information and then transmitted to the far end of the channel via an optical fibre. At the receiver, the light is detected by a photodetector which converts the optical data back to an electrical signal. To provide enough modulation depth, a driver is applied to deliver large current to modulator. A transimpedance amplifier (TIA) is used to amplify the output of the photodetector with low noise and sufficient bandwidth. As the output swing of a TIA may not be large enough to provide logic levels, several stages of limiting amplifier follow the TIA. With issues such as non-ideality and jitter in practical implementation, the high-speed serialized data is required to be synchronized prior to the modulator driver. Therefore, a re-time circuit clocked by a PLL is used. However, since the jitter of the transmitted data is

### Literature Review

determined primarily by that of PLL, this makes the task of designing a robust, low noise PLL necessary and this is a similar situation to the CDR in the receiver. The transmitter and receiver are usually designed within same substrate for full duplex communication. However, the oscillators in the transmit PLL and receive CDR operate at slight different frequencies due to mismatch and phase shifting, which may cause frequency pulling and generating substantial jitter. Therefore, designing a low noise clock circuit with substrate noise rejection becomes essential.

In terms of the performance of a modern optical transceiver system, many efforts have been made to improve the bandwidth. However, there are two main aspects that limit the speed of optical link, a key one of which is the bandwidth of the electro-optical modulator. Recently, optical transceivers have been developed with different types of modulator. Fully integrated 20Gbps and 10Gbps 4 lane optoelectronic transceivers with Mach-Zehnder Modulator (MZM) were presented in [41] and [42] respectively. In [21, 43], a 25Gbps silicon photonic transceiver using a photonic micro-ring resonator modulator is reported. Moreover, a transceiver with vertical-cavity surface-emitting laser (VSCEL) has also reached 25Gbps [27]. In addition, a single channel 40Gbps optical transceiver is presented in [20]. However, the stacked-FET circuit topology used in this design is only feasible in an SOI CMOS process. Although an equalizer is still applied in some optical transceiver designs to enlarge the bandwidth of optical devices, it doesn't require circuits as complex and large as that in a typical purely electrical link [27].

Another concern related to the performance of the optical transceiver is the level of integration. In order to reduce the parasitic capacitance and minimize the impedance discontinuity of bonding wire, flip chip bonding has become an attractive option for integrating optical devices with electrical circuits, an which approach can be found in recent papers [25, 27, 43, 44]. However, in some cases, flip chip bonding still presents a great challenge in the bonding procedure. For example, in [43], the electrical chip needs to be flipped onto different materials which requires different sizes of micro bump. The ultimate solution for integration is to fabricate all the photonic devices with electronics on the same substrate [45]. However, this process is highly depended on the development of the fabrication foundry, and potentially very expensive, as the substrate must support excellent CMOS performance, but the optical devices will take significant space on the chip, significantly increasing cost. Currently, the monolithic

optical transceiver system is compromised in either the electronic or photonic device performance [43]. In addition to this, a segmented MZM transmitter monolithically integrated with CMOS driver is presented in [24], where the transceiver reached 12.5Gbps. By cutting the modulator into several small pieces instead of a single long element, the lumped capacitance for driver segment is reduced. Moreover, power dissipation is greatly decreased as the termination resistor is no longer required. However, excellent phase control is necessary between the driver segments

# 2.3 Voltage Controlled Oscillator (VCO)

# 2.3.1 Conventional VCO Structure

The voltage controlled oscillator (VCO) has a primary role in clock generation circuits. It is not only the core building block of the phase locked loop (PLL), but is also the source of the oscillation. VCO can be categorized into two fundamental types based on their characteristics and functionality, the inductor-capacitor tank VCO (LC-VCO) and the ring-based VCO (RO-VCO). Fig. 2-3 shows the topology of the LC and RO-VCO.



Fig. 2 - 3 Topology of VCO (a) LC Tank (b) Ring Based

The LC-VCO usually consists of a cross coupled latch with an embedded LC tank which generates a resonant frequency. By varying the capacitance of the tank, the resonant peak is moved to achieve frequency tuning. At the resonant frequency, the combination of capacitor and inductor provides a large amplitude which enables the LC-VCO to acquire better phase noise performance at the oscillation frequency. However, because of this feature, the resonant frequency can only be generated within

a small range due to the limitations of implementing the physical capacitor and inductor. Therefore, one significant drawback of the LC-VCO is the relative small tuning range. In previous research [28], due to the narrow tuning range, in order to make a PLL with the LC-VCO working within the required frequency band, additional calibration loop has to be added to improve the immunity to the variation of PVT. Moreover, since the oscillation frequency of the LC-VCO is determined by the value of inductor and capacitor rather than circuit topology, associated with the advantage of low noise feature, the LC-VCO is found be more attractive in high-speed applications such as wireless transceiver systems [46, 47] and high speed electrical/optical link [4, 20, 21, 34, 37].

The RO-VCO consists of a number of delay stages connected end-to-end to form a feedback loop. Differing from the LC-VCO, the oscillation frequency of RO-VCO is determined by the effective resistance and capacitance of each delay cell. The capacitance is the combination of gate capacitance (input), load capacitance (output) and parasitic capacitance at each delay stage, which is highly depended on the technology process node. The most approaches to varying the oscillation frequency in the RO-VCO is via tuning the effective resistance. Fig. 2-4 shows two commonly used structures of RO-VCO.



Fig. 2 - 4 Common Used RO-VCO (a) Current Starved (b) Common Source

Fig. 2-4 (a) shows a current starved VCO. By controlling the bias voltage for the current source, the source and sink current at each delay stage is varied which at the same time varies the voltage applied on embedded inverter, thereby the equivalent resistance of inverter is changed in order to achieve frequency tuning. Another classic structure of an RO-VCO is the common source VCO that shown in fig. 2-4 (b). Each

delay cell is a common source amplifier with a load transistor. The control voltage directly biases the effective resistance of the load transistor to tune the oscillator frequency.

In a modern CMOS process, the dynamic range of a MOSFET can be varied over a wide range. Therefore, one significant advantage of the RO-VCO is large frequency tuning range. Moreover, because it does not require an on-chip inductor, the RO-VCO occupies small chip area which make it favorable to clock synthesizing in digital circuit design [48]. However, from the other perspective, large tuning range makes the RO-VCO vulnerable to even slight fluctuations in operation. Due to this low phase noise performance, in recent works, PLLs with an RO-VCO are typically limited to several GHz frequency range [49-54] and are rarely used in high-speed applications. Therefore, a careful design is required to apply an RO-VCO to high speed communications.

In order to maintain oscillation, both the LC-VCO and the RO-VCO need to satisfy Barkhausen's Criteria which is the principle of oscillation as described in appendix A1. It can see that LC-based and RO-based circuits have different strengths and weaknesses. Generally, they are regarded as opposite solutions and applied in different situations.

# 2.3.2 Broadband VCO

Although LC based VCOs can only operate in a narrow frequency range, many efforts have attempted to extend the tuning range. In [55], a switched capacitor array has been applied to the LC-VCO for tuning range expansion without degrading the noise performance. The Q factor of the inductor is maintained while the low Q factor of the switched capacitance decreases the achievable VCO frequency. Meanwhile, a switched inductor technique [56] was also explored to extend the frequency range in LC-based VCO. However, the switched resistance degrades the inductor's Q factor and leads to reduced noise performance. In additional, an N-VCO structure incorporated into a PLL is presented in [57]. By overlapping the tuning range of an individual LC-VCO, the overall tuning range is extended from 7.3GHz to 16.6GHz. However, it involves a complex multiplexer system to switch between the different tuning ranges, and comes at the cost of significantly increased power consumption



Fig. 2 - 5 Block Diagram of 3-stages Ring Oscillator with LC Tank

In contrast, the Ring-based VCO has an intrinsic advantage on tuning bandwidth as discussed above. However, the achievable frequency is limited in practice due to the effective resistance, and input/output capacitance at each delay cell. To allow the ring-based VCO to be used in high-speed applications, additional techniques have been applied to extend its bandwidth even further. In [58], a method of combining an LC tank within a 3 stage ring-based VCO is proposed. Fig. 2-5 illustrates the topology. The R and C represents the effective resistance and total capacitance respectively while Gm is the transconductance of each delay cell. The LC tank is only inserted at the first delay stage and controlled by a switch resistor. When the LC tank switched out of circuit, the structure acts like a standard RO-VCO, which the achievable frequency is limited by the product of R and C. Therefore, the transfer function and phase response can be given as (2.1) and (2.2) respectively.

$$H(jw_{OFF}) = \left(\frac{-G_m R}{1 + jw_{OFF}RC}\right)^3 \tag{2.1}$$

$$3 \tan^{-1}(w_{OFF}RC) = \pi \tag{2.2}$$

Therefore the oscillation frequency at the OFF state is given as (2.3)

$$W_{OFF} = \frac{\sqrt{3}}{RC} \tag{2.3}$$

When the LC tank is switched in, the transfer function in (2.1) can be rewritten as (2.4)

$$H(jw_{ON}) = \frac{-G_m R}{1 + jR\left(w_{ON}C + w_{ON}C_p - \frac{1}{w_{ON}L_p}\right)} \left(\frac{-G_m R}{1 + jw_{ON}RC}\right)^2$$
(2.4)

And the phase response in the ON state is given as (2.5)

$$tan^{-1}\left[R\left(w_{ON}C + w_{ON}C_p - \frac{1}{w_{ON}L_p}\right)\right] + 2tan^{-1}(w_{ON}RC) = \pi$$
 (2.5)

It can be seen that the oscillation frequency at the ON state is no longer limited by the combination of R and C. The resonant frequency generated by the LC tank is determined by the value of Cp and Lp, which can be illustrated by (2.6)

$$W_{tank} = \frac{1}{\sqrt{C_p L_p}} \tag{2.6}$$

By setting a suitable value of Cp and Lp, the  $w_{tank}$  can be set to a higher value than for  $w_{OFF}$ . In addition, the second term that shown in (2.5) is always less than  $\pi$ . Therefore, once the  $w_{tank}$  is set higher than  $w_{ON}$  and  $w_{OFF}$  ( $w_{tank} > w_{ON} > w_{OFF}$ ), the oscillation frequency can be raised by the LC tank. However, one potential issue is that it requires a digital control signal which makes testing more challenging since the non-ideal switching characteristics may induce more digital noise.

Another approach to extend the achievable frequency in the RO-VCO is through a multiple-pass loop architecture which is illustrated in fig. 2-6.



Fig. 2 - 6 RO-VCO with Multiple Pass Loop [59]

In [59, 60], a Park-Kim delay cell is selected with a multiple-pass loop architecture. In additional to the primary loop, a secondary loop is added to decrease the delay time in each cell to maximize the oscillation frequency. By feeding the input of preceding stages to the secondary input at each delay stage, the secondary path is switching faster

than that of primary path. Therefore, this structure can considerably increase the speed of the circuit. A modified delay cell based on this architecture is also reported in [61]. An active inductor is inserted before the secondary input to further increase the frequency. An oscillation frequency of up to 10GHz has been obtained based on the simulation results.

In addition to the frequency tuning range, the tuning linearity is another aspect of the RO-VCO which should not be ignored. Benefitting from the wide dynamic range of modern CMOS processes, the RO-VCO is capable of a large tuning range. However, due to the non-linear characteristics, the tuning linearity of the RO-VCO is typically poor which results in potential system instability and complicates the design in the PLL. Several works have been reported to improve the tuning linearity. In [62] for example, a rail-to-rail voltage tuning delay cell is proposed, the structure of which is shown in fig. 2-7 (a).



Fig. 2 - 7 (a) Rail-to-rail Voltage Tuning Delay Cell (b) Bias-level-shift Circuit

A PMOS (*Mpn*) transistor stage is inserted and paralleled with the voltage control load. The *Mpn* is biased by a level shifter that is shown in fig. 2-7 (b). The level shifter is working as two source followers in series. *Vbiasn* is two threshold-voltages lower than *Vbias*. Therefore, when *Vbias* becomes too high and makes the gate-source voltage of the load less than one threshold voltage, the PMOS load is switched off while the *Mpn* is still ON because of biasing by *Vbiasn* and continues to act like a load to the delay cell. Thereby, the overall voltage tuning range is extended.

### Literature Review

Another linear tuning delay cell has been reported in [54, 63-65], the structure of which is shown in fig. 2-8. The delay cell consists of an NMOS transconductance pair, loaded with a PMOS cross-coupled pair and a diode-connected load. The transconductance of the NMOS pair is regulated by varying the control voltage to tune the output oscillation frequency. The DC operating point can be expressed as (2.7) (when two delay cells are connected to form a ring oscillator).

$$V_{CT} = V_{gsp} + V_{gsn} = \sqrt{2k_p \frac{W_p}{L_p}I} + V_{thp} + \sqrt{2k_n \frac{W_n}{L_n}I} + V_{thn}$$
 (2.7)

Where  $k_p = \mu_p C_{ox}$  and  $k_n = \mu_n C_{ox}$ . As it indicates, the saturation current *I* is proportionally changed with  $V_{CT} - V_{thn} - V_{thp}$ . Thereby, (2.8) can be determined.

$$f_{osc} = \frac{g_{mn}}{2\pi C} \propto \sqrt{I} \propto V_{CT} - V_{thn} - V_{thp}$$
 (2.8)



Fig. 2 - 8 Linear Tuning Delay Cell

The oscillation frequency can be tuned linearly by control voltage  $V_{CT}$ . Moreover, an improved fine control path was modified on the above delay cell for further accurately controlling the frequency, which is presented in [63]. In addition to this, a voltage regulator is applied to the linear tuning delay cell [54, 63] to further stabilize the  $V_{CT}$  and reduce the fluctuations.

### 2.3.3 Low Noise VCO

The one significant drawback of the RO-VCO is the poor phase noise performance which limits it for use in high-speed applications. The main reason for this is due to the characteristic of wide tuning range which causes an excessive voltage to frequency gain ( $K_{vco}$ ). For one thing, the large  $K_{vco}$  not only makes VCO vulnerable to any slight fluctuation or noise on the control signal. Also, it complicates the design of the PLL as the larger  $K_{vco}$  requires placing a large loop capacitor. Therefore, many efforts have been made to improve the noise performance of the RO-VCO whilst maintaining its wide tuning characteristic.

The noise can be generated from many sources. The most common noise source is through the power supply voltage. In a practical implementation, the differential architecture has been widely used in both digital and analogue IC design, as it has better common-mode rejection of supply voltage and substrate noise [66]. When it comes to the differential RO-VCO, the structure of the delay cell can be categorized as fully differential (FD) and pseudo-differential (PD), which are shown in fig. 2-9 (a) and fig. 2-9 (b) respectively.



Fig. 2 - 9 (a) Full Differential (FD) Delay Cell (b) Pseudo-differential Delay Cell

The fully differential delay cell is based on the common source differential pair. The tail current source can provide good common-mode noise rejection performance, and reduce the influence of harmonic distortion [66]. The pseudo-differential delay cell consists of two independent inverters followed with an inverter based latch. The input voltage range can reach a rail-to-rail range, and has larger common-mode gain [66,

67]. Therefore, the pseudo differential delay cell has been widely applied in low power design. However, the large voltage swing requires enough current to charge and discharge the gate capacitance which limits the speed of implementation.

Another widely used technique to suppress the phase noise in the RO-VCO is injection locking [68, 69]. The principle of injection locking is to couple a secondary frequency which is slightly different to the primary frequency. When the coupling is strong enough, the secondary frequency is pulling the primary frequency to achieve identical oscillation frequency. A ring oscillator with injection locking can be modelled as fig. 2-10 (a) and a pseudo differential delay cell with injection locking is shown in fig. 2-10 (b).



Fig. 2 - 10 (a) Model of Injection in a Ring Oscillator (b) Pseudo Differential Delay Cell with Injection Locking

Because of the slightly different initial phase, the output frequency  $W_{out}$  occurs with a phase shift  $\theta$  due to the injection frequency  $W_{inj}$ .

$$\varphi + \emptyset = \theta \tag{2.9}$$

Where  $\varphi$  is the phase of injection frequency and  $\emptyset$  is the induced phase shift of the ring oscillator. According to [70], the relationship between the phase shift at output  $\theta$  and the injection intensity (S) is given as (2.10)

$$tan(\theta - \emptyset) = S \sin \theta$$
,  $S = \frac{I_{inj}}{I_{osc}}$  (2.10)

Since the injection frequency  $W_{inj}$  has a slight frequency difference to the desired frequency, assume  $W_{inj}$  is at an offset  $\Delta w$  with respect to the carrier frequency  $W_o$ 

$$w_{inj} = w_o + \Delta w \tag{2.11}$$

Literature Review

Where  $\Delta w$  is defined as (2.12).

$$\Delta w = \frac{w_o}{2Q} S \sin \theta, \quad Q = \frac{w_o}{2} \frac{d\phi}{dw} \tag{2.12}$$

when  $I_{inj} \ll I_{osc}$ , lock range is given as (2.13)

$$w_L = \frac{w_o}{20}S \tag{2.13}$$

According to (2.11) and (2.13), the injection frequency range that allows the oscillator to be locked can be estimated as (2.14)

$$w_o - w_L \le w_{inj} \le w_o + w_L \tag{2.14}$$

The power of the noise shaping can be calculated by using Leeson's equation that is presented in [69], in which the noise function within the offset frequency is expressed as (2.15)

$$\left| \frac{Y}{N} \left( j(w_o + \Delta w) \right) \right|^2 = \frac{1}{4Q^2} \left( \frac{w_o}{\Delta w} \right)^2 \tag{2.15}$$

Where Y indicates the output signal while N is the noise signal that is input to the ring oscillator. Therefore, when the injection frequency of ring oscillation is set within the lock range, the noise function can be rewritten as (2.16)

$$\left|\frac{Y}{N}(jw_{inj})\right|^2 = \frac{1}{S^2} \tag{2.16}$$

As (2.16) indicates, the power of phase noise of oscillator is inversely proportional to the power of the injection intensity (*S*). Therefore, the phase noise in RO-VCO can be improved with strong injection [69, 70].

As addressed previously, the large voltage to frequency gain  $K_{vco}$  makes the RO-VCO vulnerable to the noise which results in inherent poor phase noise performance. In [58], it is demonstrated that the in-band noise power of the PLL is proportional to the changes of  $K_{vco}$ . Moreover, large  $K_{vco}$  in the PLL amplifies the influence of process and PVT (pressure, volume and temperature) variations [64]. Therefore, much

research has focussed on reducing the sensitivity by scaling the gain of the VCO. In [71] and [72], a dual-tuning method is introduced. However, the frequency tuning range is limited to only several hundred MHz. In addition to this, a digital switching method accompanying dual frequency tuning is presented in [64]. The proposed scalable  $K_{vco}$  with linear tuning delay cell is shown in fig. 2-11.

A latch bank is embedded within an NMOS transconductance pair, loaded with a PMOS diode pair. The output impedance is linearly varied through the PMOS diode, and is inversely proportional to the changes of  $V_{coarse}$ . The function of scalable  $K_{vco}$  is achieved by digitally selecting latches in the latch bank to connect with the delay cell. The output frequency band is regulated by  $V_{coarse}$ , and in each frequency band, different  $K_{vco}$  can be selected, via the number of digital control bits. However, the scale resolution of  $K_{vco}$  is different in different frequency bands even with the same digital control signals. On the contrary, to increase the resolution of scalable  $K_{vco}$ , more digital bits are required with the increased number of latches which will certainly degrade the achievable frequency of the VCO. Moreover, the RO-VCO involved with digital control certainly needs a digital conversion circuit in the PLL which increases the complexity in designing the PLL.



Fig. 2 - 11 Diagram of Scaling Kvco Delay Cell [64]

# 2.4 Conventional Phase Locked Loop (PLL)

If we say that the VCO is the 'heart' of the clock generation system, then the PLL is the 'body'. To increase the stability and controllability, all the VCOs are required to operate in a PLL loop. A typical PLL topology is illustrated in fig. 2-12.



Fig. 2 - 12 Linear Model of PLL

A single loop PLL system basically consists of four building blocks, a phase detector (PD) or phase frequency detector (PFD), a loop filter (LF), a voltage-controlled oscillator (VCO) and a frequency divider (1/N). To explore the functionality of the PLL, within each clock cycle, the phase difference between the reference frequency  $(f_{ref})$  and the divided frequency  $(f_{div})$  is compared by the PD/PFD. Then the phase error will be filtered by the LF block to generate a control voltage signal to the VCO. After that, the VCO will depend on the control voltage to adjust the corresponding output frequency  $(f_{osc})$  simultaneously, and meanwhile,  $f_{osc}$  is forwarded to the frequency divider with a programmable dividing ratio (N) and hence creates  $f_{div}$  and the feedback signal to the PD/PFD to form a loop.

# **2.4.1** Type I PLL

The conventional PLL can be categorized into two typical structures, the type I and type II.



Fig. 2 - 13 Operation of XOR Phase Detector

The type I PLL is also called the XOR PLL, as the phase detector used in the loop is an exclusive OR gate. Fig. 2-13 indicates the function of the XOR phase detector. As can be observed from this diagram, the width of output pulse is proportional to the phase difference between the two input signals. Therefore, the maximum phase difference ( $\Delta \emptyset$ ) occurs when the two inputs share one half cycle ( $\pi$ ) phase shift. Thereby (2.17) can be obtained.

$$V_{PDout} = VDD \cdot \frac{\Delta \emptyset}{\pi} = K_{PD} \cdot \Delta \emptyset, K_{PD} = \frac{VDD}{\pi}$$
 (2.17)

where  $K_{PD}$  is defined as the gain of the phase detector. When the output pulse is 50% of the duty cycle of the pulse train, the PLL is said to be in lock, where  $V_{PDout} = VDD/2$ .

Fig. 2-14 is the block diagram of a conventional type I PLL.



Fig. 2 - 14 Conventional Type I PLL

As it can be seen, the loop filter of type I PLL is just an RC low pass filter. So the transfer function of loop filter is given as (2.18).

$$K_F = \frac{1}{sRC + 1} \tag{2.18}$$

In addition, the VCO operates as an ideal integrator which provides the transfer function in the form  $K_{vco}/s$ . Therefore, the closed loop transfer function of the type I circuit is given as (2.19)

$$H(s)|_{closed} = \frac{\emptyset_{out}}{\emptyset_{data}} = \frac{K_{PD}K_FK_{vco}}{s + 1/N \cdot K_{PD}K_FK_{vco}}$$
(2.19)

Substituting (2.18) into (2.19), it yields

$$H(s)|_{closed} = \frac{K_{PD}K_{vco}}{s^2NRC + sN + K_{PD}K_{vco}}$$
(2.20)

From equation (2.20), it can be concluded that the stability of this second order system depends on different values of the components, R, C, and  $K_{vco}$ . The natural frequency  $w_n$  and damping factor  $\zeta$  is given as (2.21) and (2.22) respectively from standard control theory

$$w_n = \sqrt{\frac{\kappa_{PD}\kappa_{vco}}{NRC}} \tag{2.21}$$

$$\zeta = \frac{1}{2RCw_n} = \frac{1}{2} \cdot \sqrt{\frac{N}{K_{PD}K_{vco}RC}}$$
 (2.22)

Increasing the values of *R* and *C* will decrease the damping factor thereby making the whole system less stable. However, increasing R and C will also reduce the loop bandwidth of the PLL, thus improving the filtering performance of the control voltage of the VCO. Therefore, the design approach of the PLL is to trade-off these parameters.

# 2.4.2 Type II PLL

Compared with type I PLLs, one of the differences of the type II PLL is that the XOR gate is replaced by a phase/frequency detector (PFD) to extract the phase shift between the divided frequency and the reference.



Fig. 2 - 15 Schematic Diagram of Phase Frequency Detector (PFD)

The structure of the traditional PFD is shown in fig. 2-15. By comparing the leading edge of the two inputs, the PFD will generate an UP or DOWN pulse to indicate which input signal is faster. When both pulses are remained low, it indicates that the loop is locked.



Fig. 2 - 16 Block Diagram of Conventional Type II PLL

Fig. 2-16 shows the conventional structure of a type II PLL. A charge pump (CP) is placed between the PFD and the loop filter to convert the frequency/phase difference from a pulse signal into a current signal. Because of this characteristic, the type II PLL is also called a charge pump PLL. Moreover, as the maximum frequency/phase difference  $\Delta\emptyset$  between the divided frequency and the reference is  $2\pi$ , the charging and discharging current that is provided by the charge pump (CP) is usually combined with the phase difference which gives the overall output current from the CP as (2.23)

Literature Review

$$I_{PFD\_I} = \frac{I_{pump} - (-I_{pump})}{4\pi} \cdot \Delta \emptyset = K_{PFD\_I} \cdot \Delta \emptyset, K_{PFD\_I} = \frac{I_{pump}}{2\pi} \quad (2.23)$$

where  $K_{PFD\_I}$  is the transfer function of the PFD and CP. Comparing this to the type I PLL, the type II PLL has larger acquisition range.

In addition to this, another difference between the type I and the type II PLLs is the loop filter. In the type II PLL, it applies a second order low pass filter, the transfer function of which is given by (2.24)

$$K_F = \frac{sRC_1 + 1}{s^2RC_1C_2 + s(C_1 + C_2)} \tag{2.24}$$

In general,  $C_2$  is set at about one tenth (or even less) of  $C_1$  by experience for stability. To simplify equation (2.24) by neglecting  $C_2$  (i.e.  $C_2 << C_1$ ), it becomes (2.25)

$$K_F = \frac{1 + sRC_1}{sC_1} \tag{2.25}$$

Therefore, the closed loop transfer function of type II PLL can be obtained as (2.26).

$$H(s)|_{closed} = \frac{\emptyset_{out}}{\emptyset_{data}} = \frac{K_{PFD\_I}K_{vco}(1+sRC_1)}{s^2 + s(\frac{RK_{PFD\_I}K_{vco}}{N}) + \frac{K_{PFD\_I}K_{vco}}{NC_1}}$$
 (2.26)

Consequently, the natural frequency  $w_n$  and damping factor  $\zeta$  is given as (2.27) and (2.28) respectively.

$$w_n = \sqrt{\frac{K_{PFD\_I}K_{vco}}{NC_1}} = \sqrt{\frac{I_{pump}K_{vco}}{2\pi NC_1}}$$
 (2.27)

$$\zeta = \frac{w_n}{2} R C_1 = \frac{R}{2} \sqrt{\frac{I_{pump} K_{vco} C_1}{2\pi N}}$$
 (2.28)

Compared with the damping factor provided by a type I PLL, the advantage of the type II PLL is that increasing R and C will not degrade the system stability. Furthermore, a large loop bandwidth will improve the immunity to ripple fluctuation on the VCO's control voltage. However, in a practical implementation, the on-chip

capacitor cannot be set big enough. Not only does a large on-chip capacitor occupy too much chip area which makes the signal wiring and integration more complex but also it may cause metal density violations of design rule, especially for more advance CMOS process nodes. Although active loop filters have been employed to scale down the large area cost of an on-chip capacitor, the active noise that is induced by an opamp limits the usage of this approach [73, 74]. As for digital loop filter, it suffers from quantization noise due to limited frequency resolution [75-77]. In that case, the factor of  $I_{pump}$  can be used to trade-off the system stability and loop bandwidth. By this reason, the feature of adaptive bandwidth has been applied to many state of art type II PLL designs, such as the work presented in [78-81]. In [79] for example, the charge pump current is proportionally scaled with the VCO's regulated control current Ibias, while the R of the loop filter is inversely varied with  $I_{bias}$ . Moreover, other PLL that incorporated with adaptive biasing function is proposed in [80, 81]. Using this approach, the charge pump current is linked to the reference frequency and the small signal resistance of a diode-connected PMOS. Compared with the type I PLL, the type II PLL is more attractive to the designer when used in clock generation system.

# 2.5 Alternative PLL Topologies

Based on the two conventional PLL topologies, other different types of PLL architecture have been developed to accommodate different types and characteristics of VCO and to improve overall system performance. For example, for a wide tuning range RO-VCO, dual tuning structure is widely applied to reduce the sensitivity of the PLL to noise generated by the PLL's building blocks [82]. Moreover, to improve the in-band noise performance, the injection locking PLL also offers a dual control structure which combines a conventional charge pump PLL with an additional loop for providing injection pules [83, 84]. In this section, different types of PLL topologies are introduced.

### 2.5.1 Dual Tuning PLL

In the most state of the art PLLs, the dual tuning structure has been widely applied, especially when used with an RO-VCO. The key principle of dual control is create two control paths within the PLL, each path taking charge of a separate control function to the VCO. One topology shown in fig. 2-17.



Fig. 2 - 17 Topology of Dual Control PLL

The most traditional usage incorporates dual-tuning of the VCO, either controlled by regulated power supply [85] or by an additional fine control signal with a *Gm*-C integrator or analogue filter in the path [82, 86, 87]. In [87], a dual-control PLL structure with an analogue *Gm*-C integrator on coarse tuning path is discussed. The fine control path consists of a type II PLL, the structure of which is shown in fig. 2-18.



Fig. 2 - 18 Conventional Dual-tuning PLL with Analogue Filter

To provide enough phase margin for the operational transconductance amplifier(OTA) that is used in the filter, the load capacitor has to be large enough to make the bandwidth of the coarse tuning path extremely narrow, where the bandwidth is roughly 10 to 100Hz as reported in [88], or alternatively to make the  $K_{vco}$  on the fine control path small. However, either of these methods will lead to stability problems. Therefore, a digitally stabilized method is applied to the dual tuning PLL [49, 87]. The analogue

integrator in the coarse tuning path is replaced by a digital integrator and a digital-toanalogue converter (DAC). Although it removes the challenge of designing the analogue filter and saves considerable chip area, the quantization noise is unavoidable as faced by all the digitalized PLLs. Moreover, the power consumption is greatly increased as well as the system complexity due to additional digital circuits.

In additional to the traditional dual-tuning PLL architecture, injection locking techniques are also widely applied with dual control PLLs [53, 83, 88-91]. As introduced in section 2.3.3 (fig.2-10(b)), a pulse transistor is placed within a pseudo-differential delay cell which provides the dual control functionality to the VCO. Therefore, the conventional injection locking PLL is shown in fig. 2-19.



Fig. 2 - 19 Conventional Injection Locking PLL

The injection pulse is generated from another path by a pulse generator. According to [88], the relation between injection frequency and output frequency of the VCO can be written as (2.29)

$$f_{inj} = \frac{f_{out}}{N} \tag{2.29}$$

Where N is the injection ratio between injection frequency and output frequency. Therefore, the clock edge of the output frequency is calibrated periodically every N/2 cycles to correct the static frequency shift of the VCO. Within the injection locking range, the phase noise of the PLL is constrained to  $\mathcal{L}_{inj} + 20 \log_{10} N$ , where  $\mathcal{L}_{inj}$  is the initial noise floor of injection signal. However, although the injection locking PLL is effectively supressing the phase noise when loop bandwidth is set within the injection locking range [88], the real-time frequency drift is hardly prevented due to the existence of PVT variation in reality [52, 53]. In addition, the structure that shown in fig 2-19 may also suffer from timing issues, as the phase correction for the VCO is

controlled by two independent paths [83, 90, 91]. In [90], a pulse positioning circuit is inserted into the injection path as timing control. The pulse-position-modulation is achieved by a voltage control delay chain. By varying the control voltage, the arrival position of the injection pulse to the VCO is dynamically regulated to compensate the timing delay from the PLL path. There is also other work [84, 92] that tries to re-time the injection signal by applying sub-sampling PLLs or using an adjustable timing phase/frequency detector (PFD).

Apart from the dual tuning RO-VCO based PLL, the double control structure is also applied with some LC-VCO based PLLs [82, 93, 94]. We know that dual tuning topology is helpful to provide a moderate gain  $K_{vco}$  for an RO-VCO. However, for LC-VCOs which have relatively low gain, the serious disadvantage is that additional digital circuitry is required to select ring VCO tuning curves before the PLL starts to operate [82]. Therefore, in [93], a dual control LC-VCO PLL is presented with a digital tuning path for swiftly selecting the suitable tuning range by a bank of digitally switched capacitors, while the analogue tuning path is used for accurate output frequency.

# 2.5.2 Multiple Loop PLLs

Multiple loop topology is another alternative approach that is widely applied in PLL design. Compared with dual-tuning PLLs, multiple loop PLLs implement more paths within the system, which to some extent increases the overall system complexity, but the advantage is that the multiple loop structure also brings more functionality to the PLL which allows the designer to better trade-off the performance between phase noise and settling time [95]. Several topologies of multiple-loop PLLs are displayed in fig. 2-20.

In fig.2-20(a), two PLLs are connected in a cascaded structure, to which the reference frequency for PLL2 is provided by the output of PLL1. As for fig.2-20(b), a single-band mixer is used to merge the output frequencies from PLL1 and PLL2 while PLL1 is used to create a controllable frequency and PLL2 provides a fixed carrier frequency. This structure is usually found in wireless transceiver systems [96]. However, the drawback of using mixer is that an undesired disturbance is created at the output of PLL. Moreover, due to the limited dynamic range of mixer, the PLL can only operate in a narrow frequency band. As for structure (c) and (d), the mixer is placed inside the

primary loop, thereby, spur power can be degraded by the low-pass filter and the frequency divider. The structure in fig.2-20(c) and fig.2-20(d) can be found in [96] and [97, 98] respectively.



Fig. 2 - 20 Several Topologies of Multiple-loop PLL

In [88], a multiple loop structure of fig.2-20(a) is used using two cascade injection locking PLLs. As seen in fig.2-21, PLL2 is sub-injection locked to the output from PLL1, which is also injection locked to the reference frequency. In addition, two pulse generators are independently creating injection signals at the rising edge of the inputs to each PLL. Using this structure, the output frequency is realigned every N cycles where N is the injection ratio in PLL2. However, it requires two delay units to compensate the mismatch of timing due to the independent path, which adds a delay that has to be large enough to tolerate the PVT variation. Furthermore, the injection pulse can only be applied on the rising edge in order to avoid the jitter induced by duty cycle distortion.

In addition to the examples discussed so far, some works also tried to avoid using mixers in multiple loop PLL structures fig.2-20(c) and fig.2-20(d) by employing subsampling techniques [60, 99, 100]. Instead of isolating paths of the PFD and the CP, they try to isolate the dividing path. However, the drawback of the sub-sampling PLL is that the locking range is quite narrow [87].



Fig. 2 - 21 Multiple Loop PLL with Cascaded Injection Locking

Fig.2-22 shows a multiple loop PLL using two identical VCOs that is sharing the same control voltage [52, 53, 101-103]. The main VCO is a type of injection-locked circuit and is responsible for generating the desired output frequency. The replicated VCO is embedded within a PLL loop to continually monitor the instantaneous frequency drifts and dynamically adjusts the control voltage. However, this frequency calibration will cost same amount of power in the replicated VCO as in the main VCO. Moreover, an avoidable drawback is the mismatch between those two VCOs, which limits the precision of the frequency calibration [83].



Fig. 2 - 22 Multiple Loop PLL with Two Identical VCOs

# 2.5.3 Multiplying Delay Locked PLL

An alternative PLL architecture to generate clock signals with better suppression of phase noise is to use the delay-locked loop (DLL). Fig.2-23 shows the block diagram of a typical DLL. Compared with the PLL structure, the DLL replaces the VCO by a voltage controlled delay line (VCDL).



Fig. 2 - 23 Block Diagram of Typical DLL

The phase of output frequency is varied by the delay. Although the DLL can significantly improve the random jitter of the clock, additional circuitry is still required to guarantee the correct operation [52]. References [103-105] integrate a DLL structure within a multiple loop as presented in fig. 2-24.



Fig. 2 - 24 DLL with Multiple Loop

As it can be seen from fig. 2-24, the replicated VCO in fig.2-22 is replaced with a voltage controlled delay line (VCDL). Meanwhile, the VCDL and the main injection locking VCO is sharing the same control voltage. As the delay cell that applied in VCDL and VCO is identical, the unit delay of individual delay cell is the same once

the DLL locked. Instead of calibrating the frequency drift of the VCO, the DLL defines the output frequency by the unit delay time and the number of delay cell that used in the VCO. Therefore, the output frequency of DLL becomes independent to the PVT variation. However, the mismatch issue between the VCDL and the VCO still exists which requires additional calibration step.



Fig. 2 - 25 Multiplying Delay-locked Loop (MDLL)

Fig. 2-25 shows a multiplying DLL architecture (MDLL) which is presented in [106-109]. The input to the VCDL is selected from a multiplexer controlled by additional logic circuitry. Therefore, by changing the input to the VCDL, the output clock edge of the DLL is substituted periodically between the clock edge of VCDL and the clean edge of the reference frequency. In previous work [106, 109], the MDLL has proved to be able to provide better jitter performance. However, in the high frequency scenario, the substitution of the high-speed VCO's will consume relatively large power because of the digital logic and frequency divider. Consequently, the MDLL is limited in frequency bandwidth.

# 2.5.4 PLL-based CDR

At the receiver of either an electrical link or optical link, the received data is both asynchronous and noisy. For subsequent processing, the timing information must be extracted from the data stream for synchronous operation. Moreover, the data needs to be retimed in order to remove the accumulated jitter during transmission. Therefore, the clock and data recovery (CDR) plays a significant role in the receiver and is necessary for clock extraction and data retiming. The conventional CDRs are

commonly based on the structure of the PLL, which also includes a phase detector (PD), charge pump, loop filter and VCO. Apart from that, the recovered data is synchronized by a D-type flip-flop (DFF) which is clocked by the extracted clock signal. Fig. 2-26 depicts the traditional CDR topology.



Fig. 2 - 26 Traditional CDR Topology

In contrast to the typical PLL, the input to the CDR is the received and amplified random data, which requires the PD to have a wide locking range to tolerate multiple data rates. There are two basic types of PD that have been commonly applied to CDRs which are the Hogge PD and the Alexander PD (also called the bang-bang PD).



Fig. 2 - 27 (a) Hogge PD (b) Alexander PD (Bang-Bang PD)

As shown in fig.2-27(a), the Hogge PD is composed of two DFFs clocked by both rising and falling edge of the clock and two XOR gates. Path Y produces proportional pulses in relation to phase difference between input data and the clock while path X produce reference pulses with half clock cycles' width ( $T_{ck}/2$ ). To correct sampling data information, the clock's edge should be placed at the middle of the data which means the proportional pulse has to have equal width to reference pulse under locked conditions. However, due to existing propagation delay in the DFF, additional delay circuitry is required to compensate the timing either in the proportional path [110] or

the reference path. As for the Alexander PD (bang-bang PD) which is illustrated in fig.2-27(b), the principle of bang-bang PD is that the random data is sampled three time by three consecutive clock edges, and the transition of the data can be detected. Meanwhile, by passing the detected transition of the data through XOR gates, the edge information can be determined whether clock leads or lags the data. In recent work [111-114], bang-bang PD has been widely used in high-speed applications. However, one of the drawback of bang-bang PD is that the linearized modelling analysis of CDR using bang-bang PD is not possible because the characteristic of bang-bang PD is a discrete model, which means the edge of the sampling clock can only approximately approach to the middle of data where is best spot of sampling [115]. Therefore, a slight phase error always existed even after PD locked.

As for a functional CDR architecture, it must include frequency and phase acquisition to guarantee locking despite PVT variation of the VCO frequency. Moreover, the recovered data should be retimed inside the PD to avoid systematic skew. The conventional PLL-based CDR that is presented in [113, 114, 116] is depicted as fig.2-28.



Fig. 2 - 28 Conventional PLL-based CDR

Due to the limited capture range of the PD and to avoid false locking to harmonic oscillation frequencies, a PLL loop is employed to initialize the VCO and lock to the correct operating frequency [110, 116, 117]. First of all, the CDR enables the

frequency acquisition to approach the desired frequency range. Once the frequency error drops to a sufficiently small value (within the capture range of the PD), a lock detector is activated and disables the frequency acquisition loop. Meanwhile, the phase tracking loop takes over to lock to the data. However, such structures face a transition disturbance when switching from frequency to phase acquisition which may lead to a large jump in phase and fall out the capture range of the PD. In addition, an accurate and adjustable reference clock is required for frequency acquisition which complexes the implementation of the CDR [110]. Therefore, a reference-less CDR structure has been realized in recent papers [110, 118]. The PFD used in the PLL loop is replaced by a frequency detector (FD) which provides a DC level (after integration by a loop filter) and drives the VCO approaching the input data rate. By using this structure, the external reference clock is removed. It should be noted that for the structure shown in fig.2-29, the FD and PD path are sharing a single control line to drive the VCO. Therefore, the fluctuation of the control voltage of the VCO needs to very small to avoid too much frequency drift and failure to lock, which is especially true for the RO-VCO. In [118-120], a digital FD path is implemented to extend frequency acquisition range. In addition, a dual-tuning structure is also employed to separate the coarse tuning and fine tuning paths to minimize the impact of ripple on the control line [121]. Other dual control CDR architectures can be found in [122, 123].



Fig. 2 - 29 Full Rate Referenceless CDR

In addition to those papers, the injection locking technique is also incorporated to provide dual control CDRs. In [123, 124], the edge information of the input data is extracted by an XOR delay unit to generate pulses as the injection frequency. Two identical VCOs are cascaded to purify the clock while the injection pulse is input to the front VCO. Moreover, the control voltage for the two VCOs is generated by a reference PLL with a duplicated VCO. However, the mismatch problem still exists between the cascaded VCO and the duplicated VCO. Furthermore, the other drawbacks of this structure are system complexity and high power dissipation [122].

On the other hand, there is another aspect that is usually concerned for designing a CDR. In contrast to the PLL in the transmitter, the CDR at the receiver deals with random data. Higher data rate demands higher VCO frequency and a faster phase detector. Therefore, many groups have focused on half-rate or quarter rate CDR design [112, 118]. In [113], quarter rate PLL-based CDR is presented which can deal with 40Gbps data rate. The same performance can also been found in [112], that integrates a quarter rate CDR within an optical receiver to accommodates 38Gbps to 43Gbps input data rate.

# 2.6 Summary

In this chapter, a review of modern high-speed electrical and optical transceiver architectures is presented, along with the state of art performance that has been published in recent papers Moreover, the core modules (VCO and PLL) for generating a clock signal for these transceivers is also comprehensively reviewed in detail from the perspective of performance (frequency bandwidth and noise) and topologies (conventional and alternative).

Obviously, since the high-speed electrical link gradually reaches the channels' limitation as data rate scales, more challenges are posed to the designer for extending the circuit bandwidth and providing a high-precision timing circuit. Optical transceivers provides a potential solution to the issue of I/O's bandwidth and has been applied in short reach communication links. To summarize regarding the clock generation systems in recent publications of high-speed electrical/optical transceiver, there are two conclusions that can be made:

### Literature Review

- 1) Traditionally, almost all the clock generation systems (PLL and CDR) in these transceiver links are based on LC tank VCOs. This is because the LC-VCO has superior phase noise performance and relatively high oscillation frequency. However, an inevitable consequence is that the bandwidth is limited by the narrow tuning range of the clock.
- 2) For high-speed transceiver systems (both electrical and optical), the clock speed is usually applied at some sub-rate of the transceiver speed to alleviate the challenge in designing a clock tree and corresponding buffers which consume significant additional power. Table 2-1 lists several transceiver examples that have been presented in recent papers. It can see that the popular clock speed is usually configured at half or quarter rate with respect to the data rate of the transceiver.

In contrast, according to the recent papers, the data rate of the transceiver could be from 20Gbps up to 56Gbps. However, none of the clock generation systems that have been presented have the capability to deal with such a wide range. Therefore, taking consideration of multiple protocol compatibility in next generation silicon photonics transceiver systems, an extremely wide band clock generation unit of over 30GHz frequency range is necessary.

Table 2 - 1 Examples of Transceiver with Corresponding Clock in Present Research Works

| Reference | Transceivers' Architecture | Data Rate   | Clock Type     | Clock Speed                                | Clock Jitter (rms) | Area                 |
|-----------|----------------------------|-------------|----------------|--------------------------------------------|--------------------|----------------------|
| [4]       | Optical                    | 40Gbps      | LC-PLL         | 20GHz                                      | 170fs              | 0.81mm <sup>2</sup>  |
| [34]      | Electrical                 | 38.8-42Gbps | LC-PLL         | 10GHz                                      | 319fs              | 0.63mm <sup>2</sup>  |
| [39]      | Electrical                 | 56Gbps      | LC-PLL         | 28GHz                                      | 508fs              | 1.14mm <sup>2</sup>  |
| [41]      | Optical                    | 20-22Gbps   | LC-PLL         | 10GHz                                      | -                  | 44.8mm <sup>2</sup>  |
| [125]     | Electrical                 | 28Gbps      | Two LC-<br>PLL | PLL1:<br>21-28.5GHz<br>PLL2:<br>14.5-21GHz | 250fs              | 0.83mm <sup>2</sup>  |
| [126]     | Optical                    | 32Gbps      | RO-PLL         | 4-11GHz                                    | 642fs              | $0.003 \text{mm}^2$  |
| [26]      | Optical                    | 28Gbps      | -              | 7GHz                                       | 350fs              | 3.6mm <sup>2</sup>   |
| [127]     | Electrical                 | 40Gbps      | RO-PLL         | 10GHz                                      | 3610fs             | 0.086mm <sup>2</sup> |
| [128]     | Electrical                 | 40Gbps      | LC-PLL         | 10GHz                                      | 230fs              | 0.6mm <sup>2</sup>   |

# Chapter 3 Ultra-wide Tuning Range

**CMOS** Ring-based VCO with

# **Inductor Peaking**

# 3.1 Introduction

Two types of voltage controlled oscillator (VCO) are used in various analogue and digital applications. Traditionally, all the clock generation systems [Phase-locked Loop (PLL) and Clock Data Recovery (CDR)] in high-speed transceiver are dominated by a LC (inductor-capacitor) tank VCO, which benefits from its inherently excellent phase noise performance and relatively high oscillation frequency. However, one significant drawback of LC-VCO is its relatively small tuning range. On the contrary, a ring-based VCO (RO-VCO) is considered as an opposite solution due to its features of wide tuning range.

In this chapter, we take an innovative approach to combine the advantages of LC-VCO and RO-VCO and propose a new VCO structure incorporating a peaking inductor to achieve both high oscillation frequency and wide frequency tuning range. This chapter

starts with the explicit analysis of the conventional RO-VCO, from which a new RO-VCO circuit topology is proposed. To validate the functionality of the proposed new design, four design examples are fabricated at two different CMOS process nodes, IBM 130nm and TSMC 65nm.

# 3.2 Analysis on Bandwidth Enhancement of VCO

It is well-known that any ring based oscillator must satisfy two of Barkhausen's criteria for oscillation (introduced in Appendix A.1), namely, the phase shift around the total feedback loop must be a multiple of 360 ° at its oscillation frequency and the magnitude of the loop gain at that frequency must be unity.



Fig. 3 - 1 Typical Three Stage RO-VCO

Therefore, for a typical three-stage RO-VCO as shown in fig.3-1, each delay cell can be treated as a common source amplifier, its transfer function is given as (3.1)

$$H(s) = \frac{V_{out}}{V_{in}} = \frac{-g_{m1}R_{eff}}{sCR_{eff}+1}$$
(3.1)

The capacitance C denotes all the capacitance associated at the  $V_{out}$  node, and a variable resistor  $R_{eff}$  is used to represent the effective resistance of the transistor  $M_2$ , whose gate voltage is controlled by the external signal  $V_{ct}$ . From (3.1), it can be seen that the 3-dB bandwidth of this amplifier is  $1/(2\pi R_{eff}C)$ , and frequency tuning is realized by changing the effective resistance  $R_{eff}$  of transistor  $M_2$ . In addition to the variable resistance  $R_{eff}$ , the other factor that limits the maximum achievable frequency of VCO is the capacitance C which includes the gate capacitance and parasitic capacitance in association with wiring.

# 3.2.1 Analysis on Transistor Sizing to Frequency

There are two parameters (C and  $R_{eff}$ ) shown in (3.1) that can be used to enhance the achievable frequency. However, the capacitance C is highly depended on the technology process which has less effects on improving the bandwidth. The  $R_{eff}$ , on the other hand, is decided by the transistor size of  $M_2$ . Equation (3.2) illustrates the relationship between the sizes of transistor to its corresponding resistance.

$$r_{ds} = \frac{L}{\mu_{eff}C_{ox}W(V_{gs} - V_{th})}$$
 (3.2)

With a constant voltage signal  $V_{ct}$ , the  $R_{eff}$  can be decreased by increasing the width of  $M_2$  while the length of  $M_2$  is fixed to its minimum value. To demonstrate the effects of this approach, fig.3-2 shows the curve of transistor sizing to frequency expansion. The following simulation results are based on 130nm technology process node.



Fig. 3 - 2 Transistor Sizing to Frequency Expansion

Two approaches are investigated which one approach is to increase the width of PMOS  $(M_2)$  with a fixed width of NMOS  $(M_1)$ . The other approach is to increase the width of both PMOS  $(M_2)$  and NMOS  $(M_1)$  at the same time. It can be seen that the frequency increases with a larger transistor width in both approaches. In addition, another result can be found is that by increasing the width of  $M_2$  and keeping  $M_1$  fixed, the frequency increases faster than that of changing both  $M_1$  and  $M_2$ . This is because the capacitance

of transistor is also increased with a larger width configuration. In terms of  $M_1$ , as the capacitance C in (3.1) increases, the frequency degrades. However, the increased capacitance of  $M_2$  has less influence on the frequency. Therefore, from the perspective of frequency enhancement, approach of only sizing the width of  $M_2$  is more efficient, which is the method that is applied in the following design on achieving high frequency bandwidth.

# 3.2.2 Analysis on Noise Contribution

As the oscillation frequency is increased, another aspect that cannot be ignored is the noise contribution of a single transistor. The in-band noise of VCO is mainly dominated by the flicker noise. The most used empirical model for flicker noise is given as (3.3)[129]

$$\overline{V_f^2} = \frac{K_f}{C_{ox}WL} \frac{1}{f} \tag{3.3}$$

where  $C_{ox}$  is the oxide capacitance per unit area,  $K_f$  is the flicker noise coefficient. It should be noted that the power density of flicker noise is a function of transistor size, which means the methodology of transistor sizing has also effects on the transistor's noise contribution. In (3.3), the flicker noise can be decreased by increasing the size of a transistor.





Fig. 3 - 3 Noise Contribution Comparison (a) Transistor Sizing to Noise Contribution (b)
Noise Contribution Scaling with Size Ratio

Fig.3-3 (a) shows the noise contribution with scaling the width of  $M_1$  and  $M_2$  respectively. As can be seen that  $M_2$  is noisier than  $M_1$  at same transistor size, which has also been addressed in [130]. In addition, to investigate the overall noise contribution, the width of  $M_1$  is set as constant while only scaling the width of  $M_2$ , in which the size ratio ( $N_{size}$ ) between  $M_1$  and  $M_2$  is defined as ( $W_p/W_n$ ). Fig.3-3(b) illustrates the overall noise contribution scaling with  $N_{size}$ . It can be concluded that the overall noise level is reduced by increasing the  $N_{size}$ .

#### 3.2.3 Analysis on Voltage Tuning Range

According to the above discussion, the method of increasing the width of  $M_2$  with a constant width of  $M_1$  is an effective approach on enhancing the achievable frequency while maintaining a relatively low noise level in terms of a common source RO-VCO. However, a critical limitation is the voltage tuning range.

Fig.3-4 (a) shows the tuning curve of control voltage and its corresponding oscillation frequency with respect to different  $N_{size}$ . As seen in fig. 3-4(a), the voltage tuning range starts to shrink with increased size ratio  $N_{size}$ . This is because that the oscillation amplitude reduces in the higher frequency range. According to the Barkhausen's criterion, in order to ensure oscillation, the oscillator should meet the requirements for

both loop gain and phase shift. In order to give evidence to this assumption, the gain-frequency response of a single delay stage is investigated with an example of  $N_{size}$ =3.



Fig. 3 - 4 (a) Control Voltage to Frequency Range (b) Gain-Frequency Response with *Nsize*=3

Fig.3-4 (b) shows the frequency response. There are two things should be pointed out. Firstly, as it can see that the gain of a single delay stage is different at different control voltage  $V_{ct}$ . With the decreasing of  $V_{ct}$ , the loop gain at each delay stage is dropped. Secondly, in [28], it indicates that the total phase shift within a feedback loop must be

a multiple of  $360^{\circ}$  and the magnitude of the loop gain at the oscillation frequency must be unity. For example, in terms of a three delay-stages oscillator, at least  $60^{\circ}$  phase shift is required by each delay stage in order to guarantee the overall phase shift of  $180^{\circ}$ , so it can meet phase requirement of oscillation. In fig.3-4 (b), the loop gain at  $60^{\circ}$  phase shift is monitored. As result indicates, the loop gain is below 0dB (unity gain) when  $V_{ct}$  is 0V and 0.3V with gain of -4.02dB and -7.59dB respectively. Compare the gain-frequency results of fig.3-4(b) with the curve ( $N_{size}$ =3) in fig.3-4(a), it can see that the RO-VCO fails to oscillate when  $V_{ct}$  is below 0.5V.

Therefore, it is critical to focus on improving the amplitude of a RO-VCO in the high frequency range.

## 3.3 Common Source RO-VCO with Inductor Peaking

In order to enhance the achievable bandwidth in high frequency range, an optimally valued inductor is inserted to provide a resonant circuit with the capacitances C. This technique, known as inductor peaking, is a fundamental technique that is commonly adopted for high-speed amplifier design [28].



Fig. 3 - 5 RO-VCO with Peaking Inductor

As shown in fig.3-5, a typical RO-VCO can be modified by inserting an additional inductor L within each delay cell, with the resulting transfer function given by (3.4)

$$H(s) = \frac{V_{out}}{V_{in}} = \frac{-g_{m1}(R_{eff} + sL)}{s^2 CL + s C R_{eff} + 1}$$
(3.4)

According to the characteristics of an inductor, the impedance of an inductor increases with the frequency going up, thereby, a resonant peak is induced in the high frequency

range and compensates the amplitude loss of the RO-VCO. By applying the same method that used in previous section, the voltage tuning curve and gain-frequency response of the inductive peaking VCO is presented in fig.3-6 (a) and fig.3-6 (b) respectively.





Fig. 3 - 6 (a) Control Voltage to Frequency Response (b) Gain-Frequency Response with Nsize=3

As can be seen in fig. 3-6 (a), the voltage-to-frequency tuning range is extended in all different cases of  $N_{size}$ . However, as the case of  $N_{size}$ =4, only part of the tuning range is compensated and the highest frequency is still lost. This is because, by over increasing the ratio  $N_{size}$  between  $M_2$  and  $M_1$ , the bandwidth in each delay stage is decreased due to the increased capacitance C. However, the inserted resonant peak can only provide a high impedance in a narrow frequency band to compensate the insufficient amplitude, which lead to an uncompleted tuning. Therefore, for the practical inductive peaking VCO design, the  $N_{size}$  should be carefully considered between achievable frequency and tuning range. In fig.3-6 (b), it can see that a resonant peak is inserted in the frequency response. Taking advantage of this peak, the loop gain at  $60^{\circ}$  phase shift is all higher than unity at different  $V_{ct}$ .

On the other hand, in terms of transfer function, re-organizing (3.4) gives (3.5)

$$H(s) = \frac{V_{out}}{V_{in}} = -g_{m1}R_{eff} \frac{s + 2\zeta w_n}{s^2 + 2\zeta w_n s + w_n^2} \cdot \frac{w_n}{2\zeta}$$
(3.5)

where the natural frequency  $\omega_n$  and damping factor  $\zeta$  are given as (3.6) and (3.7) respectively

$$w_n = \frac{1}{\sqrt{LC}} \tag{3.6}$$

$$\zeta = \frac{R_{eff}}{2} \cdot \sqrt{\frac{C}{L}} \tag{3.7}$$

It is interesting to note that (3.6) and (3.7) reveals two important facts that can guide the design of inductor-peaked RO-VCOs. Firstly, for a fixed damping factor  $\zeta$ , the inductance L is proportional to the capacitance C. Thus for an advanced CMOS process node, which has smaller gate capacitance, the required peaking inductance could be reduced proportionally. These results in smaller silicon area with a more compact design. Secondly, for a given design, the smallest (worst) damping factor  $\zeta$  occurs at the lowest value of  $R_{eff}$ , which is also the point at which the VCOs operate at their highest frequency. Therefore, decreasing the frequency (frequency-tuning mechanism) has little negative effect on the loop stability. Furthermore, for practical implementation of on-chip monolithic inductors, there is usually a concern about the trade-off between the quality factor Q of the inductor and the physical inductor size.

We show later in this chapter that the size of the inductor may influence the frequency range, whereas the Q may dominate the phase noise performance of proposed inductive peaking RO-VCO.

## 3.4 Inductive Peaking VCO Design Application

#### 3.4.1 The Research Purposes of Implementation

To demonstrate the ultra-wide tuning range and validate the theoretical analysis of proposed inductor peaking VCO, four design examples were implemented and fabricated at two different technology processes. Two research targets have been investigated by using these four examples:

- 1) Inductor peaking VCO is more suitable in advanced CMOS process node.
- 2) The trade-off among the quality factor Q and the size of inductor and its corresponding performance regarding frequency range and phase noise.

#### 3.4.2 Chip Level Design

A full chip consists of two blocks that include the proposed inductive peaking RO-VCO and a  $50\Omega$  output buffer. The top level circuit is illustrated in fig.3-7 (a).



Fig. 3 - 7 Proposed Structure of Design Example (a) Top Circuit Topology (b) Proposed VCO Structure with Inductor Peaking (c) 50Ω Output Buffer

All four design examples are implementing with the same VCO circuit topology that is shown in fig.3-7 (b). The inductive peaking VCO contains three single-end delay stages with  $N_{size}$  set to approximately 3 for the trade-off between tuning range and transistor's noise contribution. For better comparison, the width of transistors (M<sub>1</sub> and M<sub>2</sub>) in each design pair are identical, while the length for all design examples are set to its minimum value. The type of transistor is applied with low-threshold transistor in order to acquire large transition frequency ( $f_T$ ) and fast switching activity for high-speed purpose. The key parameters of those examples are listed in table 3-1.

Additionally, a self-basing  $50\Omega$  output buffer (fig.3-7(c)) is integrated with core VCO design for the purpose of test. An on-chip DC blocking capacitor was placed between the VCO and buffer to reset the DC operation point of the output buffer, and dummy loads were inserted to ensure that each delay stage in the VCO had a similar load, in order to avoid the influence of unbalanced oscillation in practical measurements.

Table 3 - 1 Key Parameters of Four Design Examples

|                           | IBM 130nm 8RF CMOS |           | TSMC 65nm LP CMOS |         |
|---------------------------|--------------------|-----------|-------------------|---------|
|                           | VCO-1              | VCO-2     | VCO-3             | VCO-4   |
| VDD (V)                   | 1.5                | 1.5       | 1.2               | 1.2     |
| M <sub>1</sub> (W/L) (μm) | 22.5/0.12          | 22.5/0.12 | 20/0.06           | 20/0.06 |
| M <sub>2</sub> (W/L) (μm) | 72/0.12            | 72/0.12   | 64/0.06           | 64/0.06 |
| Inductance (pH)           | ≈590               | ≈420      | ≈350              | ≈350    |
| Peak Q                    | 18.3               | 7.2       | 21.5              | 12.5    |

On the other hand, the key difference among those four design examples is the inductors realized in VCO-1/3 only use the top ultra-thick metal layer whereas the inductors realized on VCO-2/4 use stacked inductor structure on inner thinner layer, which results in a lower quality factor Q and smaller silicon area. The design methodology of inductor modelling is described in Appendix E. As illustrated in fig.3-8, the stacked inductor occupies less than one-third of silicon area compared to inductor used with VCO-3, although the cost in performance is a relatively lower Q factor. Moreover, it can been seen from the table 3-1, with the less gate capacitance, the required inductance used in 65nm process (VCO-3/4) is less than that used in 130nm process (VCO-1/2).



Fig. 3 - 8 Layout View of the Inductors used with (a) VCO-3 and (b) VCO-4



Fig. 3 - 9 Microscope View of the Proposed Four RO-VCO (a) VCO-1 (b) VCO-2 (c) VCO- 3 (d) VCO-4

Mircrographs of the four design examples are shown in fig.3-9. It can be seen that the VCOs realized in the 65nm process node have much smaller silicon area than the corresponding designs realized in 130nm process node. This is consistent with the theoretical prediction that the value of peaking inductor should be smaller in the advanced process node. In addition to this, the area of the VCO with a stacked inductor (VCO-2/4) is also much smaller than the VCO with thick metal layer inductor (VCO-1/3), which results in shorter signal track and less layout complexity.

#### 3.4.3 Testing and Experimental Results



Fig. 3 - 10 Testing Bench for Four Design Examples

As shown in fig. 3-10, all four design examples are applied with the same testing bench. The bare silicon die is mounted on a FR4 printed circuit board (PCB). Bonding wires are used for DC signal connections between silicon chip and PCB. The output oscillation frequency is directly detected by an RF probe which then is fed into a Keysight E4446A spectrum analyser. In addition, to power the chip, two power signals and a control voltage pass through the PCB separately by an on-board low pass filter, which the filtering bandwidth is set to several KHz. The power signals are provided by a Keithley-6487 power source. For the design examples on 130nm process node, the power supply is 1.5V while 1.2V power supply is applied for the designs on 65nm process node. The control voltage in 130nm and 65nm designs is scaled from 0 to 1.2V

and 0 to 1.0V respectively. In order to minimize the impact of integration, multiple bonding wires are attached at each DC pad.



Fig. 3 - 11. The Spectrum of Highest Oscillation Frequency of (a) VCO-1 (b) VCO-2 (c) VCO-3 (d) VCO-4

The measured power spectrum of the highest oscillation frequency from each of the four design examples are shown in fig.3-11. The highest frequency obtained in 130nm process is 12.4 GHz while the highest frequency in 65nm process is 25.07 GHz. It can clearly be seen that, with the same circuit topology, the achievable frequency is dramatically improved at more advanced process node. This is benefited by the less gate capacitance and smaller layout area, Moreover, with the premise that identical transistor sizes were used in each pair of design examples, it is reasonable that they should have similar frequency-voltage tuning range characteristics. However, as highlighted in fig.3-11 (c) and fig.3-11 (d), there is a considerable difference when comparing the highest oscillation frequency of VCO-3 and VCO-4. The reason is that the large inductor used in VCO-3 extend the signal wiring between each delay cell. This introduces a considerable additional parasitic capacitance, while the stacked-inductors used in VCO-4 reduce this impact. This implies that using a stacked-inductor

not only reduces the overall silicon area, but can also benefit the tuning range of the proposed VCO by having an intrinsically lower parasitic capacitance.



Fig. 3 - 12 Measured Frequency Tuning Characteristics

In summary, the measured voltage-frequency tuning characteristics of each design examples are shown in fig.3-12, with the power consumption highlighted at the extremes of the tuning range. Using the measured results, the linearized gain of VCO ( $K_{VCO}$ ) can be obtained by (3.8)

$$K_{VCO} = \frac{f_{osc\_max} - f_{osc\_min}}{V_{max} - V_{min}}$$
(3.8)

As example of VCO-4, the tuning range covered over 3 octaves and the linearized  $K_{VCO}$  is approximately 24.8MHz/mV. Theoretically, the large  $K_{VCO}$  is easily causing great frequency variation due to the fluctuation on the control voltage. This fluctuation could possibly be varied over hundreds mili-Volts. Therefore, it requires another approach that can minimize the impact of the large  $K_{VCO}$ , and this will be introduced in later chapter.

As shown in fig.3-13, the measured phase noise results of each design examples at its corresponding highest oscillation frequency are plotted. In addition to this, the phase noise performances of VCO-4 at its middle and lowest oscillations frequency are provided. Several low pass filters are placed on the PCB to clean the power source, although some environmental interference can still be observed at several hundred KHz.



Fig. 3 - 13 Measured Phase Noise Results of Each Design Examples

Two conclusions that can be obtained from these measurements.

- 1) When tuning the oscillation frequency, phase noise performance does get slightly worse in the middle range, which is due to the fact that the inductor has a smaller quality factor Q at this lower frequency range.
- 2) It can be observed that both VCO-1/3 (bigger size inductor) show better phase noise performance than their counterpart VCO-2/4 (smaller inductor size),

which is because inductors used in VCO-1/3 have higher quality factor Q than the inductors used in VCO-2/4.

In order to evaluate the performance of all four design examples and compare the results with recentlypublished works, a standard approach is using Figure of Merits (*FOMs*), which can be illustrated by (3.9)[64]:

$$FOM = -PN(f_{off}) + 20 \log \frac{f_{osc}}{f_{off}} - 10 \log \frac{P_{DC}}{1mW} + 20 \log \frac{FTR}{10}$$
 (3.9)

where PN represents the phase noise at the offset frequency ( $f_{off}$ ),  $P_{DC}$  is the consumed power and FTR is the frequency tuning ratio which is given as (3.10)

$$FTR = \frac{f_{osc\_max} - f_{osc\_min}}{(f_{osc\_max} + f_{osc\_min})/2}$$
(3.10)

As it can see from (3.9), the FOM evaluates the performance from the aspects including the frequency tuning range, the power consumptions, phase noise with respect to the oscillation frequency and offset bandwidth. In these parameters, the phase noise results (*PN*) dominates the significant effects on the overall performance. For instance, every 1 dBc/Hz improvement in phase noise equivalent to decrease the power consumption by about 8mW and increase the FTR by about 24%. However, the trade-off is that with increasing the oscillation frequency, the phase noise is getting worse. Therefore, in order to obtain a better *FOMs* and maximum the advantage of inductor peaking VCO, a careful consideration is necessary on trade-off these parameters. On the other hand, it also should be noted that there is another parameter not be considered by the standard calculation of *FOMs* in (3.9) which is the area of the layout. Given that perspective, these four design examples give an investigation on comparing the area cost of the inductor to the overall performance.

Table 3-2 is the performance summary and comparison based on *FOMs* in (3.9). As it illustrates, all four VCO design examples demonstrate a considerably better performance when compared with the results shown in recent publications. Although these two small VCO examples indicate less *FOM* value compared with their design pairs, one parameter that cannot be ignored and has not be considered by *FOM* (3.9) is the area cost. The smallest design (VCO-4) only occupy 0.0085mm<sup>2</sup> silicon area, and this advantage will be more obvious in ultra-small process nodes (<40nm).

Table 3 - 2 Performance Summary and Comparison

| Reference | Process | Tuning<br>Range(Hz) | FTR    | $P_{DC}(mW)$ | $PN$ $	ext{dBc/Hz}@f_{off}/f_{osc}(	ext{Hz})$ | FOM   |
|-----------|---------|---------------------|--------|--------------|-----------------------------------------------|-------|
| [64]      | 65nm    | 2G-8G               | 120%   | 6.4          | -101@1M/4.2G                                  | 187   |
| [131]     | 130nm   | 0.5G-9.5G           | 180%   | 9            | -85@1M/5G                                     | 172.3 |
| [54]      | 130nm   | 1G-9.4G             | 161%   | 3.7          | -112@10M/6G                                   | 186   |
| [132]     | 130nm   | 1.8G-10.2G          | 139.4% | 5            | -88.4@1M/5.56G                                | 179.3 |
| [133]     | 180nm   | 10G-13.5G           | 25.6%  | 2.4          | -104.5@1M/12.7G                               | 188.2 |
| VCO-1     | 130nm   | 0.25G-<br>12.4G     | 192%   | 44.7         | -112.7@1M/12.4G                               | 203.7 |
| VCO-2     | 130nm   | 0.36G-<br>11.9G     | 189%   | 37.5         | -103.3@1M/11.89G                              | 195.3 |
| VCO-3     | 65nm    | 0.12G-<br>23.7G     | 198%   | 28.1         | -97.32@1M/23.7G                               | 196.3 |
| VCO-4     | 65nm    | 0.25G-25G           | 196%   | 29.9         | -95.6@1M/25.07G                               | 194.7 |

# 3.5 Summary

In this chapter, a new type of ring-based voltage controlled oscillator (RO-VCO) incorporating with peaking inductors technique is firstly presented, in order to break through the bandwidth limitation in typical RO-VCO. The principle of this idea is that the inserted inductor and associated capacitance within the structure forms a resonant peak in the high frequency range to expand the bandwidth. In addition to this, four design examples have demonstrated the feasibility of this topology and validate the proposed design in two different technology processes (IBM 130nm and TSMC 65nm). Three conclusions can be made from the outcomes of these design examples.

1) Experimental results show that the proposed circuit topology can take advantage of modern deep sub-micron processes, which may be used for clock generation and other similar applications in future advanced systems. The results indicate the better figure of merit in the field and the best combination of frequency and tuning range to the authors' knowledge.

#### Ultra-wide Tuning Range CMOS Ring-based VCO with Inductor Peaking

- 2) The stacked inductor have a significant advantage on compacting design area while comparing with the thick metal inductor, especially in more advanced process. The stacked inductor not just reduces the inductance but also provides a smaller inductor area, which can significantly reduce the signal wiring and decrease the parasitic capacitance to further improve the frequency bandwidth.
- 3) The quality factor Q of the inductor does influence the performance of the VCO. However, there is a trade-off between Q of the inductor and the physical inductor size. Therefore, for the practical design, it is designer's decision which performance they are pursuing for and this trade-off should be carefully considered.

For the useful future work, it is suggested to use the symmetrical inductor with differential delay cell and embed it in a PLL design.

# **Chapter 4 Theoretical Analysis of Dual-**

# **Loop Triple-Controlled PLL**

#### 4.1 Introduction

In the previous chapter, a new ring-based VCO incorporated with inductor peaking technique was presented, to provide an ultra-wide bandwidth of oscillation required for silicon photonics optical transceivers. Considering the requirements for stability and controllability of the whole transceiver, the VCO is required to be used in a phase locked loop (PLL).

In this chapter, the proposed inductive peaking VCO is embedded into a PLL system for generating high quality clock signals. Firstly, an analysis of different types of conventional PLLs with inductor peaking VCO is conducted to investigate the system compatibility and controllability. Secondly, a novel topology of PLL is proposed to solve the issues that are exposed by the use of conventional PLL structures. Finally, the chapter concludes with a comparison of different types of PLL structure and proposed structure to evaluate their performance.

# 4.2 Considerations of Inductor Peaking VCO in PLL

According to the findings from the previous chapter, using the inductor peaking technique in a ring-based voltage control oscillator is capable of providing not just an absolute super-high frequency, but also a ultra-wide frequency tuning range, usually can covered from several gigahertz to tens of gigahertz. The structure of the inductor peaking VCO that was demonstrated in chapter 3 (fig.3-7(b)) is replicated in a 40nm CMOS process. The frequency tuning characteristic is displayed in fig. 4-1.



Fig. 4 - 1 (a) Frequency Tuning Characteristics of Inductor Peaking VCO (b) Nonlinear Behaviour of the Gain of VCO

The highest frequency reaches to 36.5GHz while the lowest frequency is down to 1.42GHz with voltage scaling from 0 to 0.9V. After control voltage ( $V_{ct}$ ) passes 0.9V, the frequency is almost constant. This is because that the  $V_{gs}$  of load transistor in VCO is less than its threshold voltage, thereby the VCO circuit devices enter into the subthreshold region. If it is assumed that within these frequencies that the characteristic of frequency tuning is approximately linear, the gain of VCO ( $K_{VCO}$ ) can be obtained by using (4.1).

$$K_{vco} = \frac{f_{max} - f_{min}}{V_{max} - V_{min}} = \frac{36.5 - 1.42}{0.9 - 0} = 38.98 GHz/V \approx 40 MHz/mV$$
 (4.1)

With an interference fluctuation on  $V_{ct}$  with amplitude of 1mV, a 40MHz frequency variation will occur on the output. Observing the fig. 4-1(a),  $K_{VCO}$  is varied with the changes of control voltage if the nonlinearity is taken into consideration. To approximate the variation of  $K_{VCO}$ , it is assumed that the slope is linear every 50mV

step of  $V_{ct}$ . The calculated gains are shown in fig. 4-1(b). This shows that with scaling the control voltage,  $K_{VCO}$  is varied in a very large range, which is about 38MHz/mV higher than that of linear model and 36MHz/mV lower. It is therefore necessary to investigate the above issues within a phase locked loop.

# 4.3 Investigation of Classic PLL with Inductor Peaking VCO

#### 4.3.1 Charge-pump PLL with Inductor Peaking VCO

The first example is to examine the inductor peaking VCO with a charge-pump phase locked loop (CP-PLL), which is a commonly used PLL structure. Fig.4-2 shows the overall block diagram of CP-PLL. The implementation of PFD, charge pump and frequency divider are applied with classic structure which is introduced in fig. B-7, fig. B-12 and fig. B-16 (d) respectively.



Fig. 4 - 2 Block Diagram of Charge Pump PLL (CP-PLL)

The reference frequency defined for the CP-PLL is 200MHz, therefore, with a dividing ratio of 96, the expected output frequency from PLL is 19.2GHz. The loop bandwidth should therefore ideally be less than 20MHz to maintain a stable feedback system. The closed loop transfer function of the CP-PLL is provided in equation (2.26), therefore, the open loop transfer function can be obtained as (4.2)

$$H(s)|_{open} = \frac{K_{PFD\_I} \cdot K_F \cdot K_{VCO}}{N \cdot s}$$
 (4.2)

Moreover, from the perspective of response and stability in a control system, the system damping factor is set roughly at 1. Hence, the value of R,  $C_1$  and  $C_2$  can be calculated by equation (2.27) and equation (2.28) in chapter 2. Fig.4-3 displays the bode plot of open loop transfer function.



Fig. 4 - 3 Bode Plot of Open Loop Transfer Function

In this case, the loop bandwidth is set approximately 20MHz, with phase margin of 64 degree. All the parameter configuration values are summarized in following table.4-1.

Table. 4 - 1 Parameter's Configuration Values of CP-PLL

| Reference Frequency $(f_{ref})$              | 200MHz     |  |
|----------------------------------------------|------------|--|
| Charge Pump Current $(I_{pump})$             | 60 μΑ      |  |
| Gain of VCO $(K_{VCO})$                      | 40MHz/mV   |  |
| Dividing Ratio (N)                           | 96         |  |
| Resistance in Loop Filter ( <i>R</i> )       | 5.2ΚΩ      |  |
| Capacitance in Loop Filter ( $C_1$ , $C_2$ ) | 7pF, 0.4pF |  |
| Output Frequency $(f_{out})$                 | 19.2GHz    |  |

| Damping Factor (ζ)        | 1.07    |
|---------------------------|---------|
| Natural Frequency $(w_n)$ | 9.39MHz |

The gain of VCO shown in the above table is assumed to be the value of the linear model. The output frequency of CP-PLL is obtained in fig.4-4.



Fig. 4 - 4 Frequency Output of CP-PLL

It can see that CP-PLL has successfully locked within 280ns. However, the output shows some unexpected fluctuations on the frequency from 18.43GHz to 19.96GHz, which is about 1.5GHz peak to peak value. In fact, these fluctuations are caused by the reference spur that passed through the PFD and inherent in-band noise of VCO. These periodically change with a time interval of 5ns, in relation to the 200MHz reference source. As shown in Appendix A.3, a reference spur manifest itself on the control voltage node as a ripple signal, and as already discussed, a slight voltage ripple will generate a large frequency variation on the output due to the large  $K_{VCO}$  of the inductor peaking VCO. Fig.4-5 displays the period jitter performance of CP-PLL under current situation. As the histogram illustrates, the standard deviation to the centre (zero jitter) is about  $\pm 636fs$ , which is equivalent to approximate 2.5%

variation in every clock period. And it should be noted that the power supply is completely clean in above simulation results. However, in the practical world, there is more noise interference on power supply, hence, the jitter performance will degrade even more.



Fig. 4 - 5 Period Jitter Performance of CP-PLL

There is another aspect that should not be ignored which is the nonlinearity of the large  $K_{VCO}$  value. With defined main design parameters (including  $I_{pump}$ , R,  $C_1$ ,  $C_2$  and dividing ratio N), the characteristic response of CP-PLL should be constant. However, for switching to different output frequency, the control voltage of VCO needs to make corresponding changes. Due to the nonlinear behaviour of  $K_{VCO}$ , this will result in a significant variation of the PLL's performance.

Given that the Eq.(4.3) and Eq.(4.4), the most direct influence is on the damping factor and nature frequency of PLL. Fig.4-6 indicates the effect of damping factor ( $\zeta$ ) and natural frequency ( $w_n$ ) with  $V_{ct}$  scaling.

$$w_n = \sqrt{\frac{I_{pump}K_{vco}}{2\pi NC_1}} \tag{4.3}$$



Fig. 4 - 6 Influences of Nonlinear Behaviours (a) Damping Factor (b) Natural Frequency

Due to the large  $K_{VCO}$ , this nonlinearity cause damping factor varying from 0.34 to 1.52 for different frequency outputs, which will greatly affect the response characteristic of the system. In addition, according to [28, 95], the relationship between loop bandwidth  $(w_c)$ , damping factor ( $\zeta$ ) and natural frequency  $(w_n)$  is explained by (4.5)



Fig. 4 - 7 Tendency under Nonlinear Behaviours (a) Loop Bandwidth (b) Phase Margin

Fig.4-7 demonstrates the effect of large  $K_{VCO}$  on the PLL's loop bandwidth ( $w_c$ ) and corresponding phase margin in non-linear situations. As seen in fig.4-7(a), the pre-set bandwidth is 20MHz. However, due to the serious nonlinearity on large  $K_{VCO}$ , the loop

bandwidth of CP-PLL varies significantly (3.25MHz to 35.2MHz) over the full control voltage. In addition, when it comes to the curve of phase margin in fig. 4-7(b), with the control voltage between 0.2V to 0.4V and over 0.65V, the corresponding phase margin is degraded below 60 degree.

#### 4.3.2 Dual-tuning PLL with Inductive Peaking VCO

To overcome the issues that explored in the previous section, a dual tuning PLL structure is introduced in section 2.5.1, in which the structure can effectively reduce the sensitivity of the  $K_{VCO}$ .



Fig. 4 - 8 Block Diagram of Dual Tuning PLL (DT-PLL)

Fig.4-8 shows the block diagram of dual-tuning PLL with inductor peaking VCO (DT-PLL). It should note that the structure of PFD, charge-pump and frequency divider are still applied with the same circuit that used in charge-pump PLL in section 4.3.1.

The PFD and charge-pump are still using the same structure as the previous CP-PLL example. The difference is that a Gm-C integrator is added to the original voltage control node to form a coarse tuning path to VCO, and the original voltage control node becomes the fine tuning path. The functionality of the integrator can be acquired by using an operational transconductance amplifier (OTA) with a capacitance load. Therefore, the control voltage is split into two parts: a coarse tuning voltage ( $V_{coarse}$ ) and fine tuning voltage ( $V_{fine}$ ). Because of the narrow bandwidth of Gm-C integrator (in this case is set about 200KHz),  $V_{coarse}$  is varying in a very slow rate according to the changes of  $V_{fine}$ . As consequence, there is more interference noise or leaked ripples

from charge pump are filtered out of  $V_{coarse}$ . With  $V_{coarse}$  moving towards and approaching the expected frequency band,  $V_{fine}$  starts to simultaneously reflect the phase difference detected by PFD and becomes the dominate control voltage to VCO. Therefore, the dominate gain of DT-PLL for settling is the gain on the fine control path.

To adjust VCO with this characteristic of DL-PLL, two control voltages are involved within inductor peaking VCO to realize the function of coarse tuning ( $V_{coarse}$ ) and fine tuning ( $V_{fine}$ ). The sensitivity of coarse tuning and fine tuning can be varied by the transistor size ratio between  $M_1$  and  $M_2$ . In this case, the size of  $M_2$  is set four times bigger to that of  $M_1$  to allow a large difference of sensitivity between coarse tuning and fine tuning. The structure of dual-tuning VCO is shown in fig.4-9.



Fig. 4 - 9 Applied Structure of Dual Tuning VCO

By applying the structure in above figure, frequency tuning characteristic for both coarse control and fine control can be acquired, and they are illustrated in fig.4-10.



Fig. 4 - 10 Frequency Tuning Characteristics of (a) Coarse Tuning Voltage (b) Fine Tuning Voltage

In fig.4-10(a), which is the response of the coarse tuning voltage to frequency, the overall gain is still very large and suffering from nonlinearity. But as seen in fig.4-10(b) which is fine tuning response (the depicted curve of fine tuning are obtained when coarse tuning voltage sets to 0mV, 200mV, 400mV, 600mV, 800mV respectively), it is clear to find that the frequency varying on each tuning curve is small, which means the gain of each fine-tuning voltage has a dramatically reduced comparing to that of coarse tuning. Therefore, theoretically speaking, the immunity to the fluctuation of DT-PLL should be improved compared with CP-PLL. To prove this prediction, the same investigation is made to DT-PLL. The same reference frequency (200MHz) and dividing ratio (1/96) are applied, which provides the output frequency of 19.2GHz. In addition, the  $K_{vco\_fine}$  can be realized with a known  $V_{coarse}$ . As indicated in fig.4-10(a), the  $V_{coarse}$  is 390mV for providing the output frequency of 19.2GHz, which the corresponding fine tuning characteristic is plotted as a blue dash curve in fig.4-10(b).  $K_{vco\_fine}$  is about 6.4MHz/mV. The parameter of configurations are listed in table.4-2

Table. 4 - 2 Parameter's Configuration Values of DT-PLL

| Reference Frequency $(f_{ref})$         | 200MHz     |
|-----------------------------------------|------------|
| Charge Pump Current $(I_{pump})$        | 120 μΑ     |
| Gain of VCO (K <sub>VCO</sub> )         | 6.4MHz     |
| Dividing Ratio (N)                      | 96         |
| Resistance in Loop Filter (R)           | 10ΚΩ       |
| Capacitance in Loop Filter $(C_1, C_2)$ | 6pF, 0.3pF |
| Output Frequency $(f_{out})$            | 19.2GHz    |
| Damping Factor (ζ)                      | 1.09       |
| Natural Frequency $(w_n)$               | 5.8MHz     |
| Loop Bandwidth $(w_c)$                  | ≈15MHz     |
| Phase Margin                            | 65 deg     |

In order to guarantee the stability of PLL, the parameters  $(R, C_1 \text{ and } C_2)$  are made corresponding adjustments. Fig.4-11 shows the output frequency of DT-PLL. As it can see that the DT-PLL is successfully settled and locked to the expected frequency band. To zoom in to the details, the frequency variation on the output is ranged from

19.108GHz to 19.28GHz, which is reduced to 172MHz swing compared to that of CP-PLL example. These result illustrates that the previous prediction is correct, which leads to an understanding that the quality of output frequency can be improved by reducing the gain of VCO.



Fig. 4 - 11 Frequency Output of DT-PLL



Fig. 4 - 12 Period Jitter Performance of DT-PLL

Fig.4-12 shows the period jitter of DT-PLL. The standard deviation to centre (zero jitter) is reduced to 99.2fs, which the jitter variation occupied in every clock cycle of 19.2GHz frequency is now decreased to 0.38% in percentages.

Another analysis in previous example of CP-PLL is related to the nonlinearity issue of large  $K_{VCO}$ . This is also investigated in the DT-PLL example. It is an undeniable fact that the slight nonlinear tuning still exists in DT-PLL. However, because of the split tuning mechanism in DT-PLL, the dominated gain for DT-PLL to settle is decided by  $V_{fine}$  rather than  $V_{coarse}$ . And since  $K_{vco\_fine}$  has very small value, this nonlinear issue somehow is suppressed. The tendency of  $K_{vco\_fine}$  scaling with  $V_{coarse}$  has been illustrated in fig.4-13(a), and it is compared with CP-PLL.



Fig. 4 - 13 Nonlinear Behaviours of (a) Gain on  $V_{coarse}$  (b) Damping Factor (c) Loop Bandwidth (d) Phase Margin

In Fig.4-13(a), it can be seen that, compared with the gain of VCO in CP-PLL,  $K_{vco\_fine}$  in DT-PLL is greatly reduced. In addition, within the full voltage tuning range, it appears as relatively linear, which the maximum gain difference is less than 4.72MHz/mV. Based on this outcome, it can be predicted that the other parameters,

such as damping factor, loop bandwidth and phase margin, are all in some extent improved as well. Fig.4-13(b) shows the damping factor results scaling with  $V_{coarse}$ . The damping factor of DT-PLL is varied closely around the critical damping ( $\zeta = 1$ ). The worst damping factor is 0.61 when  $K_{vco\_fine}$  has the lowest value. Compared with the CP-PLL, DT-PLL presents more stable performance with the control voltage varies. In fig.4-13(c), which is the loop bandwidth within full voltage tuning range of DT-PLL. The pre-set value is about 15MHz, and similarly, benefit from the small  $K_{vco\_fine}$  of DT-PLL, the loop bandwidth is gathering around 15MHz. Lastly, for the phase margin in fig.4-13(d), DT-PLL achieves sufficient phase margin (over 60 degree) after 0.1V of  $V_{coarse}$ . Therefore, based on the above results, it is apparent that inductor peaking VCO is more suitable to be applied in dual-tuning structure PLL for better reliability and stability.



Fig. 4 - 14 Setting Time in DT-PLL

However, although it has been demonstrated that a dual-tuning PLL structure can greatly reduce the influence of large gain of VCO and suppress the affection of non-linear behaviour, there is still the issue of a long settling time which is illustrated in fig.4-14. As it indicates that the settling time of DT-PLL is depended on the response of coarse tuning. Once the coarse tuning settled, the fine tuning is swiftly locked to its

final value. As seen in the fig.4-14. The  $V_{coarse}$  requires about 3  $\mu$  s to be settled while  $V_{fine}$  only need roughly 600ns, which gives overall settling time of DT-PLL to be 3.6  $\mu$  s. This long settling time is because of the usage of Gm-C integrator, which results in an extremely low bandwidth on coarse tuning path. Therefore, to improve the settling time issue that happened in DT-PLL, a new type of PLL structure is proposed in the next section.

# 4.4 Proposed Dual-loop Triple-controlled PLL

#### 4.4.1 Trade-off between Loop Bandwidth and Settling Time

Before presenting the proposed PLL structure, the relationship between loop bandwidth and settling time is necessary to be investigated first. This can be explored easily by using CP-PLL with different loop bandwidth. Fig.4-15 shows the trade-off between loop bandwidth and settling time.



Fig. 4 - 15 Trade-off between Loop Bandwidth and Settling Time

The above figure explores the settling time of CP-PLL with loop bandwidth set to 35MHz, 20MHz, 15MHz and 10MHz respectively. The value of settling time is

measured after the control voltage moved into the range of  $\pm 5\%$  of the final value. According to the results, it can find that with larger loop bandwidth, it requires less time for PLL to lock to the desired output frequency. This is because the larger loop bandwidth has smaller resistance (R) and capacitance ( $C_1$ ,  $C_2$ ) on the loop filter. The charging and discharging operation happened on loop filter becomes faster than that of smaller loop bandwidth, which allows conversion of detected phase difference to VCO's control voltage more quickly. Therefore, an important understanding based on the above investigation can be claimed is that, to improve the settling time of PLL, an efficient approach is to increase the loop bandwidth.

#### 4.4.2 Bandwidth Enhancement

As discussed in previous section, the settling time of DT-PLL is depended on the loop bandwidth on coarse tuning path, which is equivalent to the bandwidth of the integrator. However, reducing the bandwidth for shorting the settling time will cause the stability issue for conventional DT-PLL[87]. This claim has been experimented by increasing the unity bandwidth of integrator to 6MHz while the initial value is 200KHz. The result is shown in fig.4-16



Fig. 4 - 16 Instability Issue in DT-PLL

As it can see from fig.4-16, instead of locking into a stable state, both  $V_{coarse}$  and  $V_{fine}$  are ended at endless oscillation. This is because, by increasing the unity bandwidth of integrator, it also increases the sensitivity of coarse tuning to the phase error that detected by the PFD. Because of this,  $V_{coarse}$  still respond to the changes of the phase error when  $V_{fine}$  is settling to the desired frequency. However, as the  $K_{vco\_fine}$  is much smaller than  $K_{vco\_coarse}$ , which makes fine tuning insufficient to adjust and lock the output frequency before coarse tuning takes action, the  $V_{coarse}$  and  $V_{fine}$  are ended in such "chasing state".



Fig. 4 - 17 Proposed Structure of Loop Filter

To overcome the above issue, the integrator on coarse tuning path is replaced by the structure that shown in fig.4-17. Instead of using an active integrator, a passive RC low pass filter is applied to convert the charge pump current into  $V_{coarse}$ . The input of RC network of  $V_{coarse}$  is connected to the node between  $R_I$  and  $C_I$ . One of benefits by applying this approach is the DC operation point of  $V_{coarse}$  is following the change of  $V_{fine}$ , which means  $V_{coarse}$  and  $V_{fine}$  will be settled at same voltage level eventually. Therefore, the "chasing" issue is avoided. Moreover, the bandwidth on coarse tuning path can be adjusted by varying  $R_2$  and  $C_3$ . Therefore, to increase the bandwidth on coarse tuning path, the corresponding value of  $R_2$  and  $C_3$  should be reduced.



Fig. 4 - 18 Equivalent Structure of Proposed Loop Filter

Theoretical Analysis of Dual-loop Triple-controlled PLL

The equivalent circuit of fig.4-17 is shown in fig.4-18. As it can see that the impedance on node  $X_A$  is obtained as

$$X_A = \frac{sR_2C_3 + 1}{s^2R_2C_1C_3 + s(C_1 + C_3)} \tag{4.6}$$

Looked into the node of  $V_{fine}$ , the transfer function of loop filter is presented as (4.7)

$$K_{F\_fine}(s) = \frac{V_{fine}(s)}{I_{cp}(s)} = \frac{R_1 + X_A}{sC_2(R_1 + X_A) + 1}$$
 (4.7)

Substituted (4.6) into (4.7) the transfer function can be rewritten as

$$K_{F\_fine}(s) = \frac{s^2 R_1 R_2 C_1 C_3 + s(R_1 C_1 + R_1 C_3 + R_2 C_3) + 1}{s^3 R_1 R_2 C_1 C_2 C_3 + s^2 (R_1 C_1 C_2 + R_1 C_2 C_3 + R_2 C_2 C_3 + R_2 C_1 C_3) + s(C_1 + C_2 + C_3)} (4.8)$$

In general,  $C_2$  is set about one tenth (or even less) of  $C_1$ . To simplify the (4.8) if neglecting  $C_2$ , which gives

$$K_{F\_fine}(s) = \frac{s^2 R_1 R_2 C_1 C_3 + s(R_1 C_1 + R_1 C_3 + R_2 C_3)}{s^2 R_2 C_1 C_3 + s(C_1 + C_3)}$$
(4.9)

The transfer function of loop filter by looking into  $V_{coarse}$  can be obtained as (4.10)

$$K_{F\_coarse}(s) = \frac{1}{s^3 R_1 R_2 C_1 C_2 C_3 + s^2 (R_1 C_1 C_2 + R_2 C_2 C_3) + s C_2}$$
(4.10)

Since the  $K_{vco\_fine}$  is the dominated the gain for PLL to be settled and locked to the final frequency, the open loop transfer function with proposed loop filter can be given as (4.11)

$$H(s)|_{open} = \frac{K_{PFD\_I} \cdot K_{F\_fine} \cdot K_{VCO\_fine}}{N \cdot s}$$
(4.11)

Substituted Eq.(4.9) into Eq.(4.11), the above equation can be rewritten as (4.12)

$$H(s)|_{open} = \frac{K_{PFD\_I}K_{VCO\_fine}[s^2R_1R_2C_1C_3 + s(R_1C_1 + R_1C_3 + R_2C_3) + 1]}{s^3NR_2C_1C_3 + s^2N(C_1 + C_3)} (4.12)$$

where  $K_{PFD\_I}$  is the transfer function of PFD and charge pump. To compare the proposed loop filter structure with traditional Gm-C integrator in previous experiment, the value of  $R_1$ ,  $C_1$  and  $C_2$  are maintained at same, while  $R_2$  and  $C_3$  are set to 14K $\Omega$ 

and 36pF respectively to have about 7MHz bandwidth on coarse tuning path. Fig.4-19 shows the corresponding settling response of both  $V_{coarse}$  and  $V_{fine}$ .



Fig. 4 - 19 Settling Time of DT-PLL with Proposed Loop Filter

Comparing these results with fig.4-16, it is clear to see that, under similar bandwidth on coarse tuning path, the unstable oscillation is eliminated from both  $V_{coarse}$  and  $V_{fine}$ . Moreover, the settling time is shortened to 620ns by proposed loop filter structure. Furthermore, as predicted in previous analysis, DC operation point of  $V_{coarse}$  and  $V_{fine}$  are merged together after PLL locked to the desired frequency.

The bandwidth on coarse tuning path should be kept lower than the loop bandwidth to remain  $K_{vco\_fine}$  as the dominant gain for settling. As results, the value of  $R_2$  and  $C_3$  cannot be limitlessly reduced.

On the other hand, another approach to further increase the bandwidth on coarse tuning path is by increasing the charge pump current. Fig.4-20 shows the frequency response of open loop transfer function in (4.12) with different charge pump current.



Fig. 4 - 20 Loop Bandwidth with Different Charge Pump Current

The loop bandwidth is proportional to the changes of charge pump current, but with a larger charge pump current, more reference interferences pass through the PFD and result in ripples on the VCO's control voltage, which lead to more frequency fluctuations on PLL's output. Therefore, rather than increasing the current within a single path of PLL, the proposed structure applies dual-PFD to bring the double current into the loop filter, while each current is maintained at original amount.



Fig. 4 - 21 Block Diagram of Proposed Dual-loop Triple-Control PLL (DLTC-PLL)

Fig.4-21 shows the block diagram of proposed dual-loop triple-controlled PLL<sup>1</sup> (DLTC-PLL). Two of proposed loop filters that shown in fig. 4-17 are combined together, sharing the same coarse tuning path. Using this approach, the current of individual fine tuning path are merged together, which provide doubled current to coarse tuning. In addition, there is an advantage of a dual loop path, where the reference frequency and feedback frequency can be applied in differential form. Therefore, within a single clock period, both rising edge and falling edge of reference clock are used to detect the phase error between reference frequency and feedback frequency to achieve frequency acquisition. In the meanwhile, there are three control voltages are acting on VCO at same time. Apart from the  $V_{coarse}$ , one of fine control voltage ( $V_{fine1}$ ) is passing the phase difference of rising edge to VCO while the other fine control voltage ( $V_{fine2}$ ) is carrying the information of falling edge. And somehow by this feature, the settling process of proposed PLL is accelerated.



Fig. 4 - 22 Settling Response of Proposed DLTC-PLL

Fig. 4-22 shows the settling response of proposed DLTC-PLL. As it can see that, once the  $V_{coarse}$  is settled, it maintained at stable in the middle of  $V_{fine1}$  and  $V_{fine2}$ . And the frequency spur appeared on  $V_{fine1}$  and  $V_{fine2}$  have a half clock period interval, which in

the same structure as in charge-pump PLL (section 4.3.1) and dual-tuning PLL (section 4.3.2).

<sup>1</sup> It should note that the PFD, charge-pump and frequency divider used in DLTC-PLL are applied with

somehow formed a "pseudo differential" pattern. And this pattern can restrain or cancel the frequency fluctuation happened on each loop. For another thing, since both edges information are used, the reference frequency can be regarded as doubled. Therefore, the loop bandwidth on fine tuning path can be configured at twice bigger than the conventional DT-PLL, which is helpful to obtain better phase noise performance.

## 4.5 Performance Comparison

Based on the above analysis, the comparison in terms of jitter and settling time performance between typical CP-PLL, conventional DT-PLL and proposed DLTC-PLL are listed in the table. 4-3.

Table. 4 - 3 Performance Comparison between CP-PLL, DT-PLL and Proposed DLTC-PLL

| Туре                                 | CP-PLL        | DT-PLL      | DLTC-PLL      |
|--------------------------------------|---------------|-------------|---------------|
| Architecture                         | Single Tuning | Dual Tuning | Triple Tuning |
| Reference Frequency                  | 200MHz        | 200MHz      | 200MHz        |
| Output Frequency                     | 19.2GHz       | 19.2GHz     | 19.2GHz       |
| Settling Time                        | ≈265ns        | ≈3.6µs      | ≈560ns        |
| Frequency Fluctuation (peak-to-peak) | 1.5GHz        | 172MHz      | 178.2MHz      |
| Jitter (single-band)                 | 635.6fs       | 99.2fs;     | 95.7fs        |

In order to guarantee a fair comparison of PLL topology, all these three structures are simulated under the same testing scheme, including same power supply voltage (1.1V), same input reference frequency (200MHz), output load (1nF DC block capacitance and  $50 \Omega$  resistance), simulation temperature and process corner. Moreover, the basic building blocks (PFD, charge-pump, frequency divider) are implemented with the same structure as described in section 4.3 and section 4.4, in order to focus on the comparison between the three PLL topologies. As the comparison reveals, a dual-tuning topology can effectively suppress the periodical jitter and minimize the frequency fluctuation that induced by large  $K_{vco}$  of inductor peaking VCO. With the same reference frequency and output frequency, both DT-PLL and DLTC-PLL have greatly supress the frequency fluctuation and improve the performance of jitter down

to 99.2fs and 95.7fs respectively compared with CP-PLL (635.6fs). Apart from that, the proposed DLTC-PLL equipped with fast locking characteristic. Comparing the settling time with conventional DT-PLL, DLTC-PLL only requires 560ns to lock to the desired frequency and maintain stable, which the settling time is shorten by 3 µs.

## 4.6 Summary

In this chapter, the inductor peaking VCO that presented in previous chapter is embedded to different type of PLL structures. Because of the large gain of VCO, the unavoidable issues are exposed by conventional charge-pump PLL and dual-tuning PLL. Therefore, to solve the problem and provide an elegant solution, a novel topology of dual-loop triple-controlled PLL is proposed in 40nm technology process. Two fine control loops are combined with a coarse tuning path to precisely control and adjust the output frequency. By coarse tuning, the inductor peaking VCO is biased into the desired frequency range, after that, two fine control signals are alternatively tuning the output frequency to be accurate. As the gain of VCO on fine tuning path is small, the reference spur and inherent phase noise of VCO have less influence to the PLL's output. In addition to this, the system settling time is greatly reduced by using both the rising and falling edge of reference frequency for frequency acquisition. There are two conclusions can be made:

- 1) According to the analysis, the large gain of VCO has a significant influence on a conventional PLL topology. By comparing the inductor peaking VCO with different PLL structure, it is found that the proposed DLTC-PLL not only can suppress the frequency variation caused by reference spur and inherent phase noise of VCO, but also can achieve fast locking function in nano-seconds time, which make the inductor peaking VCO more practical for multiple protocol clock generation systems.
- 2) The settling time of PLL is the trade-off between loop bandwidth and overall noise performance. With larger loop bandwidth, VCO can be more faster response to the detected phase error. However, this will also make more spur leak through the PFD and cause frequency variation on the output. Therefore, the configuration of loop bandwidth should be carefully considered.

## **Chapter 5 Practical Implementation of**

## **Dual-Loop Triple-Controlled PLL**

## 5.1 Introduction

In the previous chapter, a prototype structure of dual-loop triple-controlled PLL (DLTC-PLL) that incorporates a VCO with inductor peaking was proposed. From the theoretical analysis of this circuit, the DLTC-PLL provides a solution for ultra-wide frequency generation with a short settling time, to fulfil the demands of multiple protocol communications in a high-speed wireline transceiver. To validate the feasibility of the proposed approach, a practical design example is demonstrated and fabricated in a 40nm CMOS technology process.

In this chapter, the design implementation of a DLTC-PLL is presented starting with the top level structure of the system. The detailed design of a DLTC-PLL is systematically described including the key functional blocks and output buffers. Finally, the post-layout simulations are compared to the measurement results.

## 5.2 Top level System Design

Fig. 5-1 shows the top level structure of the DLTC-PLL design example that was realized using a 40nm CMOS process. The entire design is divided into two main parts; one of which is the main functional core of the DLTC-PLL and the other part consists of the output buffers. Considering the functionality of multi-frequency generation, three reference sources were applied to the PLL in order to provide reliable controllability. The reference frequencies include two off-chip crystal oscillators of 400MHz ( $Ref_1$ ) and 300MHz ( $Ref_2$ ) respectively and a free-run reference ( $Ref_{free}$ ) which is activated in testing mode. In addition, to maximize the potential advantage of the ultra-wide bandwidth of an inductor peaking VCO, a digital control interface was added to the DLTC-PLL to allow switching of the division ratio of the frequency pre-scaler (CON[2:0]) and channel selection of the reference clock (S[2:0]). Each digital control signal contains 3-bits. Two additional reference voltages ( $V_{ref\_high}$  and  $V_{ref\_low}$ ) were applied to enhance the voltage tuning range of the VCO.



Fig. 5 - 1 Top Level Structure of DLTC-PLL

For the output buffers, there were two different types of output buffer implemented within the design example to serve two different purposes. Output buffer 1 is to extract the output frequency from DLTC-PLL and interface with testing equipment for matching a  $50\Omega$  impedance and improving signal quality. Output buffer 2 is for

monitoring the voltage changes of the coarse tuning path, thereby observing the settling response of the PLL.

The whole design example is powered by two supply voltages (1.1V and 2.2V).

## 5.3 Architecture of DLTC-PLL

The implemented structure of DLTC-PLL is shown in fig. 5-2



Fig. 5 - 2 Implemented Structure of DLTC-PLL

The inductor peaking VCO with three voltage control signals (fine tuning  $V_{fine1}$ ,  $V_{fine2}$  and coarse tuning  $V_{coarse}$ ) was used as the oscillation source to provide a wide frequency option. By implementing a frequency pre-scaler with 3-bit digital switching signals (CON[2:0]), 8 different division ratios can be selected. A 3-bit digital control signal (S[2:0]) combined with 2 off-chip crystal oscillators ( $Ref_1$  and  $Ref_2$ ) and a freerun reference ( $Ref_{free}$ ) were applied as a reference clock scheme. This provides 8 different reference clock modes including a testing mode. All signals within the DLTC-PLL are in differential form, which is convenient for detecting both the rising

and falling edges between the feedback frequency and the reference clock by using two fast acquisition PFDs. The pulse signals (UP/DOWN) that carry the detected phase information between the feedback and reference are converted into current form by a gate switch charge pump. After that, the integrated current is passed into the loop filter. To make each loop operate in a stable condition, the signal that carries the control voltage information is fed back to the charge pump simultaneously to dynamically adjust the charge pump current, thereby varying the loop bandwidth.

## 5.3.1 Inductor Peaking VCO with Voltage Tuning Range Enhancement

The tuning efficiency must be considered during the design of the proposed DLTC-PLL when using an inductor peaking VCO. For the implementation of the VCO itself, the control voltage is normally provided by an off-chip power source in a test or a virtual DC voltage during simulations, however, it becomes a more complex situation when embedding the VCO into a PLL as the control voltage is no longer provided from a separate source but from the charge pump. Therefore, the achievable voltage range should be carefully designed to maximize the advantage of the wide tuning range feature of the VCO.



Fig. 5 - 3 Source (Sink) Current Scaling with Drain-source Voltage of MOSFET

Several current mirrors are used to provide the charge pump current, however, it is well known that the mirrored current is highly depended on drain-source voltage  $(V_{ds})$  that is applied on the transistors. To maintain a constant stable current, the voltage

headroom of  $V_{ds}$  should be high enough to allow transistor operation in the deep saturation region. The saturation current is given as (5.1).

$$I_{saturation} = \beta \frac{W}{L} (V_{gs} - V_{th})^2 (1 + \lambda V_{ds})$$
 (5.1)

As this equation reveals, the current is varied with the change of  $V_{ds}$  when the gate voltage ( $V_{gs}$ ) of transistor is constant. For example, as explained in the chapter 3, the oscillation frequency of inductor peaking VCO is inversely proportional to the control voltage. Therefore, to increase the oscillation frequency, the control voltage is decreased by sinking the current of the loop filter into ground which will cause the changes of  $V_{ds}$  in charge pump. For this reason, the dynamic range of current is investigated by scaling the  $V_{ds}$ , which can be illustrated by fig. 5-3. To provide the required constant 30  $\mu$  A current, both the source and sink current were analyzed. When  $V_{ds}$  decreases toward to 0V, the sink current starts to drop and a similar thing happens to the source current when  $V_{ds}$  is approaches the power supply. The range for a constant 30  $\mu$  A current is small and limited.

Therefore, taking the dynamic range of charge pump into consideration, a solution of powering the VCO using two voltage regulators is proposed. Fig. 5-4 is the structure of the implemented inductor peaking VCO.



Fig. 5 - 4 Implemented Inductor Peaking VCO with Voltage Regulation

The power rail for the VCO is regulated by those two voltage regulators, thereby, with the premise of providing enough voltage for powering the VCO, this power rail can be dynamically moved up and down depending on charge pump. As shown in fig.5-4, the high voltage node ( $V_{high}$ ) of the VCO is regulated by a N-type voltage regulator while the low voltage node ( $V_{low}$ ) is provided by a P-type voltage regulator. The input

of those regulators are connected to two off-chip reference voltages ( $V_{ref\_high}$  and  $V_{ref\_low}$  respectively). In the 40nm process node, the nominal power supply is 1.1V, therefore, the voltage between  $V_{high}$  and  $V_{low}$  is set to 1.1V as well. In addition to this, another advantage of applying the voltage regulator is high PSRR (power to noise rejection ratio), which improves the noise immunity of the VCO to power supply variations.

The delay cell of inductor peaking VCO is shown in fig. 5-4. The original VCO with inductor peaking that was presented in chapter 3 is modified to a differential structure for better noise performance. Two additional transistors for fine control are paralleled with the coarse control transistor. Moreover, an inverter-based latch is inserted in the middle to ensure fully differential oscillation. Considering the conclusions in chapter 3 that a stacked inductor has a significant advantage in area, especially for more advanced process nodes, the same methodology is applied here. The inductors used in the 40nm design example are all full custom designs with the same layout strategy. The detailed parameters are listed in table. 5-1.

Table 5 - 1 Detailed Parameters of Inductor Peaking VCO in DLTC-PLL

| Parameters         | Values          |
|--------------------|-----------------|
| $W/L_{1,5}$        | 25.6 μm/0.04 μm |
| W/L <sub>2,8</sub> | 36 µm/0.04 µm   |
| W/L <sub>3,7</sub> | 4 μm/0.04 μm    |
| W/L4,6             | 4 μm/0.04 μm    |
| $L_{1,2}/Q$ factor | ≈126pH/13       |

For frequency bandwidth enhancement, the transistor size of  $M_{2,8}$  is set slightly larger than that of  $M_{1,5}$ , using the principle described and analyzed in chapter 3. In addition, to clearly distinguish the gain in coarse tuning from fine tuning, the transistor size of  $M_{3,4,6,7}$  is one ninth of  $M_{2,8}$ . Moreover, the post-layout simulation results of coarse tuning and fine tuning are shown in fig.5-5(a) and fig.5-5(b). As it can see that the overall frequency tuning range of coarse control is from 2.53GHz to 36.8GHz which results in a linearized gain of VCO on coarse tuning ( $K_{vco\_coarse}$ ) at 42.83GHz/V. As for the fine tuning characteristics, by setting with different  $V_{coarse}$ , each fine tuning

curve can be obtained. There are two things should be realized in fig.5-5(b). For one thing, in terms of each fine tuning curve, the slope over full voltage range is different. This is caused by the difference of  $K_{vco\_coarse}$  at different  $V_{coarse}$ . For another thing, the gap between each fine tuning curve has a large difference in the middle when  $V_{coarse}$  is ranged from 0.2V to 0.5V. This is caused by the nonlinearity of coarse tuning characteristic. Therefore, to simplify the fine tuning characteristic, the average gain of fine tuning ( $K_{vco\_fine}$ ) over different  $V_{coarse}$  is 4.67MHz/mV.



Fig. 5 - 5 Tuning Characteristic of Proposed Inductor Peaking VCO (a) Coarse Tuning to Oscillation Frequency (b) Fine Tuning to Oscillation Frequency

On the other hand, the implemented inductor peaking VCO is applied with differential topology as it has better performance on suppressing the common mode noise and canceling the influences through the differential branches. To investigate the improvement, the PSRR performance of proposed inductor peaking VCO is simulated and compared with a single-end VCO. Fig.5-6 indicates the ratio of the amplitude variation on the power supply voltage to the output phase variation of the output frequency. Theoretically, the larger the PSRR is, the better the immunity to the noise of power supply. As it shows in fig.5-6, the differential VCO has approximately 5dB improvement compared to the single-end structure VCO.



Fig. 5 - 6 PSRR Comparison between Differential VCO and Single-end VCO

## 5.3.2 Adaptive DC Shifting Unit

Another block required in the practical implementation of DLTC-PLL is an adaptive DC shifting unit. This block acts like an inner interface to connect the inductor peaking VCO with the following blocks, either the frequency pre-scaler or the output buffer 1. To achieve this functionality, a well-defined DC operating point is the premise. The challenge of this block is the DC voltage level of oscillation frequency from the VCO varies with the changes in oscillation frequency. For example, to increase the VCO's frequency,  $V_{ct}$  is lowers to decrease the effective resistance of the load transistor which causes the DC level of oscillation frequency to go up. Therefore, an adaptive adjusting mechanism is required before the frequency signal is processed by the next stage.

Fig. 5-5 illustrates the structure of an adaptive DC shifting unit. The whole block contains a current adjusting stage and three identical source follower stages. The input of the three source followers comes from each delay cell of the VCO. Two of the source followers are forwarded to the next stages of which one is passed to the frequency pre-scaler and forms the feedback path while the other is forwarded to output buffer 1. To avoid the issue of unbalanced oscillation, a third replicated source follower is used as a dummy stage to keep the load of each delay cell in the VCO the same. All of the source followers are applied using differential mode with an embedded cross-coupled latch.



Fig. 5 - 7 Implementation of Adaptive DC Shifting Unit

As the DC voltage of oscillation frequency from the VCO is inversely varied with the changes of  $V_{coarse}$ , a parallel transistor  $M_1$  is added where the gate voltage is dynamically controlled by  $V_{coarse}$  in order to adaptively make adjustment to the changes of  $V_{coarse}$ . For example, when  $V_{coarse}$  is increasing (the DC voltage of oscillation frequency from VCO is decreasing), the DC current in the adjusting stage is decreased as the parallel resistance is larger, thereby, the mirrored current in each source follower stage is smaller. This makes the DC voltage level in each source follower stage increased to compensate the decreased DC level of oscillation frequency.



Fig. 5 - 8 Output Signals of VCO and DC Shifting Unit

Fig.5-8 shows the output of VCO and DC shifting unit separately. It can see that the voltage range of VCO is lifted up because of the implementation of two voltage

regulators. The DC operation point is around 1V shown in this case. In order to adapt the change of VCO's voltage range and provide an adequate DC point for the following stages (frequency divider and output buffer), the DC shifting unit is dynamically mapping the oscillation signal to where the DC point is about 0.6V through the source follower stage. Although the voltage swing of the output signal from DC shifting unit is slightly smaller than that of original output of VCO, the output voltage swing (peak to peak) of about 600mV with 0.6V DC operating point is capable to provide an adequate input for frequency divider and output buffer. Moreover, another advantage can be found is that DC shifting unit bring the voltage range back to 1.1 nominal power supply which make it more compatible to other modules voltagewise.

The key parameters of the adaptive DC shifting unit is listed in table. 5-2

Table 5 - 2 Detailed Parameters in Adaptive DC Shifting Unit

| Parameters         | Values       |
|--------------------|--------------|
| W/L <sub>1</sub>   | 12 μm/2 μm   |
| W/L <sub>2</sub>   | 1 μm/4 μm    |
| W/L <sub>3</sub>   | 4 μm/0.04 μm |
| W/L <sub>4,5</sub> | 4 μm/0.04 μm |
| W/L <sub>6,7</sub> | 8 μm/0.04 μm |

#### **5.3.3** Charge Pump with Voltage Protection

The implemented charge pump is developed from a prototype structure of gate switch charge pump, for which the structure is shown in fig. 5-9.

The switching gates  $(M_{1,2,3,4})$  are placed at the path to the gate of  $M_5$  and  $M_8$  rather than on the output branch, compared with the conventional charge pump that is shown in Appendix B.2. Using this approach, the gate switch charge pump effectively releases the voltage headroom issue on the output branch.

Given the demand for a wide tuning characteristic, the VCO is powered by two voltage regulators which causes a corresponding modification in the charge pump:



Fig. 5 - 9 Implementation of Gate Switch Charge Pump with Voltage Protection

- 1) Since the voltage tuning range in the VCO is lifted-up by two voltage regulators, in order to be able to provide the required range, the power supply of the charge pump is increased to 2.2V. Correspondingly, considering the reliability of the circuit, the general low threshold type (LVT) MOSFET is replaced by high I/O transistor. However, the high I/O transistor is much slower than LVT MOSFET. Therefore, to maintain the fast switching functionality, the transistors ( $M_{1,2,3,4}$ ) used for the current switch are the LVT type.
- 2) Due to the increased power supply, the achievable dynamic range on the output of charge pump is expanded. Considering the breakdown issue in the 40nm process, a protection mechanism is implemented. As seen in fig.5-9, two additional transistors of MVP and MVN are inserted and act as the function of source follower. By this way, those transistors can make a threshold voltage higher ( $V_{thp}$ ) or lower ( $V_{thn}$ ) to their gate voltage. Therefore, the tuning range on the output of the charge pump is covered from  $V_{thp}$  to  $V_{ref\_high} V_{thn}$ . Additionally, to reduce the affection of unnecessary body effect, the transistor type used for  $M_{VP}$  and  $M_{VN}$  is the high I/O deep-n-well (DNW) MOSFET. In 40nm technology process, the threshold voltage of the high I/O MOSFET is about 450mV. Therefore, the reference voltage to the VCO can be decided, where  $V_{ref\_low}$  is set to 500mV while  $V_{ref\_high}$  is 1.6V with 1.1V voltage room, given that breakdown issue. To verify the functionality of protection

mechanism, fig.5-10 illustrates the simulation results by saturating the charge pump current towards to one direction. For example, assume that the  $f_{feedback}$  is faster than  $f_{ref}$ , source current will charge the loop filter to increase the control voltage and maintain saturated at end. On the contrary ( $f_{feedback}$  is slower than  $f_{ref}$ ), sink current will be saturated by discharging the loop filter for decrease the control voltage. As shown in fig.5-10, the saturated source current and sink current are displayed separately. As expected, the source current is stable and maintained at highest level of 1.4V while sink current is maintained at 0.4V which provides the dynamic range on the output node of charge pump within 1V difference to avoid the risk of breakdown.

3) Finally, there is another issue about the loop bandwidth to be considered. For example, once the output frequency of DLTC-PLL is changed either by frequency pre-scaler or reference clock, the loop bandwidth should be correspondingly changed to make compensation. For this reason, the function of adaptive bandwidth is applied to the charge pump, which is controlled by Madp. The gate voltage of Madp is fed back from the loop filter, which is proportionally varied with the changes of fine control voltage. By passing the changes of control voltage back to the charge pump, the current can be dynamically adjusted based on the situation on the VCO's control node, thereby to vary the loop bandwidth.



Fig. 5 - 10 Illustration of Protection Mechanism of Charge Pump

Since the power supply for charge pump has increased, the gate voltage set on  $M_3$  and  $M_4$  need to be increased as well, so that the source current switches can be fully turned on/off. But the type of transistor applied on those switches ( $M_1$ ,  $M_2$ ,  $M_3$ ,  $M_4$ ) are 1.1V LVT MOSFET which means the gate voltage cannot be set too large due to the breakdown issue. Therefore, the DC voltage level on the gate of  $M_3$  and  $M_4$  need to be increased in the range of 1.1V to 2.2V.

### 5.3.4 Fast Acquisition PFD

Fig. 5-11 is the implemented fast acquisition PFD combined with function of pulse level shifting, to meet the demands of the charge pump.



Fig. 5 - 11 Implementation of Fast Acquisition PFD

The pulse width of the UP ( $UP_P$  and  $UP_M$ ) and DOWN ( $DN_P$  and  $DN_M$ ) signals depends on the number of stages that are set to M and N. To provide enough pulse width to fully switch on/off the charge pump current, the M and N are configured at 3 and 7 respectively, giving a pulse width of 52ps. In addition, by implementing an output delay line that contains a transmission gate paralleled with an inverter, both UP and DOWN signals are implemented in a differential format. The propagation delay of an inverter is roughly 5ps. Therefore, to minimize the delay mismatch in the delay line, the size of the transistor ( $M_N$  and  $M_P$ ) in the transmission gate is set to 4  $\mu$ m and 10  $\mu$ m respectively to have similar delay time.

The most significant modification made on PFD is the pulse level shifting function. According the design requirement of the charge pump that was described in the last section, the *UP* signal should be levelled up to the range between 1.1V to 2.2V. Two level shifters are added on the end of the *UP* signal path to achieve this function. As shown in fig.5-11, the voltage swing of both *UP* and *DOWN* pulses are 1.1V while the operating DC of *UP* is in the range from 1.1V to 2.2V and *DOWN* is maintained at 0 to 1.1V.



Fig. 5 - 12 Illustration of Level Shifting between UP and DOWN Pulses

This functionality is simulated and demonstrated in fig.5-12. As expected, DC level of *UP* pulse and *DOWN* pulse is separated into two different voltage range. Moreover, with PLL gradually working into locking state, the *UP* and *DOWN* signals are closing to the pulse width of 52ps in order to have same charge and discharge current to maintain a stable oscillation.

## 5.3.5 Integer-N Frequency Pre-scaler

One of the big advantages of a DLTC-PLL is the multiple frequency output option across a wide tuning range. To achieve this functionality, the conventional frequency divider is replaced by a programmable integer-N frequency pre-scaler, which is placed at the feedback path of the DLTC-PLL. Fig. 5-13 shows the implemented integer-N frequency pre-scaler.



Fig. 5 - 13 Implementation of Integer-N Frequency Pre-scaler

The integer-N frequency pre-scaler is developed in several stages to adapt to the various frequencies during the dividing procedure. Different types of latch are capable of producing different frequency bands. Considering power efficiency and system complexity, three stages of CML latch-based divide-by-2 are implemented as frontend stages in order to cope with the high frequency input from the VCO. In addition, to enhance the achievable bandwidth of the CML latch, an inductor peaking technique was also applied to both master and slave latches. After the first three high-speed stages, the frequency is dramatically reduced and the speed is no longer suitable for a CML based divider. Therefore, to effectively interconnect CML stages with following dividing stages, a CML to CMOS converter is inserted to improve the signal quality into a rail to rail voltage swing.

For the next stage, a programmable 3-stage dual modulus is implemented to digitally switch the dividing ratio. The first two stages within the dual modulus are developed in the TSPC latch for maintaining a relatively high-speed operation and lowering the power consumption, while the final stage of the dual modulus is implemented with a classic logic gate latch. The diving ratio of the dual modulus is switched by a 3-bits digital signal (CON[2:0]) which is provided off-chip. The dividing ratio of the whole 3-stage dual modulus is given as (5.2)

$$div_{ratio} = 8 + CON_0 2^0 + CON_1 2^1 + CON_2 2^2$$
 (5.2)

For a 3-stage dual modulus divider, the basic dividing ratio is 8. Each digital bit occupies different weight. The overall dividing ratio of dual modulus divider is ranged from 8 to 15.

However, one concern about the dual modulus divider is the duty cycle as the output clock may not be 50%. This is caused by the clock cycle swallowing in order to achieve an odd dividing ratio. Since both the rising and falling edge of the divided frequency are used to compare with the reference frequency, a simple approach to avoid this duty cycle issue is to add another divide-by-2 divider after the dual modulus stage. The final stage is designed with a pure logic gate divider as the frequency that is coped with is relatively low.

Using this structure, the final division ratio that has been obtained by integer-N frequency pre-scaler is in the range from 128 to 240 with a step size of 16 for single bit switching.

#### 5.3.6 Reference Clock Scheme

As for the reference clock interface, a multiple clock scheme is applied to DLTC-PLL in order to provide more flexibility for achieving different output frequencies.



Fig. 5 - 14 Implementation of Reference Clock Scheme

As shown in fig. 5-14, the two off-chip crystal oscillators (Ref1=400MHz and Ref2=300MHz) are divided by several stages of divider which generates 7 different sub-frequencies. In addition to this, another off-chip free run oscillation source is used

for testing mode. All the 8 different frequencies are selected by a 8-to-1 multiplexer which is controlled by a 3-bit digital signal (S[2:0]).

Table 5 - 3 Look-up Table of Potential Output Frequencies

| S[2:0]<br>CON[2:0] | 001<br>(25M) | 010<br>(37.5M) | 011<br>(50M) | 100<br>(75M) | 101<br>(100M) | 110<br>(150M) | 111<br>(200M) |
|--------------------|--------------|----------------|--------------|--------------|---------------|---------------|---------------|
| 000 (Div =128)     | 3.2GHz       | 4.8GHz         | 6.4GHz       | 9.6GHz       | 12.8GHz       | 19.2GHz       | 25.6GHz       |
| 001 (Div =144)     | 3.6GHz       | 5.4GHz         | 7.2GHz       | 10.8GHz      | 14.4GHz       | 21.6GHz       | 28.8GHz       |
| 010 (Div =160)     | 4GHz         | 6GHz           | 8GHz         | 12GHz        | 16GHz         | 24GHz         | 32GHz         |
| 011 (Div =176)     | 4.4GHz       | 6.6GHz         | 8.8GHz       | 13.2GHz      | 17.6GHz       | 26.4GHz       | 35.2GHz       |
| 100 (Div =192)     | 4.8GHz       | 7.2GHz         | 9.6GHz       | 14.4GHz      | 19.2GHz       | 28.8GHz       | 38.4GHz       |
| 101 (Div =208)     | 5.2GHz       | 7.8GHz         | 10.4GHz      | 15.6GHz      | 20.8GHz       | 31.2GHz       | 41.6GHz       |
| 110 (Div =224)     | 5.6GHz       | 8.4GHz         | 11.2GHz      | 16.8GHz      | 22.4GHz       | 33.6GHz       | 44.8GHz       |
| 111 (Div =240)     | 6GHz         | 9GHz           | 12GHz        | 18GHz        | 24GHz         | 36GHz         | 48GHz         |

Combined with the dividing option that is provided by the frequency pre-scaler, the potential output frequencies of the DLTC-PLL are listed in table. 5-3. Using this clock generation scheme, the potential frequencies range from 3.2GHz to 48GHz, which covers the full achievable frequency range of the VCO applied in the DLTC-PLL<sup>2</sup>.

Considering signal quality, all three reference sources are interfaced with a Schmitt trigger before selection. Moreover, a single-to-differential converter is placed at the end, not just to improve the signal quality but also to provide the differential signal for the usage of PFD.

## 5.4 Output Buffer Stages

Two different types of output buffer have been implemented in the DLTC-PLL design for different purposes. Output buffer 1 is used to extract the output frequency from the DLTC-PLL and interface with testing environment while output buffer 2 is for

 $<sup>^{2}</sup>$  Note: the combination of 000 of S[2:0] is reserved for the testing mode.

observing the changes of coarse control voltage and measuring the settling time of DLTC-PLL.

## 5.4.1 Output Buffer 1

Since the function of output buffer 1 is to transfer broadband frequency signals, five stages of differential common source amplifier are developed as shown in fig. 5-15.



Fig. 5 - 15 Implemented Structure of Output Buffer 1

The first four stages aim to extend the bandwidth and amplifying the signals while the last stage is for obtaining 50 Ohm impedance matching. To provide enough current on the output stage for a large skew rate, the transistor's size in final stage is 4 times fanout than that of first stage. In addition, as the output buffer 1 is structured in proportional fan-out, serial inductors are placed among each stage to further improve the bandwidth.

#### 5.4.2 Output Buffer 2

When it comes to the design of output buffer 2, it requires a different specification. Fig. 5-16 shows a rail-to-rail folded cascode operation amplifier acting as a negative feedback voltage follower.

According to the previous implementation in section 5.3.3, the potential range of  $V_{coarse}$  is from 0.5V to 1.6V. Therefore, the design priority for output buffer 2 is to have a wide input voltage swing rather than a large bandwidth as changes on  $V_{coarse}$  are relatively slow.



Fig. 5 - 16 Implemented Structure of Output Buffer 2

The output buffer 2 is powered by a 2.2V supply voltage for a wide voltage swing, so, the type of transistors used is the high I/O MOSFET. In addition to this, for the output stage, in order to allow enough current and achieve  $50\Omega$  impedance matching, the PMOS and NMOS are set to  $320 \,\mu m$  and  $128 \,\mu m$  respectively.

## 5.5 Full Chip Layout

The layout view of the DLTC-PLL design example in 40nm technology process is displayed in fig. 5-17.



Fig. 5 - 17 Full Chip Layout of DLTC-PLL in 40nm Process

The total chip area is about 0.677mm<sup>2</sup> (1148 µm ×590 µm) with 29 signal pads implemented which includes both DC and RF signals. The details of the chip are summarized in table 5-4.

Table 5 - 4 Chip Details

| Chip Size        | ≈1148 µm ×590 µm |
|------------------|------------------|
| Core of DLTC-PLL | ≈435 µm ×380 µm  |
| Core of VCO      | ≈310 µm×119 µm   |
| DC&RF Pads       | 67 μm×67 μm      |

As the most important building block, the layout of VCO had been carefully designed. First of all, in order to minimize the influence of peripheral module's layout to the performance of VCO, the layout of loop filter, VCO and adaptive DC unit are designed as a whole structure. The layout is shown in fig. 5- 18.



Fig. 5 - 18 The Layout of Core VCO

Moreover, each delay cell within the inductor peaking VCO is identical and the metal wire of interconnection should be at same length to avoid unnecessary mismatch due to load variations. Taking this into consideration, the whole structure is designed in the way of centro-symmetric layout topology. In addition to this, the dual fine tuning paths are also designed in symmetrical structure and placed on the sides of inductor

peaking VCO. The large capacitor on coarse tuning path is split into two pieces to compact the layout area where the distance of signal wires are shortened to reduce the parasitic effects. The overall area of core VCO is about 0.04mm2 (310 μm×119 μm).

## 5.6 Simulation and Testing

To validate the functionality of the proposed DLTC-PLL, the design example is fabricated in TSMC 40nm CMOS technology process.

#### **5.6.1** Simulation Environment

Before presenting the results, the simulation environment is introduced including the test bench illustrated in fig. 5-19.

To model the environment used in practical tests, the 40nm design example is packaged as a symbol, with each port connected with an inductor to model the effects of bonding wires. Based on previous tape-out experience in the 130nm and 65nm nodes, the empirical value of inductance on bonding wires is about 200pH to 500pH depending the length and number of the bonding wires used. In addition, the  $50\Omega$ termination is modelled as a serial RC network with  $50\Omega$  resistance and 100nFcapacitance. The crystal oscillator will be used as the reference clock for DLTC-PLL in testing, but the specifications of the crystal oscillator, such as DC voltage level, output swing, signal shape, are varied from product to product. To include the tolerances and provide a robust reference clock, the DC voltage level is reshaped by a DC block and DC biasing circuits before being applied to the DLTC-PLL module. As shown in fig. 5-19 the pull-up and pull-down resistors applied in the biasing circuit are  $100\Omega$  in order to bias the DC voltage level of the reference clock at half of the power supply. The same structure will also be applied on the PCB for the DLTC-PLL to ensure the identical configuration between the simulation and testing. The reference clocks are modelled as sinusoidal waveform voltage sources to provide the clock signals of 400MHz, 300MHz and a variable frequency respectively. Another factor which should be considered for the simulation and testing is the digital control signal. To simplify the structure of the test bench, all of the control signals (CON[2:0] and S[2:0]) are provided by several DC voltage sources.



Fig. 5 - 19 Simulation Test Bench for 40nm DLTC-PLL

#### 5.6.2 Simulation Results

"Real-world" white noise was included in the simulation as transient noise across the range of 1 KHz to 10GHz, applied by the simulator to precisely investigate the noise performance of the DLTC-PLL design example. The following results are based on post-layout simulation (including all the predicted parasitics).

The target of the post-layout simulation is to pre-evaluate the following aspects:

- 1) The functionality of DLTC-PLL design example.
- 2) The performance of settling time, jitter and phase noise.
- 3) The performance at different widely used communication protocols.

To efficiently examine the performance of the circuit using simulation, a simulation scheme has been made. At first, the DLTC-PLL is locked to 4.4GHz to investigate the lower frequency performance. And then, the output frequency is switched to 35.2GHz by changing the digital control signals for high frequency evaluation. Thereby, the settling time between those frequencies can be monitored on the transient waveform of the coarse control signal. By using the look-up table in section 5.3.6, the digital code for a 4.4GHz output frequency is 001 for S[2:0] and 011 for CON[2:0]. As for the high frequency output of 35.2GHz, S[2:0] is set to 111 while CON[2:0] is maintained at 011 which can be easily programmed by the simulator. The transient simulation results are presented in fig.5-20.



Fig. 5 - 20 Post Layout Results (a) Transient Respond of 4.4GHz Lock-up Time (b) Transient Respond of 35.2GHz Lock-up Time (c) Period Performance of 35.2GHz Output Frequency (d) Phase Noise Performance of 35.2GHz Frequency

Fig.5-20(a) indicates that the DLTC-PLL is successfully locked to 4.4GHz within 646ns settling time. In fig.5-20(b), the locked frequency of 4.4GHz is changed to 35.2GHz as expected after switching the control signals. The total settling time for over a 30GHz frequency span is only 446ns. Moreover, the noise performance (periodic jitter and phase noise) is also investigated based on this simulation plan. In fig.5-20(c), which presents the periodic jitter performance of 35.2GHz output frequency, the standard deviation obtained is  $\pm 54.8 fs$ , while the phase noise performance at same output frequency is -109dBc/Hz at 1MHz offset, for which the simulation result is shown in fig.5-20(d).

As discussed in chapter 1, one of the motivations in this project is to create a clock generation system for multiple communication protocols. Therefore, to investigate the performance of the DLTC-PLL with a specific frequency, the frequencies in several widely used protocols were simulated. The results of the corresponding periodic jitter and phase noise performance are shown in fig.5-21.



Fig. 5 - 21 Performance with Different Protocols Frequency (a) Jitter (b) Phase Noise

The best jitter performance obtained was 37.1fs using a protocol of SuperMHL(×6 lanes) operating at 36GHz. When phase noise performance was considered, the best result is -113dBc/Hz of USB running at 10GHz.

#### **5.6.3** Testing Measurements

Practical chip testing contained two steps with the first step being a custom PCB design. Given the system complexity and intensive DC voltages required, to better facilitate the testing and save cost, the control signals (S[2:0] and CON[2:0]) are designed to be changed by switches. The output of the 40nm silicon chip is directly connected with an RF probe to reduce the influence of bonding wires. Therefore, to avoid damage to the probe while switching the control signals, the PCB is designed into two boards, with the silicon chip connected to the main board by wire bonding, while all the control signals and corresponding switches are on the secondary board. These two boards are connected by jumping wires, as illustrated in fig.5-22. The corresponding PCB is shown in fig.5-23.



Fig. 5 - 22 Concept View of PCB Connection



Fig. 5 - 23 PCBs Layout



Fig. 5 - 24 Measurement Environment with Photography of 40nm Silicon Die

To effectively investigate the performance of the settling time, a small oscillation circuit with 100KHz frequency was designed and placed on the secondary board

including the corresponding control switches. When measuring the settling time, the channel of reference selection code (S[2:0]) is switched to the output of the 100KHz oscillation circuit and transferred to the main board. Each bit in S[2:0] can be controlled separately. Once the PCB is configured into this mode, the output frequency from the 40nm chip will be automatically changed between the two different frequencies with a  $10\,\mu s$  time interval. According to the simulation results, this time interval is enough for the PLL to be settled.

Fig. 5-24 shows the measurement environment and photograph of the 40nm silicon die that integrated on the test PCB. To avoid environmental interference, a battery based power supply system was used to provide a clean power source. In addition to this, a DC adaptor circuit was also created to provide enough DC voltage to test the PLL.

#### A. Measurement of the Setting time

To measure the settling time of the DLTC-PLL, the signal on the coarse control voltage is connected and monitored by digital oscilloscope. The dividing ratio is set to 144 for which the digital code (CON[2:0]) is 001. Moreover, by configuring the switches that control the external oscillation circuit, the reference frequency is switched between 25MHz (S[2:0]=001) and 200MHz (S[2:0]=111) which allow the output frequency of the PLL to be switched between 3.6GHz and 28.8GHz periodically.



Fig. 5 - 25 Settling Responds between 3.6GHz to 28.8GHz

Fig.5-25 shows the measured settling response of the DLTC-PLL scaled from 28.8GHz down to 3.6GHz. The settling time that has been achieved is 426.05ns over a 25GHz frequency range.

### B. Measurement of Noise Performance

To effectively measure the functionality of the DLTC-PLL and corresponding noise performance in different frequency ranges, the testing covered three frequency levels from the high frequency band to the middle-band and the low frequency band. The testing configuration is listed in the table 5-5. To investigate the highest achievable frequency that can be provided by the DLTC-PLL, an external frequency generator is used to provide a continually scalable reference clock by switching S[2:0] into free run mode.

Table 5 - 5 Parameters of Testing for High, Middle and Low Frequency Band

|                  | High-band     | Middle-band | Low-band |  |
|------------------|---------------|-------------|----------|--|
| Reference Clock  | Free Run Mode | 100MHz      | 50MHz    |  |
| (S[2:0])         | (000)         | (100)       | (011)    |  |
| Dividing Ratio   | 128           | 128         | 192      |  |
| (CON[2:0])       | (000)         | (000)       | (100)    |  |
| Output Frequency | 29.4GHz       | 12.8GHz     | 9.6GHz   |  |

The corresponding phase noise performance is obtained and depicted in fig 5-26.







Fig. 5 - 26 Phase Noise Performance (a) High-frequency Band (b) Middle-frequency Band (c) Low-frequency Band

Based on the theoretical analysis in chapter 4, one significant advantage of the DLTC-PLL is that both the rising and falling edges of the reference clock are used to detect the phase difference. With this topology, the converted currents that carry both phase differences are merged together and jointly control the inductor peaking VCO. Thereby, the loop bandwidth of the PLL can be set to be twice bigger than that of single edge detection. It not only shortens the locking time but also suppresses the inherent in-band noise of the VCO.

For example, as shown in fig.5-26(a) which the highest achievable frequency from the 40nm design is 29.4GHz, the reference clock is about 229.6MHz with a division ratio of 128. Originally, the loop bandwidth is required to be set less than 23MHz to maintain loop stability (one of tenth of the reference frequency, according to the 'Gardner's Limit'[73]). However, by both edge detection, the loop bandwidth can be raised to 45MHz. A similar outcome can also be observed in the mid-band frequency (fig.5-26(b)) and low-band frequency (fig.5-26(c)) results. For the mid-band frequency where reference frequency is 100MHz, the loop bandwidth is set to be about 22MHz. Similarly, the input reference for the low-band frequency is 50MHz, the loop bandwidth of DLTC-PLL is set to be about 9MHz.

Table 5 - 6 Performance Comparison of DLTC-PLL with Recent Works

| References                               | [87]                | [48]                  | [85]                | This Work                       |
|------------------------------------------|---------------------|-----------------------|---------------------|---------------------------------|
| Technology                               | 65nm                | 45nm                  | 65nm                | 40nm                            |
| Architecture                             | Dual-tuning<br>Ring | Dual-loop<br>Ring     | Dual-tuning<br>Ring | Dual-tuning Triple Control Ring |
| Reference<br>Frequency                   | 36MHz               | 100MHz                | 500MHz              | 100MHz                          |
| Output<br>Frequency                      | 3.1GHz              | 2.5GHz                | 1.5GHz              | 12.8GHz                         |
| Tuning Range                             | 1.4~3.2GHz          | 1-8.5GHz              | 0.5-2.5GHz          | 3.6 ~ 29.4GHz                   |
| (FTR)                                    | (78.3%)             | (152.9%)              | (133.3%)            | (157%)                          |
| Lock-up Time                             | 85 μs               | 5 µs                  | Few ms              | 426.05ns                        |
| Phase Noise @ 1MHz offset                | -98dBc/Hz           | -106dBc/Hz            | N/A                 | -95dBc/Hz                       |
| Jitter <sub>rms</sub> (Integrated Range) | 2.23ps<br>(N/A)     | 1ps<br>(1MHz~1.25GHz) | 1.93ps<br>(N/A)     | 3.49ps<br>(1KHz~1GHz)           |
| Supply<br>Voltage                        | 1.2V                | 2.5V                  | 1.8V                | 1.1V/2.2V                       |
| Power                                    | 25.8mW              | 70mW                  | 4mW                 | 92mV                            |
| Active Area                              | 0.32mm <sup>2</sup> | 0.277mm2              | 0.1mm <sup>2</sup>  | 0.165mm <sup>2</sup>            |

The overall performance of the proposed DLTC-PLL is summarized in table 5-6, associated with a comparison with other recent work. As illustrated in table 5-6, the

proposed DLTC-PLL is compared with several recent demonstrations which are also based on a dual tuning ring-based PLL structure. As it can be seen, the proposed DLTC-PLL has reached the widest tuning range which gives the FTR of 157% (The equation of FTR is introduced in (3.10)). The programmable tuning range is from 3.6GHz to 29.4GHz which is over 25GHz wide. Moreover, the fast locking function has equipped the proposed DLTC-PLL with about 426ns locking time which is much faster than the other dual-tuning PLL listed in the table. In addition to this, a moderate phase noise and periodical jitter performance has also been achieved. Operating at a 12.8GHz output frequency, the phase noise at 1MHz offset is -95dBc/Hz. Besides, the periodical jitter integrated over 1KHz to 1GHz is less than 3.5ps. Although the power consumption is slightly higher than other demonstrations which is due to the multiple power supply voltages used, the big advantage of the wide tuning range allows the proposed PLL to be able used in the silicon photonics system, and from the perspective of the whole system, this slightly increase in power is tolerable. On the other hand, the power consumption is a trade-off with achievable frequency and bandwidth.

However, there are some differences on performance can be found if compare the measurement results with the post-layout simulation results. Firstly, in terms of the highest achievable frequency, up to 35.2GHz output frequency has been obtained by the post-layout simulation (fig.5-20), while the measurement only reaches to 29.4GHz. There are some factors may cause the degradation of frequency. One factor is the inductor modelling and parasitic extraction. For example, if modelled inductance (S-parameter) is larger than the inductance in fabricated silicon, the achievable frequency of VCO is decreased which will directly lead to lowering the output frequency range of PLL. Moreover, the modelled output load in simulation testbench is a combination of  $50~\Omega$  resistance in serial with 100nF capacitance (based on empirical value), which underestimate the load in practical measurement. In addition to this, the measurement frequency also affected by the testing environment. To testing the output frequency, a high-speed probe is directly attached on the output port of silicon die to reduce the load capacitance. However, a high impedance cable is used to connect the probe with the spectrum analyser which may induce more loss on frequency.



Fig. 5 - 27 Bonding Wire Simulation (a) 1pH Inductance (b) 200pH Inductance (c) 1nH Inductance (d) 2nH Inductance

On the other hand, another difference of performance can be found in periodical jitter and phase noise. According to the post-layout simulation in fig.5-20, the jitter and phase noise performance is better than that of measurement results. There are two aspects should be considered for this difference. The primary reason is the bonding wire effects. With a different length of bonding wire, the effective inductance of bonding wire is varied. The resulted inductance combined with parasitic capacitance will generate additional oscillation on the output and disturb the desired frequency. To verify this assumption, different length of bonding wire is simulated by setting with different effective inductance. The results are shown in fig.5-27. As it can see that, with increasing the effective inductance of bonding wire, the fluctuation on the output frequency is significantly increased. Therefore, to reduce the influence of bonding wire, different integration technique is required and this will be discussed in chapter 7 for future work improvement. In addition to this, there is another aspect should be considered which is the simulation time. As for the practical measurement, it is implementable to run the test up to mili-seconds or several seconds. Therefore, the jitter variation at relative low offset frequency is included. However, at post-layout simulation, it is hardly possible to run such long transient simulation as the severe timing cost. For example, 2 µs transient simulation may require about 2 days to

complete. Therefore, the anticipated integrated jitter in simulation is less than that in measurement. And for these reasons, the performance mismatches have existed.

## 5.7 Summary

This chapter has described the practical implementation of a novel dual-loop triple-controlled PLL that was proposed in the previous chapter of this thesis. The functionality and feasibility of the DLTC-PLL is demonstrated by fabricating it in the TSMC 40nm technology process. To maximize the advantage of the inductor peaking VCO with ultra-wide frequency tuning, two specific designs features have been made.

- 1) The power supply for the VCO is replaced by two voltage regulators which allows frequency tuning scaling in the full control voltage range. In addition, the corresponding modifications have been made on the charge pump and adaptive DC shifter in order to be more compatible with VCO.
- 2) A reference clock scheme in association with an integer-N frequency prescaler are created separately. Therefore, the wide range in output frequencies are programmable and controlled by 6 digits bits (3 bits for reference selection and 3 bits for the dividing ratio) which fully covers the VCO's frequency tuning range.

Additionally, a rail-to-rail output buffer was implemented and connected to the coarse control voltage to investigate the settling time and to efficiently monitor the locking response of the DLTC-PLL.

Based on the post-layout simulation and practical measurements, the functionality of the DLTC-PLL was validated and the concept presented in chapter 4 has been verified. The dual PFD detection successfully improves the settling time performance which allows the loop bandwidth to be set at twice as large as a single PFD to suppress the in-band noise further. According to the measurement results, nanosecond settling time has been achieved. Moreover, the highest achievable frequency obtained was 29.4GHz with a frequency tuning range over 25GHz. As compared with equivalent dual-loop ring-based PLL, the proposed PLL has made a significant improvement on the settling time and frequency bandwidth and provides an elegant solution for multiple clock generation silicon photonic based communication systems.

# Chapter 6 Design Methodology and

## **Enhancement for Ultra-small Process**

## 6.1. Introduction

The demand for intensive data applications, such as high-resolution base stations, cloud-based computing and network base stations, has been rapidly developing. This is a driving force for pushing the development of modern CMOS technology processes, structures and materials. As conventional poly-Si gates have reached their limits, the technique of High-k/Metal Gate (HKMG) was introduced to allow continued gate dielectric scaling together with reductions in transistor size and increases in performances [134]. By decreasing the gate leakage and improving electrostatic control, HKMG is particularly effective for increasing transistor's transition frequency ( $f_T$ ). An inherent benefit from the improved  $f_T$  is that it reduces the challenges for circuit designers operating in the realm of high-speed analogue circuits. Fig. 6-1(a) illustrates the development of  $f_T$  as geometries have reduced in size with modern process nodes.

The explosive development of this technology process has posed new challenges, particularly the increased noise. A higher noise floor can be observed in the HKMG 28nm process compared to the equivalent noise floor in different process nodes (such

as 40nm and 65nm), which a comparison based on the simulation results of 28nm, 40nm and 65nm is shown in the fig.6-1(b).



Fig. 6 - 1 Comparison in Different Process Node (a) Transition Frequency  $f_T$  (b) Noise Floor In addition, the layout design methodology is also a significant factor in improving performance, especially in high-speed analogue applications. Therefore, in this chapter there are two research subjects will be investigated using two practical design cases based on HKMG 28nm process.

- 1) A methodology for simplifying the design of transistor's finger width at layout stage as this is the primary design technique to be applied.
- 2) An inductor peaking VCO is used as a test circuit and modified to compensate the performance of phase noise.

# 6.2. Design Case I: Advanced Layout Techniques for Highspeed Analogue Circuits

A reduction of standalone single transistor's width on improving analogue performance in the HKMG process has been presented in [134]. However, the situation starts to become more complex when the individual transistors are placed in

the context of an analogue circuit. One of the key issues in ultra-small process nodes is that unlike the standard design process for the basic transistor dimensions, the selection of the individual finger width ( $W_f$ ) of the transistor and the number of fingers is not usually analysed in detail, other than typically to carry out a common centroid design to optimize device matching. Unfortunately, for modern small geometry processes, this specific aspect of the design has emerged as a significant challenge in practical circuit design scenarios, as the individual layout of each transistor becomes much more of a factor in overall performance. The aim of this section is to investigate the effects of different finger widths in transistors and the resulting relationship to analogue circuit performance in the 28nm HKMG CMOS process.

#### **6.2.1.** Circuits Details

Fig. 6-2 shows an inverter-based high speed oscillator structure which was selected as a suitable practical test-bench circuit as it has clear and specific performance metrics. The circuit was used to evaluate optimal finger width for best performance in high-speed applications. An oscillator topology with three delay stages was implemented to satisfy Barkhausen's criteria for oscillation. Two stages of self-biasing common source amplifiers follow the oscillator as an output buffer to provide  $50\Omega$  impedance matching for testing purposes. The output buffer structure is identical to that given in fig. 3-7 (c) as shown in chapter 3. A DC blocking capacitor was placed between the oscillator and buffer to reset the operation point, and therefore dummy loads were inserted to ensure that each delay stage in the oscillator had a similar load, thereby reducing the influence of unbalanced oscillation in practical measurements.



Fig. 6 - 2 Implemented Inverter-based Oscillator Example Structure with  $50\Omega$  Impedance Output Buffer

As the process intrinsic mobility of the N-type MOSFET and P-type MOSFET is different, the width of  $M_2$  is twice as big as  $M_1$  to ensure that the drain voltage is approximately half of the power supply voltage (VDD).

In the basic layout implementation, at least one contact via is required to be placed at both the drain and source terminals, as illustrated in fig.6-3. The width of a single via in the process (to satisfy the minimum metal enclosure constraints in the design rule) is 210nm and therefore the minimum finger width of the transistor is required to be 220nm with a small margin to account for tolerances as defined in the layout design rules. To evaluate the effect of finger width on performance, three different finger width oscillator examples were selected and fabricated in the 28nm HKMG CMOS technology node, with the finger widths defined in each benchmark circuit as 220nm (1X Via), 440nm (2X Via) and 880nm (4X Via) with a minimum transistor length of 30nm. To ensure that the finger width of the transistor is the most significant factor in the design, all of the NMOS and PMOS elements used in the different oscillator benchmark circuits were given the same transistor dimensions.



Fig. 6 - 3 Single Transistor with Minimum Finger Width

The key parameters of each design are shown in table 6-1.

Table 6 - 1 Key Parameters of Three Oscillator Examples

| Design Examples           | Oscillator 1 | Oscillator 2 | Oscillator 3 |  |
|---------------------------|--------------|--------------|--------------|--|
| Finger Width $(W_f)$ (nm) | 220 440      |              | 880          |  |
| M1(W/L) ( μ m/nm)         | 220*32       | 440*16       | 880*8        |  |
|                           | 7.04 /30     | 7.04 /30     | 7.04 /30     |  |
| M2(W/L) ( μ m/nm)         | 220*64       | 440*32       | 880*16       |  |
|                           | 14.08 /30    | 14.08 /30    | 14.08 /30    |  |

To minimize the impact of variation from the layout design and position of circuits, the three oscillator examples with different finger widths were implemented with the same basic layout structure, as presented in [135]. Fig. 6-4 indicates the optimal transistor layout with a finger width of 440nm. The gate of the transistor is contacted at both ends with metal  $1(M_1)$ , merged and then built up to a thick metal layer  $(M_7)$ . The source of the transistor is connected to both sides of the bulk with three thin metal layers  $(M_2, M_3 \text{ and } M_4)$  to allow sufficient current flow. Moreover, to minimize the interconnect resistance; the drain terminal of the transistor is connected through a thick metal layer  $(M_7)$ .



Fig. 6 - 4 Optimal Transistor Layout with a Finger Width of 440nm

Fig. 6-5(a) shows micrographs of the three oscillator design examples. The power supply for each oscillator example was provided separately. The fabricated silicon die

was then mounted on a standard printed circuit board (PCB) with the DC Pads directly bonded via wire bonds to the PCB. The high-frequency output signal was fed into a spectrum analyser through an RF probe.

Fig. 6-5(b) shows the detailed layout of a single delay cell in the different finger width oscillators. As the individual finger widths in each oscillator reduced, the number of fingers required is increased which leads to a slight increase in area of each delay cell. This increase in area will lead to a greater parasitic capacitance as the number of fingers increases.



Fig. 6 - 5 (a) Microscope View of Three Oscillator Examples (b) Layout View of Delay Cell of Three Oscillator Examples

Table 6 - 2 Cell Area and Parasitic Parameters of Three Oscillator Examples

| Design Examples   | Oscillator 1 | Oscillator 2 | Oscillator 3 |
|-------------------|--------------|--------------|--------------|
| Wf (nm)           | 220          | 440          | 880          |
| Cell Area ( µ m²) | 41.69        | 27.65        | 23           |
| Cgate_par (fF)    | 0.011        | 0.0082       | 0.0073       |
| Cdrain_par (fF)   | 0.013        | 0.0109       | 0.0105       |

One inference that can be drawn is that oscillator 1 (Wf = 220nm) suffers more parasitic influence than that of oscillators 2 and 3. Table 6-2 has listed the cell area and corresponding parasitic parameters of the single delay cells of the oscillators with different finger widths. The parasitic capacitance at the gate and drain nodes are extracted by Calibre PEX.

The parasitic capacitance at the gate ( $C_{gate\_par}$ ) and drain ( $C_{drain\_par}$ ) sum to the total parasitic value (C+CC). As the results shown in table 6-2, oscillator 1 (Wf = 220nm) has 33.7% and 45% larger cell area than oscillator 2 and oscillator 3 respectively. Therefore, the extracted parasitic capacitance at both gate and drain nodes of oscillator 1 are higher than that of oscillators 2 and 3. The gate capacitance is especially high since longer low layer metal tracks are used to connect to both sides of the gate. The gate capacitance of oscillator 1 is increased by 25.5% and 33.6% compared to that of oscillator 2 and 3 respectively. This result is consistent with previous assumptions that small finger width transistor require a large layout area which will lead to more parasitic capacitance.

#### **6.2.2.** Experimental Results

To investigate the optimal finger width for high-speed analogue applications, three performance aspects were taken into consideration including the oscillation frequency, power efficiency and phase noise (PN). The measurements were conducted at room temperature while the performance of the oscillator examples was monitored while varying the supply voltage from 0.4V to 1V. The three different oscillator examples were measured independently. The testing results presented in this section were measured from 6 different silicon samples, three of them from wafer 1 and the other three from wafer 2.

Fig. 6-6 shows the oscillation frequency of the three different oscillator examples against the power supply voltage. The data presented in this figure is the average of the results from 6 different samples. It can be clearly seen that oscillator 3 provides the highest frequency. While oscillator 1 has a relatively low frequency output. As noted previously in section 6.2.1, the reduction in individual finger width results in an increased layout area which thereby brings about more parasitic capacitance. As consequence, the oscillation frequency is reduced. In the case of oscillators 2 and 3, twice the number of fingers were used in oscillator 2 than in oscillator 3, which in turn

caused the layout area to be approximately 17% larger, however the oscillation frequency of the two oscillators remains very close ( $\approx$  6% difference). When comparing oscillators 1 and 2, where the number of fingers was also doubled, there was a larger (33.7%) increase in the layout area and a consequent 10.6% drop in oscillation frequency. Therefore, a larger finger width seems to be better at providing good high frequency performance while saving cost due to reduced chip area.



Fig. 6 - 6 Averaged Measurement Results of Oscillation Frequency of Three Oscillator Examples

Power efficiency is another important aspect for designers in high-speed analogue applications. In this experiment, since the transistor dimensions of the three oscillator examples are the same overall, theoretically, the current consumed by each oscillator should also be the same. However, by considering the parasitic effects introduced by the layout, both the oscillation frequency and power consumption could vary dramatically from theoretical predicted values. Therefore, instead of comparing just the power consumption at different power supply voltages, the amount of energy that is consumed within each oscillation cycle is normalized to the frequency, using (6.1).

$$E_{cycle} = \frac{P_{DC}}{f_{osc}} \tag{6.1}$$

where  $P_{DC}$  is the power consumption and  $f_{osc}$  is the corresponding oscillation frequency.



Fig. 6 - 7Averaged Measurement Results of Power to Frequency Efficiency of Three Oscillator Examples

Fig. 6-7 shows the comparison of power-to-frequency ratio between the three different oscillator examples. Oscillator 3 consumes less energy within each oscillation cycle over the whole power supply range. The power that is consumed by oscillator 2 is very close to that of oscillator 3. As for oscillator 1, a considerable difference has been observed. The energy that is consumed within each oscillation cycle is about 8.5% and 14.92% higher than for oscillators 2 and 3 respectively. Therefore, in terms of the power to frequency ratio, a larger finger width provides a better power efficiency.

Another important performance metric in an oscillator is the phase noise (PN). To investigate the effect of finger width on the oscillator's noise performance, the PN is measured at the highest oscillation frequency of the oscillator. Fig. 6-8 shows the measured results of phase noise of three oscillators at 10MHz offset frequency and its corresponding frequency spectrum. Although oscillator 3 has the highest frequency, the RF noise is worse than the other two oscillators. Equation (6.2)[136] is used to calculate the spectral density of the thermal noise in a single transistor.  $R_G$  is total gate distributed resistance and  $N_{finger}$  is the number of transistor fingers. The smaller the

transistor finger width, the more fingers that are required, therefore, the gate distributed resistance is reduced, which contributes less noise.

$$\overline{V_{n,out}^2} = 4kT \frac{R_G}{N_{finger}} (g_m r_o)^2 \tag{6.2}$$



Fig. 6 - 8 Measured Frequency Spectrum and Phase Noise Results at 1MHz Offset Frequency of Three Oscillator Examples

The averaged phase noise results of 6 samples are summarised and plotted in fig. 6-9. As each oscillator example has advantages in different aspects, Figure of Merits (FOMs) is used to trade off the advantages of the different finger widths, which the averaged values can be calculated by (6.3)[137].

$$\sum_{i=1}^{6} FOM = \sum_{i=1}^{6} \left( -PN_i \left( f_{offset} \right) + 20 \log \frac{f_{osc\_i}}{f_{offset}} - 10 \log \frac{power\_i}{1mW} \right) (6.3)$$



Fig. 6 - 9 Averaged Phase Noise Performance of Three Oscillator Examples

The PN results summarized in table 6-3 is the averaged results of 6 test samples.

Table 6 - 3 Averaged FOM of Three Oscillator Design Examples

| .Design Examples  | Oscillator 1 | Oscillator 2 | Oscillator 3 |
|-------------------|--------------|--------------|--------------|
| Frequency (GHz)   | 12.3         | 13.6         | 14.42        |
| Area (mm²)        | 0.0007       | 0.00042      | 0.0003       |
| Power (mW)        | 2.9          | 2.92         | 2.88         |
| PN dBc/Hz @ 10MHz | -93.47       | -92.75       | -87.38       |
| Averaged FOM      | 140.7        | 142.0        | 138.4        |

Although oscillator 2 has not obtained the best performance in any of the individual criteria: oscillation frequency, power efficiency or the phase noise, the overall performance (=142) is better than that of oscillators 1 and 3. On the other hand, another trade-off that should be taken into consideration is the area aspect. Larger finger width is easily to provide higher frequency with smaller area. However, this should be decided in association with overall performance. Surely, advanced analogue circuit design requires careful trade-off among the specifications of bandwidth, power efficiency, noise and layout area. The design methodology presented here will be applied to the next design case.

# 6.3. Design Case II: Inductive Peaking VCO with Cascode Noise Reduction

Previously in chapter 3, a ring based VCO with inductive peaking technique has been proposed. It was designed to cover a frequency bandwidth up to 25GHz in a 65nm technology process. However, when this structure is applied to a more advanced process node (28nm), it suffers from severe adverse effects of transistor noise, especially for high frequency applications. The noise performance of VCO is fundamental the in-band noise performance of a PLL. The phase noise induced by the flicker noise is given as (6.4)[28].

$$\mathcal{L}(f) = \frac{K_f}{NC_{ox}V_{eff}^2} \left( \frac{1}{W_{n-eff}L_{n-eff}} + \frac{1}{W_{p-eff}L_{p-eff}} \right) \frac{f_{osc}^2}{f^3}$$
(6.4)

This equation indicates that the phase noise increases with the oscillation frequency  $(f_{osc})$ . Therefore, the aim of this section is to provide an alternative approach (by biasing the transistor into the deep triode region) for reducing noise contribution in a 28nm CMOS process for high-speed analogue design.

#### 6.3.1. Inductive Peaking VCO with Cascode Noise Reduction

#### A. Noise Analysis of Single MOSFET in Different Operation Region

Before analysing the proposed noise reduction VCO, it is necessary to evaluate a single MOSFET's noise behaviour in different operation regions. The noise behaviour of common CMOS devices is dominated primarily by two noise sources which are thermal noise and flicker noise. The drain current power spectral density (PSD) of a MOSFET's thermal noise is given in (6.5)[138].

$$\bar{\iota_d^2} = \gamma 4kTg_m \tag{6.5}$$

where k is Boltzman's constant, T is the absolute temperature,  $g_m$  is the conductance at different operation region.  $\gamma$  is a complex function of the basic transistor parameter and bias condition. In the saturation region,  $\gamma = 2/3$ , with  $1 > \gamma > 2/3$  in triode [138]. Thus, (6.5) can be rewritten for the specific range of operation as (6.6) for saturation and (6.7) for triode.

Design Methodology and Enhancement for Ultra-small Process

$$\bar{\iota_d^2} = \frac{8}{3}kTg_m \tag{6.6}$$

$$\overline{\iota_d^2} = 4kTg_{do} \tag{6.7}$$

where  $g_{do}$  is the channel conductance at zero drain-source voltage ( $V_{DS}$ ). The channel of the MOSFET can be treated as a homogeneous resistor when  $V_{DS}$  is 0, which results in a small  $g_{do}$ . Therefore, compared to the saturation region, the MOSFET suffers less thermal noise effects in the triode region when the other parameters are the same.

Several theoretical and physical models are combined together to explain the flicker noise in a MOSFET. According to the theoretical analysis in [129, 139], it is accepted that the causes of flicker noise are mainly due to the mobility fluctuation and carrier density, for which models are expressed by the Hooge empirical relation and the McWhorter number fluctuation theory. However, due to the complexity of the flicker noise models, it is impractical for design space exploration [129]. In terms of analogue circuit design, the most popular flicker noise equation used for hand calculation is an empirical model shown in (6.8)[129].

$$\overline{v_f^2} = \frac{K_f}{C_{ox}^2 WL} \frac{1}{f} \tag{6.8}$$

where  $C_{ox}$  is the oxide capacitance per unit area,  $K_f$  is the flicker noise coefficient. According to (6.8), it is straightforward to find that the common factor for decreasing the flicker noise is to increase the MOSFET's dimensions (WL).



Fig. 6 - 10 Comparison of Width versus Flicker Noise in Triode and Saturation Region

Fig. 6-10 shows the comparison between MOSFET's width and flicker noise in terms of same drain-source current ( $I_{ds}$ ) in both saturation and triode regions. All the following simulation results are based on HKMG 28nm technology process while the length of transistors are set to minimum size (L=30nm). To bias the MOSFET into the saturation region,  $V_{GS}$  (gate-source voltage) and  $V_{DS}$  (drain-source voltage) are set to power supply voltage (VDD) while  $V_{DS}$  sets to one tenth of VDD for biasing MOSFET into deep triode region. The width of MOSFET in the triode region is larger than that in the saturation region, to provide enough  $g_m$  to obtain the same  $I_{ds}$ , which results in over more than 3 order improvement of flicker noise.

The flicker noise can also be presented as a noise-current source connected between drain and source instead of a noise-voltage source in series with the gate. In this case, the PSD of the noise-current source is given by (6.9)

$$\overline{\iota_f^2} = g_m^2 \overline{\nu_f^2} \tag{6.9}$$

And substituting (6.9) into (6.8) gives (6.10)

$$\bar{l_f^2} = \frac{K_F I_{ds}}{C_{0x} L^2} \frac{1}{f} \tag{6.10}$$

where  $I_{ds}$  is proportion to the change of  $V_{DS}$ . Therefore, the drain current noise increases with the increase of the  $V_{DS}$ .



Fig. 6 - 11 Noise Floor Scaling with VDS

Fig. 6-11 shows the noise floor versus the change of  $V_{DS}$ . With the decrease of  $V_{DS}$  scaling down from 900mV to 100mV, the MOSFET is gradually biased into the triode region, and the overall noise floor is reduced by over 63dB in the low frequency range (before the corner frequency) and reduced by 45dB in the higher frequency range (higher than the corner frequency). The noise floor in fig. 6-11 combines both thermal noise and flicker noise and cover the frequency ranged from 100Hz to 10GHz. The results indicate that, for a single MOS transistor, the overall noise contribution in triode region is much less than that in saturation region, which indicates that the approach can be applied in the VCO design for improving noise performance.

#### B. Inductive Peaking VCO with Cascode Topology

Fig. 6-12(a) is common source delay stage with inductor peaking. Because the VCO is connected end to end, the DC voltage at node B is as same as the node A, which make the  $V_{GS}$  and  $V_{DS}$  of  $M_I$  the same value. Therefore,  $M_I$  is always operating in the saturation region across the whole frequency tuning range.



Fig. 6 - 12 Delay Cell of Inductive Peaking VCO (a) Conventional (b) Proposed with Cascode Noise Reduction

The proposed common source delay stage with cascode noise reduction is presented in fig. 6-12(b). A transistor  $M_2$  is placed in middle of  $M_1$  and the load which is biased by  $V_{bias}$ .  $V_A$  is still the same as  $V_B$ , but the drain-source voltage of  $M_1$  ( $V_C$ ) is now equal to  $V_B - V_{DS,M2}$ , which makes  $M_1$  operate at triode region. The equivalent resistance of  $M_3$  increases with  $V_{CT}$  raising up, which makes  $V_B$  decrease and operates  $M_1$  further

into deep triode region to obtain better noise performance. To demonstrate this concept, the DC operation points at node *A*, *B* and *C* are illustrated in fig. 6-13



Fig. 6 - 13 Investigation of DC Voltage Level at node A, node B and node C (a) DC Voltage Level scaling with Control Voltage (b) Operation Region scaling with Control Voltage

As it can see from fig. 6-13(a), the DC voltage level at node A and node B are at the same value while DC voltage of node C is decreased by the inserted transistor  $M_2$ which the result is expected by the theoretical analysis. Therefore,  $M_1$  no longer operates in the saturation region because of the decreased DC voltage level at node C. The operation region of  $M_I$  is illustrated in fig. 6-13(b). According to the definition of the region in spectreRF simulator, region 1 represents the triode region while 3 represents the sub-threshold region. As results, it can be found that  $M_1$  operates in triode region for the most of the control voltages rather than in the saturation region. Thereby, the noise contribution of  $M_1$  is suppressed by biasing transistor into triode region. However, it also should note that, with continually decreased of the control voltage, the  $V_{GS}$  of  $M_1$  is gradually lower than its threshold voltage which makes  $M_1$ into the sub-threshold region. Since the VCO maintains the function of oscillation by leakage current in the sub-threshold region, the noise floor of VCO is kept in a low level as the small value of leakage current. However, because the leakage current is too small to support a high frequency oscillation, the operation of the sub-threshold region is only happened in a relative low-frequency range where the control voltage approaches to the power supply.



Fig. 6 - 14 Proposed Three Stages Inductive Peaking VCO with Cascode Noise Reduction

Fig. 6-14 is a demonstration of a three-stage inductive peaking VCO with cascode noise reduction. Neglecting the body effect of  $M_2$ , the transfer function can be obtained (6.11)

$$H(s) = \frac{V_{out}}{V_{in}} = -g_{m1}R_{out}; R_{out} = R_L || R_{cas}$$
 (6.11)

where  $g_{ml}$  is transconductance of  $M_1$ ,  $R_L$  is the combination of load impedance and  $R_{cas}$  is the resistance of cascode structure.  $R_{cas}$  and  $R_L$  are given by (6.12)[136] and (6.13) respectively.

$$R_{cas} = g_{m2}r_{DS1}r_{DS2} + r_{DS1} + r_{DS2} \approx g_{m2}r_{DS1}r_{DS2}$$
 (6.12)

where  $r_{DS1}$  and  $r_{DS2}$  are represented as the drain-source resistance of  $M_1$  and  $M_2$ .

$$R_L = \frac{R_{eff} + sL}{s^2 LC + sR_{eff}C + 1} \tag{6.13}$$

The C denotes all the capacitance appeared at  $V_{out}$  node and  $R_{eff}$  is the variable resistance of the load. Substitute (6.12) and (6.13) into (6.11), it gives  $R_{out}$  as (6.14)

$$R_{out} = \frac{sLR_{cas} + R_{eff}R_{cas}}{s^2LCR_{cas} + s(CR_{eff}R_{cas} + L) + R_{eff} + R_{cas}}$$
(6.14)

By reorganizing (6.14), the natural frequency  $(w_n)$  and damping factor  $(\zeta)$  for a 2<sup>nd</sup> order system are obtained as defined in (6.15).

$$w_n = \sqrt{\frac{1}{LC} \left( 1 + \frac{R_{eff}}{R_{cas}} \right)}; \zeta = \frac{R_{eff}}{2} \sqrt{\frac{C}{L} \frac{R_{cas}}{R_{cas} + R_{eff}}}$$
(6.15)

A comparison simulation was conducted between the conventional inductor peaking VCO presented in [137] and the proposed VCO to evaluate the noise reduction performance. All the transistors in both structures were set to the same dimensions. The phase noise at the 1MHz frequency offset is investigated and presented in fig. 6-15. The VCO with cascode noise reduction has improved phase noise during the full range of the control voltage, and the biggest difference (13dBc/Hz) happens when the VCO operates at its highest frequency ( $V_{CT}$ =0).



Fig. 6 - 15 Phase Noise Comparison between Conventional Inductive Peaking VCO and Proposed Cascode Noise Reduction VCO during All Frequency Tuning Range

#### **6.3.2.** Design Example

The following design example shown in fig. 6-16(a) is a fully differential inductor peaking VCO with cascode noise reduction realized in the HKMG 28nm CMOS process node. Due to the benefits outlined earlier in previous section, a finger width of 440nm is applied for key transistors in order to obtain balanced overall performance. By implementing a two stage ring structure, this example can provide a quadrature output which is useful for applications that requires multiple clock phases. A differential  $50\Omega$  output buffer is followed by the VCO for providing better impedance matching. Fig. 6-16(b) is the delay cell that applied within proposed VCO with the key parameters listed in table 6-4. The transistors applied within each delay cell are low threshold type (LVT) to allow fast switching activity and higher transition frequency.



Fig. 6 - 16 Design Example (a) Top Structure of Differential Quadrature Inductive Peaking VCO with Cascode Noise Reduction and its Output Buffer (b) Differential Delay Cell of Proposed VCO with Symmetric Inductor (c) Layout View of Fully Customized Symmetrical Inductor

Table 6 - 4 Key Parameters of Design Example

| There e i i i i j i minimi e e i z e e gi z zinning. | •         |
|------------------------------------------------------|-----------|
| VDD (V)                                              | 1         |
| Finger Width $(W_f)$ (nm)                            | 440       |
| M1 (W/L) (μm)                                        | 17.6/0.03 |
| M2 (W/L) (μm)                                        | 26.4/0.03 |
| M3 (W/L) (μm)                                        | 54.6/0.03 |
| Inductance (pH)                                      | ≈283.7    |
| Peak Q@GHz                                           | 13.3@36   |

To improve the immunity to environment noise, a symmetrical differential inductor is used instead of single end inductor. For practical reasons, two single end inductors need to be placed certain distance apart to avoid cross interference, which leads to a larger overall area and will potentially increase parasitic effects on the signal traces. In contrast, differential inductors can save much more space. The layout view of fully

customized symmetrical inductor is displayed in fig. 6-16(c), which the overall area of inductor is 2263 µm<sup>2</sup>.



Fig. 6 - 17 Layout View of Proposed VCO

The layout view of proposed VCO is shown in fig. 6-17. The core dimension of VCO is only take 0.013mm<sup>2</sup> while most of the rest of the chip area is taken up by the output buffer.

#### 6.3.3. Simulation Results

To model the practical testing environment more accurately, a simulation test bench was produced as shown in fig. 6-18. The power for the core VCO and output buffer are provided separately by two DC sources, thus, the power consumption for each block can be monitored by measuring the current in each source directly. Additionally, the dimension of each DC pad is  $60 \, \mu m^* 70 \, \mu m$ , which only allows one bonding wire to be attached per pad. Therefore, multiple DC pads are used in the layout design to connect the power signal. This is also considered in the simulation test bench by using multiple parallel inductors to reduce the overall effects of the bonding wires. As the

inductance of a bonding wire is affected by its length, in the simulation it is assumed that each DC bonding wire has same effective inductance which is 200pH.



Fig. 6 - 18 Simulation Testbench

The frequency tuning response was obtained using post-layout simulations and includes all parasitic effects (R, C, and L); this is shown in fig. 6-19(a).



Fig. 6 - 19 Post-layout Simulation Results (a) Frequency Tuning Range (b) Phase Noise

The maximum frequency reached was 39.03GHz with a power consumption of 24.55mW (excluding the output driver). The overall frequency tuning range covers more than four octaves, which meets the requirements of data rates in modern high-speed transceiver systems. The phase noise at the highest frequency is plotted in fig. 6-19(b). A phase noise of -106dBc/Hz and -86dBc/Hz running at 10MHz offset and 1MHz offset were obtained. In addition to this, the phase noise results at different control voltage are also investigated and displayed in fig.6-20. although the phase noise performance gets a slightly worse at the middle band frequency which is caused by smaller Q factor, the overall performance over full control voltage is less than -80dBc/Hz at 1MHz offset frequency and -102dBc/Hz at 10MHz offset frequency.



Fig. 6 - 20 Phase Noise Performance Scaling with Different Control Voltages

To evaluate the performance of the proposed VCO, the figure of merits (*FOMs*) are presented in the table 6-5. This work obtains 189.2 *FOM* comparing with recent works.

Table 6 - 5 Performance Summary and Comparisons

| References           | [64] | [48]   | [140]  | This Work |
|----------------------|------|--------|--------|-----------|
| Process              | 65nm | 45nm   | 28nm   | 28nm      |
| Tuning Range (GHz)   | 2-8  | 1-8.5  | 0.6-3  | 1.7-39    |
| FTR                  | 120% | 157&   | 133%   | 184%      |
| P <sub>DC</sub> (mW) | 6.4  | 8      | 7.3    | 24.55     |
| PN                   | -101 | -114.9 | -126.5 | -106      |

| dBc/Hz @                | @1M  | @10M  | @1M   | @10M  |
|-------------------------|------|-------|-------|-------|
| $f_{off}/f_{osc}$ (GHz) | /4.2 | /2.5  | /1.5  | /39   |
| FOM                     | 187  | 177.7 | 183.4 | 189.2 |

The calculation of *FOM* and *FTR* can refer to the equation (3.9) and (3.10) that are shown in chapter 3.

# 6.4. Summary

In this chapter, two specific design cases are presented, to explore a design methodology for high-speed analogue applications in 28nm HKMG CMOS technology process.

In the first design case, the effect of optimizing transistor finger width for high-speed performance in analogue circuits is studied. It reveals that, although the dimensions of transistors are overall the same, the performance of high-speed circuits could be varied by applying different finger widths. The selection of finger width for high-speed design is a very significant step during the layout stage and presents a trade-off between bandwidth, power efficiency, noise and layout area, particular for modern deep-sub micron technology nodes such as 28nm. The theory demonstrated by the three oscillator design examples, provides an elegant solution simplifying finger width optimisation.

In the second design case, the optimized finger width from the first design case is applied to a more sophisticated circuit for high-speed design. The structure of inductor peaking VCO that is shown in chapter 3 is used as a test circuit. Furthermore, to compensate the process-caused degradation of noise floor in 28nm HKMG process, a cascode structure is applied within the inductor peaking VCO to operate the switching transistor in the deep triode region, where the transistor has better noise performance than the normal approach of using transistors in the saturation region. The analysis indicates that the proposed cascode structure can achieve better phase noise while the characteristics of ultra-wide tuning range of inductor peaking VCO is maintained.

One significant conclusion is that the development of technology process has indeed improved the performance of the circuit with the same topology, while considering key design parameters including; frequency, power consumption and area. To Design Methodology and Enhancement for Ultra-small Process

maximize those benefits in more advanced process nodes, especially true in high-speed analogue design, a more careful design approach is required and has presented in this chapter.

# **Chapter 7 Conclusions and Future**

#### 7.1 Conclusions

As the increased demands of data intensive applications and the era of tera-scale computing approaches, a new type of technology called silicon photonics has caught the eye of the world and rapidly developed in the past few decades. Meanwhile, it is precisely because of the emergence of a new technology that some specific functionalities and performance metrics are driven to be improved. The clock generation system as a critical part for future silicon photonics based application has became the core purpose of this PhD research work.

The whole thesis is developed around providing a broadband clock generating solution for use in short reach silicon photonics communication system. In order to fulfil the requirements of high-speed and multiple protocols, the research has various targets including high oscillation frequency, wide tuning range, fast locking and multiple phase clock outputs. During the research work, a novel RO-VCO integrated with an inductor peaking technique was proposed in order to extend the frequency tuning range and bandwidth of such systems. The principle of the idea has been verified through four design examples in two different technology processes. Moreover, the proposed VCO is embedded onto a new PLL topology to achieve the targets for clock generation. Finally, design methodologies on high-speed analogue applications have been realized in order to adapt to the evolution of the CMOS process. Two specific design cases are implemented in the 28nm HKMG process to fully utilize the advantages of the new CMOS process and mitigate the corresponding side-effects.

#### Conclusions and Future

In addition to the proposed structure within this thesis for fulfilling the purpose of generating a clock signal, the more important conclusion should be addressed is the design methodology of high-speed analogue application, especially when frequency reaches to millimetre wave realm. Firstly, the usage of inductor becomes a necessary technique for expanding the system bandwidth and has been widely applied in many high-speed application. The speed of the circuit is limited by the parasitic capacitance which leads to the bottleneck on the bandwidth in high frequency range. Therefore, by using inductors, the degradation of insufficient bandwidth caused by this capacitance is compensated. Moreover, the type of inductor is another factor of trade-off when applying an inductor to the system. For example, the active inductor has the smallest area while suffering from transistor noise and limited bandwidth; the thick metal inductor has highest Q factor but occupies larger chip area; the stacked layer inductor sacrifices part of the performance of Q factor to condense the area cost. These are the aspects that should be carefully considered by the designer when using an inductor to pursue high-speed performance. On the other hand, another aspect that benefits highspeed analogue design is the development of the technology process. For example, the increasing  $f_T$  of transistor allows faster switching activity which is the primary reason why circuit can operate in higher frequency with more advanced process. In addition, with scaling down of transistor size, the gate capacitance is decreased which results in using smaller inductance. Thereby applying inductor technique becomes more efficient on bandwidth extension and area compacting in sub-micro process node. However, the challenges also exist with the development of technology. The supply voltage is decreased with scaling of process node. In terms of high-speed requirement, less power supply poses a great burden on available voltage room for each transistor. Moreover, this situation becomes worse with more complex circuit structure. Therefore, more elegant and simply architecture is required to maximize the advantages of more advanced technology process. Lastly, the design methodology of layout plays a significant role on guarantee high performance outcome of high-speed implementation. With increasing of achievable frequency, the influence of parasitic capacitance is getting severe as the equivalent impedance between metal layer and substrate is smaller. This will significantly degrade the speed of circuit. Therefore, the top metal layer is required to be used for routing high frequency signals in order to reduce the coupling capacitance to the substrate. Furthermore, in a high-speed

#### Conclusions and Future

application, the signal wiring is no longer a metal track for simply transmitting data or connecting port, but also a transmission line to avoid signal reflection and crosstalk.

The contributions of this thesis are listed as follows in more detail.

- 1) The proposed inductive peaking VCO presents an idea to combine the advantages of two different types of classic VCO together to acquire both a wide tuning range and a high achievable frequency. From the perspective of performance, no other research work has yet presented a VCO structure (ringbased and LC-based) that is capable of covering over 25GHz frequency tuning range with a moderate phase noise performance. To the authors' knowledge, the experimental results demonstrate the best figure of merit in the field and the best combination of frequency and tuning range. Moreover, the novel VCO structure proposed in the thesis makes ultra-wide frequency tuning possible and provides more options for multiple clock outputs from a PLL.
- 2) More importantly, by practical implementation, it is demonstrated that stacked inductors show significant advantages on producing a compact design area and decreasing parasitic capacitance comparing with thick metal inductors. This advantage becomes more obvious in more advanced processes which allows the idea of applying inductors to other types of circuit for high-speed performance.
- 3) Theoretical analysis and discussions have been conducted on embedding the proposed ultra-wide tuning range VCO with various popular PLL structures. Unfortunately, none of those structures has the capability to deal with the high sensitivity of the wide band VCO and provide stable controllability. By contrast, the proposed DLTC-PLL in this thesis not only successfully resolves the above issues but also fulfils the demands of a broadband clock generation system in terms of high oscillation frequency, wide tuning range and nanosecond level locking time. Compared with other recent work, the proposed PLL structure provides an elegant solution for multiple protocol compatibility in future silicon photonics communication systems.
- 4) Another important contribution of this PLL structure is that it provides an alternative solution for generating broadband clock signals and exploits the use of a ring-based VCO in high speed applications.

Regarding the important role of layout in practical circuit implementation, the thesis presents a methodology for designing high-speed analogue circuits from a new perspective. Although different references in the literature have investigated same concept, they are all from the point of view of a single transistor, while this thesis supplements the concept and brings the discussion into more the complex situation of a real, high speed analogue circuit. By investigating the relationship of different finger widths and the primary performance of analogue circuits, the thesis provides a guidance on making decisions on finger width at the layout stage and gives a direction to further exploration from the finger width point of view.

### 7.2 Future Work

There are several potential research areas behind this PhD work can be undertaken in the future. Briefly, the further research contains the following areas for investigation:

The first aspect is the improvement of the integration and measurement approach. The integration technique used in this thesis is based on wire bonding for all the practical design cases. However, the parasitic inductance of the bonding wire greatly impacts the performance of the circuit designs, especially when the operation frequency reaches the millimetre wave regime.



Fig. 7-1 Illustrations of Integration (a) Wire Bonding (b) Flip-chip Bonding

Fig.7-1(a) illustrates the wire bonding integration of a PCB with a silicon die. The electrical silicon die is usually 300 μm to 350 μm thick. For a short bonding wire of approximately 800 μm to 1mm length, according to [141], the equivalent parasitic inductance per millimetre bonding wire is roughly 1nH. Moreover, different position of PCB designs with respect to the silicon die will lead to different lengths of bonding wire. Therefore, another integration technique needs to be applied in high-speed analogue design. An attractive solution is using flip-chip bonding integration, as shown in fig.7-1(b). The bonding wire is replaced by micro-balls which a diameter of about 55 μm to 60 μm. Thereby, the distance that signal travels between silicon die and PCB is greatly reduced and consequently, so are the parasitic effects. However, the challenge of this future work is, for the specific design case of a PLL in a 40nm CMOS process, the PAD pitch applied is 100 μm while the minimum track gap that can be provided from PCB fabrication is 75 μm which is not enough space to flip a 40nm chip onto the PCB directly. Therefore, a new integration approach is required to get compatibility between those standards, in associated with a corresponding PCB design.



Fig. 7- 2 Potential Circuit Design based on DLTC-PLL

Secondly, the potential circuit design can be built upon the proposed DLTC-PLL topology. The idea of detecting both the rising and falling edge of the reference clock is realized which could double the loop bandwidth. Not only does it benefit the phase locking performance but also further supresses the VCO's in-band noise. However, only a single reference is applied to the proposed PLL. If the design can be correctly operated with interleaving reference clock pulses, then the performance and use of the

proposed structure can be enhanced. Fig.7-2 illustrates a possible topology. A quadrature reference clock is incorporated with a quadrature phase output VCO, and the desired frequency can be calibrated every quarter period of the reference clock to further improve the jitter performance of the PLL. However, the challenge is the potential timing mismatch between two dividing paths. Furthermore, a new structure of PLL may be required to correlate the detected phase differences and merge them together to precisely control the VCO.

Finally, a clock generation system for other silicon photonics based applications is interesting future work. In this thesis, the clock generation system is mainly focused on the broadband use in silicon photonics applications. However, in a pilot research project within an emerging field, unforeseen technique solutions always emerge. One of the recent research trends is to utilise silicon photonics technology within coherent optical links which poses an exciting challenge to provide a narrowband clock generation system. One example can be made with a spectrum slice synthesizer. Briefly, a super-broadband signal is divided into several spectral slices and then parallelized and mapped to the optical domain by the DACs and electro-optical modulators in order to generate a large-bandwidth single-carrier signal [142]. The topology is illustrated by fig.7-3.



Fig. 7-3 Topology of Spectral Slice Synthesizer

The need is underlined if we take consideration of cost. The current commercial approach with large-volume discrete devices is relatively expensive and not suitable for spectral slice synthesis which points to the solution of using silicon photonics. Therefore, the equivalent of the discrete complex photonic system can be realized on a small silicon chip. However, the very first step for a spectrum slice synthesizer is to

generate multiple optical carriers. A common method used in wavelength division multiplex (WDM) technology is applying multiple external cavity lasers (ECL) to provide multiple wavelengths [143]. However, this approach is not only expensive but also the numbers of discrete lasers are increasing the challenge for system integration. For those reasons, a dedicated optical comb generation system is necessary. The functionality of the optical comb generator can be realized with a fixed frequency generator incorporated with a modulator driver.





Fig. 7-4 (a) Optical Comb Generator (b) Narrowband Clock Generation System for Optical Comb

As see in fig.7-4(a), the electrical spectral is modulated onto the optical carrier that is generated by a single external laser source. In addition, to avoid jitter distortion on the modulated optical signal, this clock generation system is required to be very low noise. The possible solution is to apply an LC-based PLL following with a narrow-band power amplifier, the topology of which is illustrated in fig.7-4(b). Moreover, to ensure the narrowband modulator driver targets the resonant frequency of the LC-PLL, some correlated regulation has to be applied to dynamically control the spectrum of both the PLL and the driver into the same frequency range. However, when considering the

## Conclusions and Future

different driving load between the LC-PLL and the power amplifier, the potential frequency mismatch issue cannot be ignored which may require another frequency calibration loop to be created. Thus further work is required to determine how best to implement such work.

# **Appendix. A Basic Concepts**

In this Appendix, it will introduce well known Barkhausens' theory of oscillation in associated with the basic concepts of two oscillator structures (Ring-based oscillator and Inductor-capacitor oscillator) that commonly used in modern voltage controlled oscillator and phase locked loop. Furthermore, the concepts of phase noise and reference spurs are presented at end.

## A.1 General Theory of Oscillation

In electronic system, oscillator is the source of providing a periodical output. As such circuit, it has no input while sustaining the output indefinitely. Usually, the oscillation is an "undesirable" design in a feedback system. However, as for the functionality of periodical output, a badly designed feedback amplifier is helpful.

Fig.A-1 is the oscillatory changing in feedback system with time.



Fig. A - 1 Oscillatory Changing in Feedback System with Time

As it illustrates, if the amplifier itself experiences a large phase shift that force a negative feedback becoming positive feedback, then the oscillation will happen. More specifically, if the signal experiences a 180° phase shift and return to the subtractor, with the input, the feedback signal will gives a large difference and continues to reproduce.

For the oscillation, it has to satisfy two conditions:

$$|H(jw_0)| \ge 1 \tag{A.1}$$

$$\angle H(jw_0) = 180^{\circ} \tag{A.2}$$

These two conditions are called "Barkhausen Criteria". In Eq.(A.1), it indicates, for oscillation, the gain in such closed loop should be unity or greater, which the geometric series over many cycles is shown in (A.3).

$$V_x = V_0 + |H(jw_0)|V_0 + |H(jw_0)|^2 V_0 + |H(jw_0)|^3 V_0 + \cdots$$
 (A.3)

The (A.2) denotes that the phase shift of frequency should be  $180^{\circ}$  (Fig.A-1(a)) or the total phase shift of  $360^{\circ}$  (Fig.A-1(b)) to compensate this positive feedback. By satisfying Barkhausen criteria, the system will oscillate at  $w_o$ .

## A.2 RC and LC based Oscillator

Oscillator can be basically categorized into two main branches, which are RC based and LC based.

#### A.2.1 RC based Oscillator

In terms of RC based oscillator, the transistors are used to perform voltage or current amplification. So, in order to satisfy the Barkhausen criterion, the circuit architecture need to be properly configured.

Fig.A-2 (a) is a single stage common source amplifier. It can be seen that the open-loop circuit contains only one pole which can provide a maximum frequency dependent phase shift of 90°. Moreover, the common source stage itself has a low frequency phase shift of 180°. Therefore, the overall phase shift combined together is 270°. According to the criterion, the phase shift in a negative feedback loop should be

180° to satisfy oscillation. Therefore, structure in fig. A-2 (a) is unable to maintain oscillation.



Fig. A - 2 Common Source Amplifier in Loop with Different Stages: (a) single stage common source amplifier (b) two-stages (c) three stages

In fig.A-2(b), another stage of amplifier is cascaded, the achievable frequency dependent phase shift can reach 180° as another pole is added. However, the signal inversion through each common source stage makes the circuit exhibiting positive feedback near zero frequency which the required phase shift is 360°. Therefore, it will also fail to oscillate.

Finally, fig.A-2 (c) is a three stage common source amplifier formed a closed loop. Three poles are able to provide maximum frequency dependent phase shift of  $270^{\circ}$ , while three stages can ensure the negative feedback configuration. The transfer function can be written as (A.4).

$$H(s) = -\frac{A_0^3}{\left(1 + \frac{s}{w_0}\right)^3} \tag{A.4}$$

According to the criterion, the oscillation occurred only if the frequency dependent phase shift equals to  $180^{\circ}$ . Therefore, each pole will contributes  $60^{\circ}$  phase shift.

$$tan^{-1}\frac{w_{osc}}{w_0} = 60^{\circ} \tag{A.5}$$

Therefore

$$w_{osc} = \sqrt{3}w_0 \tag{A.6}$$

By (A.1) in Barkhausen criteria, the minimum loop gain can be given as:

$$\frac{A_0^3}{\left[\sqrt{1+\left(\frac{w_{osc}}{w_0}\right)^2}\right]^3} \ge 1 \tag{A.7}$$

Provided that the gain  $(A_o)$  should be greater than 2 for a three-stage RC oscillator.

#### A.2.2 LC based Oscillator

Inductor-capacitor (LC) based oscillator is another fundamental topology which is widely used. Before to introduce LC oscillator, it is necessary to review the basic properties of LC circuit. As we all know, if an inductor  $(L_p)$  is placed with a capacitor  $(C_p)$  in parallel, the resonance will happen at the frequency of  $w_{LC}$  which decided by (A.8).

$$W_{LC} = \frac{1}{\sqrt{L_p C_p}} \tag{A.8}$$

At the resonant frequency, the impedance of inductor equals to that of capacitor but with opposite phase, thereby producing an infinite impedance ideally. The magnitude and phase of an LC tank is shown in Fig.A-3.



Fig. A - 3 Magnitude and Phase of the Impedance of an LC Tank

However, in practice, due to the existence of metal resistance of inductor, the LC tank can be modelled as an inductor in serial with a resistor (Fig.A-4 (a)). For a narrow band frequency, it usually can be converted into a paralleled equivalent circuit as shown in fig.A-4 (b).



Fig. A - 4 Conversion of a Tank to Three Parallel Components: (a) Practical LC Tank Configuration. (b) Equivalent Parallel LC Tank Configuration

where  $R_p$  is the equivalent parallel resistance of the tank. At resonant frequency, the tank can be regarded as a simple resistor with frequency depended phase shift of  $0^{\circ}$ .



Fig. A - 5 (a) Single Tuned Stage with LC Load. (b) Two Tuned Stages in a Feedback Loop Fig. A-5 (a) is a single stage feedback circuit with a load of LC tank. At resonant frequency, the voltage gain is decided by  $-g_m R_p$ . And the frequency dependent phase shift of the tank never reach  $180^{\circ}$ . Therefore, the circuit will never oscillate.

In fig.A-5(b), another stage is added to increase the phase shift. The total phase shift of the loop is equal to  $360^{\circ}$ . Therefore, the circuit will oscillate if the gain satisfy (A.9).

$$\left(g_m R_p\right)^2 \ge 1\tag{A.9}$$

# A.3 Phase Noise and Reference Spur

Normally, the ideal oscillator should be able to produce a periodical changes output, which means the zero cross should be happened at exact integer multiples of half of period. However, in reality, the noise interference from oscillator or outside the oscillator will randomly disturb the zero crossing. This phenomenon can be illustrated as Fig. A-6.



Fig. A - 6 Noise Interference on Zero Crossing

As it can see that with the noise intrusion, the frequency from oscillator is varied by time. To analyse the influence of noise in the oscillator, the noiseless sinusoidal waveform is assumed as (A.10)

$$V(t) = V_0 \cos w_0 t \tag{A.10}$$

Where  $V_0$  is the amplitude and  $w_0$  is centre frequency. Take noise into consideration, the (A.10) can be rewritten as (A.11).

$$V(t) = (V_0 + v(t))\cos(w_0 t + \emptyset(t)) \tag{A.11}$$

Where v(t) and  $\phi(t)$  represent amplitude and phase fluctuation, respectively. As the amplitude variation can be greatly reduced by attenuator or automatic amplitude control circuit (AAC) [144], the concentration is mainly on the phase fluctuation affection.

There are two main types of phase variations will be considered, which one is periodical variation ( $\emptyset_{ref}(t)$ ) while the other is random variation ( $\emptyset_n(t)$ ). As (A.12) shows.

$$\emptyset(t) = \emptyset_{ref}(t) + \emptyset_n(t)$$
 (A.12)

The random variation will be analysed first. The random variation is normally called phase noise in frequency domain while jitter in time domain. The output spectrum of an ideal oscillator and a noise-in oscillator is displayed in fig.A-7.



Fig. A - 7 Power Spectrum Density of Oscillator (a) Ideal Oscillator (b) Practical Oscillator As Fig. A-7(a) shows, the power spectrum density (PSD) of ideal oscillator is a single impulse at  $w_0$ . But, in Fig. A-7(b) after the frequency experiences a random variation, its PSD departs from  $w_0$ . The impulse gets widened and vanished toward the direction away from  $w_0$ . In mathematic perspective, the noise-in spectrum (fig. A-7(b)) in time domain can be presented as (A.13).

$$V(t)_n = V_0 \cos[w_0 t + \emptyset_n(t)]$$

$$= V_0 [\cos w_0 t \cos \emptyset_n(t) - \sin w_0 t \sin \emptyset_n(t)]$$
(A.13)

For a small phase fluctuation  $\emptyset_n(t) \ll 1 rad$ . Therefore,  $\cos \emptyset_n(t) \approx 1$  while  $\sin \emptyset_n(t) \approx \emptyset_n(t)$ . After the approximations, (A.13) is rewritten as (A.14).

$$V(t)_n \approx V_0[\cos w_0 t - \emptyset_n(t)\sin w_0 t] \tag{A.14}$$

Therefore, the spectrum of  $\emptyset_n(t)$  is translated to a function of centre frequency of  $w_0$ . Due to the phase noise falls away from the centre frequency, to quantify the phase noise, the frequency offset  $\Delta w$  is specified, which is set at certain difference with respect to  $w_0$ . The single-side band (SSB) phase noise can be illustrated as (A.15).

$$\mathcal{L}(\Delta w) = 10 \log \frac{P_{noise}(w_0 + \Delta w, 1Hz)}{P_{carrier}}$$
(A.15)

It indicates that the phase noise is the ratio between 1Hz bandwidth of spectrum at certain offset of  $\Delta w$  to the carrier power. The unit of phase noise is dBc/Hz.

To analyse the periodical noise ( $\phi_{ref}(t)$ ), it is necessary to accompanied with phase locked loop (PLL). There are many source may cause periodical noise, but one of the most common is the reference frequency of the PLL, so that it is also called as reference spur. In a PLL system, the phase-frequency detector (PFD) and charge pump (CP) are clocked at reference frequency. Therefore, the main contribution to the reference spur are including the propagation delay in PFD and CP, charge injection and current mismatches in the CP, and leakage current on VCO's voltage tuning node [65, 145], which is shown in fig.A-8.



Fig. A - 8 Reference Spur on VCO's Tuning Node

The periodical fluctuation on the control voltage is in period of 1 over reference frequency  $(f_{ref})$ . To mathematically modelling the reference spurs, the approach is similar to that in phase noise.  $\emptyset_{ref}(t)$  is expressed as (A.16)

$$\emptyset_{ref}(t) = \Delta \emptyset \sin w_{ref} t \tag{A.16}$$

where  $w_{ref}$  is the reference frequency. According to (A.11), the magnitude of spur can be achieved as (A.17). Assume that in a small noise modulation ( $\Delta \emptyset \ll \pi / 2$ ), , (A.18) can be obtained after the approximation.

$$\begin{split} V(t)_{ref} &= V_0 \cos \left[ w_0 t + \emptyset_{ref}(t) \right] \\ &= V_0 \left[ \cos w_0 t + \Delta \emptyset \sin w_{ref} t \right] \\ &= V_0 \left[ \cos w_0 t \cos \left( \Delta \emptyset \sin w_{ref} t \right) - \sin w_0 t \sin \left( \Delta \emptyset \sin w_{ref} t \right) \right] \end{split} \tag{A.17}$$

$$V(t)_{ref} = V_0 \left[ \cos w_0 t - \frac{\Delta \emptyset}{2} \cos \left( w_0 + w_{ref} \right) t - \frac{\Delta \emptyset}{2} \cos \left( w_0 - w_{ref} \right) t \right] \quad (A.18)$$

Therefore, it can be seen that this periodical noise cause a spurious tone at the offset frequency of  $w_{ref}$  from the centre frequency of  $w_0$ . Fig. A-9 illustrates the phase noise and reference spurs in spectrum.



Fig. A - 9 Phase Noise and Reference Spur in Power Spectrum

Fig. A-10 shows the phase noise comparison of a VCO and PLL. As it sees, the phase noise in VCO is getting lower with the offset frequency getting increase. When it comes to PLL, the phase noise within the loop bandwidth is usually as flat as its reference input, while the out-band phase noise follows that of the VCO. Another thing need to be noted is that, in VCO phase noise curve, when the offset frequency is lower than corner frequency, the roll-off slope is -30dBc/Hz, while beyond that point, the roll-off slope is -20dBc/Hz. This difference is caused by different type of noise source [28].



Fig. A - 10 Phase Noise Comparison between VCO and PLL

To make it more specific, based on the impulse sensitivity function (ISF) model demonstrated in previous work [146], with the noise source be considered as an impulse function, it indicates that the -30dBc/Hz roll-off is caused by the flicker noise while -20dBc/Hz is came from the thermal noise [28]. Moreover, as illustrated in fig. A-10, the lower the offset frequency is, the closer to the centre frequency. Therefore, the flicker noise should be considered as the dominate source that contributes to the phase noise. To quantify the phase noise induced by the flicker noise, (A.19) is obtained in [28].

$$\mathcal{L}(f) = \frac{K_f}{NC_{ox}V_{eff}^2} \left( \frac{1}{W_{n-eff}L_{n-eff}} + \frac{1}{W_{p-eff}L_{p-eff}} \right) \frac{f_o^2}{f^3}$$
(A.19)

The above equation is derived based on a push-pull inverter based ring oscillator.  $K_f$  is the flicker noise coefficient, N is the stage of ring oscillator,  $f_o$  is the oscillation frequency and  $V_{eff}$  is the effective voltage, in that case,  $V_{eff} = VDD/2 - V_t$ . Therefore, it can be seen that some methods can be applied in order to improve the phase noise, such as enhancing the transistor dimensions, insert more delay stage and increase the difference between power supply and threshold voltage.

# **Appendix. B Necessary Building Blocks**

In this section, the necessary building blocks that used in the PLL is introduced, including phase frequency detector, charge pump, loop filter and frequency divider.

#### **B.1** Phase (Frequency) Detector (PD/PFD)

Phase detector is the first module in a PLL system. Based on its functionality, it can be categorized into two different types, which one detects the phase difference between reference frequency and feedback frequency (PD), while the other one can also detect frequency errors (PFD).

For the simplest PD, the function can be achieved by an exclusive OR gate. Fig. B-1 shows the operation of a XOR PD and its characteristic curve.



Fig. B - 1 (a) Operation of XOR PD (b) The Characteristics of XOR PD

Observed from fig. B-1(a), the width of the output pulse is proportional to the phase difference between two input signals. Therefore, we can say that the maximum phase difference is happened when two inputs share half cycle phase shift. As illustrated in fig. B-1(b), the detectable range of XOR PD is from  $-\pi$  to  $+\pi$ . When a XOR PLL in locked, either the rising edge or falling edge of feedback frequency will located at the middle spot to the input signal (data). As results, that will make XOR PD more likely to be used in clock and data recovery application. In terms of this section, it will mainly focus on the review of PFD PLL.

A PFD block is usually built with memory elements such as flip-flop. Fig. B-2 shows the common linear PFD structure with resettable D-type flip-flop (DFF).



Fig. B - 2 Common Linear PFD Structure



Fig. B - 3 Characteristics of PFD (a) Ideal (b) Parctical

By comparing the leading edge of two input signals, PFD will generate UP and DOWN pulse to indicate which input signal is faster. When both these pulses are remained at low, it indicates that the loop is locked. The width of pulse is depended on how much one edge beyond the other one. For example, when the rising edge of " $clock_{ref}$ " comes before " $clock_{div}$ ", UP pulse is generated, until the edge of " $clock_{div}$ " arrived. When the moment that DOWN pulse is generated, it will trigger the NAND gate to reset the DFF and pull UP and DOWN to low. Therefore, ideally, the maximum detectable range for PFD can reach  $\pm 2\pi$ , as shown in Fig. B-3(a). However, due to the delay on the reset path, the practical characteristic of PFD is more like Fig. B-3(b). The absolute phase difference is shrink by  $\Delta$ . The phase difference over  $2\pi - \Delta$  will be lead to the opposite output. Fig. B-4 illustrates the issue of missing clock edge due to this non-ideal behaviour.



Fig. B - 4 Non-ideal Behavior of Missing Clock Edge

Missing clock edge will swap the sequence order of input signals and cause the control voltage of the VCO towards the opposite direction. As consequence, it takes more time for PLL to be settled.

In addition, this non-ideal behaviour will lead to another issue, which is usually called "dead zone" [147] which in the meanwhile is illustrated in Fig. B-5. This is because, with the phase difference between two input signals moving closer, the gain of PFD is decreased and fail to generate valid pulse signal, which mean the loop will not lock tighter when phase error less than certain range [136].



Fig. B - 5 Non-ideal Behavior of Dead Zone

PFD seen in fig. B-2 can be implemented by using RS latches, which have been applied in [95]. An alternative PFD structure implemented in many literature is dynamic PFD [148]. The RS-based latch flip-flop is replaced by a simplified TSPC flip-flop since the D input is tied to "1" [95]. In [149], a pre-charge type PFD is reported, which is developed from the structure of dynamic PFD. Fig. B-6 shows the basic block of pre-charge PFD.



Fig. B - 6 Basic Block Diagram of Pre-charge PFD

The NAND gate in the reset path is replaced by two NMOSs which combined with TSPC latches. Since the output pulses drives less transistor gate on the reset path, the transmission delay is reduced, so that the detectable range of PFD is expanded. Additionally, a simplified version of pre-charge PFD is reported in [150].

Given that the common issues of PFD mentioned above, many techniques have been proposed based on different structure. A simplest method that reported in [28] is to insert several stages of inverter before the reset node to alleviate the dead zone issue. Also, the similar way had been used in [151]. It proposed a method that can predict the reset signal and delay the rising edge so that to avoid missing edges. Based on the dynamic PFD structure, a no feedback design is proposed in [152] which eliminates the dead zone and extend its input range. Fig. B-7 shows a fast acquisition latch-based PFD structure, similar to the simplified pre-charge PFD, which can eliminate dead zone and effectively alleviate the issue of missing edge [147].



Fig. B - 7 Structure of Fast Acquisition Latch-based PFD

Because there are odd stages of inverters that are inserted before M6, which ensure that M5 and M6 will operate at opposite region. And this allows the edge of input be stored for a certain of time. Fig. B-8 illustrates the behaviour of latch-based PFD.



Fig. B - 8 Missing Clock Edge Elimination

As explained in fig. B-4, the missing edge is caused by the transmission delay on the reset path. However, edge information is maintained in latch-based PFD, any input signal arrived during the reset period is no longer be missed. So, in order to expand the detectable range of PFD to  $2\pi$ , it should increase the delay stage of inverter. However, one thing also should be noted is that the maintained time for clock edge should be slightly smaller than reset delay, otherwise the PFD will fail to lock when there is no phase error at input signals [147].

#### **B.2** Charge Pump

Another basic block comes after the PFD is the charge pump. Fig. B-9 shows a basic concept of charge pump.



Fig. B - 9 Basic Concept of Charge Pump

The phase difference is converted to UP and DOWN pulse through PFD. And then forwarded to charge pump to sink or source current on the loop filter. MP and MN are act as current switches. Once the UP pulse switch (MP) turns on, current will be injected into Vct node simultaneously, to charge loop filter. Vice versa to DOWN and MN. However, practically, the charge pump is implemented more like the structure in Fig. B-10(a).

The sources of current are provided from current mirrors which are driven by diode-connected MOSFETs. Moreover, a left branch is added, so when both UP and DOWN pulse stay at low, the source current is driving sink current through the left branch. When either UP or DOWN pulse goes high, the corresponding switch turns on, and current will be injected into/out the loop filter by one of the current source. However, one of great issue faced by this structure is current mismatch. Since when left branch is switched on, the drain of PMOS and NMOS (node V<sub>X</sub>) is not precisely same to the voltage on node Vct. That inevitably lead to current violations that flow between left and right branch. In addition, due to the channel length modulation, a slight voltage variation across current mirror will cause great current variations.



Fig. B - 10 Practical Implementation of Charge Pump (a) without mismatch cancellation (b) with mismatch cancellation

Therefore, in [28], a common used charge pump structure is reported, which is shown in fig. B-10(b). An operational amplifier (op-amp) is added between left and right branch, act like a voltage follower. The large gain of op-amp will pull the node  $V_X$  close to node Vct to avoid current mismatch. Moreover, the cascade current mirror provides larger output impedance, thereby, to alleviate effect of channel length modulation. One drawback of this structure is that it will cause restricted voltage headroom across each MOSFETs with more advanced technology process, since the

decreased supply voltage. And this could lead to great performance degradation on charge pump.

Another techniques used to raise the output impedance is by regulated cascade structure [141, 153]. Fig. B-11(a) shows the regulated cascade charge pump.



Fig. B - 11 (a) Regulated Cascade Charge Pump (b) Op-amp Regulated Charge Pump

One of the advantage of this technique is that the output impedance is increased without costing additional voltage headroom [141]. Benefit from the large gain of opamp, the drain voltage of switch MOSFETs is maintained as close to V<sub>b1</sub> and V<sub>b2</sub>, by adjusting the gate voltage of M3 and M4. Therefore, the current flowed through the switch transistor is relatively constant. It should also note that the regulated cascade charge pump requires wider pulse width of UP and DOWN signal, due to the finite response of op-amp. Which means it needs longer time for the output of op-amp to settle to its proper value when M1 and M2 switches on. Fig. B-11(b) illustrates another approach by using op-amp [154]. Similar to the circuit in Fig. B-11(a), op-amp is applied to bring the Vx close to Vct by regulating the gate voltage of M5 and M3. Hence,  $I_{D3} \approx I_{D5}$ . Also, since Vx has same voltage to Vct, so  $I_{D4} \approx I_{D6}$ , and that bring source current close to sink current [155]. Moreover, by this approach, charge pump is not suffer from the finite response of op-amp, because the output of op-amp still regulate with M5 even when M1 or M2 turned off [141]. But a rail-to-rail input range op-amp is necessary to this structure as Vct is better close to the rail to provide a wider frequency tuning range.

In [156], a gate switching charge pump is proposed, which effectively releases the headroom restriction and also avoids the current mismatch. The structure is shown in fig. B-12.



Fig. B - 12 Gate Switching Charge Pump

The functionality of op-amp is similar to previous literature. By regulating the gate voltage to bring closer between Vx and Vct. Same current is flowed through each branch that results in same current at source and sink end, thereby eliminates the current mismatch. However, unlike the circuit in fig. B-11(b), the UP and DOWN pulse switches in fig. B-12 are placed on the path to the gate of M1 and M2. Therefore, there are only two transistors in serial between power and ground, which left enough voltage headroom for the output stage. Note that, this structure still require a wide input range op-amp for VCO control voltage.

Another common issue for the structure of charge pump that mentioned above is, due to the non-ideal switching behaviour of PMOS and NMOS, certain amount of unwanted charge pump current will leak into control voltage node to create ripple, and pass to VCO, thereby causing additional frequency variation [28]. This ripple is actually a trade-off of PLL's loop bandwidth and will be explained in later session.

#### **B.3** Loop Filter

Loop filter can be regarded as brain to PLL. An uncorrected loop filter value may cost PLL too long time to lock or even fail to lock. The basic loop filter architecture can be categorized into two types, which are passive loop filter and active loop filter, as shown in fig. B-13.



Fig. B - 13 (a) Passive Loop Filter (b) Active Loop Filter

Fig. B-13 (a) is a second order passive loop filter, commonly used in charge pump PLL. A capacitor  $C_2$  is directly connected to ground from voltage control line to open a low impedance path for leaked charge pump ripple to be filter out. However, too large value of  $C_2$  will chop off bandwidth and degrading the loop stability. Normally, a reasonable value for  $C_2$  is  $C_2 \le 0.2C_1$  [141]. Fig. B-13(b) is an active loop filter which implemented by [157, 158]. Compared with passive loop filter, one of advantage of active filter is that a higher supply voltage can be applied on the operational transconductance amplifier (OTA) to provide larger control voltage range for VCO. Moreover, it can be easily configure as differential output for differential VCO. But, its drawbacks are also obvious, such as more power consumption and additional noise from OTA [28].

A big issue for loop filter is capacitor integration, especially for narrow-band PLL, as a quite low loop bandwidth required [95]. Dual-path loop filter is another loop filter architecture which widely used to cope with large on-chip capacitor integration problem. In fig.B-14, the dual-path loop filter is separated into two parts, an integration path and proportional path. The voltage on those two path are added through a voltage adder to form the final control voltage and forward to VCO. Its transfer function can be represented by (B.1).

$$V_{ct} = V_Z + V_P = \frac{I_{cp}[1 + sR(BC_Z + C_p)]}{sC_Z(1 + sRC_Z)}$$
 (B.1)

The integration capacitor  $C_Z$  can be scaled by scaling the dual-path charge pump current through a scaling factor B. Other technique of combing an active loop filter into dual-path loop filter can be found in [95, 157, 158]. However, some disadvantage about this structure can also not be ignored, like delay mismatch of dual charge pump, more active noise and power consumption, voltage decay through passive component [95].



Fig. B - 14 Dual-path Loop Filter

A sample-reset loop filter is proposed in [159], which has good effective on minimizing the ripple on VCO's control signal, hence reduce reference spur. At first, the phase difference is sampled as traditional charge pump PLL, and then an averaged current, which is proportional to the sampled charge pump current, is injected into voltage control line to provide a constant periodical value. At beginning of each period, the previous voltage on sampling capacitor is erased by a reset switch, which connected to the ground. The behaviour of sample-reset charge pump is illustrated as fig.B-15. By this approach, the ripple on the control node to VCO can be effectively reduced, but not eliminated, because it still suffers from the non-ideal switching

behaviour. Based on its characteristic, sample-reset loop filter is more suitable for the situation that reference spurs is a matter concern [95].



Fig. B - 15 Behavior of Sample-reset Charge Pump

#### **B.4** Frequency Division

A frequency divider is normally placed at the feedback path in a PLL to derive a divided frequency  $(f_{div})$  from output  $(f_{out})$  and simultaneously compare with reference clock  $(f_{ref})$  for phase difference.  $f_{out}$  is N times to  $f_{div}$ , where N is dividing ratio. Fig. B-16 shows several frequency dividers (N=2) with different register structures.



172



Fig. B - 16 Alternative Register Structure for Divide-by-2 Frequency Divider (a) True Single Phase Clocked (TSPC) (b) Razavi Divider (c) Wang's Topology (d) Current Mode Latch (CML)

True Single Phase Clocked (TSPC) register (Fig. B-16(a)) is popular for high-speed digital applications, not just because it able to provide reasonable fast speed, but also have advantage of compacted size and only single input clock. A disadvantage is that the speed is usually limited because the signal has to go through three gates per clock cycle and speed is slowed down by the stacked PMOS. And the need for a full scale voltage swing. Fig. B-16(b) is another topology of register reported in [160], which is faster than TSPC approach. Without stacked PMOS and only two gates per clock cycle, this structure can operate faster. However, due to its current steering mode, it suffers from more static power consumption. Also, it requires a differential input clock signal with full voltage swing [161]. With a similar topology, in [162], Wang proposed another fast signal transition register (Fig. B-16(c)). The main difference to the previous structure in [160] is that current steered from PMOS is merged and flow through a clock driven NMOS, thereby, output voltage on one of latch is maintained while the other latch steering current. Therefore, except from all the advantages of structure in Fig. B-16(b), the output signal is now 50% duty cycle. The fastest register topology is current mode latch (CML) [161], which shown in Fig. B-16(d). CML latch avoid clock signal goes through PMOS in order to allow signal transitioning at higher frequency. To further speed-up the frequency of CML based dividers, the inductor peaking technique can be used, which places an inductor in series with load [163]. And other techniques to improve the steering current in CML can be found in [164].

A multi-modulus divider architecture is used to form an integer-N PLL. Fig. B-17 shows high order pre-scaler architecture with 2/3 dual-modulus divider.



Fig. B - 17 Architecture of High Dividing Order Programmable Pre-scaler with 2/3 Dual-modulus Divider

 $mod_n$  signals are used to enable divided-by-3 function. Each cell will re-clock the  $mod_n$  signal from the next stage and generate a new mod signal  $(mod_{n-1}, mod_{n-2}, \dots, mod_1)$  for the previous stage. The overall dividing ratio can be calculated by (B.2).

$$div_{ratio} = 2^{N} + CON_{0}2^{0} + CON_{1}2^{1} \dots \dots + CON_{N-1}2^{N-1}$$
 (B.2)

Fig. B-18 is the structure of a single divide-by-2/3 cell using gating logic. Once the  $mod_{in}$  signal becomes active, 2/3 cell operates in division cycle, and when binary control CON is asserted, the output clock is divided by 3 [165, 166].



Fig. B - 18 Structure Diagram of Dual Modulus Cell



Fig. B - 19 Divide-by-2/3 Dual Modulus by Multiplexing Logic

An alternative divide-by-2/3 dual modulus can achieve by multiplexing logic, as illustrated in Fig. B-19. In [167], a divide-by-4/5/6/7 modulus was implemented by multiplexing logic, with a 4-to-1 MUX and 4 phase divide-by-2 circuit. In [161], a 64 modulus divider was shown, ranging from 32 to 63.5 with half steps of clock input.

# Appendix. C Additional Results of

## **Simulations and Measurements**

Table C - 1 Measurement Results from 6 Different Wafers

|                    | Wafer 1      |              |              |
|--------------------|--------------|--------------|--------------|
|                    | Oscillator 1 | Oscillator 2 | Oscillator 3 |
| Frequency (GHz)    | 12.2         | 12.93        | 13.32        |
| Power (mW)         | 2.785        | 2.76         | 2.67         |
| PN dBc/Hz @ 1MHz   | -62.67       | -78.1        | -68.92       |
| PN dBc/Hz @ 10MHz  | -91.48       | -93.31       | -89.67       |
| PN dBc/Hz @ 100MHz | -120.24      | -120.05      | -119.52      |
|                    | Wafer 2      |              |              |
|                    | Oscillator 1 | Oscillator 2 | Oscillator 3 |
| Frequency (GHz)    | 12.18        | 13.67        | 14.27        |
| Power (mW)         | 2.88         | 2.86         | 2.82         |
| PN dBc/Hz @ 1MHz   | -68.3        | -72.26       | -59.23       |
| PN dBc/Hz @ 10MHz  | -93.77       | -91.84       | -94.73       |
| PN dBc/Hz @ 100MHz | -121.85      | -119.14      | -110.45      |
|                    | Wafer 3      |              |              |
|                    | Oscillator 1 | Oscillator 2 | Oscillator 3 |
| Frequency (GHz)    | 12.26        | 13.89        | 14.39        |
| Power (mW)         | 2.94         | 2.96         | 2.98         |
| PN dBc/Hz @ 1MHz   | -73.73       | -61.76       | -65.23       |
| PN dBc/Hz @ 10MHz  | -92.32       | -92.21       | -86.92       |
| PN dBc/Hz @ 100MHz | -118.74      | -117.18      | -119.9       |
|                    | Wafer 4      |              |              |

Appendix C. Additional Results of Simulations and Measurements

|                    | Oscillator 1 | Oscillator 2 | Oscillator 3 |
|--------------------|--------------|--------------|--------------|
| Frequency (GHz)    | 12.297       | 14.1         | 14.57        |
| Power (mW)         | 2.88         | 2.94         | 2.88         |
| PN dBc/Hz @ 1MHz   | -69.8        | -79.3        | -68.04       |
| PN dBc/Hz @ 10MHz  | -96.03       | -94.07       | -87          |
| PN dBc/Hz @ 100MHz | -121.43      | -121.39      | -119.89      |
|                    | Wafer 5      |              |              |
|                    | Oscillator 1 | Oscillator 2 | Oscillator 3 |
| Frequency (GHz)    | 13.05        | 14.1         | 15.31        |
| Power (mW)         | 3.04         | 3.07         | 3.02         |
| PN dBc/Hz @ 1MHz   | -67.42       | -76.05       | -57.59       |
| PN dBc/Hz @ 10MHz  | -94.25       | -91.66       | -87.62       |
| PN dBc/Hz @ 100MHz | -120.53      | -118.72      | -112.47      |
|                    | Wafer 6      |              |              |
|                    | Oscillator 1 | Oscillator 2 | Oscillator 3 |
| Frequency (GHz)    | 12.29        | 12.92        | 14.67        |
| Power (mW)         | 2.87         | 2.89         | 2.88         |
| PN dBc/Hz @ 1MHz   | -63.18       | -67.61       | -63.51       |
| PN dBc/Hz @ 10MHz  | -93.02       | -93.4        | -88.87       |
| PN dBc/Hz @ 100MHz | -116.4       | -121.31      | -117.99      |

# Appendix. D Design Methodology for

### **High-speed Analogue Circuit**

In this section, the design methodology for general analogue circuit design is introduced with a design flow chart at the beginning. And then, a demonstration of inductive peaking VCO is presented to make a design example. Moreover, in order to take consideration of process variation effects, the simulation of PVT and Monte-carlo are also presented.

### D.1 Design flow of High-speed Analogue Circuit

A well-defined design methodology is necessary for analogue circuit design in order to deliver a high quality performance application. The standard design flow of analogue design is shown in fig.D-1.

The whole design flow contains two stages and each stage is essential to guarantee the performance of final output. At the beginning of the design, it starts with the schematic design of the circuit. Not only has the circuit connection needed to be considered at this step, but also has the connection of each component itself. For example, apart from the gate, source, the drain of the transistor, the bulk of transistor is also required to connect with a certain voltage point, like the ground or the power supply. In addition, another aspect should be considered at this step is the transistor type. There is various

transistor has been provided by the technology process, such as low threshold transistor (LVT), high voltage transistor and deep N-well transistor (DNW), to name just a few. For example, in terms of a cascode structure, to avoid the body effects, the cascoded transistor usually applies with DNW to separately bias the bulk.



Fig. D - 1 Design Flow for Analogue Circuit

After the step of schematic design, a pre-simulation is necessary to evaluate the performance. The simulator of virtuoso platform is based on the spectreRF model for

both pre-simulation and post-layout simulation. If the performance meets the requirement of the specification, the process will move to the layout design. Otherwise, the parameters of the schematic are needed to be adjusted and then redo the presimulation. At the layout design stage, the schematic circuit is mapped into the physical layout through the virtuoso layout editor. The quality of layout is the key to make the final outcome meets the desired requirement and yield a high performance analogue chip. Before the evaluation of the layout's performance, a very important step is the parasitic extraction to investigate the amount of parasitic capacitance, resistance and inductance on the physical interconnects, thereby to calculate the delay and interference in a more authentic way. The process of parasitic extraction is based on the tool called Calibre which is a built-in function on the platform of Cadence. In order to accurately investigate the influence, different kind of parasitic effects are extracted which includes parasitic resistance and capacitance between metal to substrate, the coupling capacitance between two parallel metals, and the parasitic inductance for long metal interconnects. The more accurate the parasitic effects are extracted, the more runtime is required by Calibre. After the extraction finished, the Calibre will generate a parasitic netlist which contains all delay information through each connection net within the circuit. Based on this netlist, the post-layout simulation can be conducted by the SpectreRF simulator to verify the performance. Differ to the pre-simulation, the post-layout simulation usually takes more running time due to the reading of the extracted netlist. Therefore, the biggest issue of post-layout simulation is the timing cost. For some cases, the post-layout simulation may last for several hours or even days, like proposed PLL structure presented in chapter 5. For the last step, if the performances are meet, the layout will be sent to a foundry to fabricate. Otherwise, specific modifications are required either from schematic design step or the layout step for optimization.

### **D.2** Design Example

To demonstrate the design methodology of analogue circuit, a design example is presented in this section. The design case is an inductor peaking VCO which the structure is proposed in section 3.4.2. The implementation is based on TSMC 65nm

technology process. Fig.D-2 shows the structure of inductor peaking VCO and a 50  $\Omega$  output buffer.



Fig. D - 2 Structure of Design Example

The parameters are listed in Table and the layout of VCO is shown in fig.D-3.



Fig. D - 3 Layout View of Design Example

Table D - 1 Key Parameters of VCO and Output Buffer

| Process                | TSMC 65nm CMOS |
|------------------------|----------------|
| VDD (V)                | 1.2            |
| $M_1$ (W/L) ( $\mu$ m) | 20/0.06        |

Appendix D. Design Methodology for High-speed Analogue Circuit

| $M_2$ (W/L) ( $\mu$ m) | 64/0.06 |
|------------------------|---------|
| Inductance (pH)        | ≈350    |
| Peak Q                 | 21.5    |
| Cblock (pF)            | 800     |
| Rbias(k Ω )            | 25      |

To evaluate the performance of VCO at different stage, the pre-simulation and postlayout simulation are presented and compared in fig.D-4.



Fig. D - 4 Results Comparison between Pre-simulation and Post-layout Simulation

As it can see from fig.D-4(a) which is the performance of the frequency tuning range. The main difference happens at the high frequency range where 27.3GHz highest achievable frequency obtained by pre-simulation while 24.9GHz after post-layout simulation. This difference is caused by the parasitic effects which is considered in the post-layout simulation. The parasitic capacitance and resistance increase the load on interconnects of high-speed signal, thereby drop the achievable frequency. As for the phase noise performance in fig.D-4(b), about 5dBc difference can be observed at 1MHz offset frequency. The phase noise result is obtained by using periodical steady-state analysis (*PSS*) associated with *Pnoise* function. As defined by the (D.1)

$$\mathcal{L}(f) = \frac{K_f}{NC_{ox}V_{eff}^2} \left( \frac{1}{W_{n-eff}L_{n-eff}} + \frac{1}{W_{p-eff}L_{p-eff}} \right) \frac{f_{osc}^2}{f^3}$$
(D.1)

The phase noise within the offset frequency range is proportional to the oscillation frequency. Therefore, the reason that the post-layout simulation acquires better phase noise performance is that the achieved frequency in the post-layout simulation is lower than that of in pre-simulation due to the parasitic effects, thereby the phase noise is slightly better.

In addition to this, another aspect should be considered to evaluate the final performance of the design is the process variation. In order to verify the reliability of design under different environment, PVT (process, voltage, temperature) simulation results are presented and shown in fig.D-5.







Fig. D - 5 PVT Simulation (a) Process (b) Voltage (c) Temperature

As shown in fig.D-5(a), which is the process variation at three process corner (ff, tt, ss), tt represents typical process corner while ff and ss represent fastest and slowest process corner respectively. At the highest oscillation frequency, the difference between ff and ss is 2.2GHz. However, with the decreasing of frequency, the difference between different corners is increasing, especially at the middle frequency band in which there is about 11.7GHz frequency variation caused by process corner. When it comes to the voltage variation shown in fig.D-5(b), the power supply is changed from low to high where the lowest voltage is 1.08V while the highest voltage is 1.32V. Again, a visible difference in frequency can be found in the middle range which reaches to 12.9GHz when the supply voltage is low, the output frequency is dropped rapidly with increasing of the control voltage. Lastly, as for the temperature variation in fig. D-5(c) which the extreme temperature is ranged from -40°C to 127°C while the nominal room temperature is assumed at 27°C. At highest output frequency, the variation on frequency is 0.56GHz. And a slight variant of 2.4GHz can be observed at low frequency range. According to PVT simulation results, it can see that the variation of process corner, supply voltage and temperatures to some extent affects the performance of VCO. Therefore, to minimize the influence of PVT variation, the VCO is recommended to be used in a phase locked loop and an adaptive regulation mechanism is necessary to dynamically adjust the output frequency.

On the other hand, Monte-carlo simulation is another common approach to investigate the possibility of PVT variation and percentage of mismatch. To run the Monte-carlo simulation, the model library need to be switched to MC mode which is a dedicated mode for Monte-carlo. Moreover, by setting a larger number of point, it can increase the accuracy of Monte-carlo simulation. However, it will also increase the simulation time. In this design case, the number of points is set to 100 to obtain moderate accuracy and relative short simulation time. Fig.D-6 illustrates the Monte-carlo simulation based on industry standard testing condition.







Fig. D - 6 Monte-carlo Simulation under Different Testing Condition (a) 1.2V Power Supply with 27°C Temperature (b) 1.08V Power Supply with 127°C Temperature(c) 1.32V Power Supply with -40°C Temperature

Fig.D-6(a) is the Monte-carlo simulation result under normal testing condition (1.2V power supply with 27°C room temperatures). The average frequency is around 24.89GHz which the single deviation is about 319MHz. The other two testing condition is 1.08V power supply with 127°C temperature (fig.D-6(b)) and 1.32V power supply with -40°C temperature (fig.D-6(c)). The resulted frequency is concentred around 23.79GHz and 24.56GHz with deviation of 275.8MHz and 382.9MHz respectively. Although the averaged frequency is slightly lower than that of in normal condition, the frequency variation is suffered from fewer effects.

# **Appendix. E Inductor Modelling**

The modelling of an inductor is categorized into two types, which one is the library-based inductor and the other is a fully customized inductor. As for the former type, the library-based inductor usually applies with thick metal to acquire higher quality factor by lowering the metal resistance. However, the drawback is that it requires a larger layout dimension, which increases the parasitic capacitance and poses more challenge on signal wiring in high-speed design. On the other hand, the fully customized inductor provide a solution on compromising the quality of Q factor and layout dimension. In this section, the design flow for the customized inductor is introduced with a practical design example.

### E.1. Design Flow of fully customized inductor

Fig. E-1 is the design flow for a fully customized inductor.

The first step to designing a customized inductor is to draw the inductor layout based on the platform of virtuoso layout editor. The key parameters to vary the performance of inductor are including inductor width, metal spacing, number of turns, inner radius and guard-ring distances. For instance, with larger inner radius, the inductance is increased. However, the overall dimension is also increased which leads to a larger chip area and more parasitic effects. Once the layout of the inductor is designed, the next step is to stream out the layout into GDSII format in which step contains the layer map of the technology process.



Fig. E - 1 Design Flow of Fully Customized Inductor

Before importing the inductor's GDS into advanced design system (ADS) for simulating the S-parameter, it is important to build the substrate layer map in ADS in order to correctly mapping the used layer contained in inductor's GDS file. Generating the required technique files for constructing the substrate layer map can refer to [168] and the approach to build the substrate layer is described in [169]. After that, by importing the GDS file into ADS, the S-parameter simulation can be run with setting up ADS-EM environment. The ADS provides various mathematical modelling template to evaluate the S-parameter performance. As for a single-end spiral inductor, the inductance and Q factor can be plotted by using the 2-port spiral model. Lastly, to simulate the VCO with the customized inductor, the S-parameter data of inductor is

converted to SNP format file where N represents the number of ports, for example, as for a single-end inductor, the port number is 2, thereby, the S-parameter data is exported as S2P format. And then the SNP file is imported to the SpectreRF simulator in Virtuoso platform by creating a new inductor schematic and symbol. By evaluating the performance of the VCO, it can judge whether the current inductance and Q factor meet the performance requirement. Once the inductance is required to be changed, the common approach is to modify the parameters of the inductor from the layout drawing stage. Therefore, it requires careful considerations to trade-off the inductance, Q factor and inductor area when applying fully customized inductor technique.

#### E.2. Design Example

To demonstrate the design flow of fully customized inductor, a design example is conducted based on 65nm technology process. As shown in fig. E-2(a), it is the layout of two port single-end spiral inductor based on virtuoso layout editor. The metal option is selected with top two metal layers (metal 9 and metal 8) given that the Q factor and current density in each metal layer. As it can see that the metal 9 is the thicker layer than that of metal 8 in which experience less metal resistance. Therefore, to allow flow enough current, the width of metal 8 is slightly larger than that of metal 9. On the other hand, given that the inductor area and 2.5 turns structure, the metal 8 and metal 9 is designed in overlap.



Fig. E - 2 (a) The Layout of Two Ports Single-end Spiral Inductor based on Virtuoso Layout Editor (b) 3D View of Inductor based on ADS Platform

It can be observed better in fig. E-2 (b) which is the three-dimensional view of inductor constructed in ADS. The connection between two metal layers is through multiple vias. The overall area of the fully customized inductor is  $20\,\mu\text{m} \times 20\,\mu\text{m}$ . Compared with the counterpart of library-based inductor (73  $\mu\text{m} \times 75\,\mu\text{m}$ ) with the same value of parameters, the area cost is compacted by approximately 90%.

Export the GDS data of inductor and import into ADS simulation environment, the S-parameter analysis can be conducted by setting up ADS-EM. Fig. E-3 illustrates the substrate layer map based on 65nm technology process. The conductor layer (metal 8 and metal 9) of the inductor is highlighted along with the thickness of each layer defined by the process.



Fig. E - 3 Substrate Layer Map based on 65nm Technology Process

According to the S-parameter analysis, the effective inductance and quality factor performance can be obtained by using the 2-port spiral model, which the results over

50GHz frequency range are illustrated in fig. E-4(a) and fig. E-4(b) respectively. As it reveals that the inductance is varied with the changes of operational frequency. The maximum difference is about 7pH. When VCO operating at different oscillation frequency, the customized inductor is experiencing with different inductance which leads to the non-linear influence as shown in section 3.3. Therefore, in terms of a wide frequency tuning application, the effects of non-linear characteristic caused by inductor should be considered and a solution is required to compensate the influence.



Fig. E - 4 S-parameter Results over 50GHz Frequency Range (a) Effective Inductance (b) Quality Factor

For better understanding the trade-off between inductance, Quality factor and inductor area, different size of inductor with its corresponding effective inductance and Q factor are investigated as well. Assuming that the operational frequency is 20GHz, the results of inductance and Q factor are illustrated in fig. E-5(a) and fig. E-5(b).



Fig. E - 5 (a) Effective Inductance scaling with Inductor Size (b) Q Factor scaling with Inductor Size

As it can see, with a larger inductor area, the Q factor is improved. This is benefit from the larger inner radius. However, to further increase the oscillation frequency of VCO, the inductance is required to be decreased thereby the resonant peak can be moved

#### Appendix E. Inductor Modelling

toward to higher frequency range. There are two approaches to reduce the desired inductance which one reduces the number of turns. However, this approach is limited by its physical layout. The other approach is to decrease the size of the inductor. However, as shown in fig. E-5(b), the Q factor is worse with smaller inductor area which will affect the phase noise performance of VCO. Therefore, a careful consideration is necessary during design the VCO with a fully customized inductor.

### References

- [1] Cisco. "The Zettabyte Era: Trends and Analysis," https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/vni-hyperconnectivity-wp.html#\_Toc484556817.
- [2] I. A. Young, E. Mohammed, J. T. S. Liao, A. M. Kern, S. Palermo, B. A. Block, M. R. Reshotko, and P. L. D. Chang, "Optical I/O Technology for Tera-Scale Computing," *IEEE Journal of Solid-State Circuits*, vol. 45, no. 1, pp. 235-248, 2010.
- [3] F. O'Mahony, G. Balamurugan, J. E. Jaussi, J. Kennedy, M. Mansuri, S. Shekhar, and B. Casper, "The future of electrical I/O for microprocessors," *IEEE. International Symposium on VLSI Design, Automation and Test*, pp. 31-34, 2009.
- [4] R. Navid, E. H. Chen, M. Hossain, B. Leibowitz, J. Ren, C.-h. A. Chou, B. Daly, M. Aleksic, B. Su, S. Li, M. Shirasgaonkar, F. Heaton, J. Zerbe, and J. Eble, "A 40 Gb/s Serial Link Transceiver in 28 nm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 4, pp. 814-827, 2015.
- [5] M. Zuffada, "The industrialization of the Silicon Photonics: Technology road map and applications," *IEEE Proceedings of the European Solid-State Device Research Conference (ESSDERC)*, pp. 7-13, 2012.
- [6] B. Razavi, Design of Integrated Circuits for Optical Communications: John Wiley & Sons, 2012.
- [7] D. J. Lockwood, *Silicon photonics II: Components and Integration*, Berlin Heidelberg: Springerverlag, 2011.
- [8] D. Thomson, A. Zilkie, J. E. Bowers, T. Komljenovic, G. T. Reed, L. Vivien, D. Marris-Morini, E. Cassan, L. Virot, J.-M. Fédéli, J.-M. Hartmann, J. H. Schmid, D.-X. Xu, F. Boeuf, P. O'Brien, G. Z. Mashanovich, and M. Nedeljkovic, "Roadmap on silicon photonics," *Journal of Optics*, vol. 18, no. 7, pp. 073003, 2016.
- [9] G. T. Reed, and A. P. Knights, *Silicon Photonics : An Introduction*, England: John Wiley & Sons, Ltd, 2004.
- [10] C. Sun, M. T. Wade, Y. Lee, J. S. Orcutt, L. Alloatti, M. S. Georgas, A. S. Waterman, J. M. Shainline, R. R. Avizienis, S. Lin, B. R. Moss, R. Kumar, F. Pavanello, A. H. Atabaki, H. M. Cook, A. J. Ou, J. C. Leu, Y. H. Chen, K. Asanovic, R. J. Ram, M. A. Popovic, and V. M. Stojanovic, "Single-chip

- microprocessor that communicates directly using light," *Nature*, vol. 528, no. 7583, pp. 534-8, Dec 24, 2015.
- [11] H. Byun, J. Bok, K. Cho, K. Cho, H. Choi, J. Choi, S. Choi, S. Han, S. Hong, S. Hyun, T. J. Jeong, H.-C. Ji, I.-S. Joe, B. Kim, D. Kim, J. Kim, J.-K. Kim, K. Kim, S.-G. Kim, D. Kong, B. Kuh, H. Kwon, B. Lee, H. Lee, K. Lee, S. Lee, K. Na, J. Nam, A. Nejadmalayeri, Y. Park, S. Parmar, J. Pyo, D. Shin, J. Shin, Y.-h. Shin, S.-D. Suh, H. Yoon, Y. Park, J. Choi, K.-H. Ha, and G. Jeong, "Bulk-Si photonics technology for DRAM interface [Invited]," *Photonics Research*, vol. 2, no. 3, pp. A25, 2014.
- [12] G. S. Jeong, W. Bae, and D. K. Jeong, "Review of CMOS Integrated Circuit Technologies for High-Speed Photo-Detection," *Sensors (Basel)*, vol. 17, no. 9, Aug 25, 2017.
- [13] M. Rakowski, M. Pantouvaki, P. De Heyn, P. Verheyen, M. Ingels, H. Chen, J. De Coster, G. Lepage, B. Snyder, K. De Meyer, M. Steyaert, N. Pavarelli, J. S. Lee, P. O'Brien, P. Absil, and J. Van Campenhout, "A 4x20Gb/s WDM ringbased hybrid CMOS silicon photonics transceiver," *IEEE International Solid State Circuits Conference (ISSCC)*, pp. 1-3, 2015.
- [14] C. Sun, M. Georgas, J. Orcutt, B. Moss, Y.-H. Chen, J. Shainline, M. Wade, K. Mehta, K. Nammari, E. Timurdogan, D. Miller, O. Tehar-Zahav, Z. Sternberg, J. Leu, J. Chong, R. Bafrali, G. Sandhu, M. Watts, R. Meade, M. Popovic, R. Ram, and V. Stojanovic, "A Monolithically-Integrated Chip-to-Chip Optical Link in Bulk CMOS," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 4, pp. 828-844, 2015.
- [15] H. Morita, K. Uchino, E. Otani, H. Ohtorii, T. Ogura, K. Oniki, S. Oka, S. Yanagawa, and H. Suzuki, "A 12x7;5 two-dimensional optical I/O array for 600Gb/s chip-to-chip interconnect in 65nm CMOS," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 140-141, 2014.
- [16] Y. Chen, M. Kibune, A. Toda, A. Hayakawa, T. Akiyama, S. Sekiguchi, H. Ebe, N. Imaizumi, T. Akahoshi, S. Akiyama, S. Tanaka, T. Simoyama, K. Morito, T. Yamamoto, T. Mori, Y. Koyanagi, and H. Tamura, "A 25Gb/s hybrid integrated silicon photonic transceiver in 28nm CMOS and SOI," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 1-3, 2015.
- [17] M. Vanhoecke, A. Aimone, N. Argyris, S. Dris, R. Vaernewyck, K. Verheyen, M. Gruner, G. Fiol, D. Apostolopoulos, H. Avramopoulos, G. Torfs, X. Yin, and J. Bauwelinck, "Segmented Optical Transmitter Comprising a CMOS Driver Array and an InP IQ-MZM for Advanced Modulation Formats," *Journal of Lightwave Technology*, vol. 35, no. 4, pp. 862-867, 2017.
- [18] D. J. Thomson, F. Y. Gardes, J.-M. Fedeli, S. Zlatanovic, Y. Hu, B. P. P. Kuo, E. Myslivets, N. Alic, S. Radic, G. Z. Mashanovich, and G. T. Reed, "50-Gb/s Silicon Optical Modulator," *IEEE Photonics Technology Letters*, vol. 24, no. 4, pp. 234-236, 2012.
- [19] E. Temporiti, G. Minoia, M. Repossi, D. Baldi, A. Ghilioni, and F. Svelto, "A 56Gb/s 300mW silicon-photonics transmitter in 3D-integrated PIC25G and 55nm BiCMOS technologies," *IEEE International Solid-State Circuits Conference (ISSCC)*, pp. 404-405, 2016.
- [20] J. Kim, and J. F. Buckwalter, "A 40-Gb/s optical transceiver front-end in 45nm SOI CMOS technology," pp. 1-4, 2010.
- [21] J. F. Buckwalter, X. Zheng, G. Li, K. Raj, and A. V. Krishnamoorthy, "A Monolithic 25-Gb/s Transceiver With Photonic Ring Modulators and Ge

- Detectors in a 130-nm CMOS SOI Process," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 6, pp. 1309-1322, 2012.
- [22] P. Winzer, "Beyond 100G Ethernet," *IEEE Communications Magazine*, vol. 48, no. 7, pp. 26-30, 2010.
- [23] T. Kato, "InP modulators with linear accelerator like segmented electrode structure," *Optical Fiber Communications Conference and Exhibition (OFC)*, 2014, pp. Tu3H.1, 2014.
- [24] A. Mahendra, D. M. Gill, C. Xiong, J. S. Orcutt, B. G. Lee, T. N. Huynh, and J. E. Proesel, "Monolithically integrated CMOS nanophotonic segmented Mach Zehnder transmitter," in Lasers and Electro-Optics (CLEO), 2017.
- [25] Z. Yong, S. Shopov, J. C. Mikkelsen, R. Mallard, J. C. C. Mak, and S. P. Voinigescu, "A 44Gbps high extinction ratio silicon Mach-Zehnder modulator with a 3D-integrated 28nm FD-SOI CMOS driver," in Optical Fiber Communications Conference and Exhibition (OFC), 2017.
- [26] M. Harwood, S. Nielsen, A. Szczepanek, R. Allred, S. Batty, M. Case, S. Forey, K. Gopalakrishnan, L. Kan, B. Killips, P. Mishra, R. Pande, H. Rategh, A. Ren, J. Sanders, A. Schoy, R. Ward, M. Wetterhorn, and N. Yeung, "A 225mW 28Gb/s SerDes in 40nm CMOS with 13dB of analog equalization for 100GBASE-LR4 and optical transport lane 4.4 applications," *IEEE International Solid-State Circuits Conference Digest of Technical Papers* (ISSCC), pp. 326-327, 2012.
- [27] T. Takemoto, H. Yamashita, T. Kamimura, F. Yuki, N. Masuda, H. Toyoda, N. Chujo, K. Kogo, Y. Lee, S. Tsuji, and S. Nishimura, "A 25-Gb/s 2.2-W optical transceiver using an analog FE tolerant to power supply noise and redundant data format conversion in 65-nm CMOS," pp. 106-107, 2012.
- [28] K. Li, "Design and Analysis of High Performance of Low Noise Oscillators and Phase Locked Loops," PhD Dissertation, University of Southampton, 2008.
- [29] H. Jeong, J. Park, S. Ann, and N. Kim, "Integrated High Speed Current-Mode Frequency Divider with Inductive Peaking Structure," *European Modelling Symposium (EMS)*, pp. 479-483, 2014.
- [30] J. Sheng-Lyang, L. Chi-Wen, L. Cheng-Chen, and J. Miin-Horng, "An Active-Inductor Injection Locked Frequency Divider With Variable Division Ratio," *IEEE Microwave and Wireless Components Letters*, vol. 19, no. 1, pp. 39-41, 2009.
- [31] R. Bhattacharya, A. Basu, and S. K. Koul, "A Highly Linear CMOS Active Inductor and Its Application in Filters and Power Dividers," *IEEE Microwave and Wireless Components Letters*, vol. 25, no. 11, pp. 715-717, 2015.
- [32] "200 Gb/s and 400 Gb/s Ethernet Task Force," http://www.ieee802.org/3/bs/index.html.
- [33] "50 Gb/s, 100 Gb/s, and 200 Gb/s Ethernet Task Force," <a href="http://www.ieee802.org/3/cd/index.html">http://www.ieee802.org/3/cd/index.html</a>.
- [34] M.-S. Chen, Y.-N. Shih, C.-L. Lin, H.-W. Hung, and J. Lee, "A Fully-Integrated 40-Gb/s Transceiver in 65-nm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 47, no. 3, pp. 627-640, 2012.
- [35] L. Yue, and E. Alon, "A 66Gb/s 46mW 3-tap decision-feedback equalizer in 65nm CMOS," pp. 30-31, 2013.
- [36] P.-C. Chiang, H.-W. Hung, H.-Y. Chu, G.-S. Chen, and J. Lee, "60Gb/s NRZ and PAM4 transmitters for 400GbE in 65nm CMOS," pp. 42-43, 2014.

- [37] M.-S. Chen, and C.-K. K. Yang, "A 50-64 Gb/s Serializing Transmitter With a 4-Tap, LC-Ladder-Filter-Based FFE in 65 nm CMOS Technology," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 8, pp. 1903-1916, 2015.
- [38] B. Song, K. Kim, J. Lee, and J. Burm, "A 0.18-um CMOS 10-Gb/s Dual-Mode 10-PAM Serial Link Transceiver," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 2, pp. 457-468, 2013.
- [39] J. Lee, P.-C. Chiang, P.-J. Peng, L.-Y. Chen, and C.-C. Weng, "Design of 56 Gb/s NRZ and PAM4 SerDes Transceivers in CMOS Technologies," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 9, pp. 2061-2073, 2015.
- [40] F. Lv, X. Zheng, S. Yuan, Z. Wang, Y. He, C. Zhang, Z. Wang, F. Lv, and J. Wang, "A 40–80 Gb/s PAM4 wireline transmitter in 65nm CMOS technology," pp. 539-542, 2017.
- [41] B. Analui, D. Guckenberger, D. Kucharski, and A. Narasimha, "A Fully Integrated 20-Gb/s Optoelectronic Transceiver Implemented in a Standard 0.13-um CMOS SOI Technology," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 12, pp. 2945-2955, 2006.
- [42] A. Narasimha, B. Analui, Y. Liang, T. J. Sleboda, S. Abdalla, E. Balmater, S. Gloeckner, D. Guckenberger, M. Harrison, R. G. M. P. Koumans, D. Kucharski, A. Mekis, S. Mirsaidi, D. Song, and T. Pinguet, "A Fully Integrated 4x 10-Gb/s DWDM Optoelectronic Transceiver Implemented in a Standard 0.13 um CMOS SOI Technology," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 12, pp. 2736-2744, 2007.
- [43] Y. Chen, M. Kibune, A. Toda, A. Hayakawa, T. Akiyama, S. Sekiguchi, H. Ebe, N. Imaizumi, T. Akahoshi, S. Akiyama, S. Tanaka, T. Simoyama, K. Morito, T. Yamamoto, T. Mori, Y. Koyanagi, and H. Tamura, "A 25Gb/s hybrid integrated silicon photonic transceiver in 28nm CMOS and SOI," pp. 1-3, 2015.
- [44] T. Yagisawa, T. Shiraishi, M. Sugawara, and K. Tanaka, "Novel trace design for high data-rate multi-channel optical transceiver assembled using flip-chip bonding," pp. 1048-1053, 2014.
- [45] M. W. C. Sun, Y. Lee, J. Orcutt, L. Alloatti, M. Georgas, A. Waterman,, R. A. J. Shainline, S. Lin, B. Moss, R. Kumar, F. Pavanello,, H. C. A. Atabaki, A. Ou, J. Leu, Y.-H. Chen, K. Asanović, R. Ram,, and a. V. S. M. Popović, "Single-chip microprocessor that communicates directly using light," *Nature* 528, 2015.
- [46] Z. Jin, Y. Xiaobao, H. Siyang, S. Ying, W. Ziqiang, W. Jia, and C. Baoyong, "A 1.5–1.9GHz phase-locked loop (PLL) frequency synthesizer with AFC and Σ-Δ modulator for Sub-GHz wireless transceiver," *IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)* pp. 1-3, 2014.
- [47] Y. Pan, Y. Huang, and Z. Hong, "A 3~5-GHz low-phase-noise fractional-N frequency synthesizer with AFC for GSM/PCS/DCS/WCDMA tranceivers," *IEEE International Symposium on Radio-Frequency Integration Technology* (*RFIT*), pp. 53-56, 2011.
- [48] D. M. Fischette, A. L. S. Loke, M. M. Oshima, B. A. Doyle, R. Bakalski, R. J. DeSantis, A. Thiruvengadam, C. L. Wang, G. R. Talbot, and E. S. Fang, "A 45nm SOI-CMOS dual-PLL processor clock system for multi-protocol I/O," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 246-247, 2010.

- [49] A. Elkholy, S. Saxena, R. K. Nandwana, A. Elshazly, and P. K. Hanumolu, "A 2.0–5.5 GHz Wide Bandwidth Ring-Based Digital Fractional-N PLL With Extended Range Multi-Modulus Divider," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 8, pp. 1771-1784, 2016.
- [50] S. Min, T. Copani, S. Kiaei, and B. Bakkaloglu, "A 90-nm CMOS 5-GHz Ring-Oscillator PLL With Delay-Discriminator-Based Active Phase-Noise Cancellation," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 5, pp. 1151-1160, 2013.
- [51] K. Sogo, A. Toya, and T. Kikkawa, "A ring-VCO-based sub-sampling PLL CMOS circuit with -119 dBc/Hz phase noise and 0.73 ps jitter," *Proceedings of the ESSCIRC (ESSCIRC)*, pp. 253-256, 2012.
- [52] D. Wei, A. Musa, T. Siriburanon, M. Miyahara, K. Okada, and A. Matsuzawa, "A 0.022mm2 970μW dual-loop injection-locked PLL with −243dB FOM using synthesizable all-digital PVT calibration circuits," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 248-249, 2013.
- [53] A. Musa, W. Deng, T. Siriburanon, M. Miyahara, K. Okada, and A. Matsuzawa, "A Compact, Low-Power and Low-Jitter Dual-Loop Injection Locked PLL Using All-Digital PVT Calibration," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 1, pp. 50-60, 2014.
- [54] L. Changzhi, and L. Jenshan, "A 1–9 GHz Linear-Wide-Tuning-Range Quadrature Ring Oscillator in 130 nm CMOS for Non-Contact Vital Sign Radar Application," *IEEE Microwave and Wireless Components Letters*, vol. 20, no. 1, pp. 34-36, 2010.
- [55] L. Xian, L. Wenyuan, and W. Zhigong, "A wide tuning range LC-VCO using switched capacitor array technique," *International Symposium on Signals Systems and Electronics (ISSSE)*, 2010.
- [56] B. Sadhu, S. Kalia, and R. Harjani, "A 3-band switched-inductor LC VCO and differential current re-use doubler achieving 0.7-to-11.6 GHz tuning range," *IEEE Radio Frequency Integrated Circuits Symposium (RFIC)*, pp. 191-194, 2015.
- [57] D. Yikui Jen, and F. Zhong, "A self-calibrating multi-VCO PLL scheme with leakage and capacitive modulation mitigations," *IEEE International Symposium on Circuits and Systems (ISCAS)*, pp. 1400-1403, 2013.
- [58] L. Xu, K. Stadius, and J. Ryynanen, "A wide-band digitally controlled ring oscillator," *International Symposium on Circuits and Systems (ISCAS)*, pp. 1983-1986, 2010.
- [59] Y. A. Eken, and J. P. Uyemura, "A 5.9-GHz Voltage-Controlled Ring Oscillator in 0.18um CMOS," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 1, pp. 230-233, 2004.
- [60] J. Liang, Z. Zhou, J. Han, and D. G. Elliott, "A 6.0–13.5 GHz Alias-Locked Loop Frequency Synthesizer in 130 nm CMOS," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 1, pp. 108-115, 2013.
- [61] S. Kamran, and N. Ghaderi, "A novel high speed CMOS pseudo-differential ring VCO with wide tuning control voltage range," *Iranian Conference on Electrical Engineering (ICEE)*, pp. 201-204, 2017.
- [62] S. y. Lee, S. Amakawa, N. Ishihara, and K. Masu, "Low-phase-noise wide-frequency-range ring-VCO-based scalable PLL with subharmonic injection locking in 0.18um CMOS," *MTT-S International Microwave Symposium Digest (MTT)*, pp. 1178-1181, 2010.

- [63] L. Lu, C. Li, and J. Lin, "A regulated 3.1–10.6 GHz linear dual-tuning differential ring oscillator for UWB applications," *International Symposium on Circuits and Systems (ISCAS)*, pp. 225-228, 2011.
- [64] S. Yoo, J. J. Kim, and J. Choi, "A 2–8 GHz Wideband Dually Frequency-Tuned Ring-VCO With a Scalable KVCO," *IEEE Microwave and Wireless Components Letters*, vol. 23, no. 11, pp. 602-604, 2013.
- [65] W. S. T. Yan, and H. C. Luong, "A 900-MHz CMOS low-phase-noise voltage-controlled ring oscillator," *IEEE Transactions on Circuits and Systems II:* Analog and Digital Signal Processing, vol. 48, no. 2, pp. 216-221, 2001.
- [66] J. Jalil, M. B. I. Reaz, and M. A. M. Ali, "CMOS Differential Ring Oscillators: Review of the Performance of CMOS ROs in Communication Systems," *IEEE Microwave Magazine*, vol. 14, no. 5, pp. 97-109, 2013.
- [67] S. Meng-Hung, L. Po-Hsiang, and H. Po-Chiun, "A 1-V CMOS Pseudo-Differential Amplifier With Multiple Common-Mode Stabilization and Frequency Compensation Loops," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 55, no. 5, pp. 409-413, 2008.
- [68] A. E. Siegman, Lasers, Mill Valley: CA: University Science Books 1986.
- [69] B. Mesgarzadeh, and A. Alvandpour, "A Study of Injection Locking in Ring Oscillators," *IEEE International Symposium on Circuits and Systems, ISCAS*. pp. 5465-5468, 2005.
- [70] B. Razavi, "A study of injection locking and pulling in oscillators," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1415-1424, 2004.
- [71] L. Hai Qi, G. Wang Ling, L. Siek, L. Wei Meng, and Z. Yue Ping, "A Low-Noise Multi-GHz CMOS Multiloop Ring Oscillator With Coarse and Fine Frequency Tuning," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 17, no. 4, pp. 571-577, 2009.
- [72] Z. Shu, K. L. Lee, and B. H. Leung, "A 2.4-GHz Ring-Oscillator-Based CMOS Frequency Synthesizer With a Fractional Divider Dual-PLL Architecture," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 3, pp. 452-462, 2004.
- [73] Z. Huang, H. C. Luong, B. Chi, Z. Wang, and H. Jia, "A 70.5-to-85.5GHz 65nm phase-locked loop with passive scaling of loop filter," *IEEE International Solid- State Circuits Conference (ISSCC)*, pp. 1-3, 2015.
- [74] B. Catli, A. Nazemi, T. Ali, S. Fallahi, Y. Liu, J. Kim, M. Abdul-Latif, M. R. Ahmadi, H. Maarefi, A. Momtaz, and N. Kocaman, "A Sub-200 fs RMS jitter capacitor multiplier loop filter-based PLL in 28 nm CMOS for high-speed serial communication applications," *IEEE Custom Integrated Circuits Conference (CICC)*, pp. 1-4, 2013.
- [75] W. Wanghua, B. Xuefei, R. B. Staszewski, and J. R. Long, "A 56.4-to-63.4GHz spurious-free all-digital fractional-N PLL in 65nm CMOS," *IEEE International Solid-State Circuits Conference Digest of Technical Papers* (ISSCC), pp. 352-353, 2013.
- [76] C. Ching-Che, and L. Chen-Yi, "An all-digital phase-locked loop for high-speed clock generation," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 2, pp. 347-351, 2003.
- [77] N. Pavlovic, and J. Bergervoet, "A 5.3GHz digital-to-time-converter-based fractional-N all-digital PLL," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 54-56, 2011.
- [78] Z. Cao, Y. Li, and S. Yan, "A 0.4 ps-RMS-Jitter 1–3 GHz Ring-Oscillator PLL Using Phase-Noise Preamplification," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 9, pp. 2079-2089, 2008.

- [79] S. Sidiropoulos, L. Dean, K. Jaeha, W. Guyeon, and M. Horowitz, "Adaptive bandwidth DLLs and PLLs using regulated supply CMOS buffers," *Digest of Technical Papers Symposium on VLSI Circuits*, pp. 124-127, 2000.
- [80] J. G. Maneatis, "Low-jitter process-independent DLL and PLL based on self-biased techniques," *IEEE Journal of Solid-State Circuits*, vol. 31, no. 11, pp. 1723-1732, 1996.
- [81] J. G. Maneatis, K. Jaeha, I. McClatchie, J. Maxey, and M. Shankaradas, "Self-biased, high-bandwidth, low-jitter 1-to-4096 multiplier clock-generator PLL," *IEEE International Digest of Technical Papers. ISSCC* vol. 1, pp. 424-504, 2003.
- [82] R. Nonis, N. Da Dalt, P. Palestri, and L. Selmi, "Modeling, design and characterization of a new low-jitter analog dual tuning LC-VCO PLL architecture," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 6, pp. 1303-1309, 2005.
- [83] S. Choi, S. Yoo, Y. Lim, and J. Choi, "A PVT-Robust and Low-Jitter Ring-VCO-Based Injection-Locked Clock Multiplier With a Continuous Frequency-Tracking Loop Using a Replica-Delay Cell and a Dual-Edge Phase Detector," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 8, pp. 1878-1889, 2016.
- [84] C.-F. Liang, and K.-J. Hsiao, "An injection-locked ring PLL with self-aligned injection window," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 90-92, 2011.
- [85] A. Arakali, S. Gondi, and P. Kumar Hanumolu, "Low-Power Supply-Regulation Techniques for Ring Oscillators in Phase-Locked Loops Using a Split-Tuned Architecture," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 8, pp. 2169-2181, 2009.
- [86] L. Bizjak, N. Da Dalt, P. Thurner, R. Nonis, P. Palestri, and L. Selmi, "Comprehensive Behavioral Modeling of Conventional and Dual-Tuning PLLs," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 55, no. 6, pp. 1628-1638, 2008.
- [87] A. Sai, Y. Kobayashi, S. Saigusa, O. Watanabe, and T. Itakura, "A digitally stabilized type-III PLL using ring VCO with 1.01psrms integrated jitter in 65nm CMOS," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 248-250, 2012.
- [88] J. Lee, and H. Wang, "Study of Subharmonically Injection-Locked PLLs," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 5, pp. 1539-1553, 2009.
- [89] X. Zheng, F. Lv, F. Zhao, S. Yue, C. Zhang, Z. Wang, F. Li, H. Jiang, and Z. Wang, "A 10 GHz 56 fsrms-integrated-jitter and -247 dB FOM ring-VCO based injection-locked clock multiplier with a continuous frequency-tracking loop in 65 nm CMOS," *IEEE Custom Integrated Circuits Conference (CICC)*, pp. 1-4, 2017.
- [90] J.-C. Chien, P. Upadhyaya, H. Jung, S. Chen, W. Fang, A. M. Niknejad, J. Savoj, and K. Chang, "A pulse-position-modulation phase-noise-reduction technique for a 2-to-16GHz injection-locked ring oscillator in 20nm CMOS," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 52-53, 2014.
- [91] Y.-C. Huang, and S.-I. Liu, "A 2.4GHz sub-harmonically injection-locked PLL with self-calibrated injection timing," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 338-340, 2012.
- [92] I. T. Lee, C. Yen-Jen, L. Shen-Iuan, J. Chewn-Pu, H. Fu-Lung, and H. Hsieh-Hung, "A divider-less sub-harmonically injection-locked PLL with self-

- adjusted injection timing," *IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC)*, pp. 414-415, 2013.
- [93] B. Sadhu, M. A. Ferriss, A. S. Natarajan, S. Yaldiz, J.-O. Plouchart, A. V. Rylyakov, A. Valdes-Garcia, B. D. Parker, A. Babakhani, S. Reynolds, X. Li, L. Pileggi, R. Harjani, J. Tierno, and D. Friedman, "A linearized, low-phase-noise VCO-based 25-GHz PLL with autonomic biasing," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 5, pp. 1138-1150, 2013.
- [94] M. Ferriss, A. Rylyakov, H. Ainspan, J. Tierno, and D. Friedman, "A 28GHz hybrid PLL in 32nm SOI CMOS," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 4, 2014.
- [95] S. Keliu, and S. Edgar, *CMOS PLL Synthesizers: Analysis and Design*, United States of America: Springer Science, 2005.
- [96] S. Aniruddhan, S. Shekhar, and D. J. Allstot, "A CMOS 1.6 GHz Dual-Loop PLL With Fourth-Harmonic Mixing," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 58, no. 5, pp. 860-867, 2011.
- [97] X. Gai, S. Chartier, A. Trasser, and H. Schumacher, "A 35 GHz dual-loop PLL with low phase noise and fast lock for millimeter wave applications," *IEEE MTT-S International Microwave Symposium Digest (MTT)*, pp. 1-4, 2011.
- [98] X. Gai, A. Trasser, and H. Schumacher, "A fully integrated low phase noise, fast locking, 31 to 34.9 GHz dual-loop PLL," *41st European Microwave Conference (EuMC)*, 2011.
- [99] K. Sogo, A. Toya, and T. Kikkawa, "A ring-VCO-based sub-sampling PLL CMOS circuit with -119 dBc/Hz phase noise and 0.73 ps jitter," *Proceedings of the ESSCIRC (ESSCIRC)*, pp. 253-256, 2012.
- [100] X. Gao, E. A. M. Klumperink, M. Bohsali, and B. Nauta, "A Low Noise Sub-Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is Not Multiplied by \$N ^{2}\$," *IEEE Journal of Solid-State Circuits*, vol. 44, no. 12, pp. 3253-3263, 2009.
- [101] W. Deng, D. Yang, T. Ueno, T. Siriburanon, S. Kondo, K. Okada, and A. Matsuzawa, "A Fully Synthesizable All-Digital PLL With Interpolative Phase Coupled Oscillator, Current-Output DAC, and Fine-Resolution Digital Varactor Using Gated Edge Injection Technique," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 1, pp. 68-80, 2015.
- [102] Y. Lee, M. Kim, T. Seong, and J. Choi, "A Low Phase Noise Injection-Locked Programmable Reference Clock Multiplier With a Two-Phase PVT-Calibrator for segma-delta PLLs," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 3, pp. 635-644, 2015.
- [103] W. Deng, A. Musa, T. Siriburanon, M. Miyahara, K. Okada, and A. Matsuzawa, "A dual-loop injection-locked PLL with all-digital background calibration system for on-chip clock generation," *19th Asia and South Pacific Design Automation Conference (ASP-DAC)*, pp. 21-22, 2014.
- [104] M. Kim, S. Choi, and J. Choi, "A 450-fs jitter PVT-robust fractional-resolution injection-locked clock multiplier using a DLL-based calibrator with replicadelay-cells," *IEEE Symposium on VLSI Circuits (VLSI Circuits)*, pp. C142-C143, 2015.
- [105] M. Kim, S. Choi, T. Seong, and J. Choi, "A Low-Jitter and Fractional-Resolution Injection-Locked Clock Multiplier Using a DLL-Based Real-Time PVT Calibrator With Replica-Delay Cells," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 2, pp. 401-411, 2016.

- [106] R. Farjad-Rad, W. Dally, N. Hiok-Tiaq, R. Senthinathan, M. J. E. Lee, R. Rathi, and J. Poulton, "A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 12, pp. 1804-1812, 2002.
- [107] A. Elshazly, R. Inti, B. Young, and P. K. Hanumolu, "Clock Multiplication Techniques Using Digital Multiplying Delay-Locked Loops," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 6, pp. 1416-1428, 2013.
- [108] S. Levantino, G. Marucci, G. Marzin, A. Fenaroli, C. Samori, and A. L. Lacaita, "A 1.7 GHz Fractional-N Frequency Synthesizer Based on a Multiplying Delay-Locked Loop," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 11, pp. 2678-2691, 2015.
- [109] B. M. Helal, M. Z. Straayer, G.-Y. Wei, and M. H. Perrott, "A Highly Digital MDLL-Based Clock Multiplier That Leverages a Self-Scrambling Time-to-Digital Converter to Achieve Subpicosecond Jitter Performance," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 4, pp. 855-863, 2008.
- [110] S. Huang, J. Cao, and M. M. Green, "An 8.2 Gb/s-to-10.3 Gb/s Full-Rate Linear Referenceless CDR Without Frequency Detector in 0.18 um CMOS," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 9, pp. 2048-2060, 2015.
- [111] L. Jri, K. S. Kundert, and B. Razavi, "Analysis and modeling of bang-bang clock and data recovery circuits," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 9, pp. 1571-1580, 2004.
- [112] Y. Chen, Z. Wang, X. Fan, H. Wang, and W. Li, "A 38 Gb/s to 43 Gb/s Monolithic Optical Receiver in 65 nm CMOS Technology," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 12, pp. 3173-3181, 2013.
- [113] J. Lee, and B. Razavi, "A 40 Gb/s clock and data recovery circuit in 0.18 μm CMOS technology," *IEEE International Digest of Technical Papers Solid-State Circuits Conference (ISSCC)*, vol. 1, pp. 242-491, 2003.
- [114] J. W. Jung, and B. Razavi, "A 25-Gb/s 5-mW CMOS CDR/Deserializer," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 3, pp. 684-697, 2013.
- [115] "10 Gb/s Bang-Bang Clock and Data Recovery (CDR) for optical
- transmission systems," https://www.adv-radio-sci.net/3/293/2005/ars-3-293-2005.pdf.
- [116] J. Cao, M. Green, A. Momtaz, K. Vakilian, D. Chung, J. Keh-Chee, M. Caresosa, X. Wang, T. Wee-Guan, C. Yijun, L. Fujimori, and A. Hairapetian, "OC-192 transmitter and receiver in standard 0.18-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 12, pp. 1768-1780, 2002.
- [117] A. Momtaz, J. Cao, M. Caresosa, A. Hairapitian, D. Chung, K. Vakitian, M. Green, B. Tan, J. Keh-Chee, I. Fujimori, G. Gutierrez, and C. Yijun, "Fully-integrated SONET OC48 transceiver in standard CMOS," *IEEE International Digest of Technical Papers. Solid-State Circuits Conference (ISSCC)*, pp. 76-77, 2001.
- [118] G. Shu, S. Saxena, W.-S. Choi, M. Talegaonkar, R. Inti, A. Elshazly, B. Young, and P. K. Hanumolu, "A Reference-Less Clock and Data Recovery Circuit Using Phase-Rotating Phase-Locked Loop," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 4, pp. 1036-1047, 2014.
- [119] R. Inti, W. Yin, A. Elshazly, N. Sasidhar, and P. K. Hanumolu, "A 0.5-to-2.5 Gb/s Reference-Less Half-Rate Digital CDR With Unlimited Frequency Acquisition Range and Improved Input Duty-Cycle Error Tolerance," *IEEE Journal of Solid-State Circuits*, vol. 46, no. 12, pp. 3150-3162, 2011.

- [120] G. Shu, W.-S. Choi, S. Saxena, M. Talegaonkar, T. Anand, and A. Elkholy, "A 4-to-10.5 Gb/s Continuous-Rate Digital Clock and Data Recovery With Automatic Frequency Acquisition," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 2, pp. 428-439, 2016.
- [121] S. B. Anand, and B. Razavi, "A CMOS clock recovery circuit for 2.5-Gb/s NRZ data," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 3, pp. 432-439, 2001.
- [122] N. Kalantari, and J. F. Buckwalter, "A Multichannel Serial Link Receiver With Dual-Loop Clock-and-Data Recovery and Channel Equalization," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 11, pp. 2920-2931, 2013.
- [123] J. Lee, and M. Liu, "A 20-Gb/s Burst-Mode Clock and Data Recovery Circuit Using Injection-Locking Technique," *IEEE Journal of Solid-State Circuits*, vol. 43, no. 3, pp. 619-630, 2008.
- [124] J. Lee, and M. Liu, "A 20Gb/s Burst-Mode CDR Circuit Using Injection-Locking Technique," *IEEE International Digest of Technical Papers. Solid-State Circuits Conference(ISSCC)*, pp. 46-586, 2007.
- [125] H. Kimura, P. M. Aziz, T. Jing, A. Sinha, S. P. Kotagiri, R. Narayan, H. Gao, P. Jing, G. Hom, A. Liang, E. Zhang, A. Kadkol, R. Kothari, G. Chan, Y. Sun, B. Ge, J. Zeng, K. Ling, M. C. Wang, A. Malipatil, L. Li, C. Abel, and F. Zhong, "A 28 Gb/s 560 mW Multi-Standard SerDes With Single-Stage Analog Front-End and 14-Tap Decision Feedback Equalizer in 28 nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 49, no. 12, pp. 3091-3103, 2014.
- [126] M. Raj, S. Saeedi, and A. Emami, "22.3 A 4-to-11GHz injection-locked quarter-rate clocking for an adaptive 153fJ/b optical receiver in 28nm FDSOI CMOS," *IEEE International Solid- State Circuits Conference (ISSCC)*, pp. 1-3, 2015.
- [127] F. Lv, J. Wang, H. Wang, Z. Wang, Y. He, Y. Liu, C. Zhang, Z. Wang, and H. Jiang, "A 10 GHz ring-VCO based injection-locked clock multiplier for 40 Gb/s SerDes application in 65 nm CMOS technology," *International Conference on Electron Devices and Solid-State Circuits (EDSSC)*, pp. 1-2, 2017.
- [128] X. Zheng, C. Zhang, F. Lv, F. Zhao, S. Yuan, S. Yue, Z. Wang, F. Li, Z. Wang, and H. Jiang, "A 40-Gb/s Quarter-Rate SerDes Transmitter and Receiver Chipset in 65-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 52, no. 11, pp. 2963-2978, 2017.
- [129] A. Arnaud, and A. Hoffmann, "A complete compact model for flicker noise in MOS transistors," *IEEE 6th Latin American Symposium on Circuits & Systems* (LASCAS), 2015, pp. 1-4, 2015.
- [130] CMOS8RF (CMRF8SF) Design Manual, IBM Corporation.
- [131] M. Parvizi, A. Khodabakhsh, and A. Nabavi, "Low-power high-tuning range CMOS ring oscillator VCOs," *IEEE International Conference on Semiconductor Electronics.*, pp. 40-44, 2008.
- [132] B. Fahs, W. Y. Ali-Ahmad, and P. Gamand, "A Two-Stage Ring Oscillator in 0.13um CMOS for UWB Impulse Radio," *IEEE Transactions on Microwave Theory and Techniques*, vol. 57, no. 5, pp. 1074-1082, 2009.
- [133] T.-P. Wang, and Y.-M. Yan, "A Low-Voltage Low-Power Wide-Tuning-Range Hybrid Class-AB/Class-B VCO With Robust Start-Up and High-Performance <sub>rm</sub>FOM<sub>T</sub>," *IEEE Transactions on Microwave Theory and Techniques*, vol. 62, no. 3, pp. 521-531, 2014.

- [134] S. Sivanaresh M, P. Duhan, and N. R. Mohapatra, "Role of Device Dimensions and Layout on the Analog Performance of Gate-First HKMG nMOS Transistors," *IEEE Transactions on Electron Devices*, vol. 62, no. 11, pp. 3792-3798, 2015.
- [135] D. Zhao, and P. Reynaert, "A 60-GHz Dual-Mode Class AB Power Amplifier in 40-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 10, pp. 2323-2337, 2013.
- [136] R. B, Design of Analog CMOS Integrated Circuits: McGraw-Hill Companies, 2001.
- [137] K. Li, F. Meng, D. J. Thomson, P. Wilson, and G. T. Reed, "Analysis and Implementation of an Ultra-Wide Tuning Range CMOS Ring-VCO With Inductor Peaking," *IEEE Microwave and Wireless Components Letters*, vol. 27, no. 1, pp. 49-51, 2017.
- [138] W. Bing, J. R. Hellums, and C. G. Sodini, "MOSFET thermal noise modeling for analog integrated circuits," *IEEE Journal of Solid-State Circuits*, vol. 29, no. 7, pp. 833-835, 1994.
- [139] A. Arnaud, and C. Galup-Montoro, "Consistent Noise Models for Analysis and Design of CMOS Circuits," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 51, no. 10, pp. 1909-1915, 2004.
- [140] Z. Boyi, D. Li, and J. Jing, "A filter enhanced capacitively phase-coupled low noise 0.6-to-3 GHz Ring VCO," *IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)*, 2016 pp. 1531-1533, 2016.
- [141] B. Razavi, RF Microelectronics: Prentice Hall, 2012.
- [142] H. Mardoyan, R. Rios-Müller, M. A. Mestre, P. Jennev & L. Schmalen, A. Ghazisaeidi, P. Tran, S. Bigo, and J. Renaudier, "Transmission of Single-Carrier Nyquist-Shaped 1-Tb/s Line-Rate Signal over 3,000 km," *Optical Fiber Communications Conference and Exhibition (OFC)*, pp. W3G.2, 2015.
- [143] P. Marin-Palomo, J. N. Kemal, M. Karpov, A. Kordts, J. Pfeifle, M. H. P. Pfeiffer, P. Trocha, S. Wolf, V. Brasch, M. H. Anderson, R. Rosenberger, K. Vijayan, W. Freude, T. J. Kippenberg, and C. Koos, "Microresonator-based solitons for massively parallel coherent optical communications," *Nature*, vol. 546, no. 7657, pp. 274-279, Jun 7, 2017.
- [144] W. Deng, A. Musa, T. Siriburanon, M. Miyahara, K. Okada, and A. Matsuzawa, "A dual-loop injection-locked PLL with all-digital background calibration system for on-chip clock generation," 19th Asia and South Pacific, Design Automation Conference (ASP-DAC), 2014, pp. 21-22, 2014.
- [145] N. Kamal, S. Al-Sarawi, N. H. E. Weste, and D. Abbott, "A Phase-Locked Loop reference spur modelling using Simulink," *Intl Conf on Electronic Devices, Systems and Applications (ICEDSA)*, pp. 279-283, 2010.
- [146] A. Hajimiri, and T. H. Lee, "A general theory of phase noise in electrical oscillators," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 179-194, 1998.
- [147] M. Mansuri, D. Liu, and C. K. K. Yang, "Fast frequency acquisition phase-frequency detectors for Gsamples/s phase-locked loops," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 10, pp. 1331-1334, 2002.
- [148] K. Sungjoon, L. Kyeongho, M. Yongsam, J. Deog-Kyoon, C. Yunho, and L. Hyung Kyu, "A 960-Mb/s/pin interface for skew-tolerant bus using low jitter PLL," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 5, pp. 691-700, 1997.

- [149] H. Notani, H. Kondoh, and Y. Matsuda, "0-7803-1918-4/94/\$3.00 c 1994 IEEE," *Digest of Technical Papers, Symposium on VLSI Circuits, 1994.*, pp. 129-130, 1994.
- [150] H. O. Johansson, "A simple precharged CMOS phase frequency detector," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 295-299, 1998.
- [151] K. Park, and I.-C. Park, "Fast frequency acquisition phase frequency detectors with prediction-based edge blocking," *IEEE International Symposium on Circuits and Systems*, 2009. ISCAS pp. 1891-1894, 2009.
- [152] Y. He, X. Cui, C. L. Lee, and D. Xue, "An improved fast acquisition PFD with zero blind zone for the PLL application," *IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC)*, pp. 1-2, 2014.
- [153] B. J. Hosticka, "Improvement of the gain of MOS amplifiers," *IEEE Journal of Solid-State Circuits*, vol. 14, no. 6, pp. 1111-1114, 1979.
- [154] J.-S. Lee, M.-S. Keel, S.-I. Lim, and S. Kim, "Charge pump with perfect current matching characteristics in phase-locked loops," *Electronics Letters*, vol. 36, no. 23, pp. 1907, 2000.
- [155] S. Bou-Sleiman, and M. Ismail, "Dynamic Self-Regulated Charge Pump With Improved Immunity to PVT Variations," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 22, no. 8, pp. 1716-1726, 2014.
- [156] M. Terrovitis, M. Mack, K. Singh, and M. Zargari, "A 3.2 to 4 GHz, 0.25 μm CMOS frequency synthesizer for IEEE 802.11a/b/g WLAN," *IEEE International Solid-State Circuits Conference, Digest of Technical Papers. ISSCC*, pp. 98-515, 2004.
- [157] K. Yido, H. Hyungki, C. Yongsik, L. Jeongwoo, P. Joonbae, L. Kyeongho, J. Deog-Kyoon, and K. Wonchan, "A fully integrated CMOS frequency synthesizer with charge-averaging charge pump and dual-path loop filter for PCS- and cellular-CDMA wireless systems," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 5, pp. 536-542, 2002.
- [158] L. Chi-Wa, and H. C. Luong, "A 1.5-V 900-MHz monolithic CMOS fast-switching frequency synthesizer for wireless applications," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 4, pp. 459-470, 2002.
- [159] A. Maxim, B. Scott, E. M. Schneider, M. L. Hagge, S. Chacko, and D. Stiurca, "A low-jitter 125-1250-MHz process-independent and ripple-poleless 0.18μm CMOS PLL based on a sample-reset loop filter," *IEEE Journal of Solid-State Circuits*, vol. 36, no. 11, pp. 1673-1683, 2001.
- [160] B. Razavi, K. F. Lee, and R. H. Yan, "Design of high-speed, low-power frequency dividers and phase-locked loops in deep submicron CMOS," *IEEE Journal of Solid-State Circuits*, vol. 30, no. 2, pp. 101-109, 1995.
- [161] M. H. Perrott, "Techniques for High Data Rate Modulation and Low Power Operation of Fractional-N Frequency Synthesizers," PhD Dissertation, Massachusetts Institute of Technology, 1997.
- [162] W. HongMo, "A 1.8 V 3 mW 16.8 GHz frequency divider in 0.25 µm CMOS," *IEEE International Solid-State Circuits Conference, Digest of Technical Papers. ISSCC.*, pp. 196-197, 2000.
- [163] T. Chalvatzis, K. H. K. Yau, R. A. Aroca, P. Schvan, M.-T. Yang, and S. P. Voinigescu, "Low-Voltage Topologies for 40-Gb/s Circuits in Nanoscale CMOS," *IEEE Journal of Solid-State Circuits*, vol. 42, no. 7, pp. 1564-1573, 2007.

## References

- [164] Z. Xuelin, W. Yuan, J. Song, Z. Ganggang, and Z. Xing, "A novel CML latch for ultra high speed applications," *IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC)*, pp. 1-2, 2014.
- [165] B. Razavi, "A Family of LowPower Truly Modular Programmable Dividers in Standard 0.35m CMOS Technology," 2009.
- [166] P. Jiang, M. Cheng, S. Lu, and J. Lin, "A high speed programmable frequency divider," *3rd Asia-Pacific Conference on Antennas and Propagation (APCAP)*, pp. 1130-1133, 2014.
- [167] L. Wenguan, C. Honglin, and S. Lei, "A 4.5GHz 256 ~ 511 Multi-modulus Frequency Divider based on Phase Switching Technique for Frequency Synthesizers," *IEEE International Conference of Electron Devices and Solid-State Circuits (EDSSC)*, pp. 1-4, 2010.
- [168] D. M. C. S. GmbH, "m/matl The EM Technology File Editor for RFIC," 2013.
- [169] Cadence, "Quantus QRC Techgen Reference Manual," 2015.