# Southampton

## University of Southampton Research Repository ePrints Soton

Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.

When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g.

AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination

## **University of Southampton**

## On-Chip Time Measurement Architectures and Implementation

by Matthew Collins

A thesis submitted for the degree of

## **Doctor of Philosophy**

in the

Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science

May 2009

To Daphna and Kai...

#### UNIVERSITY OF SOUTHAMPTON

#### **ABSTRACT**

#### School of Electronics and Computer Science

#### Faculty of Engineering, Science and Mathematics

#### **On-Chip Time Measurement Architectures and Implementation**

#### By Matthew Collins

In recent years, system on chip (SoC) devices have become increasingly popular in many applications, such as automotive, signal processing, portable electronic devices and communication products. This has led to more functionality being integrated onto a single piece of silicon. As the level of technology decreases down to smaller geometries, not only has the design become more complicated but also the verification of such devices has become significantly complex that it has led to stringent timing requirements being placed on such devices. With the continuing integration and speed scaling to higher frequencies into the low giga hertz range, limitations in the effectiveness of traditional production testing have been introduced. The increase in cost of automatic test equipment (ATE) and the fact that the electrical distance between the tester and the embedded core under test (CUT) has got wider has made the verification of such devices challenging.

To alleviate this cost test problem, this research investigates the design and methods associated with high resolution on-chip time measurement systems and proposes the design of a low cost, high resolution, programmable time measurement architecture for characterizing on-chip time measurements. This new architecture is based on the time-to-digital conversion (TDC) method and uses the dual-slope technique to perform the timing measurement. The proposed architecture can perform a number of different types of time measurements, such as rise and fall time, pulse width and propagation delay type measurements, without the need for additional circuitry or circuit duplication that would add to the overall cost of the time measurement architecture. Each of the critical building blocks are analysed and a description of the final implementation of a prototype chip using a  $0.12\mu$ m CMOS process is described.

As the on-chip clock speeds of high performance VLSI devices increase into the tens of gigahertz range, time measurement architectures with timing resolutions of tens of femtoseconds will be required. Current high resolution time measurements architectures based on vernier and flash time measurement architectures use latches and flip-flops in the main timing measurement technique and can suffer from the inherited metastability phenomenon. To address this problem, current research solutions are analysed in this thesis and an on-chip time measurement architecture that is also based on the time-to-digital conversion method but uses the homodyne technique is proposed. The architecture is described and finally simulations using transistors based on a 0.12µm CMOS process are presented and suggest that timing resolutions in the tens of femtosecond range are attainable.

## Acknowledgements

I would like to express my sincere gratitude to my supervisor, Prof. Bashir Al-Hashimi, without whom, this project would have not been possible, and for his continuous support and guidance throughout the project. I am also extremely grateful for the DTA/EPSRC funding I have received during the past three years and the research facilities provided by the School of Electronics and Computer Science at the University of Southampton.

I would like to thank Dr. Neil Ross, Dr. Peter Wilson, Dr Koushik Maharatna and Prof. Gordon Russell for their useful and constructive comments throughout the review stages. Also, I would like to thank Dr. Paul Rosinger for his friendly help and encouragement throughout the time of this project.

## Contents

| 1. Intro | oduction                                   | 11 |
|----------|--------------------------------------------|----|
| 1.1 T    | The importance of test                     | 13 |
| 1.2 S    | System-on-chip (SoC) design flow           | 14 |
| 1.3 0    | Cost of testing                            | 17 |
| 1.4 0    | Dn-chip time measurement testing           | 20 |
| 1.5 T    | Thesis Contributions and Organisation      | 21 |
| 2. Liter | rature review                              | 23 |
| 2.1 T    | Time measurement techniques                | 23 |
| 2.1.1    | Homodyne mixing                            | 24 |
| 2.1.3    | 3 Time domain analysis                     | 26 |
| 2.       | 1.3.1 Single Counter                       | 27 |
| 2.       | 1.3.2 Interpolation                        | 27 |
| 2.       | 1.3.3 Dual Slope                           | 28 |
| 2.       | 1.3.4 Pulse stretching                     | 30 |
| 2.       | 1.3.5 Charge Pump                          | 31 |
| 2.       | 1.3.6 Vernier Oscillator                   | 31 |
| 2.       | 1.3.7 Flash conversion                     | 32 |
| 2.3 0    | Concluding remarks                         | 36 |
| 3. Prog  | grammable Time Measurement Architecture    | 38 |
| 3.1 F    | Proposed time measurement architecture     | 39 |
| 3.2 F    | Programmable Interface block (PIB)         | 41 |
| 3.3 F    | High speed comparator design               | 47 |
| 3.3      | 3.1 Comparator bias circuitry              | 54 |
| 3.3      | 3.2 Comparator simulations                 | 56 |
| 3.4 1    | Fime-to-voltage conversion                 | 58 |
| 3.5 0    | Current steering time-to-voltage converter | 61 |
| 3.6 E    | Digital processing block                   | 62 |

| 3.7 PTMA simulation results                                             | 64  |
|-------------------------------------------------------------------------|-----|
| 3.8 Concluding remarks                                                  | 71  |
| 4 CMOS implementation of PTMA                                           | 73  |
| 4.1 Backend Design Flow                                                 | 73  |
| 4.2 Prototype chip                                                      | 75  |
| 4.3 Comparator layout techniques                                        | 78  |
| 4.4 Time-to-voltage converter layout techniques                         | 79  |
| 4.5 High speed clock generation                                         |     |
| 4.6 Programmable input block                                            |     |
| 4.7 Digital processing block                                            |     |
| 4.8 Full chip layout                                                    |     |
| 4.9 Experimental results                                                |     |
| 4.10 Concluding remarks                                                 |     |
| 5 Homodyne Time-to-Digital Conversion (HTDC)                            |     |
| 5.1 Metastability problem with TDC based on Vernier/Flash architectures | 93  |
| 5.2 Time amplifiers                                                     | 94  |
| 5.3 Proposed architecture                                               | 96  |
| 5.4 Mixer design                                                        | 97  |
| 5.4.1 Single balanced                                                   |     |
| 5.4.2 Double balanced (Gilbert multiplier)                              |     |
| 5.4.3 Dual gate mixer [104]                                             | 100 |
| 5.4.4 Dual gate mixer simulations                                       | 102 |
| 5.5 Low pass filter design                                              | 104 |
| 5.6 Analogue-to-Digital Conversion                                      | 108 |
| 5.6.1 Successive Approximation ADC                                      | 108 |
| 5.6.2 Flash ADC                                                         | 109 |
| 5.6.3 Dual-Slope ADC                                                    | 111 |
| 5.7 Delta Sigma ( $\Delta\Sigma$ ) ADC                                  | 112 |
| 5.7.1 Non-overlap clocking scheme                                       | 115 |
| 5.7.2 Amplifier design                                                  | 116 |
| 5.7.3 Comparator                                                        | 120 |
| 5.7.4 Integrator simulations                                            |     |
| 5.7.5 Modulator Simulations                                             | 122 |
| 5.7.6 Decimation Filter [114]                                           | 123 |

| 127 |
|-----|
|     |
|     |
|     |
|     |
|     |
|     |
| 141 |
|     |
|     |
| 147 |
|     |

## List of Figures

| Figure 1-1: SoB versus SoC [2]                                     | 11 |
|--------------------------------------------------------------------|----|
| Figure 1-2: Semiconductor technology node prediction [1]           | 12 |
| Figure 1-3: Rise time measurements                                 | 14 |
| Figure 1-4: Generic IC design flow [7]                             | 15 |
| Figure 1-5:Types of IP Blocks [2]                                  | 16 |
| Figure 1-6: Modern SoC digital/mixed-signal production tester [9]  | 17 |
| Figure 1-7: Electrical distance from the tester and the DUT        |    |
| Figure 1-8: Rise and fall time measurements                        |    |
| Figure 1-9: Pulse width measurement                                |    |
| Figure 1-10: Propagation delay measurement                         | 21 |
| Figure 1-11: Time measurement tester integrated with embedded core | 21 |
| Figure 2-1: Classification of on-chip time measurement techniques  | 24 |
| Figure 2-2: Homodyne Mixing [10]                                   | 24 |
| Figure 2-3: Subsampling in the time domain [32]                    |    |
| Figure 2-4: Single Counter [10]                                    |    |
| Figure 2-5: Interpolation based time domain analysis [63]          |    |
| Figure 2-6: Capacitor Voltage                                      |    |
| Figure 2-7: Basic Dual-Slope ADC [43]                              |    |
| Figure 2-8: Pulse stretching                                       | 30 |
| Figure 2-9: Capacitor voltage                                      | 30 |
| Figure 2-10: Charge Pump Technique [57]                            |    |
| Figure 2-11: Vernier Oscillator [65]                               |    |
| Figure 2-12: Single delay chain flash converter                    |    |
| Figure 2-13: Flash converter with a vernier delay line             |    |
| Figure 2-14: Flash converter without vernier delay line            |    |
| Figure 2-15: MUTEX Circuit [70]                                    |    |
| Figure 2-16: time measurement circuit with MUTEX elements [70]     |    |
| Figure 3-1: Proposed programmable timing measurement architecture  | 39 |
| Figure 3-2: Proposed time measurement architecture waveforms       |    |

| Figure 3-3: Programmable Interface Block (PIB)4                                 | 12 |
|---------------------------------------------------------------------------------|----|
| Figure 3-4: Switch controller block schematic4                                  | 14 |
| Figure 3-5: Switch controller block simulations4                                | 15 |
| Figure 3-6: Comparator control logic circuitry4                                 | 15 |
| Figure 3-7: Comparator control simulations4                                     | 16 |
| Figure 3-8: Adjacent switching during rise time measurement4                    | 17 |
| Figure 3-10: Block diagram of a high speed comparator [91]4                     | 18 |
| Figure 3-11: Decision circuit4                                                  | 18 |
| Figure 3-12: Window comparator [57]4                                            | 19 |
| Figure 3-13: Switched-capacitor input sampling network [85]4                    | 19 |
| Figure 3-14: Non-overlapped clock generator [11]5                               | 50 |
| Figure 3-15: PMOS and NMOS differential amplifiers                              | 50 |
| Figure 3-16: Rail-to-Rail Comparator5                                           | 51 |
| Figure 3-17: Simulation of comparator with input difference of 10ps             | 53 |
| Figure 3-18: Bias circuit with start-up circuitry [11]5                         | 55 |
| Figure 3-19: Output current verses supply voltage5                              | 55 |
| Figure 3-20: Typical process corner (TT, 1.2V, 27 degC)5                        | 57 |
| Figure 3-21: Worst process corner (FF, 1.08V, -40 degC)5                        | 57 |
| Figure 3-22: Best process corner (SS, 1.32V, 125 degC)5                         | 58 |
| Figure 3-23: TVC operation                                                      | 58 |
| Figure 3-24: Current TVC implementations for embedded memory characterization 5 | 59 |
| Figure 3-25: VTC of configurations (b) and (c).                                 | 50 |
| Figure 3-26: Current steering time-to-voltage converter (TVC)                   | 52 |
| Figure 3-27: Simulations of TVC6                                                | 52 |
| Figure 3-28: Simplified view of processing block6                               | 53 |
| Figure 3-29: 800ps propagation delay time measurement                           | 54 |
| Figure 3-30: Propagation delay measurement6                                     | 56 |
| Figure 3-31: Rise time verses output count from the PTMA6                       | 57 |
| Figure 3-32: Pulse width measurement6                                           | 58 |
| Figure 3-33: Fall time verses output count from the PTMA6                       | 59 |
| Figure 3-34: The effect of rise and fall time on inputs when switching from one |    |
| measurement to another7                                                         | 70 |
| Figure 3-35: Conversion time                                                    | 71 |
| Figure 4-1: Back end design flow7                                               | 74 |

| Figure 4-2: Top level hierarchy of prototype chip                                            | 75  |
|----------------------------------------------------------------------------------------------|-----|
| Figure 4-3: On- chip reference generator                                                     | 76  |
| Figure 4-4: Time measurement core                                                            | 77  |
| Figure 4-5: Rail-to-rail comparator                                                          | 78  |
| Figure 4-6: Comparator Layout                                                                | 79  |
| Figure 4-7: Time-to-voltage converter (TVC)                                                  | 80  |
| Figure 4-8: Capacitor array                                                                  | 80  |
| Figure 4-9: Time-to-voltage converter (TVC) layout                                           |     |
| Figure 4-10: On-chip clock generation                                                        |     |
| Figure 4-11: Clock generator circuit                                                         |     |
| Figure 4-12: Generation of the 2 GHz and 2.5 GHz clocks                                      |     |
| Figure 4-13: Layout of the clock generator module                                            |     |
| Figure 4-14: Layout of the programmable input block                                          |     |
| Figure 4-15: Layout of the digital processing block                                          |     |
| Figure 4-16: Full chip layout                                                                |     |
| Figure 4-17: Top level schematic with input and output tri-state buffers                     |     |
| Figure 4-18: Modules of the PTMA                                                             |     |
| Figure 4-19: Optical picture of fabricated chip                                              |     |
| Figure 4-20: Experimental test setup                                                         |     |
| Figure 4-21: Level translator                                                                |     |
| Figure 5-1: Flip-Flop Setup and Hold Violations                                              | 93  |
| Figure 5-2: Low resolution time measurement architecture (LRTMA) with time                   |     |
| amplifier                                                                                    | 94  |
| Figure 5-3: MUTEX [99]                                                                       | 94  |
| Figure 5-4: Time amplifier [98]                                                              | 95  |
| Figure 5-5: Proposed Time Measurement Architecture                                           | 96  |
| Figure 5-6: Single balanced mixer [101]                                                      |     |
| Figure 5-7: Double balanced Gilbert multiplier [104]                                         |     |
| Figure 5-8: Dual-Gate Mixer [110]                                                            | 101 |
| Figure 5-9: Dual-gate cascode mixer                                                          | 102 |
| Figure 5-10: Dual Gate Mixer Simulations                                                     | 103 |
| Figure 5:11: Conversion Gain vs Input Frequency                                              | 103 |
| Figure 5-12: Signal flow graph representation of high Q SC LPF                               | 105 |
| Figure 5-13: A 2 <sup>nd</sup> order low pass switched-capacitor filter with switch sharing. | 105 |

| Figure 5-14: On-resistance of transmission gate                                 | 106 |
|---------------------------------------------------------------------------------|-----|
| Figure 5-15: LPF Frequency Response                                             | 106 |
| Figure 5-16: SC LPF input and output waveforms                                  | 107 |
| Figure 5-17: Total output noise of mixer and LPF                                | 107 |
| Figure 5-18: Successive Approximation ADC [8]                                   | 108 |
| Figure 5-19: 3-bit Flash ADC                                                    | 110 |
| Figure 5-20: Dual Slope ADC                                                     | 111 |
| Figure 5-21: N-bit dual slope ADC [8]                                           | 112 |
| Figure 5-22: Block diagram of the $\Delta\Sigma$ ADC                            | 113 |
| Figure 5-23: Block diagram of the 1st Order $\Delta\Sigma$ Modulator            | 113 |
| Figure .5-24: 1st Order $\Delta\Sigma$ Modulator using Simulink®                | 114 |
| Figure 5-25: Simulink Simulations                                               | 114 |
| Figure 5-26: 1 <sup>st</sup> order switched-capacitor $\Delta\Sigma$ modulator  | 115 |
| Figure 5-27: Non-overlapping clock generator                                    | 115 |
| Figure 5-28: Non-overlapped clock timing                                        | 116 |
| Figure 5-29: Non-overlap clock simulation                                       | 116 |
| Figure 5-30: Folded cascode operational amplifier.                              | 118 |
| Figure 5-31: Amplifier gain across process corners                              | 119 |
| Figure 5-32: Amplifier Phase response across process corners                    | 119 |
| Figure 5-33: Amplifier step response                                            | 120 |
| Figure 5-34: High performance comparator                                        | 121 |
| Figure 5-35: Clock, Input and output waveforms of the integrator                | 121 |
| Figure 5-36: Zoomed in version                                                  | 122 |
| Figure 5-37: Delta modulated output of the $\Delta\Sigma$ modulator             | 122 |
| Figure 5-38: Frequency spectrum of the $\Delta\Sigma$ modulator                 | 123 |
| Figure 5-39: Decimation Filter                                                  | 126 |
| Figure 5-40: Filter output response                                             | 126 |
| Figure 5-41: Proposed time measurement architecture                             | 127 |
| Figure 5-42: Simulated relationship between timing resolution and the output of | the |
| LPF                                                                             | 128 |
| Figure 5-43: Input and output waveforms of the proposed architecture            | 128 |
| Figure 5-44: propagation delay versus output count                              | 129 |
| Figure 5-45: Time measurement architecture current consumption                  | 130 |
| Figure A-1: IEEE Standard 1500 wrapper architecture [120]                       | 137 |

## **List of Tables**

| Table 2-1: Recent work on on-chip time measurement architectures |     |
|------------------------------------------------------------------|-----|
| Table 3-1: Modes of operation                                    |     |
| Table 3-2: Switch Controller Truth Table                         |     |
| Table 3-3: Comparator transistor sizes                           | 53  |
| Table 3-4: Comparator propagation delay across process corners   |     |
| Table 3-5: Propagation delay simulation results                  | 66  |
| Table 3-6: Rise time simulation results                          | 67  |
| Table 3-7: Pulse width simulation results                        |     |
| Table 3-8: Fall time simulation results                          | 69  |
| Table 4.1: Experimental results                                  |     |
| Table 5.1: Amplifier settling time across PVT                    |     |
| Table 5-2: Propagation delay results                             | 129 |
| Table 5-3: Propagation delay tolerance results                   | 129 |

## **List of Abbreviations**

| ADC   | Analogue-to-Digital Converter                    |
|-------|--------------------------------------------------|
| ATE   | Automatic Test Equipment                         |
| BIST  | Built-In Self-Test                               |
| CDF   | Cumulative Distribution Function                 |
| СМ    | Common Mode                                      |
| CMOS  | Complementary Metal Oxide Silicon                |
| CRC   | Cyclic Redundancy Check                          |
| CUT   | Core Under Test                                  |
| DFT   | Design for TEST                                  |
| DGFET | Dual-Gate Field Effect Transistor                |
| DLL   | Delay Locked Loop                                |
| DNL   | Differential Non-Linearity                       |
| DUT   | Device Under Test                                |
| FET   | Field Effect Transistor                          |
| FIR   | Finite Impulse Response                          |
| FPGA  | Field Programmable Gate Array                    |
| HDL   | Hardware Description Language                    |
| HTDC  | Homodyne Time-to-Digital Conversion              |
| IC    | Integrated Circuit                               |
| IEEE  | Institute of Electrical and Electronic Engineers |
| INL   | Integral Non-Linearity                           |
| I/O   | Input and/or Output                              |
| ITRS  | International Roadmap for Semiconductors         |
| LPF   | Low Pass Filter                                  |
| MSB   | Most Significant Bit                             |
| MUTEX | Mutually Exclusive Circuit                       |

| OSR  | Over-Sampling Ratio                        |
|------|--------------------------------------------|
| PCB  | Printed Circuit Board                      |
| PIB  | Programmable Input Block                   |
| PSD  | Power Spectral Density                     |
| PTMA | Programmable Time Measurement Architecture |
| PTMB | Programmable Time Measurement Block        |
| RF   | Radio Frequency                            |
| RMS  | Root Mean Square                           |
| SAR  | Successive Approximation Register          |
| SC   | Switched Capacitor                         |
| SoB  | System on Board                            |
| SoC  | System on Chip                             |
| TAP  | Test Access Port                           |
| TDC  | Time-to-Digital Converter                  |
| TDI  | Test Data Input                            |
| TDO  | Test Data Output                           |
| TMA  | Time Measurement Architectures             |
| TVC  | Time-to-Voltage Converter                  |
| VCDL | Voltage Control Delay Line                 |
| VDL  | Vernier Delay Line                         |
| VLSI | Very Large Scale Integration               |
| WBR  | Wrapper Boundary Register                  |
| WBY  | Wrapper Bypass Register                    |
| WIR  | Wrapper Instruction Register               |
| WPI  | Wrapper Parallel Input                     |
| WPO  | Wrapper Parallel Output                    |
| WSO  | Wrapper Serial Output                      |
| WSI  | Wrapper Serial Input                       |

## **Chapter 1**

## Introduction

Today, the electronics industry is driven by number of driving factors, such as quick time to market, higher levels of integration, lower power dissipation and cost [1]. In the past industry built large systems on multiple printed circuit boards (PCB) which provided a quick turn around solution and therefore delivering a product in a faster time to market [2]. But as the industry was driven by the demands for higher levels of integration, smaller sized devices, lower power dissipation and lower overall cost [1], multi-chip system-on-board (SoB) solutions had to migrate to single system-on-chip (SoC) solutions. These devices incorporated digital logic, memory, analogue/mixed-signal and radio frequency (RF) embedded modules onto the same piece of silicon. This is summarised in Figure 1-1.



Figure 1-1: SoB versus SoC [2]

This popularity of SoC solutions has led to a continuous increase in on-chip circuit complexity and transistor density. This has led to more and more functionality being integrated onto a single piece of silicon, ranging from digital, analogue and mixed-signal components. As the level of technology decreases towards smaller geometries, the design of these SoC devices has become more complicated. Figure 1-2 shows the predicted semiconductor process technology nodes. As these devices become more complicated, the verification of such devices also becomes significantly complex and it has led to strict timing requirements being placed on such devices.



Figure 1-2: Semiconductor technology node prediction [1]

The topic of this dissertation is to provide an embedded test core for testing time measurements which will contribute towards reducing the cost of manufacturing test providing a low cost system-on-chip test solution. This chapter is organised as follows. In section 1.1, describes the importance of integrated circuit (IC) testing. Section 1.2, discusses the differences between the traditional system-on-board (SoB) and current system-on-chip (SoC) devices, which includes the design flows and the implication on testing these devices. Section 1.3 highlights the issues of the cost of testing and problems associated with current automatic test equipment. This section also describes current Built-In Self Test (BIST) solutions to help reduced the overall cost of test. Section 1.4 introduces the concept of on-chip time measurement testing. Finally, section 1.5 outlines the main contributions of this work and presents the organisation of this thesis.

### **1.1 The importance of test**

Integrated circuit (IC) testing is very important part of any system as it serves as a go/no-go test at the end of a manufacturing production cycle and also used for defect analysis during product diagnosis [3, 4]. The types of tests that can be performed are categorized into two groups, Structural Tests and Functional Tests. Structural testing are tests which are used to verify the topology of the manufactured chip. This type of testing can be done with reliance on the static stuck-at fault model. Assuming that the physical defect being searched for will represent itself as a "net" or "gate" connection that acts as if it is always at a logic level '1' or always at a logic level '0'. Tests are developed by applying values to the inputs that toggle the suspected defective node to its opposite value. For example, forcing a logic '1' on a stuck-at '0' node and then applying values at the inputs that will allow the correct value to propagate to a detected point. If the value at the detected point is different from the expected value for a good circuit, then a fault has been detected. Structural tests are measured by fault coverage where the number of faults detected is compared to the total number of possible faults. A delay fault model can be applied similarly to assess timing failures, and a currentbased fault model can be used to assess power consumption [5].

Functional testing is a test to validate the correct operation of a system with respect to its functional specification. It is used to verify that a model or a logic block behaves as it was intended. For example, if a core to be tested is an adder, then a test stimulus is written to see if the core actually performs an addition function. Functional tests are derived from a functional model that is used to check physical faults in the manufactured system and can also be used as design verification tests for checking that the implementation is free of design errors. Functional testing is measured by the logic committing the correct action (known expected response) to the (known) applied stimulus [6].

There is another set of important tests that throughout this thesis is referred to as Characterization tests. Characterization tests are a subset of functional testing. They verify how well the manufactured device meets the specification. Examples of these include analogue tests such as gain, noise, bandwidth and time measurement tests, such as a rise time measurement, where the rise time of a signal is defined as the time

elapsed as the signal passes from 10% to 90% of its maximum value, as illustrated in Figure 1-3.



Figure 1-3: Rise time measurements

Tests can be performed either at the core level or at the system level. Core level test deals with the testing of the core design itself. The system integrator, who integrates the cores into the overall system, usually regards the cores as a block boxes and the designer has to provide the model and the test setup for them. Whereas, system level test deals with testing at the system level and addresses the test of the entire system. Compared to conventional PCB test, system level test is far more complex. The system level tests consists of tests for the individual core, tests for user define logic (UDL) and tests for the interconnect logic and wiring.

## 1.2 System-on-chip (SoC) design flow

Traditional system-on-board (SoB) comprises of discrete components, such as resistors, capacitors and transistor, as well as integrated circuits (IC) incorporated on single or multiple printed circuit boards (PCB). In the traditional SoB design, each integrated circuit is designed, developed, manufactured and tested by the provider. The function of the system integrator was only to integrate the ICs into the system and check for interconnectivity faults, whereas, system on a chip (SoC) devices are composed from embedded core, which makes it easier to import existing technologies in order to shorten the time-to market through design reuse. The embedded cores can perform a wide range of functions, for example, Digital Signal Processor (DSP), Reduced

Instruction Set Computer (RISC) processors, Dynamic Random Access Memory (DRAM), analogue circuits and Complementary Metal Oxide Silicon (CMOS) logic. Figure 1-4 shows a generic IC design flow for a typical system on chip.



Figure 1-4: Generic IC design flow [7]

There are four levels of abstraction; behavioural, register transfer level (RTL), gate level and physical level. The behavioural level describes the specifications using generic methods where no structural constraints are considered. The register transfer level is where the generic functions are replaced by structural blocks such as registers and arithmetic logic units (ALUs). The gate level is where the structural components are mapped to a netlist of components that are interconnected to other building blocks and gates. Finally the physical level is where the modules are represented by a layout of the low level physical components such as transistors, resistors and interconnection wires. This is then streamed out into a file format, such as Graphic Data System II (GDSII) or stream format that describes planar geometric shapes, text labels and other information needed for the manufacture of the chip. Often these cores are products of technology, software and intellectual proprietary (IP) information and subjected to patents and copyrights. An embedded block represents an IP that the integrator licenses from the embedded core provider. Therefore, the embedded core user is not always able

to apply changes to the core as it's forced to use it as a black box, where the user only knows the functionality and the input/output interface for the core. In addition, while ICs are delivered in a range of manufactured and tested form, embedded cores are delivered in a range of hardware description levels, soft cores, firm cores and hard cores. Soft cores consist of a synthesizable representation that is technology independent that can be re-targeted for different technologies. These embedded cores are delivered as a hardware description language (HDL), such as VHDL or verilog and are very flexible for the user. Firm cores are usually delivered as a gate level netlist ready to be place and routed into the overall system. They are accompanied by simulation models and give some flexibility to the designer. Hard cores are delivered as a technology dependant layout, such as in GDSII format. They are process dependant and include timing information. They are optimised for area and performance. The user has little or no flexibility and uses the core as a black box.



Predicability, Performance, Cost, Effort by Vendor

Figure 1-5: Types of IP Blocks [2]

Due to the popularity and high complexity of system-on-a-chip (SoC) devices, there are two main test related problems that can be identified. They are the test infrastructure for the SoC device and the overall cost of test, which if current trends continue will have a negative effect on the total production cost of the device [1]. This latter will mainly have an influence on the importance of testing future SoC devices as illustrated in the following section.

## 1.3 Cost of testing

It is predicted by the International Technology Roadmap for Semiconductors (ITRS) [8] that on-chip clock frequencies will be driven into the multi giga hertz range. This has introduced limitations in the effectiveness of traditional production testing. Traditionally, post silicon timing parameter verification of high performance devices, such as rise time, fall time, propagation delay and pulse width, is traditionally carried out and is still carried out using a large and expensive digital and/or mixed-signal production testers, as shown in Figure 1-6. As clock frequencies continue to increase, it is becoming impractical to use these production testers for post silicon validation.



Figure 1-6: Modern SoC digital/mixed-signal production tester [9]

There are four main reasons for this and they are Cost, Bandwidth, Resolution and Observability [10].

*Cost:* The cost of such production testers extends into the millions of dollars or more depending on its configuration [8] and for most small companies this is not an option. Renting the equipment on a per hour basis can also amount to a significant proportion of the cost depending on the test program and if there are thousands of devices to be tested then the cost can be significantly large. Despite the advances in the automatic test equipment (ATE) technology, testing very large system integration (VLSI) circuits has become continually more and more difficult and costly. The cost of the tester

 $(C_{tester})$  can be divided into the cost of the test equipment itself  $(C_{ate})$  and the cost of handling the test equipment  $(C_{handling})$ 

$$C_{tester} \approx C_{ate} + C_{handling} \tag{1.1}$$

**Bandwidth and Resolution**<sup>1</sup>: Often the technology used to fabricate the pin electronics of the tester is the same technology used for the devices that are tested. This has a large effect on the resolution capability of the tester, as the transition frequency,  $f_T$ , of a transistor for a particular technology will determine how fast the transistor can be switched [11]. The Bandwidth limitation of the tester also exists due to the growing electrical distance between the tester and the device being tested (DUT). Figure 1-7 gives a graphical representation of the electrical distance from the tester and the DUT.



Figure 1-7: Electrical distance from the tester and the DUT.

The expression for the equivalent electrical distance *L* is as follow;

$$\phi = \beta L \tag{1.2}$$

therefore

$$L = \frac{\phi}{\beta} = \frac{\phi}{\omega} v_p = t_p c \tag{1.3}$$

where L is the equivalent electrical distance,  $\varphi$  is the insertion phase or the phase shift through the propagation medium or network,  $\beta$  is the phase constant of the propagation medium,  $v_p$  is the phase velocity,  $t_p$  is the phase delay, c is the velocity of light in a vacuum and  $\omega$  is the angular frequency. As the electrical distance between the pin electronics of the tester and the output pins of the device grows, the signals needed to be tested attenuates and an additional phase delay is inherited within the high speed

<sup>&</sup>lt;sup>1</sup> Throughout this thesis, the word resolution is used. In order to clarify, the notation of high resolution is taken as the smallest timing resolution that is achievable.

signal. This additional phase delay will appears as an error in the timing measurement of the timing performance parameters being carried out [10].

*Observability:* In addition, modern integrated circuits (ICs) are frequently condensed and have highly integrated levels of functionality. Routing out embedded nodes from deeply buried cores to the pins of the device at the chip boundary for observation is often impossible and impractical. The resistive and capacitive parasitic effects will not only increase the electrical distance, but will also attenuate and skew the timing results [12].

The difficulties in integrated circuit (IC) testing, have risen due to the shortage of I/O points. VLSI SoC devices have a limited number of input and output pins and in addition there are usually multiple cores, which are integrated onto the same silicon [4]. Therefore, the test points for a particular core maybe deeply buried and the access from external ATE is impossible. Signal distortions and noise disturbances in interface connections from the ATE to the device under test (DUT) may exist and introduce timing errors in the measurement. In addition, there maybe difficulty in synchronising the test object's timing with the tester timing. There is also the cost of the large volume of test data to be processed although research into data compression techniques are being carried out [7] to minimise this cost. Also the external ATE may have limited performance compared to the DUT. This will limit the capability of the tester for measuring timing measurements of today's high performance VLSI devices. Therefore, the cost for running the tests, as well as, the ATE equipment itself is high. It is predicted by the ITRS [1] that costs will rise further towards \$20 million [1, 13]. If the cost of test is not lowered, testing will have a negative impact on the cost of design, leading to an increase in the overall production cost [1, 14-16]. These problems with quality and cost of external ATE will continue to get worse for high speed, high density VLSI devices, thus rendering external ATE expensive, inaccurate and is unacceptable [13].

This has led to research into on-chip test solutions such as built-in self test (BIST), where the tester or part of the tester is situated on the same silicon as the device/embedded core under test. There have been a number of BIST solutions

implemented such as SCAN [8] and memory BIST [17], however, this is not in the scope of this thesis.

### **1.4 On-chip time measurement testing**

To address the issues of the costs of IC testing, a number of researchers have proposed an approach, which involves integrating test circuitry onto the same silicon that is required for time measurement testing [18]. This approach includes a single tester circuit on-a chip or a multiple number of tester circuits on-chip in a SoC environment. The implementation of the embedded tester can be delivered as either a soft, firm or hard core, as mentioned in section 1.2. On-chip time measurement testing is a form of Built-in self test (BIST) specifically used for characterisation of embedded cores. Traditional BIST mechanisms, such as logic BIST [5] and memory BIST [19, 20] are structural tests. Time measurement testing is a form of functional testing and focuses on specific kinds of test. Time measurement tests can be classified into four main types of measurements. They are rise time, fall time, pulse width and propagation delay types of measurements; Rise time,  $t_{rise}$ , is the time it takes a signal to change from 10% of the logic "1" to 90% of the logic "1" level (Figure 1-7). Fall time,  $t_{fall}$ , is the time a signal to change from 90% of a logic "1" level to 10% of a logic "1" level (Figure 1-8). Pulse width,  $t_{pw}$ , is defined as the time interval between the rising edge and the falling edge of the pulse where the amplitude of the pulse is 50% of the peak value (Figure 1-9). Propagation delay,  $t_{prop}$ , is defined as the time interval for a signal to travel through a logic gate or series of gates to the output destination (Figure 1-10).



Figure 1-8: Rise and fall time measurements



Figure 1-9: Pulse width measurement



Figure 1-10: Propagation delay measurement

The aim of this research is to carry out on-chip time measurement testing for system on chip that is delivered as a hard embedded core. In practice a single timing measurement test core is incorporated on the same silicon as the embedded core to be tested as shown in Figure 1-11. The time measurement tests can be applied in memory test, as well as, clock generation applications, where time measurements are critical. For example, memory access times, clock jitter and clock skew timing measurements.



Figure 1-11: Time measurement tester integrated with embedded core

### **1.5 Thesis Contributions and Organisation**

The body of this thesis comprises of six chapters where, chapters three, four and five represent the main contributions of this research. Each chapter is largely self-contained and as such incorporates relevant background material and a literature review. The driving motivations and subject matter of these chapters are summarised as follows.

In Chapter 2, Time measurement architectures (TMAs) based on the Time-to-Digital Conversion (TDC) technique have been the focus of much work in on-chip time measurement testing. Although researchers have developed numerous time measurement architectures [10], they are only capable of performing a limited number of time measurements with the use of duplicating or adding additional circuitry. In this chapter the current time measurement techniques are presented.

Chapter 3, describes a new programmable time measurement architecture that can be programmed to measure four types of measurements; rise time, fall time, pulse width and propagation delay is proposed.

To analysis the practical performance of the time measurement architecture, a prototype chip has been fabricated. Chapter 4 describes how the proposed programmable time measurement architecture is implemented in a CMOS process. Post silicon results from a test chip that was fabricated are presented and a detailed description of the test setup is described.

The International Technology Roadmap for Semiconductors (ITRS) is predicting that by 2010 clock frequencies of high performance VLSI devices will increase into the tens of GHz. To perform timing performance measurements of such devices, timing measurement architectures with capabilities of tens of femtoseconds will be required [1]. Whilst there are various architectures capable of achieving timing resolutions in the region of picoseconds [21], no single timing measurement architecture has been reported that is capable of achieving femtosecond resolution which is needed to verify the timing performance of future VLSI devices.

In Chapter 5, new time measurement architecture for femtosecond resolution time measurement is proposed and simulation results are presented.

Finally, Chapter 6 summaries the presented work and concludes this thesis. The contributions outlined in Chapters 3, 4 and 5 resulted in original work published in [22-24] and are itemised in Appendix E.

## **Chapter 2**

## Literature review

Chapter one outlined limitations in traditional timing performance test methods. These problems have led researchers to investigate new test methodologies for on-chip time measurement testing. This chapter provides a review of the key advances in the field and more recently reported work which is aiming to integrate a high resolution time measurement architecture (TMA) on to the same silicon as the circuit under test. The device that is described in this thesis is referred as an embedded TMA. Section 2.1.1 gives an overview of the previous work carried out using the homodyne mixing technique. Section 2.1.2 describes the reported work carried out on signal amplitude sampling technique and section 2.1.3 describes the proposed work carried out using time domain analysis technique. Section 2.2 briefly describes the previous work carried out on jitter measurement. Finally, section 2.3 provides concluding remarks for this chapter.

## 2.1 Time measurement techniques

Numerous time measurement techniques have been proposed to facilitate the literature review and to identify future work. The time measurement architectures are classified into three groups. Homodyne mixing, Signal amplitude sampling and Time domain analysis [10] as shown in Figure 2-1.



Figure 2-1: Classification of on-chip time measurement techniques

Each of these techniques and their suitability for on-chip time measurement testing is described and discussed in the following sections.

### 2.1.1 Homodyne mixing

Homodyne mixing is a technique that converts a phase difference into a DC voltage by modulating the input with a reference phase signal and a low pass filter that is coupled to the output to produce a DC voltage. This DC voltage is a function of the phase difference [25]. Figure2-2 shows the basic operation of homodyne mixing method.



Figure 2-2: Homodyne Mixing [10]

The modulation is achieved by the use of an analogue multiplier, followed by a low pass filter. If two sinusoidal signals with the same amplitude, *A*, and frequency,  $\omega$ , but have a difference phases,  $\phi_1$  and  $\phi_2$ , respectively, then the output of the mixer is given by

$$A^{2}\cos(\omega t + \phi_{1})\cos(\omega t + \phi_{2})$$
(2.1)

$$=\frac{A^{2}}{2}(\cos(\phi_{1}-\phi_{2})+\cos(2\omega t+\phi_{1}+\phi_{2}))$$
(2.2)

The  $2\omega$  term is removed by the low pass filter, leaving only the DC term which is proportional to the cosine of the phase difference multiplied by a constant.

$$\frac{A^2}{2}(\cos(\phi_1 - \phi_2)) \tag{2.3}$$

This technique was used to measure the jitter and skew in a high-frequency clock distribution network [25], where the maximum sensitivity of the measurement system was measured to be 60*f*s/mV. In this application, the DC voltage was measured using an off-chip voltmeter.

The main drawback of this technique is that the temporal resolution is directly proportional to the frequency of the input signals. As a result, a very high-resolution DC voltmeter is needed to measure the phase differences in the order of picoseconds for relatively low-frequency signals. Another disadvantage is the need for a periodic input signal and so it is not easy to perform rise and fall time measurements.

## 2.1.2 Signal amplitude sampling

Signal amplitude sampling is a technique that involves sampling a signal well above the Nyquist rate  $(2f_{max})$  using high-speed, high resolution ADC, to obtain phase information. This technique is used in digital sampling oscilloscopes and has been used in several applications for on-chip signal capture circuits. After the signal is captured, an off-chip digital signal processor (DSP) is used to manipulate the digitized output for time measurements [26-31]. Figure 2-3 shows an example of an input waveform that is triggered by the positive edge of the signal clock and is repeated once every *T* seconds. The sampling clock samples the data on its leading edge and a new data time point is captured each time the waveform is repeated. The output then comprises of a "spreadout" version of the waveform by a factor of  $T/\Delta t$ .



Figure 2-3: Subsampling in the time domain [32]

One of the limitations of the sub-sampling technique is that two tightly controlled clocks are required. This is usually achieved by employing a delay-lock loop (DLL) to create a voltage controlled delay line (VCDL) or a vernier approach which uses two DLLs [32]. The time resolution of this technique is not only limited to a gate delay or the difference of two gate delays if the vernier approach is used, but also by the jitter of the DLLs and the error in the matching of the buffer stages of the VCDL. This technique is impractical as implementations tend to consume large area [18]. Another limitation of this technique is that the input signal needs to be either periodic or be rendered periodic from a clock edge, therefore in the later case, additional circuitry is needed.

### 2.1.3 Time domain analysis

Time domain analysis is the most commonly used method for time measurement applications as they are more suitable for on-chip implementations and are attractive for low-voltage process technologies. There are number of different techniques that have been proposed [21, 33-61] and the following sections give a brief overview to these techniques.

### 2.1.3.1 Single Counter

The Single counter technique [10] is often used for frequency measurements but unable to provide high resolution for measuring short intervals. This is because the short intervals to be measured are less than the single gate delay. The resolution of this circuit is constrained by the speed of the reference clock and can be no higher than a single clock period. Figure 2-4 shows a typical implementation of the single counter technique. The operation is as follows; the single counter counts the number of clock cycles of a high frequency clock signal. This count represents the time interval  $\Delta T$  between the rising edges of the start pulse to that of the stop pulse. The AND gate ensures that the counter is enabled only when the start and stop signals are logically different.



Figure 2-4: Single Counter [10]

## 2.1.3.2 Interpolation

The implementation of the Interpolation technique is shown in Figure 2-5 [62, 63]. A complementary input is generated and applied to the inputs and used to steer current,  $I_0$ , from Q1 to Q2 and back to Q1. The resulting voltage across the capacitor will be proportional to the charging time and is given by

$$V_c = \frac{I_0}{C}t \tag{2.4}$$

where t is the time interval being measured. Switch S is opened and closed to precharge the capacitor to the supply voltage. Figure 2-6 shows the voltage across the capacitor, C.



Figure 2-5: Interpolation based time domain analysis [63]



Figure 2-6: Capacitor Voltage

The disadvantage of this circuit is that the parasitic capacitances causes a transient decay on all the time measurements and hence non-linearity. This parasitic capacitance comprises of the emitter capacitances of the transistors Q1 and Q2, the collector capacitance of the current source, I, and the associated interconnect between the components. Also this technique will require a very high resolution ADC for high resolution time measurement applications.

### 2.1.3.3 Dual Slope

This technique is based on a dual slope method similar to the one used in dual-slope analogue-to-digital converters (ADC) as shown in Figure 2-7 [43].



Figure 2-7: Basic Dual-Slope ADC [43]

The dual-slope integrator consists of a binary-controlled ratio resistor array, a capacitor, an opamp and a digital counter. Two binary words control the ratio-resistor array to perform the dual-slope charging and discharging. The dual-slope conversion is based on the equation

$$\Delta V = \frac{\Delta Q}{C} = I_{ch} * \frac{\Delta t_1}{C} = I_{dis} * \frac{\Delta t_2}{C}$$
(2.5)

If the currents  $I_{ch}$  and  $I_{dis}$  are constant then the relationship between the original time and the converted time is linear, such that

$$\Delta t_2 = \left(\frac{I_{ch}}{I_{dis}}\right) \Delta t_1 \tag{2.6}$$

The advantage of this technique is that it provides good linearity. The chip area is relatively small and compatible on any CMOS process. The technique is relatively simple and low power. The disadvantage of this technique is that it is noise sensitive, although this can be overcome by good layout practises, such as guard rings and shielding. Offset voltages and settling time of the opamps, as well as, errors in the resistor array limit the resolution of this measurement technique.

## 2.1.3.4 Pulse stretching



Figure 2-8: Pulse stretching

Single ramp is often insufficient for measuring short time intervals. A dual slope ramp, generated by charging and discharging a capacitor between the start and stop events and discharging it slowly, can overcome this deficiency. Figure 2-8 shows such a circuit [64]. By making the discharge rate slower than the charging rate, in effect a time interval amplifier is created, and then a simple counter can be used to measure this interval. Figure 2-9 shows the voltage across the capacitor, C.



Figure 2-9: Capacitor voltage

### 2.1.3.5 Charge Pump

Figure 2-10 shows the implementation of the charge pump technique [57].



Figure 2-10: Charge Pump Technique [57]

The circuit consists of a charge pump, a comparator, a 6-bit digital counter, and a capacitor. The proposed technique also consists of an acquisition circuit which consists of two additional comparators. The circuit operates in an analogue, continuous mode without using a sampling clock. It compares the signal to be measured with a reference signal by charging and discharging a capacitor, Cp. The disadvantages of this technique are that this technique uses three comparators, and that this circuit is prone to switching error, such as charge sharing and clock feed-through. Although the techniques to minimise effects are well established, they can not eliminate it completely. The effect on the output voltage, VCp, can lead to non-linearity in the output measurement.

### 2.1.3.6 Vernier Oscillator

Figure 2-11 shows an implementation of the vernier oscillator technique [65]. Two ring oscillators that are set by buffers  $\tau_s$  and  $\tau_f$  quantize the time interval to be measured. The start and stop pulses enable the oscillators and the time interval is measured by the phase detector and counter.


Figure 2-11: Vernier Oscillator [65]

The main disadvantage of this technique is the accumulation of the period jitter in the free running oscillators, limits achievable accuracy. The relative jitter between the oscillators prevents achieving the accuracy required for multiple gigahertz application. In addition, the circuit suffers from a long test time.

#### 2.1.3.7 Flash conversion

The flash conversion technique is well suited for use in on-chip timing measurement systems because they can be operated at high speed, offer low test time and are relatively easy to integrate. However, clock jitter is often on the same order of magnitude as the temporal resolution of the TDC itself. In Figure 2-12 a single delay chain flash converter can be seen. Each buffer produces a delay equal to  $\tau$  [66]. To ensure that there is known accuracy, the delay chain is controlled by a delay locked loop (DLL) [41]. The operation of the single delay flash converter is as follows. Each flip-flop compares the difference in time of the rise edge of the start signal to the rising edge of the stop signal. A thermometer-encoded output indicates the value of the time difference,  $\Delta T$ , assuming the flip-flops are given sufficient time to resolve. The main disadvantage of the single delay flash converter is that the temporal resolution can be no smaller that a single gate delay.

Literature review



Figure 2-12: Single delay chain flash converter

To achieve a higher resolution that is smaller than a gate delay, the flash converter can be constructed with a vernier delay line as shown in Figure 2-13 [41].



Figure 2-13: Flash converter with a vernier delay line

The timing resolution is now dependant on the difference between two buffer delays rather than on the minimum gate delay used in the single delay flash converter. The delays  $\tau_1$  and  $\tau_2$  can be controlled by the sizing of the buffers or by using a voltage controlled delay cell [67]. The disadvantage of this technique is that the problems of vernier delay lines are now inherited. The timing resolution is limited by error factors, such as mismatch in the delay elements, switching noise and also by the physical length of the delay line. Usually, the length of the delay line is quite large to meet the required resolution and this increases the size of the circuit as well as the power dissipation. For higher resolution the delay buffers can be removed completely and only the temporal offsets on the flip-flops themselves are used for time quantization. Figure 2-14 shows such a circuit.



Figure 2-14: Flash converter without vernier delay line

This type of flash converter is known as a "sampling offset" [12]. The main disadvantage of this technique is that the flip-flops are not ideal and they possess component mismatching and switching noise. The other main problem of this technique is that as the phase difference becomes smaller and smaller, data and clock edges becomes very close to each other [68]. This gives rise to a metastability problem where the setup and hold times for the flip-flop is violated [69]. In order to reduce the

metastablility time of the flip-flop, Abas *et al* used a mutually exclusive (MUTEX) circuit to replace the traditional flip-flop/latch circuit within the vernier delay line. Figure 2-15 shows the circuit of the MUTEX.



Figure 2-15: MUTEX Circuit [70]

Although a metastability condition also exists in the MUTEX circuit, this is suppressed until the condition is resolved [70]. The block diagram of the time measurement architecture is shown in Figure 2-16.



Figure 2-16: time measurement circuit with MUTEX elements [70]

The resolution of this time measurement circuit incorporating the MUTEX circuit is 5ps. This is similar to the vernier and flash based architectures.

| Year | Proposed | Method                     | Resolution |
|------|----------|----------------------------|------------|
| 2000 | [41]     | Vernier delay line         | 5ps        |
| 2004 | [70]     | Vernier delay line         | 5ps        |
| 2004 | [12]     | Flash                      | 5ps        |
| 2003 | [71]     | Time to Voltage Conversion | 14ps       |
| 2005 | [72]     | Vernier Oscillator         | 18.5ps     |
| 2002 | [36]     | Vernier Oscillator         | 67ps       |
| 2001 | [44]     | Time-to-Voltage Conversion | 350ps      |
| 1999 | [52]     | Interpolator               | 30ps       |

Table 2.1 gives a summary of the literature review of the development of on-chip time measurement architectures from the highest resolution to the lowest reported to-date.

Table 2-1: Recent work on on-chip time measurement architectures

It can be seen from Table 2-1 that the highest resolution achieved using an on-chip time measurement architecture is 5ps. The time measurement technique used to obtain this resolution is the vernier and flash methods that use latches or flip-flops.

### 2.3 Concluding remarks

The literature review has identified various time measurement techniques with different resolutions. The work on time-to-digital conversion (TDC) forms the foundations for the research described in chapters 3, 4 and 5. The literature review also identifies a number of worthy problems that requires further investigations. This includes multiple measurements, programmability and sub-picosecond timing resolutions which forms the foundations of this research.

This chapter has presented the previous work that focuses on different types of time measurement architectures base on three types of time measurement techniques. However, the types of time measurements capable with the proposed time measurement architectures are limited. In order to obtain a different type of time measurement, extra or additional circuitry is required and in some cases, the time measurement architecture must be re-produced. Therefore, there will be multiple time measurement architectures

#### Literature review

being required on-chip to measure the required time measurement for a single CUT and this may not be acceptable for some applications. The proposed time measurement architecture in Chapter 3 addresses this problem of measuring multiples of time measurements with a single architecture, thereby eliminating the need to reproduce or add circuitry in order to obtain different types of time measurements. Many of the proposed architectures in the literature review have based their results on simulation alone. Therefore, practical validation is needed to gain maturity.

# Chapter 3

# Programmable Time Measurement Architecture

As illustrated in chapter 2, numerous circuits and techniques for on-chip time measurement testing have been proposed in the literature. The numbers of different types of time measurements that are capable of being performed are limited, and in order to perform the different types of measurement, extra or additional circuitry is needed. For example, in order to perform a rise and fall time measurement, additional circuitry incorporating a voltage discriminator for 90% and 10% of the supply voltage is needed [57]. Therefore within this thesis these types of time measurement architectures are classed as fixed and are only specific to certain types of measurements such as propagation and pulse width types of measurements. In addition, the time measurement architecture has to be duplicated in order to perform such measurements. For example, numerous additions of the time measurement architecture are needed to be integrated if a rise time measurement and a propagation delay type measurement is required [10]. Therefore, multiples of time measurement architectures will be required on the chip and this may be not suitable for some applications. In this chapter, a new novel programmable time measurement architecture is proposed and can perform a number of different types of time measurements without the need for duplication or additional circuitry. The main attributes that were considered when designing the architecture were overall size of the area, easy of integration and performance specification parameters such as resolution and dynamic range. The outline of this chapter is as follows. Section 3.1 presents the proposed programmable time measurement architecture. In section 3.2 the programmable input stage is described. Section 3.3 describes the design of a high speed comparator that was employed into the overall design of the time measurement architecture. The time measurement architecture uses the time-to-voltage (TVC) technique as the main measurement technique. In section 3.4 this method is described and different implementations are analysed and based upon these, a current steering time-to-voltage converter is described in section 3.5. Section 3.6 describes the digital processing block that generates the N-bit digital output code of the time measurement architecture and simulations of the overall architecture are presented in section 3.7. Finally, section 3.8 concludes this chapter.

#### 3.1 Proposed time measurement architecture

The proposed programmable time measurement architecture (PTMA) is shown in Figure 3-1. The PTMA is a self contained embedded core, which can be integrated onchip adjacent to the core-under-test (CUT). It is composed of three main blocks; a programmable interface block (PIB), a time-to-voltage converter (TVC) and a digital processing block to generate an N-bit digital output code. The PTMA is based on the time-to-digital converter (TDC) method, where the time difference between to input signals are represented by a digital output stream. The time measurement architecture is capable of performing four different types of time measurements, rise time, fall time, pulse width and propagation delay type measurements.



Figure 3-1: Proposed programmable timing measurement architecture

In order to configure the time measurement circuit to perform the different types of measurement, two pins: *mode0* and *mode1* are programmed. This can be achieved by directly applying the correct voltages to the inputs of *mode0* and *mode1* or by setting the correct values into a register using a microcontroller or other computer device. In

practice, the test will be automated and the bits will be stored in memory on the ATE. There are two bits to set four programming modes. Binary coding is used instead of other coding schemes, such as thermometer coding, as it uses less programming bits and therefore saves on the amount of memory which could be used for other test procedures. The modes of operation are shown in Table 3-1.

| Mode1 | Mode0 | Measurement       |
|-------|-------|-------------------|
| 0     | 0     | Rise Time         |
| 0     | 1     | Fall Time         |
| 1     | 0     | Pulse Width       |
| 1     | 1     | Propagation Delay |

 Table 3-1: Modes of operation

When the *start* signal is asserted high, the time measurement architecture converters the time difference, in this case a propagation delay between *Vin1* and *Vin2*, into N-bit digital output stream and stored into an output register for post analysis. Figure 3-2 shows the ideal waveforms of the proposed time measurement.



Figure 3-2: Proposed time measurement architecture waveforms

The output of the PIB is an inverted pulse that allows the capacitor in the TVC to charge and discharge. Then a comparator output signal is used to enable and disable the counter and latch the data into a register. In the following sections, each of the main building blocks is described.

The programmable interface block (PIB) consists of seven dynamic switches, a rail-torail comparator and some control logic. The internal reference voltages, VrefH, VrefM and VrefL, are generated internally via a voltage reference or by an external voltage reference. As the current design is targeted for a 1.2V 0.12µm process the reference voltages are currently 1.08V, 0.6V and 120mV respectively to represent the 90%, 50% and 10% of the supply voltage. In order to facilitate a rise time measurement, switches sw < l > and sw < 4 >, as shown in Figure 3-3, are closed in order for the comparator to compare the input rising signal with the applied VrefL. As soon as the output of the comparator goes high, both switches sw < 3> and sw < 6> turn on and sw < 1> and sw < 4 > turn off. This now allows the comparator to compare the input voltage with VrefH. When the input voltage crosses the VrefH reference voltage, the output of the comparator goes low. The output of the comparator passes through a switch that is opened by the high to low transition of the comparator output to form a single pulse that represents the time duration of the rising edge of the input voltage. This pulse is then converted into a voltage by the use of a time-to-voltage converter (TVC) which is described in section 3.5. Finally, a digital processing block generates a N-bit digital output code. A fall time measurement is achieved in a similar way but the references *VrefH* and *VrefL* are reversed. For pulse width measurements switches sw < 0 > and sw < 4 are closed first and then switches sw < 3 and sw < 5 are closed to compared the Vin1 input with VrefM, and for propagation delay measurements switches sw < 0 >and sw<4> are used and then sw<2> and sw<5>, so that Vin1 and Vin2 can be compared with VrefM.



Figure 3-3: Programmable Interface Block (PIB)

As shown in Figure 3-3, the switch controller controls the opening and closing of the switches at the input to the comparator. Depending on the input from the comparator and the measurement mode that the architecture is set to, determines what switches are opened and closed. The operation of the switch controller can be shown by the truth table in Table 3-2.

| Inputs  |        | Outputs |       |       |       |       |       |       |       |
|---------|--------|---------|-------|-------|-------|-------|-------|-------|-------|
| comp_in | mode_1 | mode_0  | sw<6> | sw<5> | sw<4> | sw<3> | sw<2> | sw<1> | sw<0> |
| 0       | 0      | 0       | 0     | 0     | 0     | 1     | 0     | 0     | 1     |
| 0       | 0      | 1       | 0     | 0     | 0     | 0     | 1     | 1     | 0     |
| 0       | 1      | 0       | 1     | 0     | 0     | 0     | 0     | 0     | 1     |
| 0       | 1      | 1       | 1     | 0     | 0     | 0     | 0     | 0     | 1     |
| 1       | 0      | 0       | 0     | 0     | 0     | 0     | 1     | 1     | 0     |
| 1       | 0      | 1       | 0     | 0     | 0     | 1     | 0     | 0     | 1     |
| 1       | 1      | 0       | 1     | 0     | 0     | 0     | 0     | 0     | 1     |
| 1       | 1      | 1       | 0     | 1     | 1     | 0     | 0     | 0     | 0     |

 Table 3-2: Switch Controller Truth Table

There are number of different logic minimization techniques [73] that are available in order to minimise the logic for the control block. Using such methods like the classical Karnaugh maps or K-maps for short, can take a long time to generate a solution and is not well suited for more than 6 input variables and only practical for up to 4 variables. The Quine and McCluskey method was the first alternative tabular method that was well suited to be implemented in a computer program to speed up the process time. The procedure starts with the truth table and generates a set of logic functions to which a set of prime implicants is compose. Finally a systematic procedure is followed to find the smallest set of prime implicant the output functions can be realized with. This method was found to be inefficient in terms of process time and memory. Adding a variable to a function increases both the process time as well as memory, as the length of the truth table increases exponentially with the number of variables. As a result, the Quine and McCluskey method is only practical for only functions with a limited number of input variables and output functions. The Expresso technique uses an algorithm that manipulates "cubes" that represents product terms instead of expanding the logic function into "minterms". Although the minimisation result is not guaranteed to be the global minimum, in practice it is very close approximation and the solution is always free from redundancy. Compared to the previous methods, the Expresso method is essentially more efficient in terms of reduced memory usage and computation time. There are also no restrictions on the number of variables and output functions. This allows for efficient implementation and has been incorporated as a standard logic function minimization step in logic synthesis tools. Therefore, this logic minimisation method was chosen for the synthesis of the control controller block. The logic for the switch controller is shown the schematic in Figure 3-4.



Figure 3-4: Switch controller block schematic

Figure 3-5 shows a mixed-signal simulation of the switch controller using the SpectreVerilog simulator from Cadence Design Systems [74]. The waveforms show that for a given input mode and the input from the comparator output, a set of logic values are placed onto the switch control bus that opens and closes the input switches of the TMA. For example, if a propagation delay type measurement is required, *modeO* and *mode1* are set to a logic level high. If the comparator input to the switch control block is at a logic level low, then the value on the switch control bus is set to a hexadecimal value of 22, which opens switches sw<3> and sw<1>, as shown in Figure 3-5. The verilog description of the switched controller can be found in Appendix B.

#### Programmable Time Measurement Architecture (PTMA)



Figure 3-5: Switch controller block simulations

The comparator control logic circuitry (See Figure 3-6) is incorporated within the TMA. The objective of this circuitry is to prevent the comparator output from changing state on each pulse of the input clock signal. The architecture of the control circuitry is designed so that only the first edge or pulse is measured, whether a rise time, fall time, pulse width or propagation delay type measurement is performed. In order to ensure that the output of the comparator is disabled from the output of the PIB, a switched connected to the output of the comparator is used in conjunction with the control logic shown in Figure 3-6.



Figure 3-6: Comparator control logic circuitry

In order to demonstrate this operation, Figure 3-7 shows the input and output waveforms of the PIB. Firstly, the power up signal is asserted to enable the circuitry (See Figure 3-3). In order to start the measurement, the start signal is asserted to a logic level high. In this case, a propagation delay type measurement is intended to be

performed, in which the two input signal *Vin\_1* and *Vin\_2* are applied to the input of the PIB as shown in Figure 3-3 and simulated in Figure 3-7.



Figure 3-7: Comparator control simulations

It can be seen that the comparator continues to change state after the period of the required input signal required by the measurement. This is due to the comparator clock still being activated. But the output has not propagated to the output of the PIB as we have disabled the output from the comparator. Therefore, the output of the PIB is a signal inverted pulse that can now be supplied to the TVC for conversion.

Figure 3-8 and Figure 3-9 show the effect of adjacent switches being switched in opposite directions. This only ever happens in two cases. Firstly, during a rise time measurement adjacent switches sw<2> and sw<3> are open and closed respectively. After the first comparison, the output of the comparator causes the switches to switch in opposite directions. This also happens during a fall time measurement; switches sw<0>and sw<1> are initially open and closed respectively and after the first comparison, the output of the comparator causes the switches to switch in opposite directions. As can be seen from Figure 3-8 and Figure 3-9, the switch control rise and fall time signal supplied from the control block for turning on and off the input switches, is fast enough such that when the two adjacent switches switch, there is little or no effect on the comparator input, (*vinn* and *vinp*), due to the two switches conducting at the same time.



Figure 3-8: Adjacent switching during rise time measurement.



Figure 3-9: Adjacent switching during fall time measurement.

#### 3.3 High speed comparator design

The comparator is an important as it compares two input signals and outputs a binary signal. Speed and resolution of CMOS comparators are limited by the inherent MOSFET characteristics of low transconductance and relatively large device mismatches. Therefore, there has been number of recent architectures for comparator design [75-82] and this section investigates a technique for high-speed, high resolution

comparator design for time-to-digital conversion applications. A block diagram of a high performance comparator is shown in Figure 3-10.



Figure 3-10: Block diagram of a high speed comparator [91]

The comparator consists of three stages: the input preamplifier, a positive feedback or decision stage, and an output buffer stage. The preamp stage amplifies the input signal to improve the comparator sensitivity, thereby increasing the minimum input signal with which the comparator can make a decision. The preamp stage also isolates the input of the comparator from switching noise or kickback noise coming from the positive feedback stage that can effect the overall performance of the comparator [83]. The positive feedback or decision stage is used to amplify the small differential signal from the output of the pre-amplifier stage to a signal level needed to drive digital circuitry. Positive feedback is used to generate the analogue signal into a full scale digital signal. Finally, the output buffer amplifiers this information and outputs the digital signal. The decision circuit is the heart of the comparator and should be able to discriminate mV level signals. The circuit that is used in the comparator is shown in Figure 3.11. The circuit uses positive feedback to increase the gain of the decision element.



Figure 3-11: Decision circuit

For the design of the comparator a rail-to-rail input common mode input range is needed. Previous research has used two additional comparators to form a window comparator [57, 84] as shown in Figure 3-12.



Figure 3-12: Window comparator [57]

However, this uses additional resources. By using a comparator with a rail-to-rail input stage it is possible to eliminate the need for the two additional comparators [57]. A common solution to achieve a rail-to-rail operation from a comparator is to decouple the inputs using capacitors or to use a switched-capacitor based sampling network [85], as shown in Figure 3-13, where Cs1-Cs4 samples the reference (Vr1 and Vr2) and input voltages (Vi1 and Vi2).



Figure 3-13: Switched-capacitor input sampling network [85]

During  $\phi l$ , *Vil* and *Vi2* are connected to *Cs2* and *Cs3*, and *Vr1* and *Vr2* are connected to *Cs4* and *Cs1*, respectively. This solution is costly in terms of area and noise, as due to the *KT/C* noise. Where, *K* is Boltsmans constant, *T* equals the temperature in Kelvin and *C* is the value of the capacitor. It can be seen that the capacitors need to be large in order to reduce the noise. In order to minimise offsets at the input of the comparator, capacitors *Cs1-Cs4* needs to be well matched. In addition, an overlap clock generator,

such as the one shown in Figure 3-14 [11], is required to generate the non-overlapped clocks,  $\phi l$  and  $\phi 2$ . Thus, extra circuitry is required.



Figure 3-14: Non-overlapped clock generator [11]

Another solution is to design a comparator with a two complementary differential pairs. Figure 3.15 shows two amplifiers, one with a PMOS input and one with a NMOS input.



Figure 3-15: PMOS and NMOS differential amplifiers.

The common mode input voltage range of the amplifier with the PMOS input is given by:

$$V_{ss} + V_{GS3} + |V_{TH1}| < V_{CM} < V_{DD} - V_{SD5} - V_{GS1}$$
(3.1)

where the  $V_{GS3}$  is the gate source voltage of the transistor M3,  $V_{TH1}$  is the threshold voltage of transistor M1,  $V_{SD5}$  is the source drain voltage of transistor M5 and  $V_{GS1}$  is the gate source voltage of transistor M1. The common mode input voltage of the amplifier with the NMOS input is given by:

$$V_{SS} + V_{DS10} + V_{GS6} < V_{CM} < V_{DD} - |V_{GS8}| + V_{TH6}$$
(3.2)

where  $V_{DS10}$  is the drain source voltage of transistor *M10*,  $V_{GS6}$  is the gate source voltage of transistor *M6*, and  $V_{GS8}$  is the gate source voltage of transistor *M8* and  $V_{TH6}$  is the threshold voltage of transistor *M6*. When the n-channel and p-channel input pairs are placed in parallel, the common mode input range becomes:

$$V_{SS} + V_{DS5} + V_{GS1} < V_{CM} < V_{DD} - |V_{GS8}| + V_{TH6}$$
(3.3)

A schematic of a rail-to-rail clocked comparator is shown in Figure 3-16 [11]. The input of the comparator consists of a rail-to-rail input stage that contains two complementary differential pairs in parallel. When one of the common-mode (CM) inputs is close to VDD, differential pair M1,M2 is active. When the common-mode input is close to VSS, differential pair M3,M4 is active. Vpbias and Vnbias are the bias voltages for the tail currents of the two differential pairs which are supplied by a bias circuit; this is not shown for simplicity. The operation of the comparator is as follows. When Vlatch is low, the comparator is in a reset state and transistors M11,M12 couple the drains of transistors M9,M10 to VDD.



Figure 3-16: Rail-to-Rail Comparator

Consequently, transistors *M13,M14* are off and there is no supply current flowing. When *Vlatch* is high, transistors *M13* and *M12* are open. The cross coupled regenerative inverters amplify the voltage difference and one of the output nodes is at *VDD*, the other is at *VSS*. The regeneration time  $T_{reg}$  of the comparator can be approximated as follows.

$$T_{reg} \approx \frac{C_{gs7,8}}{gm_{7,8}}$$
 (3.4)

where  $gm_{7,8}$  is the transconductance of transistors *M7* and *M8*, and  $C_{gs7,8}$  is their gate source capacitance. The transconductance of transistors M7 and M8 is given by:-

$$gm_{7,8} = KP_n \frac{W}{L}(vgs - vth)$$
(3.5)

where *W* and *L* are the width and lengths of the transistors *M7* and *M8*, *vgs* is the gate source voltage, *vth* is the threshold voltage and  $KP_n$  is the transconductance parameter for a n-channel transistor and is given by

$$KP_n = \mu_n C_{ox} = \mu_n \frac{\mathcal{E}_{ox}}{t_{ox}}$$
(3.6)

where  $\mu_n$  is the electron mobility in the n-channel,  $C_{ox}$  is the gate oxide capacitance per unit area,  $\varepsilon_{ox}$  is the dielectric constant of silicon oxide and  $t_{ox}$  is the thickness of the oxide. The output of the comparator changes on the rising or falling edge of a clock signal. The SR latch is used in order to make the outputs of the comparator change on the rising edge of the clock signal. The comparator has been designed through intensive transistor dimension optimization (See Table 3-3), instead of employing offset cancellation techniques often associated with high speed comparator design. Offset cancellation is a popular technique in analogue MOS circuits. However, if the time slot for comparison differs from the time slot for the offset storage in an offset cancelling technique, the stored offset voltage of the offset interval is not expected to be the same as the time slot for the comparison. In a noisy environment this will cause noise in the converted signal. In this technique, it is also inevitable to have an offset caused by clock feedthrough [11, 86].

| Device | Width (µm) | Length (µm) |
|--------|------------|-------------|
| M1     | 1.2        | 0.2         |
| M2     | 1.2        | 0.2         |
| M3     | 3.4        | 0.2         |
| M4     | 3.4        | 0.2         |
| M5     | 1.2        | 0.4         |
| M6     | 3.4        | 0.4         |
| M7     | 2.8        | 0.2         |
| M8     | 2.8        | 0.2         |
| M9     | 3.4        | 0.2         |
| M10    | 3.4        | 0.2         |
| M11    | 3.4        | 0.2         |
| M12    | 3.4        | 0.2         |
| M13    | 2.8        | 0.2         |
| M14    | 2.8        | 0.2         |

**Table 3-3: Comparator transistor sizes** 

The following Cadence plot (Figure 3-17) shows a simulation of the comparator when two clock edges are applied to the input of the comparator with a delay of 10ps between them. As can be seen, the comparator shows good performance and is able to distinguish differences as small as 10ps.



Figure 3-17: Simulation of comparator with input difference of 10ps

#### **3.3.1** Comparator bias circuitry

In order to generate the bias voltages, *Vnbias* and *Vpbias*, for the comparator as shown in Figure 3-16, a bias circuit is created. The bias circuitry used is shown in Figure 3-18. [11]. The gate source voltage of *M1* is given as

$$v_{gs1} = v_{gs2} + I_{D2} \cdot R \tag{3.7}$$

If the channel modulation effect is neglected, equation 3.7 can be written as follows;

$$\sqrt{\frac{2I_{D1}}{\mu_n C_{ox}(W/L)_1}} + vth = \sqrt{\frac{2I_{D2}}{\mu_n C_{ox}(W/L)_2}} + vth + I_{D2} \cdot R$$
(3.9)

where  $\mu_n$  is the mobility of the NMOS transistors and  $C_{ox}$  is the oxide capacitance. Neglecting the body effect equation 3.9 can be re-written as follows

$$\sqrt{\frac{2I_{D2}}{\mu_n C_{ox} (W/L)_2}} \left(1 - \frac{1}{\sqrt{K}}\right) = I_{D2} \cdot R$$
(3.10)

Therefore,

$$I_{D2} = \frac{2}{\mu_n C_{ox} (W/L)_2} \cdot \frac{1}{R^2} \left( 1 - \frac{1}{\sqrt{K}} \right)^2$$
(3.11)

In this design K is equal to 4. The final values are shown in Figure 3-18, where M is the multiplying factor which represents the number of devices in parallel.



Figure 3-18: Bias circuit with start-up circuitry [11]

Transistors M6-M7 form a differential amplifier which is used to compare the drain voltage of M1(Vnbias), with the drain voltage of M2 that regulates them to be equal. This results in an effective increase in M2's output resistance by using feedback. The operation is as follows; if the drain voltage of M2 is above Vnbias, the amplifier output increases. This drives the gate of M4 upwards, lowering the current it supplies and causing Vreg to drop back down. At the same time the gate of M3 is also increased, causing it to source less current. This causes a drop in Vnbias, which is the same as the drain voltage of M2 due to symmetry. Figure 3-19 shows the reference current with change in VDD when the amplifier has a gain of 10dB. To make the reference stable, MOS capacitors 250fF formed by M9 and M10 are added to the circuit.



Figure 3-19: Output current verses supply voltage

This bias circuit does not require external biasing and therefore is self-biased. As with any self-biased circuit there are two possible operating points, one where no current flows, and thus the circuit will remain in this stable state forever, and the other is where the circuit is in the desirable operating state. In order to ensure that the first state does not happen, a start up circuit is supplied, as shown in Figure 3-18. The operation of the start-up circuitry is as follows. Transistors M11-M13 forms the start-up circuitry. At start-up where there is no current flowing, the gates of M1 and M2 are at ground, while the gates of M3 and M4 are at VDD. In this state, the gate of M11 is at ground and is turned off. The gate of M12 is somewhere between VDD and VDD-Vthp. Transistor M13 behaves like a NMOS switch which turns on and leaks current into the gates of M1,M2 from the gates of M3,M4. This causes the current to jump to the desired operating state and M13 will turn off.

#### **3.3.2** Comparator simulations

The comparator was designed using a  $0.12\mu m$  CMOS process with a supply voltage of 1.2V. The propagation delay time of the comparator was defined as the time required from 50% of strobe input pulse to 50% of the output crossing the 0.6V threshold. Table 3-4 shows the simulation results of the comparators propagation delay across process corners.

| Process | Voltage [V] | Temp [degC] | <b>Propagation Delay [ps]</b> |
|---------|-------------|-------------|-------------------------------|
| Best    | 1.32        | -40         | 160.47                        |
| Typical | 1.20        | 25          | 175.65                        |
| Worst   | 1.08        | 120         | 191.11                        |

Table 3-4: Comparator propagation delay across process corners

A common mode input voltage reference of 0.6V and a 2.5 GHz clock is supplied to the comparator. A propagation delay of the comparator, measured from the 50% of the clock signal to the 50% of the output low-to-high transition, of approximately 200ps is achieved with a capacitive loading of 50fF that is connected to the output of the comparator. The simulated power dissipation was found to be 999µW with a supply voltage of 1.2V and the current consumption of 832.5µA. The capacitive load,  $C_{load}$ , on the output of the comparator during all simulations was 50fF. The following simulations show the comparator operation over process, voltage and temperature (PVT) corners. Figure 3-20 shows the typical process corner, Figure 3-21 shows the fast process corner and Figure 3-22 shows the slow process corner. It can be seen that the comparator performs according to expectation over the PVT simulations.





Figure 3-21: Worst process corner (FF, 1.08V, -40 degC)



Figure 3-22: Best process corner (SS, 1.32V, 125 degC)

#### 3.4 Time-to-voltage conversion

The time-to-voltage converter (TVC) is based on the dual-slope technique of charging up a capacitor *C*, in a time interval  $T_1$  using a constant current source,  $I_{ch}$ , and discharging it at a slower time interval T2 using another constant current source  $I_{dis}$ . This technique is widely used in time measurement architectures [51, 57, 87]. Figure 3-23 depicts the operation of the TVC.



Figure 3-23: TVC operation

During the charging phase of the TVC, the voltage at the integration capacitor C, is given by:

$$\Delta V_{ch} = \frac{I_{ch}}{C} * \Delta T_1 \tag{3.12}$$

During the discharge phase the capacitor voltage is given by:

$$\Delta V_{dis} = \frac{I_{dis}}{C} * \Delta T_2 \tag{3.13}$$

where

$$\Delta T_2 = Count * T_{clk} \tag{3.14}$$

Equating  $\Delta V_{ch} = \Delta V_{dis}$  gives :

$$\frac{I_{ch}}{C} * \Delta T_1 = \frac{I_{dis}}{C} * Count * T_{clk}$$
(3.15)

Therefore

$$\Delta T_1 = \frac{I_{dis}}{I_{ch}} * Count * T_{clk}$$
(3.16)

As can be seen in equation (3.16), the input measurement,  $\Delta T_I$ , is a function of the ratio of the discharging current,  $I_{dis}$  and the charging current,  $I_{ch}$ . Therefore the accuracy of the currents,  $I_{ch}$  and  $I_{dis}$ , has little effect on the accuracy of the measurement and temperature effects can be minimised by current layout techniques of current mirrors. A number of circuits have been proposed to implement the TVC [51, 87, 88] and Figure 3-24 shows a simplified view of the current implementations.



Figure 3-24: Current TVC implementations for embedded memory characterization

Configuration 3-24(a) is based on an integrator, where the capacitor  $C_{int}$  is charged and discharged by  $SW_{dis}$ . Configuration 3-24(b) and 3-24(c) are very similar; when switch SI is closed, they both use a constant current source, II, to charge up the capacitor, C. The resulting voltage ramp is directly proportional to the time and the voltage across the capacitor,  $V_c$ , is given by equation (3.17).

$$V_c(t) = \frac{I1}{C} * t \tag{3.17}$$

When switch *S2* is closed and switch *S1* is opened, the capacitor discharges. However, in Figure 3-24(c) a constant current *I2* is used to slowly discharge the capacitor. Each of the configurations in Figure 3-24 has non-linearities which limits their dynamic range. Configuration 3-24(a) has non-linearity caused by the settling time of the opamp. Both configurations 3-24(b) and 3-24(c) have non-linearity in their voltage transfer curves (VTC) as shown in Figure 3-25.



Figure 3-25: VTC of configurations (b) and (c).

This non-linearity arises because when switch, S1, is open, the voltage at node (x) is at VDD, but when S1 closes, this voltage rapidly falls to zero. This requires charge redistribution in the transistors of the current source which leads to a transient current in excess of the steady-state current. In order to avoid this, the current from the current source maybe diverted through a second switch while S1, is open. This can be achieved by the use of a current steering time-to-voltage converter, which is described in section 3.5.

#### **3.5 Current steering time-to-voltage converter**

Figure 3-26 shows a circuit diagram of a current steering time-to-voltage converter. This type of circuit is often used in Phase Locked Loop (PLL) circuits [89] but it has been used in the proposed current steering TVC to achieve better linearity and dynamic range, than previous TVC architectures. The operation of the circuit is as follow; transistors M7,M8 and also M11-M16 form the current reference similar to the one previously used for the biasing used for the comparator circuit where the drain current,  $I_{D8}$ , following in transistor M8 is given by:-

$$I_{D8} = \frac{v_{gs16} - v_{gs8}}{R} \tag{3.18}$$

where  $v_{gs16}$  and  $v_{gs8}$  are the gate to source voltage of transistor *M16* and *M8* respectively and *R* is the resistor connected from the source of transistor *M8* to *vss*. The current mirror form by *M11-M14* tried to force the current flowing in *M16* to be equal to  $I_{D8}$ but this can only happen if the  $v_{gs16}$  is greater than  $v_{gs8}$ . In order to ensure this, the width of M8 is made *K* times as large as the width of M16. In this case *K* is equal to 4 times. The currents are then mirrored to devices *M9,M10* and *M5,M6* that are used to charge and discharge the capacitor, *C*, depending on the voltage of *Vin* and *VinB*.



Figure 3-26: Current steering time-to-voltage converter (TVC)

Simulations of the current steering TVC and the commonly used configuration 3-24(b) are shown in Figure 3-27. As can be seen from the simulations, the current steering TVC has better linearity and dynamic range compared to that of configuration 3-24(b).



Figure 3-27: Simulations of TVC

## **3.6 Digital processing block**

The final block of the PTMA is the digital processing block. In order to convert the output voltage from the TVC (Figure 3.1) to an N-bit digital output, a comparator, a counter and a register is used. The counter is enabled and disabled by the output of the

comparator. This comparator is exactly the same type of comparator used for the programmable input block as described in section 3.2 and is used to enable the counter to start when the capacitor is discharging. When the capacitor has stop discharging the comparator outputs a signal to stop the counter from counting and latch the final count value in the register. Figure 3-28 shows a simplified view of the processing block.



N-Bit Digital Output Code

Figure 3-28: Simplified view of processing block

In order to calculate the measured result the following can be achieved using the following equation.

$$\Delta T_1 = \frac{I_{dis}}{I_{ch}} * Count * T_{clk}$$
(3.19)

where *Count* is equal to the digital output value of the counter,  $I_{ch}$  and  $I_{dis}$  are the charging and discharging currents of the TVC, as described in section 3.4.  $T_{clk}$  is equal to the period of the clock frequency supplied to the counter; in this case a clock frequency of 2 GHz is generated on chip using a ring oscillator. The resolution of the TMA,  $T_{res}$ , is given by the ratio of the discharging current,  $I_{dis}$ , and the charging current,  $I_{ch}$ , of the TVC and multiplied by the period of the counter clock frequency,  $T_{clk}$ , as shown in equation 3.20.

$$T_{res} = \frac{I_{dis}}{I_{ch}} * T_{clk}$$
(3.20)

Using a clock frequency of 2 GHz and a charging current of  $40\mu$ A and  $8\mu$ A for the charging and discharging currents, the resolution can be calculated to be approximately 100ps. In this design, 2 GHz is the maximum clock frequency that is achievable for correct operation using a 0.12µm CMOS technology. Based on this, it is likely that with technology scaling to smaller geometries the resolution can be improved.

#### **3.7 PTMA simulation results**

To verify the proposed programmable time measurement architecture (PTMA), the design was implemented and simulated using 0.12 $\mu$ m CMOS process models. The charging and discharging currents of the TVC are designed approximately to I<sub>ch</sub> = 60 $\mu$ A and I<sub>dis</sub> = 5 $\mu$ A, respectively. This is in order to give a longer slope during the discharging phase as compared to the slope during the charging phase. Figure 3-29 shows the simulation results of the proposed time measurement block.



Figure 3-29: 800ps propagation delay time measurement

The input signals applied to the PTMA, *Signal1* and *Signal2*, have a propagation delay time of 800ps. The capacitor voltage at the output of the TVC shows good linearity brought about by incorporating the current steering TVC. This has improved the linearity and therefore the dynamic range has improved compared to previous configurations. The digital output code D4-D0 generated by the digital processing

block was  $10100_2 = 20_{10}$ . The measured value can be calculated using the following equation.

$$\Delta T_1 = \frac{I_{dis}}{I_{ch}} * Count * T_{clk}$$
(3.21)

$$=\frac{5}{60}*20*500*10^{-12}=833\,ps$$
(3.22)

This gives a measured value of approximately 833ps. The percentage error can be calculated using the following equation.

$$Error\% = \frac{measured \_value - actual \_value}{actual \_value} *100\%$$
(3.23)

$$=\frac{833\,ps - 800\,ps}{800\,ps} * 100\% = 4.16\% \tag{3.24}$$

This gives an error of approximately 4 percent. This error is caused by a number of factors. Firstly, the charging capacitor used in the TVC is non-linear. Secondly, the comparator of the PIB stage has a propagation delay of approximately 175ps, as shown in Table 3-4 which is adding to this error. Therefore, the temporal resolution,  $T_{res}$ , is given by the following equation

$$T_{res} = Comp_{pd} + \frac{I_{dis}}{I_{ch}} * T_{clk}$$
(3.25)

Where  $Comp_{pd}$  is equal to the propagation delay of the comparator,  $I_{ch}$  and  $I_{dis}$  are the charging and discharging currents of the TVC and  $T_{clk}$  is the clock supplied to the counter of the digital processing block. Therefore, the overall resolution of the PTMA is approximately 300ps.

A number of simulations have been made in order to show the accuracy and linearity of the proposed programmable time measurement architecture in each of its modes. Table 3-5 shows the results of the PTMA configured for propagation delay type measurement.

| Actual [ps] | Count | Simulated [ps] |
|-------------|-------|----------------|
| 400         | 9     | 361            |
| 800         | 20    | 833            |
| 900         | 21    | 861            |
| 1000        | 25    | 927            |
| 1200        | 30    | 1235           |
| 1300        | 33    | 1380           |
| 1500        | 39    | 1610           |
| 1700        | 42    | 1750           |
| 1900        | 46    | 1902           |
| 2000        | 51    | 2130           |
| 2400        | 60    | 2500           |
| 2800        | 65    | 2710           |
| 2900        | 68    | 2830           |
| 3000        | 71    | 2960           |

Table 3-5: Propagation delay simulation results

Figure 3-30 shows a plot of the simulation results for the actual and measured delays for a propagation time delay measurement.



Figure 3-30: Propagation delay measurement

The following table shows the simulation results of the PTMA when programmed for a rise time measurement.

| Actual [ps] | Count | Simulated [ps] |
|-------------|-------|----------------|
| 400         | 9     | 375            |
| 800         | 20    | 833            |
| 900         | 21    | 875            |
| 1000        | 25    | 1042           |
| 1200        | 30    | 1250           |
| 1300        | 32    | 1333           |
| 1500        | 38    | 1583           |
| 1700        | 39    | 1625           |
| 1900        | 45    | 1875           |
| 2000        | 49    | 2040           |
| 2400        | 56    | 2300           |
| 2800        | 64    | 2670           |
| 2900        | 72    | 3000           |
| 3000        | 78    | 3080           |

Table 3-6: Rise time simulation results

The simulated rise time measurement is shown in Figure 3-31.



Figure 3-31: Rise time verses output count from the PTMA

Similarly, Table 3-7 shows the results of the PTMA configured for pulse width type measurement.
| Actual [ps] | Count | Simulated [ps] |  |  |
|-------------|-------|----------------|--|--|
| 400         | 9     | 361            |  |  |
| 800         | 20    | 833            |  |  |
| 900         | 21    | 861            |  |  |
| 1000        | 25    | 927            |  |  |
| 1200        | 30    | 1235           |  |  |
| 1300        | 31    | 1277           |  |  |
| 1500        | 39    | 1610           |  |  |
| 1700        | 39    | 1610           |  |  |
| 1900        | 46    | 1902           |  |  |
| 2000        | 50    | 1902           |  |  |
| 2400        | 60    | 2500           |  |  |
| 2800        | 69    | 69 2920        |  |  |
| 2900        | 70    | 2830           |  |  |
| 3000        | 80    | 2880           |  |  |

Table 3-7: Pulse width simulation results

Figure 3-32 shows a plot of the simulation results for the actual and measured delays for a pulse width measurement.



Figure 3-32: Pulse width measurement

The following table shows the simulation results of the PTMA when programmed for a fall time measurement.

| Actual [ps] | Count   | Simulated [ps] |  |  |
|-------------|---------|----------------|--|--|
| 400         | 9       | 375            |  |  |
| 800         | 20      | 833            |  |  |
| 900         | 21      | 875            |  |  |
| 1000        | 25      | 1042           |  |  |
| 1200        | 30      | 1250           |  |  |
| 1300        | 32      | 1333           |  |  |
| 1500        | 38 1583 |                |  |  |
| 1700        | 39      | 1625           |  |  |
| 1900        | 45      | 1875           |  |  |
| 2000        | 52      | 1917           |  |  |
| 2400        | 59      | 2330           |  |  |
| 2800        | 68 2920 |                |  |  |
| 2900        | 72      | 3000           |  |  |
| 3000        | 78      | 3080           |  |  |

Table 3-8: Fall time simulation results

The simulated fall time measurement is shown in Figure 3-33.



Figure 3-33: Fall time verses output count from the PTMA

In Figures 3-30 to Figure 3-33, which plot the actual time to be measured against the digital output count, there exists some non-linearity. This non-linearity, which has produced errors in the conversion, is primarily caused by the propagation delay of the comparator. Therefore, the speed of the comparator is a very important specification in this time measurement architecture, as it determines the starting and stopping of the

counter, as well as determining when the switches switch after making the first measurement.

Figure 3-34 shows the effect of the rise and fall times on the input transmission gates when switching from one measurement to another. The first measurement to be made is a rise time measurement where both *Mode0* and *Mode1* are set to zero. When a rising edge of the *start* signal is asserted high, the comparator compares the two input signals, *Vin\_1* and *Vin\_2*. As can be seen, at this point the capacitor voltage of the TVC starts to rise and then fall as expected. Once the measurement has finished, both the mode inputs *Mode0* and *Mode1* are asserted high to perform a propagation delay measurement. The circuit is disabled and enabled again and on an assertion of the next *start* signal the measurement is performed. Figure 3-34 shows the data output of the TMA.



Figure 3-34: The effect of rise and fall time on inputs when switching from one measurement to another

Figure 3-35 shows a plot of the simulated conversion times for a rise time measurement. As can be seen, the maximum conversion time is 56ns which is for a 9ns rise time measurement. It should also be noted that for a TMA using the TVC technique, the conversion time will increase in time as the input time measurement increases.



Figure 3-35: Conversion time

### **3.8 Concluding remarks**

This chapter has presented a new programmable time measurement architecture and associated circuits. In previous approaches that are capable of performing different types of time measurements, additional circuitry had to be added or duplicated. The proposed circuit is capable of obtaining four types of measurements, rise-time, falltime, pulse width and propagation delay type measurements, without the need for additional or duplicated circuitry. This has been achieved through careful analysis and selection of circuits for the various building blocks for the time-to-digital converter (TDC), such as the use of a novel programmable input stage which consists of a set of input switches, a single comparator that has a rail-to-rail input stage and some control logic. Although, the propagation delay of the comparator add to the overall resolution of the programmable time measurement architecture, simulation results have shown that the time measurement architecture is capable of performing measurements with a time resolution of less than 300ps. Furthermore, by employing a programmable approach for the time measurement architecture, greater savings in terms of area overhead are achieved, as there is no need to add or duplicate circuitry in order to perform different types of time measurements. In addition, it has been shown that by using a current steering time-to-voltage converter (TVC), the dynamic range has been improved due to better linearity within the TVC as apposed to previous time measurement architectures that use this method.

In the next chapter, the physical and practical validation of the proposed programmable time measurement architecture is described.

### **Chapter 4**

### **CMOS implementation of PTMA**

In order to validate the programmable time measurement architecture (PTMA) proposed in Chapter 3, a prototype chip was fabricated using a  $0.12\mu m$ , CMOS process [90]. This chapter give an in depth explanation of the implementation of the prototype chip from schematic to the final layout including experimental results.

The outline of this chapter is as follows. Section 4.1 introduces the procedure of implementation of the prototype chip, describing the backend design flow. Section 4.2 describes the top level layout of the prototype chip. In section 4.3, the layout of the comparator is discussed and analysed, as it is one of the important design modules within the overall time measurement architecture. In section 4.4, the layout techniques used for the time-to-digital converter is described and explained. Section 4.5 and 4.6, explains the layout for the high speed clock generation and the programmable input block respectively. Section 4.7 and 4.8 shows the layout for the digital processing block and the full chip layout. In section 4.9, experimental results are presented and finally in section 4.10, concluding remarks are given.

### **4.1 Backend Design Flow**

The following design flow, shown in Figure 4-1, was used for the implementation of the prototype chip, from the schematic to the generation of the final device.



Figure 4-1: Back end design flow

The design flow proceeds with either a schematic or a register transfer language (RTL) file that can be written in verilog or VHDL hardware description language (HDL). This describes the detailed functions of the device's modules. For the design of the PTMA, the traditional analogue design flow, that is used today, using schematic entry is applied because current analogue hardware description synthesis tools are not yet fully developed. The next stage in the procedure is to generate the layout of the components. This can be either done manually by placing every device and joining up the wires or automatically by using place and route tools. Automatic place and route tools are mainly used for placement and routing of digital modules rather than analogue modules as the layout of the digital modules are less sensitive to process and noise disturbances. Since this design is a mixed-signal architecture and a large proportion of the design is analogue, the layout of the design is very sensitive to noise and process variations and therefore the layout is done manually.

Once the layout has be generated, design rule checks (DRC) specified by the manufacturer have to be carried out to ensure correct spacing of the device for manufacturing. After the layout has met the specified checks, the components are

extracted from the layout view so that verification can be done with the schematics in order to check that all layouts are as intended. This procedure is known as LVS.

The next stage of the design flow is to perform simulation on the layout in order to verify that the design meets the overall specification. This involves extracting the parasitic components that exist within the generated layout. If the design does not meet the specification, then the layout has to be modified or re-generated again and the process has to be re-done. Once the specification has been met, finally a Graphic Design System (GDSII) file format is generated and sent to the manufacture for fabrication.

### 4.2 Prototype chip

Figure 4-2 shows a schematic of the top level hierarchy of the prototype chip for the programmable time measurement architecture.



Figure 4-2: Top level hierarchy of prototype chip

There are three reference voltages, 1.08V, 0.6V, and 120mV that are needed for the PTMA. They are either generated on-chip using the on-chip voltage reference block (*res\_ref*) or they can be generated off chip and supplied to dedicated pins, *VrefL*, *VrefM* and *VrefH*. In this case, the on-chip voltage reference uses a resistive divider in order to generate the on-chip voltage references. Figure 4-3 shows the block diagram of the on-chip generator. External capacitors of 1 nF are used to decouple any noise on the input of *VrefL*, *VrefM* and *VrefH* pins.



Figure 4-3: On- chip reference generator

In order to determine whether the on-chip reference voltage or an external voltage reference is used for the generation of the voltages references, the chip can be programmed using a dedicated programming bit. The switches that are programmed are transmission gate switches and have an on resistance of approximately 50 ohms. The schematic of the time measurement core is shown in Figure 4-4.





**Figure 4-4: Time measurement core** 

The time measurement core consists of the programmable input block, a time-tovoltage converter, a processing block, a clock generator to generate the high speed clock signals used for the two comparator circuits and the counters within the processing block. In order to calibrate the programmable time measurement architecture, an on-chip pulse generator that generates a 1 ns pulse can be used or this can be achieved using externally using and external signal generator.

The implementation of the layout for the time measurement architecture the following guidelines were followed to reduced miss-matching and noise that will effect the operation and performance, through noise and cause non-linearity and there for reduced resolution. Most of the noise problems encountered in integrated circuits are caused by capacitive coupling from on circuit node to another. The most noise sensitive signals are inputs to high gain amplifiers and high precision comparators, inputs to the analogue-to-digital converters (ADCs), outputs of voltages references, analogue ground to high precision circuitry. Metal layers that carry noisy signals should not run on top of sensitive signals or vice versa. If a crossing of metal layers is unavoidable the area of crossing is minimised and shielding formed from an intermediate layer is used as an electrostatic shield [91]. Furthermore, noisy signals should not run adjacent to sensitive signals if this is unavoidable another signal should be run between them, such as a shield or ground line. Wherever possible noisy circuits should be place as far as

possible way from sensitive circuits. Guard rings are used extensively to protect sensitive circuitry from noise caused by neighbouring circuits.

### 4.3 Comparator layout techniques

The comparator is a critical module in the overall design of the programmable time measurement architecture. The comparator not only has to be high speed but also must have high precision. These two prerequisites are critical in achieving a high resolution time measurement architecture. Figure 4-5 shows the schematic of the comparator, as described in Chapter 3, section 3.3.



Figure 4-5: Rail-to-rail comparator

The use of the common centroid layout is used to minimise the mismatch in components which will inevitably affect the input offset of the comparator and therefore facilitate speed and precision. The mismatch, due to oxide gradients and other process variations, is minimised in this design by laying out transistors M1 to M4 in a common centroid configuration [91].

Another means of achieving a high speed comparator is to reduce the total capacitance on the output of the latch stage. This is carried out by minimising the capacitance on the node at the drains of transistors *M9* and *M10*.

The total size of the complete comparator layout including the biasing stage is 35µm by 14µm and is illustrated in Figure 4-6.



**Figure 4-6: Comparator Layout** 

### 4.4 Time-to-voltage converter layout techniques

The time-to-voltage converter (TVC) is the heart of the time measurement architecture. It performs the pulse amplification by charging up a capacitor and discharging at a slower rate as described in Chapter 2. Due to the fact that this block is an analogue block it is a very important block and is susceptible to noise and process variations. Figure 4-7 shows the schematic of the TVC. The layout of the current sources M12-M18 that supply the charging and discharging currents need to be matched. Therefore, they are laid out in a common centroid layout configuration. The resistor, R, sets the bias current and needs to be place close as possible to the module. The additional parasitic resistances associated with the routing to the source of the transistor M5 will be minimised so that the gate source voltage of transistor M5 is not reduced and thus reducing the bias current for the generation of the charging and discharging currents of the TVC.



Figure 4-7: Time-to-voltage converter (TVC)

When a capacitor is fabricated, undercutting of the mask can lead to mismatch. One solution is to construct an array of smaller unit capacitors [91]. Other factors associated with larger single capacitors are that the oxide grows non-uniformly this result in non accurate capacitor values. Therefore, the capacitor is laid out in a common-centriod scheme, so that the first order oxide errors average out to be the same for each capacitor [91]. Figure 4-8 illustrates this



Figure 4-8: Capacitor array

The full layout of the TVC is shown below in Figure 4-9. Decoupling capacitors of 80fF are used to reduce the noise on critical bias nodes. The bias resistor is laid out in a

"snake" format with dummy resistor on either side. This configuration is utilized in order to reduce the mismatch between the neighbouring resistors from the photolithography and etching process during manufacturing [91]. When interfacing this module within the overall time measurement architecture, the TVC is place as close as possible to the next stage, in this case to a comparator. This is done so that the routing to the next stage is as short as possible in order to minimise the additional parasitics associated with the routing that would add to the overall capacitance on the output node of the TVC and would effect the overall timing measurement.



Figure 4-9: Time-to-voltage converter (TVC) layout

The final size of the TVC block is measured to be  $35.5\mu$ m by  $42\mu$ m.

### 4.5 High speed clock generation

In order to generate the high speed clock signals of 2.5 GHz and 2 GHz for the high performance comparator and counter circuits, the clocks had to be generated on-chip. The growing electrical distance between the external clock generator and the device,

attenuates and introduces a phase delay in the high speed clock signal. In addition, the associated parasitic components of the pads in the pad ring connecting to the core circuitry on-chip, makes it impossible to supply a clock at such a high frequency. So the architecture of the clock generator is in the form of a ring oscillator [92]. The ring oscillator is created using an odd number of amplifying inverters in a feedback loop, as shown in Figure 4-10.



**Figure 4-10: On-chip clock generation** 

The operation of the ring oscillator uses negative feedback and as a result, the circuit becomes unstable and oscillations occur. The frequency of the oscillator is calculated using the following equation

$$f_{osc} = \frac{1}{T} = \frac{1}{2n\tau_{inv}} \tag{4.1}$$

where *n* equals the number of inverters in the feedback loop and  $\tau_{inv}$  is equal to the delay of the inverter. A simulation of the ring oscillator running at 4 GHz with 7 inverters in the feedback loop gives an inverter delay of approximately 17.85ps. Figure 4-11 shows the schematic of the clock generator circuit.



Figure 4-11: Clock generator circuit

The generation of the 2 GHz and 2.5 GHz clocks are shown in Figure 4-12.



Figure 4-12: Generation of the 2 GHz and 2.5 GHz clocks

The ring oscillator is a very noisy circuit that injects noise into the substrate [93], especially at high frequency this is a problem for noise sensitive circuits. Therefore, it is important to keep this type of circuit away from sensitive modules. In addition, guard rings and shielding [91] is used to protect such circuits. The layout for the clock generator circuit is shown in Figure 4-13. The total area of the clock generation module is 68.9µm by 28.7µm.



Figure 4-13: Layout of the clock generator module

### 4.6 Programmable input block

It is important to keep the parasitics at the input to the comparator small in order to reduce any inaccuracy in the time measurement, and to keep the noisy switches away from sensitive nodes of the comparator. The use of guard rings and shielding is applied to the layout. The layout for the programmable input block is shown in Figure 4-14. As can be seen the area of the programmable input block is  $61.5\mu$ m by  $39.2\mu$ m.



Figure 4-14: Layout of the programmable input block

### 4.7 Digital processing block

The digital processing block is a less critical module in terms of layout. As described in chapter 3, section 3.6, this block consists of a high speed counter and an 8-bit register. The layout for the digital processing block is shown in Figure 4-15. The layout has been done manually, due to a limited number of components and therefore it is faster to layout the design by hand then to use place and route tools which are targeted for larger complex designs. As can be seen in Figure 4-15, the total area is 142µm by 30µm.



Figure 4-15: Layout of the digital processing block

### 4.8 Full chip layout

The full chip layout is shown in Figure 4-16. The size of the chip measures  $1480.52 \mu m$  by  $1505.94 \mu m$ .



Figure 4-16: Full chip layout

All of the digital input and output pins are buffered via tri-state buffers. This is intended so that the circuit is isolated from the other on-chip modules on the multi project silicon die during power down. Each circuit is powered up by its own power supply. This will isolate it from the other circuits so that no interference from adjacent circuitry can obstruct the operation and performance of the operating circuit. Figure 4-16 shows the top level schematic of the programmable time measurement architecture that incorporated input and output (I/O) tri-state buffers on the digital I/O pins. When the circuit is used, a programming bit is set to enable the tri-state buffers so that the architecture is ready for operation.



Figure 4-17: Top level schematic with input and output tri-state buffers

Traditionally, each individual circuit has its own powered down mode so it can be powered down individually, but this approach ensures that if the circuitry is not operating, then the circuit is powered down by default. The individual layout for the modules of the PTMA can be seen in the overall layout in Figure 4-18.



Figure 4-18: Modules of the PTMA

Figure 4-19 shows an optical picture of the chip that was fabricated. As can be seen much of the circuitry is hidden under a layer that was placed over the circuitry for protection.



Figure 4-19: Optical picture of fabricated chip

The PTMA is situated in the top left hand quarter of the die. It is possible to see the top layer of the capacitors of the capacitor array of the time-to-voltage converter.

### **4.9 Experimental results**

In order to facilitate the programming of the fabricated test chip and to demonstrate the operational modes; rise time, fall time, pulse width and propagation delay, a test setup for the fabricated test chip was built. An overview of the test setup is shown Figure 4-20. It consists of a the Microchip MPLAB® ICD 2 development design kit which includes the PICDEM 2 PLUS demo board [94]



Figure 4-20: Experimental test setup

There are two chips on the breadboard; they are the Maxim 3002 [95] and the Maxim DS1020 [96]. The Maxim 3002 is a level translator that converts voltages between 5V and 1.2V required by the prototype chip. All control signals and input sources interfacing between the PIC controller and the test chip is interfaced via the translator, as shown in Figure 4-21.



Figure 4-21: Level translator

The Maxim DS1020 device is an 8-bit programmable delay line which is used to setup the time measurements that are needed to test the PTMA. The delay values for the programmable delay line are set using the 8-bit parallel port and a value is set by the microcontroller. The program in the microcontroller is designed to set a range of 8-bit values in order to vary the delay. Therefore, the pulse width and propagation delay type measurements can be easily generated and adjusted. For the rise and fall time type measurements, in order to generate different rise and fall times, additional capacitance is added to the output of one stage of the delay line. This was achieved by increasing the fan out a single stage delay stage of the delay line. Using the experimental setup described in section 4.2, an input rise time of 3.3ns is applied to the input of the PTMA using the microchip. The source code for the programming of the PIC can be seen in Appendix C. Before making any measurement, the time measurement architecture is configured in calibration mode which applies a 1ns pulse to the input of the time measurement architecture using the on-chip pulse generator. The output binary value is '00011000<sub>2</sub>' is achieved which gives a decimal gives a value of  $24_{10}$ . In order to verify that the measurement is correct, the data output value is inserted into equation 4.2, as described in chapter 3, section 3.7.

$$\Delta T_1 = \frac{I_{dis}}{I_{ch}} * Count * T_{clk}$$
(4.2)

where, the charging,  $I_{ch}$ , and discharge,  $I_{dis}$ , currents are 60µA and 5µA respectively. The clock frequency supplied by the on-chip oscillator is 2 GHz, which translates to a clock period,  $T_{clk}$ , of 500 ps.

$$\Delta T_1 = \frac{5}{60} * 24 * 500 * 10^{-12} = 1ns \tag{4.3}$$

This gives a result of 1ns, which corresponds well with the on-chip 1ns input pulse generator. In order to verify all the programming modes are tested using the setup described above. Table 4.1 shows the various programmable measurements with their respective digital output code. As can be seen, the proposed time measurement architecture is proficient in achieving the four programmable type measurements (rise time, fall time, pulse width and propagation delay). The results are also similar to the simulated results in the chapter 3. The errors produced in the conversion are primarily caused by number of sources. The propagation delay of the comparator is the main cause of the errors, as this delay determines the starting and stopping of the counter. However, by improving the design of the comparator, this will reduced this error. Another course of this error is the non-linearity of the charging and discharging Metal-Insulator-Metal (MIM) capacitor. This non-linearity is caused by increasing the capacitor density through the use of decreasing the dielectric thickness or using a dielectric with a higher permittivity [97]. Both of these methods degrade the linearity of the capacitor and therefore as the capacitor is an important element in the time measurement architecture, this has an effect on the overall measurement.

| Mode 1 | Mode 0 | Measurement Type  | Input time | Digital Output | Output  | Magnitude |
|--------|--------|-------------------|------------|----------------|---------|-----------|
|        |        |                   |            | Code           | Value   | of Error  |
| 0      | 0      | Rise time         | 2.1ns      | 43             | 1.792ns | -308ps    |
| 0      | 0      | Rise time         | 2.2ns      | 46             | 1.917ns | -283ps    |
| 0      | 0      | Rise time         | 3.3ns      | 73             | 3.042ns | -258ps    |
| 0      | 0      | Rise time         | 3.5ns      | 77             | 3.208ns | -292ps    |
| 0      | 1      | Fall time         | 2.1ns      | 44             | 1.833ns | -267ps    |
| 0      | 1      | Fall time         | 2.2ns      | 46             | 1.917ns | -283ps    |
| 0      | 1      | Fall time         | 3.3ns      | 73             | 3.042ns | -258ps    |
| 0      | 1      | Fall time         | 3.5ns      | 77             | 3.208ns | -292ps    |
| 1      | 0      | Pulse width       | 2.2ns      | 46             | 1.917ns | -283ps    |
| 1      | 0      | Pulse width       | 3.3ns      | 73             | 3.042ns | -258ps    |
| 1      | 0      | Pulse width       | 3.4ns      | 75             | 3.125ns | -275ps    |
| 1      | 0      | Pulse width       | 3.5ns      | 76             | 3.167ns | -333ps    |
| 1      | 1      | Propagation delay | 2.2n       | 46             | 1.917ns | -283ps    |
| 1      | 1      | Propagation delay | 3.2ns      | 69             | 2.875ns | -325ps    |
| 1      | 1      | Propagation delay | 3.4ns      | 74             | 3.083ns | -317ps    |
| 1      | 1      | Propagation delay | 3.5ns      | 77             | 3.208ns | -292ps    |

 Table 4.1: Experimental results

In addition, the time measurement architecture requires stable reference voltages. If the reference voltages are not stable this will affect the overall measurement value. A solution is to use an on-chip bandgap voltage reference and to generate the internal references using a voltage regulator [11]. This will supply a stable and accurate voltage references for the measurement architecture and therefore minimise reduce the error.

#### **4.10 Concluding remarks**

This chapter has presented the implementation and test of the proposed programmable time measurement architecture as proposed in chapter 3. It has been shown through experimental results that a programmable time measurement architecture can perform four types of time measurements without the use of additional circuitry or circuit duplication. It has been made possible through the design of high performance circuits and careful mixed-signal layout techniques, such as common centroid layout, analogue to digital circuit separation, the use of analogue shielding and guard rings. The programmable time measurement architecture has also been verified in a mixed signal noisy environment where there are different circuits operating on the same silicon.

The limitations of the fabricated chip are that the chip can only make time measurements sequentially rather than in parallel. The advantage of making time measurements in parallel would reduce the overall time cost for the testing. The other limitation of the chip is that the resolution of the time measurement architecture is limited by the propagation delay of the comparator which is simulated to be 175ps using the typical process corner, as shown in Chapter 3, section 3.3.2. So, in order to improve the resolution of time measurement architectures, a new design is presented in chapter 5.

### **Chapter 5**

## Homodyne Time-to-Digital Conversion (HTDC)

The ITRS'05 [1] predicts that on-chip clock speeds will increase into the tens of gigahertz range which will require time measurement architectures with timing resolutions of tens of femtoseconds. Currently, as discussed in Chapter 1, section 1.3, timing characteristics of VLSI devices are performed using automatic test equipment (ATE). Such testers are able of achieving accurate timing measurements, however they are expensive. Furthermore, the increased integration and performance of VLSI devices due to technology scaling has produced limitations in traditional timing performance test methods. For example, bandwidth and additional timing skew brought about by the increase of electrical distance between the tester and the device under test (DUT).

For femtosecond resolution, metastability becomes a problem for time measurement architectures based on flip-flops and latches, such as the vernier [41] and flash based time-to-digital conversion (TDC) architectures [10]. Section 5.1 analyses this problem and section 5.2 describes current research that has been carried out to improve the resolution of such time measurement architectures using time amplifiers [70, 98] that precede the lower resolution TDC. Thereby, adding an extra circuit to the time measurement architecture will add to the overall area budget which can be unacceptable for some applications. To address these issues, section 5.3 proposes a new fully integrated on-chip time measurement architecture that is based on the time-to-digital conversion method using the Homodyne technique [25]. However, it is not based on latches and flip-flops with the attempt of achieving tens of femtosecond timing resolution.

# **5.1 Metastability problem with TDC based on Vernier/Flash** architectures

In order to achieve the femtosecond timing resolution required for future high performance VLSI devices, as predicted by the ITRS [1], time measurement architectures based on the vernier delay line (VDL) and the flash based architecture have a limiting factor. This limiting factor is the metastability problem associated with the flip-flops and latches which are the prime elements and used for the time measurement technique itself. Metastability arises when the setup and hold time of the flip-flop is violated and therefore its normal operation is disturbed. Consequently, the output of the flip-flop may stay low and then go high, or vice versa. Figure 5-1 shows the setup and hold timing violations of a flip-flop or latch.



Figure 5-1: Flip-Flop Setup and Hold Violations

This inherited metastability is a limiting factor for femtosecond timing resolution which is necessary to measure time measurements for high performance VLSI devices.

Further research has been done to improve the resolution of time measurement architectures by incorporating time amplifiers [70, 98, 99] that precede the TDC in order to improve the overall timing resolution. A review of current time amplifiers used for time measurement testing is discussed in the next section.

### **5.2 Time amplifiers**

In order to achieve higher resolution time measurement capabilities, researchers have explored the idea of amplifying the input to the time measurement architecture using a time amplifier [70, 98, 99], as shown in Figure 5-2. The objective is to amplify the input time interval into a timing range that the time measurement architecture is capable of processing.



Figure 5-2: Low resolution time measurement architecture (LRTMA) with time amplifier

Figure 5-3 shows such a time amplifier and uses a mutual exclusion (MUTEX) circuit [99] or arbitration circuit which is sometimes called an arbiter [11].



Figure 5-3: MUTEX [99]

The operation of the arbiter is as follows. The cross-coupled NAND gates of the SR latch switches the output transistors only when a difference in the output of the bistable latch reaches a certain value [100]. The gain of the time amplifier can be increased or decreased by the sizing of the output transistors. Another implementation of a time amplifier is shown in Figure 5-4 [98].



Figure 5-4: Time amplifier [98]

The circuit consists of two cross-coupled differential pairs with passive RC loads. The operation of the circuit is as follows. Two rising edges  $\phi_1$  and  $\phi_2$  are applied to the gates of *M1* and *M3* respectively. The bias current of the amplifier is steered around the differential pairs and into the passive loads. The voltage at the drains of transistors *M1* and *M2* are caused to be equal at a certain time and at the drains of *M3* and *M4*, the voltage becomes equal a short time later. This effectively produces a time interval that is proportional to the input time difference. These circuits output an analogue voltage and are often used to condition the input signal. Then a relatively low resolution time-to-digital converter (TDC) is used for the actual measurement.

Although, time amplifiers increase the resolution of time measurement architectures, this is achieved at the cost of additional silicon area which may be unacceptable for some applications. Therefore, further research is needed to investigate and develop an architecture that meets the requirements for tens of femtosecond timing resolution with reduced circuit overhead which is low power and compatible with modern CMOS technologies. In the next section, a time measurement architecture that meets these requirements is proposed and each of the critical sub-blocks are described in detail.

### **5.3 Proposed architecture**

The proposed time measurement architecture is shown in Figure 5-5.



Figure 5-5: Proposed Time Measurement Architecture

The architecture is composed of three components; an analogue mixer, a filter and an analogue-to-digital converter (ADC). The operation of the proposed time measurement architecture is to convert a small phase difference into a DC voltage using the homodyne technique [25] and then a high resolution ADC is used to convert this DC voltage (*Vdc*) into a digital binary output code that represents the phase difference at the input. The generation of the DC voltage is achieved by modulating an input clock signal (*Input*) with a reference clock signal (*Reference*) and apply filtering to produce a DC voltage (*Vdc*). This DC voltage represents a function of the phase difference of the two input clock signals. The modulation is achieved by the use of an analogue mixer. For example, if two sinusoidal signals with the same amplitude, *A* and frequency,  $\omega$ , but have different phases,  $\phi_l$  and  $\phi_2$ , respectively, then the output of the mixer (*Vm*) is given by

$$V_m(t) = A^2 \cos(\omega t + \phi_1) \cos(\omega t + \phi_2)$$
(5.1)

and

$$V_{m}(t) = \frac{A^{2}}{2} \left( \cos(\phi_{1} - \phi_{2}) + \cos(2\omega t + \phi_{1} + \phi_{2}) \right)$$
(5.2)

If a low pass filter is used to remove the  $2\omega$  term, this leaves a DC term which is proportional to the cosine of the phase difference multiplied by half of the squared amplitude.

$$V_{m}(t) = \frac{A^{2}}{2} \left( \cos(\phi_{1} - \phi_{2}) \right)$$
(5.3)

When implementing the proposed time measurement architecture, it is necessary to keep the complexity, hence the area overhead to as small as possible, yet capable of achieving the required high timing resolution. This characteristic is a requisite if such time architectures are to be included onto the same silicon as the DUT. An additional requirement is that the time measurement architecture must also operate at low power supply. This has the benefit of being low power and compatible with modern CMOS technologies. As a result of these requisites, the next sections describe each of the three main components, Mixer, Filter and ADC. These components are analyzed and their advantages and disadvantage are described for each implementation.

### 5.4 Mixer design

The CMOS mixer circuit is an important non-linear analogue signal processing function that can be found in a wide variety of applications. Such as adaptive filtering, modulation, frequency translation, automatic gain controlling, neural network, etc. The purpose of the mixer is to convert a signal from one frequency band to another. In addition, if un-modulated signals with the same frequency are applied to the two inputs, the circuit behaves as a phase detector and produces an output with a DC component that is proportional to phase difference between the two input signals. CMOS mixer circuits can be implemented in a number of different way, for example single balanced [101], double balance [102] and dual-gate [103]. Each of these configurations is described in the following sections. Their advantages and disadvantages are analysis for suitability and implementation for the mixer circuit within the proposed homodyne time measurement architecture.

### 5.4.1 Single balanced

The single balanced mixer is the simplest approach that can be implemented and is shown in Figure 5-6.



Figure 5-6: Single balanced mixer [101]

The advantage of the single balanced mixer is that the mixer exhibits less input referred noise for a given power dissipation than a doubled balanced counterpart, described in section 5.4.2. The disadvantage of the single balanced mixer is the local oscillator (LO) to intermediate frequency (IF) feed-through, which can be a limiting factor. Transistors M2 and M3 operate as a differential pair and therefore amplify the LO signal. If the IF signal is not lower enough compared to the LO frequency, the low pass filter (LPF) following the multiplier may not adequately filter out the LO feed-through without attenuating the IF signal and therefore, desensitize the amplifier. Also this type of architecture is more acceptable to the noise in the LO signal, as compared to the double balanced mixer [101] which is described in the next section.

### **5.4.2 Double balanced (Gilbert multiplier)**

A configuration commonly used in RF mixers and analogue multipliers is the Gilbert Cell [102]. A circuit of the Gilbert multiplier cell is shown in Figure 5-7 [104]. As can

#### Homodyne Time-to-Digital Conversion

be seen it consists of seven devices and the operation of the circuit is as follows. It is assumed that all transistors are biased in the saturation region and obey square-law equations and that devices are sized and matched so that the transconductance parameters satisfy the equations K1=K2=K3=K4=Ka and K5=K6=Kb.



Figure 5-7: Double balanced Gilbert multiplier [104]

Defining the output currents

$$Io1 = -(I3 + I1) \tag{5.4}$$

and

$$Io2 = -(I2 + I4) \tag{5.5}$$

It can also be shown that the differential current Iod = Io2 - Io1 is given by

$$Iod = \sqrt{2Ka}Vx \left[ \sqrt{15}\sqrt{1 - \frac{KaVx^2}{215}} - \sqrt{16}\sqrt{1 - \frac{KaVx^2}{216}} \right]$$
(5.6)

If

$$\frac{KaVx^2}{2I5} << 1$$

and

$$\frac{KaVx^2}{2I6} << 1$$

then

$$Iod = \sqrt{2Ka} \left( \sqrt{15} - \sqrt{16} \right) Vx \tag{5.7}$$

As 15 and 16 are dependent on the input voltage Vy as shown by

$$Vy = \frac{1}{\sqrt{Kb}} \left( \sqrt{I5} - \sqrt{I6} \right) \tag{5.8}$$

Substituting equation (5-8) into equation (5-7) gives

$$Iod = \sqrt{2KaKb}VyVx \tag{5.9}$$

This is the typical characteristic of the analogue multiplier. The advantages of the double balanced mixer is that it is implemented using more transistors and generates less even order distortion than the single balanced mixer described in section 5.4.3. However, the disadvantages are that it is difficult to operate at low voltages due to the stacking of the transistors [105]. In addition, the generation of complementary signals can add to errors in the final measurement

### 5.4.3 Dual gate mixer [106]

The dual-gate mixer is a well used technique [103, 106-110]. The dual-gate structure has the advantage of isolated signal and local oscillator ports, which allows separate matching and provides inherent local oscillator (LO) to radio frequency (RF) input isolation. The simplicity of the single-ended dual gate mixer results in current savings and is often the choice in low power front-end designs [111]. The optimum bias point and the LO power required for a dual-gate mixer are chosen in order to maximise the conversion transconductance,  $g_c$ . This is defined as the ratio of the output intermediate frequency (IF) current to the input RF voltage, given by equation 5-10.

$$g_c = \frac{I_{IF}}{V_{RF}} \tag{5.10}$$

The maximum conversion tranconductance,  $g_c$ , occurs when an upper gate fed LO and a RF signal fed into the lower gate. The applied LO modulates the floating node voltage, which is the drain of the lower transistor. The conversion gain for a dual-gate mixer is achieved due to the change in drain-source voltage of the lower transistor, which modulates the lower transistor's transconductance  $g_m$ . A smaller frequency conversion path in the dual-gate mixer is present due to the modulated conductance gdsof the lower transistor.

The dual gate device can be implemented by using two cascade-connected single-gate transistor devices with equal widths [103]. This is in order to facilitate the easy of fabrication of each transistor at the circuit level which also gives freedom to adopt different device parameters such as gate width. For optimum performance, the dual gate mixer (see Figure 5-8) is selectively biased in such away that the lower transistor (M2) is operating as a transconductor while the upper transistor (M1) is acting as a switch. For mixing purposes, LO and RF signals are applied to the gate inputs of M1, M2, respectively.



Figure 5-8: Dual-Gate Mixer [110]

The advantage of this structure is that the LO and RF signals are inherently isolated and can be used to develop compact mixers with conversion gain [109]. Although, the potential of conversion gain is attractive; the downside is that they tend to have lower linearity then passive a design.

As a result of the advantages of the dual-gate mixer architecture, the design of the mixer within the time measurement architecture is based on the dual-gate mixer. Figure 5-9 shows the implementation of an analogue dual-gate cascode mixer for the proposed time measurement architecture and the operation is as follows. Transistors M3,M4 form a constant current source. Two inputs, *Vin* and *Vref*, are applied to the gates of transistors M1,M2 respectively and the resulting modulated output of the mixer is taken from the drains of transistors M2,M4. This cascode type mixer has the advantage of being low power, consumes a small amount of area and capable of operating at low supply voltages. The transistor sizes were chosen to achieve the required bias currents of 10µA and appropriate driving capabilities to the low pass filter of the time measurement architecture.



Figure 5-9: Dual-gate cascode mixer

### **5.4.4 Dual gate mixer simulations**

In order to show the operation of the dual-gate mixer circuit, Figure 5-10 shows the input and output simulations using the Cadence Analogue Design Environment (ADE). The top plot shows and input signal with an input frequency of 1 MHz and an amplitude of 60mV. The second plot shows an input signal with an input frequency of 4 MHz and amplitude of 60mV. The third plot shows the output of the dual-gate mixer which is the product of the two input signals.



Figure 5-10: Dual Gate Mixer Simulations

The conversion gain versus input frequency for the dual-gate mixer is shown in Figure 5-11.



Figure 5:11: Conversion Gain vs Input Frequency
### 5.5 Low pass filter design

The purpose of the filter is to filter out the product of the two fundamental frequency components of the input and the reference signals at the input to the time measurement architecture. Thus, leaving a phase component that is proportional to the phase. To realise the design of the Low Pass Filter (LPF) of the time measurement architecture, a  $2^{nd}$  order switched-capacitor (SC) bi-quad filter [112] with a cut-off frequency of 120kHz was used. A switched capacitor (SC) architecture was chosen as apposed to a passive, continuous time, gm or switched- current (SI). The components are easy to implement on silicon as apposed to the passive and continuous time implementations. There is no need for a tuning circuit unlike gm-C filters and have low sensitivity to temperature changes, better linearity as compared to switched-current architectures.

The design of the low pass filter is as follows. The transfer function, H(s), of a low pass filter is shown in equation 5.11.

$$H(s) = \frac{V_{out}(s)}{V_{in}(s)} = \frac{k_2 s^2 + k_1 s + k_0}{s^2 + \left(\frac{\omega_0}{Q}\right)s + {\omega_0}^2}$$
(5.11)

where  $a_0$  and Q are the pole frequency and pole Q, respectively, and  $k_0$ ,  $k_1$  and  $k_2$  are the arbitrary coefficients that place the bi-quads zeros. Multiplying through by the denominators and dividing by s<sup>2</sup>, equation 5.11 can be written as

$$V_{out}(s) = \left[-k_2 - \frac{k_1}{s}\right] V_{in}(s) - \frac{\omega_0}{Qs} V_{out}(s) - \frac{k_0}{s^2} V_{in}(s) - \frac{\omega_0^2}{s^2} V_{out}(s)$$
(5.12)

Equation 5.12 can be expressed as two integrator based equations as follows

$$V_{out}(s) = -\frac{1}{s} \left[ k_2 s V_{in}(s) - \omega_0 V_{c1}(s) \right]$$
(5.13)

Figure 5-12 show the signal flow graph (SFG) describing the preceding two equations



Figure 5-12: Signal flow graph representation of high Q SC LPF

Figure 5-13 shows the switched-capacitor implementation, which also incorporates switch sharing, minimising the amount of switches that are needed for the implementation [113].



Figure 5-13: A 2<sup>nd</sup> order low pass switched-capacitor filter with switch sharing.

Figure 5-13 was implemented into the Cadence design environment using the amplifier described in section 5.7.2. The switches are based on transmission gates and have a low on-resistance of 1k ohms, as shown in Figure 5-14. This is in order for the circuit to settle within half a clock period of the sampling clock.



Figure 5-14: On-resistance of transmission gate

The filter was simulated using SPECTRE models based on 0.12µm CMOS process. Figure 5-15 shows the LPF frequency response. As can be seen, the -3dB point is 120 KHz as required by the design.



Figure 5-15: LPF Frequency Response

Figure 5-16 shows the transient input and output waveforms of the LPF.



Figure 5-16: SC LPF input and output waveforms

Figure 5-17 shows the total output noise density of the mixer with the LPF. As can be seen at 1 kHz the noise density is at 360nV/ Hz and at 10 MHz the noise density falls to 53nV/ Hz.



Figure 5-17: Total output noise of mixer and LPF

### **5.6 Analogue-to-Digital Conversion**

Analogue to digital converter are not new and there is a vast amount of literature been written about them [92, 114]. The purpose of the ADC is to convert the DC voltage from the output of the filter to a digital output code. The following sections give a brief overview of the most common types of nyquist-rate ADC architectures.

### 5.6.1 Successive Approximation ADC

The successive approximation ADC [92] converts an analogue voltage to a digital output code. A block diagram of the successive approximation ADC is shown in Figure 5-18. The operation of the successive approximation register is as follows. At a sample time, the ADC sets the most significant bit (MSB) in the successive approximation register (SAR) to a logical "1". All the remaining bits are set to logical "0". This digital guess is converted back to an analogue value and is compared with the input voltage. If the input is at a higher voltage than the feedback analogue representation of the guessed value, (i.e.  $Vin > V_D$ ), the MSB is left set to a logical "1". On the other hand, if the input is at a lower potential voltage than the feedback analogue value, (i.e.  $Vin < V_D$ ), the MSB is reset to a logical "0".



Figure 5-18: Successive Approximation ADC [8]

This process is repeated for the second most significant bit. The MSB is left unchanged from the first approximation, the second MSB is set to a logical "1" and the reset of the bits are left in reset. This digital code is an improved guess and is converted into analogue code again and compared with the input voltage again using a comparator. If the analogue input voltage is at a higher voltage potential then the feedback value, the second MSB is left is set at a logical "1", otherwise it is reset to a logical "0". This process continues for each of the remaining lower order bits until all *N* bits of the converter have been examined. The value left in the SAR register represents the input voltage and this can either be outputted serially of parallel.

The advantages of successive approximation register ADC are that for N bits, only N comparisons are needed to be made, which results in high speed and low power dissipation. The disadvantages of the successive approximation ADC is that there are a lot of internal operations which must occur for each single sample. In the N-bit converter, N approximations and comparisons must be performed in each sample of the input voltage. Therefore, an N-bit successive approximation ADC running at a sampling frequency of,  $f_{s}$ , samples per second must run its internal circuitry at a rate of  $Nf_s$  operations per second. This has a limiting sampling rate which is determined by the sampling frequency,  $f_{s}$ , and the output digital code size, N.

### 5.6.2 Flash ADC

The Flash ADC generates all the output bits in one instance; the drawback is its complexity. The flash ADC distributes the sampling process across the entire circuit, as a result more circuitry is required. A 3-bit flash ADC is illustrated in Figure 5-19. For a N-bit flash ADC, the circuit requires  $2^N$  resistors,  $2^N$  comparators and digital encoding logic.



Figure 5-19: 3-bit Flash ADC

The advantages of the flash ADC is that the output is determined in one step and therefore has the capability of very fast operation. The flash ADC's internal circuitry operates at the sampling frequency. Therefore, it is normally used in fast sampling applications. In order to increase the output code word size, a flash ADC needs only to add more circuitry unlike the successive approximation ADC which is required to operate at a faster internal rate.

The disadvantages of the flash ADC is the increase in circuitry, as the number of comparators and resistors doubles for each additional bit of output. Furthermore, the increase in complexity of the thermometer-binary encoding logic is needed. In that way, the Flash ADC effectively trades circuit size for speed. The word length is determined by the size of the core, consequently the largest flash ADCs are typically 8-10 bits.

### 5.6.3 Dual-Slope ADC

The Dual-slope ADC is much simpler but much slower then the successive approximation ADC. Instead of using binary search like the successive approximation register, it uses a step search. It uses an integrator to ramp upwards for a fixed amount of time,  $T_{integration}$ , starting from the time it crosses a fixed threshold voltage.



Figure 5-20: Dual Slope ADC

The slope of integration is directly proportional to the analogue input voltage. Therefore, the larger the input voltage, the higher the integration voltage will be at the end of the fixed time period. Then the integrator is ramped downwards at a fixed slope until it reaches the threshold voltage again. The time it takes to discharge is directly proportional to the integrator's peak voltage, which in turn is proportional to the ADC input voltage. The time period  $T_{count}$  is measured by a digital counter, whose output represents the ADC conversion result, as illustrated in Figure 5-21.



Figure 5-21: N-bit dual slope ADC [8]

The conversion time of a dual-slope ADC is typically 100ms or more [8]. Therefore, using this technique is expensive for production testing, but by their nature, the dualslope ADC has excellent differential non-linearity (DNL) characteristics, as each code width is dependent on a smooth ramping analogue integrator rather than a binary weighted sum of components such as capacitors or resistors. However, the dual-slope ADC is susceptible to integral non-linearity (INL) errors which are dominated primarily by the linearity of the comparator and the linearity of the integrator's ramp.

The key to achieving femtosecond resolution using the proposed architecture is the implementation of a high resolution ADC. For this reason, the Delta-Sigma ( $\Delta\Sigma$ ) ADC was selected as opposed to nyquist rate ADCs because of its high resolution capabilities, area requirements and also the accuracy of the converter does not depend on precise component matching, precise sample-and-hold circuitry or trimming, like nyquist converters such as successive approximation and dual slope ADCs [115].

## 5.7 Delta Sigma ( $\Delta\Sigma$ ) ADC

Delta Sigma ( $\Delta\Sigma$ ) ADC's are typically used in high resolution low frequency applications. The advantages of  $\Delta\Sigma$  ADC is as follows. Firstly there are no precise requirements on analogue building blocks as the linearity of the ADC is not dependent on component matching. Also this may take advantage of the low cost, low power digital filtering. It relaxes the transition band requirements for analogue anti-aliasing filters. It reduces the baseband quantization noise power and most importantly trades speed for resolution. The  $\Delta\Sigma$  ADC depicted in Figure 5-22 consists of a  $\Delta\Sigma$  modulator and a decimation filter.



Figure 5-22: Block diagram of the  $\Delta\Sigma$  ADC

The purpose of the  $\Delta\Sigma$  Modulator is to convert the analogue input voltage in to a 1-bit pulse stream. The loop filter/integrator can be either switched capacitor or continuous time. Switched capacitor filters are easier to implement on silicon then continuous time and the frequency characteristics scale with the clock rate [112]. The purpose of the digital filter is to remove the out of band quantization noise and provides anti-aliasing to allow re-sampling at a lower sampling rate. There are numerous  $\Delta\Sigma$  ADC architecture and the choice usually involves trade-offs between resolution, circuit complexity and stability. Through extensive simulations, it was found that a 1<sup>st</sup>-order  $\Delta\Sigma$  ADC with an over-sampling ratio (OSR) of 32 is sufficient to achieve femtosecond resolution avoiding stability and complexity issues often associated with higher order converters. A block diagram of the 1<sup>st</sup>-order  $\Delta\Sigma$  modulator is shown in Figure 5-23. It consists of an integrator and a single bit quantizer.



Figure 5-23: Block diagram of the 1st Order  $\Delta\Sigma$  Modulator

The oversampling ratio (OSR) of the modulator is given by the following equation

$$OSR = \frac{f_s}{2f_B} \tag{5.14}$$

Where  $f_s$  is the sampling frequency and  $f_B$  is the input signal bandwidth. For this application the OSR is set to 32, in order to provide to appropriating noise shaping that is required to achieve the high resolution time measurement.



Figure .5-24: 1st Order ΔΣ Modulator using Simulink®

Figure 5-24 was implemented into MatLab and a Simulink® simulation result is shown in Figure 5-25.



**Figure 5-25: Simulink Simulations** 

The  $\Delta\Sigma$  modulator is based on a switched-capacitor implementation and is shown in Figure 5-26.



Figure 5-26:  $1^{st}$  order switched-capacitor  $\Delta\Sigma$  modulator

## 5.7.1 Non-overlap clocking scheme

The  $\Delta\Sigma$  modulator operates on a two phase clocking scheme, where both clock phases and delayed versions of the clock phases are generated to avoid signal dependant charge injection. The two phase clock generator is shown in Figure 5-27 and the clocking scheme is shown in Figure 5-28.



Figure 5-27: Non-overlapping clock generator



Figure 5-28: Non-overlapped clock timing

A simulation of the non-overlap clock generator using the SPECTRE simulator is shown in Figure 5-29.



Figure 5-29: Non-overlap clock simulation

## 5.7.2 Amplifier design

The amplifier employed in the integrator (see Figure 5.26) is a critical element within the  $\Delta\Sigma$  modulator. Any integrator leakage resulting from the finite DC gain of the

amplifier will reduce the modulator attenuation of the quantization noise at low frequencies. Although the modulator can tolerate non-ideal components, the required DC gain of the amplifier is chosen slightly higher than the oversampling ratio. Figure 5-30 shows the amplifier employed in the modulator. It is based on the folded cascode operational amplifier [92] and has a class AB output stage so that the output can swing close to the supply voltages. It also uses a summing circuit that consists of two current mirrors, M5,M6 and M9,M10 with cascades M7,M8 and M11,M12 respectively, and a floating current source M13,M15. The current generated by the floating current source M13,M15 flows through M11 and M7. At the source of M11, the bias current of the NMOS input pair is added, and the current is mirrored by M9,M10. At the source of M12, the bias current of the NMOS input pair is subtracted again. The current through M12,M8 and the class AB control transistors M14,M16 is constant and equal to the current set by the floating current source. By using this type of configuration, the biasing of the output stage is not affected by the common mode input voltage. The transistor sizes in the amplifier have been designed and optimised to achieve a DC gain of 60dB, a unity gain bandwidth of 110 MHz and a phase margin of 70 degrees. Where the unity gain bandwidth,  $\omega_l$ , is given by:-

$$\omega_1 = \frac{gm_1}{2\pi C_c} \tag{5.15}$$

where  $gm_1$  is the transconductance of the input stage and  $C_c$  is the miller capacitance. The second frequency pole is given by:-

$$\omega_{2} = \frac{gm_{2}}{C_{gs} + C_{L} + \frac{C_{gs}}{C_{c}}C_{L}}$$
(5.16)

where  $gm_2$  is the transconductance of the output stage,  $C_{gs}$  is the gate source voltage capacitance of the output stage including all parasitics connected to the gates, and  $C_L$  is the load capacitance including all parasitic capacitance connected to the output. The phase margin is given by:-

Homodyne Time-to-Digital Conversion

$$\varphi_m = a \tan\left(\frac{\omega_2}{\omega_1}\right) = \frac{gm_2}{gm_1} \frac{C_c}{C_{gs} + C_L + \frac{C_{gs}}{C_c}C_L}$$
(5.17)

In order to achieve a phase margin of 70 degrees, the second pole should be at least three times the unity gain frequency. Figure 5-30 shows the complete design of the amplifier with the optimised transistor values.



Figure 5-30: Folded cascode operational amplifier.

Figures 5-31 and 5-32 show the gain and phase response of the amplifier across process, voltage and temperature. As can be seen the DC gain varies from 43dB to 64dB over process, voltage and temperature (PVT) corners.



Figure 5-31: Amplifier gain across process corners



Figure 5-32: Amplifier Phase response across process corners

Figure 5-33 shows the step response of the amplifier across process, voltage and temperature (PVT) corners.



Figure 5-33: Amplifier step response

Table 5-1 shows the settling time of the amplifier across PVT corners, as can be seen the settling time varies form 16ns to 23ns.

| Process corner   | Setting time |
|------------------|--------------|
| tt, 27degC, 1.2V | 18ns         |
| ff, 125degC,1.2V | 16ns         |
| ff, 80degC, 1.2V | 17.8ns       |
| ss, -40degC,1.2V | 23ns         |

Table 5.1: Amplifier settling time across PVT

### 5.7.3 Comparator

The comparator used for the single bit quantizer within the  $\Delta\Sigma$  modulator (Figure 5-24) is shown in Figure 5-33. The comparator is the same as the one used in the programmable time measurement architecture (PTMA) described in Chapter 3, section 3.3. The reason for choosing this particular comparator is for its high performance.



Figure 5-34: High performance comparator

## **5.7.4 Integrator simulations**

The input and output waveforms of the integrator of the  $\Delta\Sigma$  modulator are shown in Figure 5-35 and Figure 5-36. The sampling clock period is 10ns, the input signal period is 160ns, and the output load capacitance *CLoad* equals 50fF. The simulations were performed over typical process corners, with temperature of 27degC and a supply voltage of 1.2V



Figure 5-35: Clock, Input and output waveforms of the integrator



Figure 5-36: Zoomed in version

## **5.7.5 Modulator Simulations**

The delta modulated output signal of the  $\Delta\Sigma$  modulator is shown in Figure 5-37



Figure 5-37: Delta modulated output of the  $\Delta\Sigma$  modulator





Figure 5-38: Frequency spectrum of the  $\Delta\Sigma$  modulator

### 5.7.6 Decimation Filter [116]

The output of the 1<sup>st</sup> order modulator contains very little quantization noise at low frequencies and the power spectral density (PSD) of the output noise grows rapidly with increasing frequencies. Hence, the signal band limit,  $f_b$ , must be a lot smaller than half the sampling frequency,  $f_s/2$ , and if the modulator is to be used as an ADC, the out of band noise must be removed by a digital low pass filter (LPF). Afterwards, the LPF's output signal may be decimated, thereby reducing the sampling rate to the nyquist sampling frequency  $2f_b$ . The requirements of the LPF are that the gain response should be flat and large over the signal band from zero to  $f_b$ , and very small between  $f_b$  and  $f_s/2$ . Often, it is also desirable to have a flat group delay response in the signal band.

There are two classes of digital filters and can be determined by their impulse response [117]. They are Finite Impulse Response (FIR) and Infinite Impulse Response (IIR). The main differences between the two types of filters are that the FIR filters have a unit sample response of finite length and therefore a system function which is a polynomial in  $Z^{-1}$ . All the poles of the system function are at the origin and therefore do

not effect the shape of the response. IIR filters have a unit sample response that is of infinite duration. FIR filters are less sensitive to rounding errors in the coefficients and computation. They are incapable of becoming unstable, whereas, IIR filters can produce oscillations as a result of non-linearity caused by overloading or quantization errors [116, 117].

The advantages of using a FIR topology as apposed to an IIR filter are as follows [116, 117]. They can easily be designed for "linear phase". Linear-phase filters would delay the input signal, but do not distort its phase. They are relatively easy to implement. On most DSP microprocessors, the FIR calculation can be done by looping a single instruction. They are suited to multi-rate applications; either "decimation" (reducing the sampling rate), "interpolation" (increasing the sampling rate), or both. Whether decimating or interpolating, the use of FIR filters allows some of the calculations to be omitted, thus providing an important computational efficiency. In contrast, if IIR filters are used, each output must be individually calculated, even though the output may be discarded, so the feedback will be incorporated into the filter. They have desirable numeric properties. In practice, all DSP filters must be implemented using "finite-precision" arithmetic that is a limited number of bits. The use of finite-precision arithmetic in IIR filters can cause significant problems due to the use of feedback, but FIR filters have no feedback and they can usually be implemented using fewer bits. The FIR filter can be implemented using fractional arithmetic. Unlike IIR filters, it is always possible to implement a FIR filter using coefficients with magnitude of less than 1 and the overall gain of the FIR filter can be adjusted at its output, if desired. This is an important consideration when using fixed-point DSP's, as it makes the implementation easier.

The disadvantages of using an Impulse Response (IIR) filter are as follows. They are more susceptible to problems of finite-length arithmetic, such as noise generated by calculations, and limited cycles. This is a direct consequence of feedback: when the output is not computed perfectly and is fed back, the imperfection can compound. It is more difficult to implement the IIR using fixed-point arithmetic. They do not offer the computational advantages of FIR filters for multi-rate (decimation and interpolation) applications.

This latter condition can be satisfied by using a linear phase finite impulse response (FIR) LPF. Since the output signal of the modulator is a single bit stream, it may be practical to use a single-stage high order linear-phase FIR filter, since there are no actual multiplications required between the samples of the signal and the weights of the taps. In general, however, it is usually more efficient and economical to carry out the filtering and decimation in stages. The most commonly used stage for the filter is a sinc filter. A sinc filter is an FIR Filter with N-1 delays and N equal-valued tap weights, which computes a running average of the input data stream. The output of the sinc filter can be represented by:

$$y(n) = \frac{1}{N} \sum_{i=0}^{N-1} x(n-i)$$
(5.18)

Its impulse response is given by

$$h_1(n) = \begin{cases} 1/N & \text{, if } (0 \le n \le N - 1) \\ 0 & \text{, otherwise} \end{cases}$$
(5.19)

The z-domain transfer function is given by:

$$H_1(z) = \frac{1}{N} \frac{1 - z^{-N}}{1 - z^{-1}}$$
(5.20)

And thus its frequency response is

$$H_1(e^{j2\pi f}) = \frac{\sin c(Nf)}{\sin c(f)}$$
(5.21)

Where sinc(*f*) is defined as  $(\sin(\pi f))/(\pi f)$ , hence the name of this filter.

An important advantage of the sinc filter is that for single-bit modulators it can be realized very economically, using a counter that is incremented for each +1 from the modulator. Once in every *N* clock cycles, the output of the counter is clocked into a register, and the counter is reset. Thus the output y(n) is a down sampled count of +1s produced by the modulator over the last *N* clock cycles.

The Decimation filter of the  $\Delta\Sigma$  ADC that follows the  $\Delta\Sigma$  modulator is shown in Figure 5-39. The filter performs both digital filtering and down-sampling of the single bit input data stream form the modulator. The architecture of the decimation filter

consists of a counter, a clock divider and a register, making it suitable for time measurement architectures as it is small and easy to integrate. The divide-by ratio of the clock divider is set equal to the oversampling ratio (OSR) of the  $\Delta\Sigma$  modulator, in this case the OSR is set to 32.



**Figure 5-39: Decimation Filter** 

A MATLAB® implementation of the 16 tap low pass digital FIR filter is shown in Figure 5-40.



**Figure 5-40: Filter output response** 

### **5.7.7 Simulation results**

In order to verify the performance of the proposed timing measurement architecture based on the time-to-digital conversion method using the homodyne technique shown in Figure 5-41.



Figure 5-41: Proposed time measurement architecture

The circuit level designs, considered in section 5.3, were implemented using a 0.12µm CMOS process. Using the SPECTRE® simulator together with foundry transistor models, the relationship between timing resolution and the output of the LPF was simulated. Jitter was applied to the reference clock signal using a divide-by-2 module. Therefore, the reference frequency was set initially to twice the frequency of the input clock signal. The divide-by-2 module models a jitter of 1% of the clock frequency on the output signal. The verilog code for the module is shown in Appendix D. Figure 5-42 shows the output of the LPF when the input phase delay of the two clock signals was varied from 0 to 500fs. As it can be seen, there is a linear relationship between the voltage output and the phase difference of the input as expected.



Figure 5-42: Simulated relationship between timing resolution and the output of the LPF

To further demonstrate the correct operation of the proposed time measurement architecture, Figure 5-43 shows a simulation of two clock inputs (*Clock\_in* and *Clock\_ref*) with a phase difference ( $\Delta \Phi$ ) of 40fs (top plots), the mixer and filter outputs are shown in the third and forth plot respectively and the output from the  $\Delta\Sigma$  modulator is shown in the fifth plot.



Figure 5-43: Input and output waveforms of the proposed architecture.

| Measurement type  | Input difference [fs] | Count |
|-------------------|-----------------------|-------|
| Propagation delay | 42                    | 4     |
| Propagation delay | 85                    | 7     |
| Propagation delay | 110                   | 9     |
| Propagation delay | 200                   | 15    |
| Propagation delay | 300                   | 21    |
| Propagation delay | 400                   | 27    |
| Propagation delay | 500                   | 33    |
| Propagation delay | 600                   | 39    |
| Propagation delay | 700                   | 45    |
| Propagation delay | 800                   | 51    |
| Propagation delay | 1000                  | 65    |

Table 5-2 shows the simulation results of the time measurement architecture with various input time differences representing a propagation delay type measurement.

**Table 5-2: Propagation delay results** 

Figure 5-44 shows the plot of f the propagation time measurement against the digital output count.



Figure 5-44: propagation delay versus output count

Table 5-3 shows the tolerance on the count on a propagation delay time measurement of 42fs. As can be seen the count does not change that much over process, voltage and temperature as expected. This is because the accuracy of the  $\Delta\Sigma$  ADC converter does not depend on component matching, precise sample and hold circuitry or trimming and only has a small amount of analogue circuitry that is affected by process variations. Therefore, most of the variation is caused by the earlier stages, such as the mixer and the low pass filter circuits that create the DC voltage prior to the  $\Delta\Sigma$  ADC converter.

| Process corner   | Propagation delay [fs] | Count |
|------------------|------------------------|-------|
| tt, 27degC, 1.2V | 42                     | 4     |
| ff, 125degC,1.2V | 42                     | 5     |
| ff, 80degC, 1.2V | 42                     | 5     |
| ss, -40degC,1.2V | 42                     | 3     |

Table 5-3: Propagation delay tolerance results

The following plot shown in Figure 5-45 shows the current consumption of the homodyne time measurement architecture. The top plots show the current consumption of the digital supplies and the bottom plot shows the current consumption of the analogue supplies.



Figure 5-45: Time measurement architecture current consumption

### Homodyne Time-to-Digital Conversion

To give insight into the silicon area, careful layout techniques, such as common centroid layout of critical transistors (e.g. input transistors of opamp and comparator circuits), separated analogue and digital layouts were used to reduce offsets and switching noise that would affect the overall performance of the time measurement architecture. In Figure 5-46 the time measurement core shows a silicon area of 160µm by 155µm, making it attractive for high resolution on-chip time measurement.



Figure 5-46: Layout of time measurement core

## 5.8 Concluding remarks

This chapter has presented a new high resolution time measurement architecture. Un like previous approaches for time-to-digital conversion, it has been shown that such a technique is effective in achieving high resolution of tens of femtoseconds which is needed for high performance VLSI devices operating with clock frequencies of tens of

gigahertz. This has been made possible through appropriate selection of mixer, filter and data conversion techniques. Simulations using SPECTRE models based on the  $0.12\mu m$  CMOS process show that measurements are capable with a resolution of 42fs which is the highest reported-to-date.

# Chapter 6

# **Conclusions and Further Research Directions**

### **6.1 Conclusions**

This thesis has identified the existing problems using current external automatic test equipment (ATE) for on-chip timing measurements. These problems have been brought about by technology scaling increasing to tens of GHz over the last decade. Current ATE tends to use the same technology and hence lags behind the leading edge developments. Finding a solution that is on-chip is becoming crucial in reducing the electrical distance between the tester and the embedded core under test. Although recent research has produced some solutions using time amplifiers, vernier delay lines and other time-to-digital converter method. In this work, a different approach was pursued, focusing on the issue of multiple time measurements using a single high resolution embedded test core that not only reduces the electrical distance but also reduces the number of test resources available on-chip. In this thesis, two on-chip time measurement architectures have been proposed and an analysis of the contributions presented is given as follows.

In Chapter 3, a programmable time measurement architecture has been proposed to address multiple time measurements using a single test core. The proposed architecture is based on the time-to-digital (TDC) method using the dual-slope technique and does not require additional or duplication of circuitry, which is often the case for other architectures. A key feature of the proposed architecture is its ability to be programmable. The architecture is capable of making four different types of measurements. They are rise and fall times, pulse width and propagation type time measurements. Simulations based on a 1.2V CMOS process show that it is possible to obtain these multiple measurements with high resolution.

Chapter 4 presents a detailed implementation and verification of a prototype chip of the programmable time measurement architecture fabricated using a 1.2V CMOS process. The high performance of the time measurement architecture is achieved by the use of careful consideration of the use of mixed-signal layout techniques, such as minimisation of noise and mismatch of devices. This was achieved through the use of common centriod layout techniques and shielding of carefully selected components.

The advantage of this programmability and flexibility is in the ability to perform different types of on-chip time measurements and the potential to reduce the overall time cost of chip testing. Another advantage of this architecture is that it can be easily automated. Automation of on-chip time measurement testing is possible using the proposed PTMA using either an on-chip embedded microcontroller or an off-chip programmable device such as a field programmable gate array (FPGA).

Realising the potential of the fabricated programmable time measurement architecture, chapter 5, has proposed a novel time measurement architecture that is capable of tens of femtosecond timing measurements. This was achieved using the TDC method using the homodyne technique. This technique uses a different approach to the problem of sub picoseconds by means of an analogue/RF solution. It has been shown that by using a TDC incorporating a frequency domain method, higher resolutions are capable compared with time domain methods.

With regards to the practicality of these two time measurement architectures, the number of time measurement architectures required on a chip, will be determined on how large the chip is and how much is to be tested. With small designs only one or two time measurement blocks may be required and the connections can be easily multiplex. However, in larger SoC devices where the CUTs are placed far apart from one side of the chip to the other, then there may be more time measurement blocks required, simply to minimise the parasites from the CUT to the time measurement block.

Although setup and hold times on register files are important time measurements on digital circuitry, often rise and fall time specifications are important in analogue and mixed-signal circuits. Therefore, the Programmable Time Measurement Architecture (PTMA) proposed in Chapter 3 is more suited to analogue and mixed signal SoC devices for signal conditioning and sensor applications. While the Homodyne TDC is more suited for digital applications where setup and hold times on register files are more important.

If only a few time measurement blocks are included within a design, in order to guarantee the length of signal paths to the measurement block, if it is required to be multiplexed, is to ensure that all the tracks that are connected from the CUT and the measurement block are wide and short as possible and to guaranteed not to use minimum width metal tracks. This is in order to again minimise the parasitics from the CUT to the time measurement block.

Furthermore both of these time measurement architectures are analogue in nature. Therefore, as device dimensions are scaled down they are susceptible to process variations. Although the designs of both the proposed time measurement blocks have used known design and layout techniques to minimise process variations

In conclusion, this thesis has presented two detailed designs and implementations for on-chip time measurement testing. The two architectures proposed are a significant contribution to the development of on-chip time measurement architectures and increase its potential for testing of current and future CMOS systemon-chip devices.

### **6.2 Further research directions**

Based on the work in this thesis, the following relevant future research directions have been identified and are briefly outlined as follows:

#### Higher resolution single shot time measurement architectures:

It has been shown in chapter 5 that a high resolution time measurement architecture with tens of femtosecond timing resolution is capable, but as on-chip clock frequencies continue to increase higher resolutions may also be required.

### Parallel time measurement architectures:

One of the limitations of current time measurement architectures is that they can only perform time measurements sequentially. The advantage of performing time measurements in parallel could reduce the overall test time even further. Currently, time measurement architectures can be duplicated and placed on a single piece of silicon at a required location near to the core under test (CUT). However, placing numerous architectures on the same test node increases the parasitics on that node and thereby increases the problem of accuracy. Therefore, research into time measurement architectures that can perform time measurements in parallel would be of great value.

### Period to non periodic signal generation:

Currently, only period signals can be applied to the time measurement architecture based on the homodyne time-to-digital conversion (HTDC) technique described in chapter 5. Therefore, research into techniques for rendering a periodic signal to a nonperiodic signal will allow this architecture to make other types of time measurements, such as, rise and fall time measurements that are needed for on-chip time measurement tests.

### On-chip calibration for the homodyne time to digital conversion (HTDC) technique:

A feature of the HTDC method is that it has a small layout foot print (160µm by 155µm) but an on-chip calibration circuit is still required. Therefore, depending on the size of the calibration circuit the size of the overall time measurement architecture will increase and this could be at least half the size of the time measurement core. Therefore, research into on-chip calibration schemes and incorporating it into the Homodyne Time to Digital Converter (HTDC) method without reducing the overall performance is required.

### Incorporation in the IEEE 1500 standard:

The IEEE 1500 standard is a core level solution and was developed to facilitate test integration and test reuse. Its purpose is to provide a uniform interface between the embedded cores and the test access circuitry. A brief overview of the IEEE 1500 standard is described in Appendix A. Addition research in to how and where time measurement architectures, such as PTMA and the HTDC proposed in chapter 3 and chapter 5 respectively, are integrated into this standard is required.

## **Appendix A**

## **IEEE Standard 1500**

To facilitate SoC testing a new standard has been recently approved and is known as the IEEE 1500 standard [13, 118, 119]. The purpose of this standard is to provide a uniform interface between the embedded cores and the test access circuitry which is similar to the Joint Test Action Group (JTAG) mechanism associated system on board level testing [8]. The IEEE 1500 standard is a core level solution and was developed to facilitate test integration and test reuse. The IEEE standard 1500 defines a collection of wrapper interface ports that fall into two categories. Wrapper serial ports, wrapper serial input (WSI) and wrapper serial output (WSO) are used for serial access and wrapper parallel ports, wrapper parallel input (WPI) and wrapper parallel output (WPO), for parallel access. Figure 1-9 shows the IEEE standard 1500 wrapper architecture.



Figure A-1: IEEE Standard 1500 wrapper architecture [120]

### Appendix A

The IEEE standard 1500 wrapper architecture consists of an instruction register known as the wrapper instruction register (WIR). The wrapper instruction register (WIR) configures the 1500 wrapper into different modes of operation determined by an instruction shifted into the WIR register. A wrapper boundary register (WBR) provides access to the core terminals where data can be shifted, captured, updated and transferred. A wrapper bypass register (WBY) is provided as a bypass in serial mode. It is intended for use when several IEEE 1500 wrappers are chained together, thus providing a minimum length scan path through the wrapper.

# **Appendix B**

# **Switch Control Block**

The following verling RTL code for the switch control block within the programmable input block (PIB) of the Programmable time measurement architecture (PTMA) is shown below. This code was used to simulate a mixed-signal simulation of the PIB module.

```
// Verilog HDL for "matt_scratch", "CTL_BLOCK5_s" "functional"
module CTL_BLOCK5_s( dvdd, dvss, mode1, mode0, cmp_in, sw, start);
input dvdd, dvss, model, mode0, cmp_in, start;
output [6:0] sw;
reg [6:0] sw_reg;
initial
begin
sw_reg= 7'b0000000;
end
always@(mode1 or mode0 or cmp_in)
begin
case ( {model, mode0, cmp_in} )
3'b000: // rise time
begin
            if (cmp_in == 0)
                sw_reg = 7'b0001001;
              else
               sw_reg = 7'b0000110;
            end
```
#### Appendix B

```
3'b101: // Fall Time
     begin
        if (cmp_in == 0 )
         sw_reg = 7'b0000110;
        else
         sw_reg = 7'b0001001;
     end
     3'b110: // Pulse Width
     begin
       sw_reg = 7'b1000001;
     end
     3'bll1: // Propagation
     begin
       sw_reg = 7'b1000001;
       if (cmp_in == 1)
        sw_reg = 7'b0110000;
       end
     default: sw_reg = 7'b0000000;
  endcase
end
```

assign sw[6:0] = sw\_reg[6:0];

endmodule

# **Appendix C**

# Test chip Microchip PIC 18F2030 test program

In order to validate the fabricated test chip for the programmable time measurement architecture (PTMA), described in chapters 3 and 4, the Microchip PIC 18F2030 device is used for programming the four different modes of the test chip. The following shows the program used for the PIC microcontroller. The program consists of a number of subroutines and a main program. There are four subroutines for each of the four programmable modes of PTMA. Each subroutine sets up the signals for the operation of the device within the required mode. The main program sets up the input and output ports of the device and configures the test chip for a particular mode and then calls the subroutine for the required test mode of the test chip. Currently, the program is set for a rise time measurement and therefore the main program calls the subroutine *MODE1*, in order to configure the device for a different programming mode, for example a propagation delay, the main program should be changed to call subroutine *MODE4*. For example:

CALL MODE4

title "PIC18F2030 MC01TDC PROGRAM" ; : Matthew Collins

#### Appendix C

; Date : 22/03/06 REV: 1 ; For PIC18FXXX ; Function ; -----; Program MC01TDC Module list p=18f452 ; Include file, change directory if needed #include <p18f452.inc> ; Start at the reset vector Reset\_Vector code 0x000 ; Start application beyond vector area 0x002a code goto init ;\*\*\*\*\*\* THE SUBROUTINES START HERE \*\*\*\*\*\*\* init ; the Program starts here

;\*\*\*\*\*\* MEMORY EQUATES \*\*\*\*\*\*\* ; You define names and allocation as required by the program

goto start

#### MODE1 ; Rise Time Measurement

|         | bcf<br>bcf | PORTB,RB3 ;set PORTB BIT3 High (MODE0)<br>PORTB,RB4 ;set PORTB BIT4 High (MODE1) |
|---------|------------|----------------------------------------------------------------------------------|
|         | bsf        | PORTB,RB5 ;set PORTB BIT5 High (START)                                           |
| ; Vin_1 |            |                                                                                  |
|         | bcf        | PORTB,RB0 ;set PORTB BIT6 High                                                   |
|         | bcf        | PORTB,RB0 ;set PORTB BIT6 Low                                                    |
|         | bcf        | PORTB,RB0 ;set PORTB BIT6 Low                                                    |
|         | bcf        | PORTB,RB0 ;set PORTB BIT6 Low                                                    |
|         | bsf        | PORTB,RB0 ;set PORTB BIT6 High                                                   |
|         | bsf        | PORTB,RB0 ;set PORTB BIT6 High                                                   |
|         | bsf        | PORTB,RB0 ;set PORTB BIT6 High                                                   |
|         |            |                                                                                  |

#### MODE2 ; Fall Time Measurement

| bcf | PORTB,RB3 ;set PORTB BIT3 High (MODE0) |
|-----|----------------------------------------|
| bsf | PORTB,RB4 ;set PORTB BIT4 High (MODE1) |
| bsf | PORTB,RB5 ;set PORTB BIT5 High (START) |
| bsf | PORTB,RB0 ;set PORTB BIT6 High         |
| bsf | PORTB,RB0 ;set PORTB BIT6 High         |
| bsf | PORTB,RB0 ;set PORTB BIT6 High         |
| bcf | PORTB,RB0 ;set PORTB BIT6 Low          |
| bcf | PORTB,RB0 ;set PORTB BIT6 Low          |
| bcf | PORTB,RB0 ;set PORTB BIT6 Low          |
|     |                                        |

#### MODE3 ; Pulse Width Measurement

| 1 | h | s | f |
|---|---|---|---|
|   |   | ~ | - |

#### PORTB,RB3 ;set PORTB BIT3 High (MODE0)

| bcf                                                                        | PORTB,RB4 ;set PORTB BIT4 High (MODE1)                                                                                                                                                                                                                                                                                                |
|----------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| bsf                                                                        | PORTB,RB5 ;set PORTB BIT5 High (START)                                                                                                                                                                                                                                                                                                |
| · Vin 1                                                                    |                                                                                                                                                                                                                                                                                                                                       |
| , , , m_1<br>bcf                                                           | PORTB RB0 set PORTB BIT6 High                                                                                                                                                                                                                                                                                                         |
| bcf                                                                        | PORTB.RB0 :set PORTB BIT6 Low                                                                                                                                                                                                                                                                                                         |
| bcf                                                                        | PORTB.RB0 ;set PORTB BIT6 Low                                                                                                                                                                                                                                                                                                         |
| bcf                                                                        | PORTB.RB0 :set PORTB BIT6 Low                                                                                                                                                                                                                                                                                                         |
| bsf                                                                        | PORTB, RB0 ;set PORTB BIT6 High                                                                                                                                                                                                                                                                                                       |
| bsf                                                                        | PORTB, RB0 ;set PORTB BIT6 High                                                                                                                                                                                                                                                                                                       |
| bsf                                                                        | PORTB, RB0 ;set PORTB BIT6 High                                                                                                                                                                                                                                                                                                       |
| bcf                                                                        | PORTB,RB0 ;set PORTB BIT6 Low                                                                                                                                                                                                                                                                                                         |
| bcf                                                                        | PORTB,RB0 ;set PORTB BIT6 Low                                                                                                                                                                                                                                                                                                         |
| bcf                                                                        | PORTB,RB0 ;set PORTB BIT6 Low                                                                                                                                                                                                                                                                                                         |
| bcf                                                                        | PORTB,RB5 ;set PORTB BIT5 Low (START)                                                                                                                                                                                                                                                                                                 |
| MODE4 ; Propagation I                                                      | Delay Measurement                                                                                                                                                                                                                                                                                                                     |
| bsf                                                                        | PORTB RB3 :set PORTB BIT3 High (MODE0)                                                                                                                                                                                                                                                                                                |
| 001                                                                        | I OITID, ICD , BOTI OITID DITS INGH (INODDO)                                                                                                                                                                                                                                                                                          |
| bsf                                                                        | PORTB,RB4 ;set PORTB BIT4 High (MODE1)                                                                                                                                                                                                                                                                                                |
| bsf<br>bsf                                                                 | PORTB,RB4 ;set PORTB BIT4 High (MODE1)<br>PORTB,RB5 ;set PORTB BIT5 High (START)                                                                                                                                                                                                                                                      |
| bsf<br>bsf<br>; Vin_1                                                      | PORTB,RB4 ;set PORTB BIT4 High (MODE1)<br>PORTB,RB5 ;set PORTB BIT5 High (START)                                                                                                                                                                                                                                                      |
| bsf<br>; Vin_1<br>bcf                                                      | PORTB,RB4 ;set PORTB BIT4 High (MODE1)<br>PORTB,RB5 ;set PORTB BIT5 High (START)<br>PORTB,RB0 ;set PORTB BIT6 High                                                                                                                                                                                                                    |
| bsf<br>bsf<br>; Vin_1<br>bcf<br>bcf                                        | PORTB,RB4 ;set PORTB BIT4 High (MODE1)<br>PORTB,RB5 ;set PORTB BIT5 High (START)<br>PORTB,RB0 ;set PORTB BIT6 High<br>PORTB,RB0 ;set PORTB BIT6 High                                                                                                                                                                                  |
| bsf<br>bsf<br>; Vin_1<br>bcf<br>bcf<br>bcf                                 | PORTB,RB4 ;set PORTB BIT4 High (MODE1)<br>PORTB,RB5 ;set PORTB BIT5 High (START)<br>PORTB,RB0 ;set PORTB BIT6 High<br>PORTB,RB0 ;set PORTB BIT6 High<br>PORTB,RB0 ;set PORTB BIT6 High                                                                                                                                                |
| bsf<br>bsf<br>; Vin_1<br>bcf<br>bcf<br>bcf<br>bsf                          | <ul> <li>PORTB,RB4 ;set PORTB BIT4 High (MODE1)</li> <li>PORTB,RB5 ;set PORTB BIT5 High (START)</li> <li>PORTB,RB0 ;set PORTB BIT6 High</li> </ul>                                                            |
| bsf<br>bsf<br>; Vin_1<br>bcf<br>bcf<br>bsf<br>; Vin 2                      | PORTB,RB4 ;set PORTB BIT4 High (MODE1)<br>PORTB,RB5 ;set PORTB BIT5 High (START)<br>PORTB,RB0 ;set PORTB BIT6 High<br>PORTB,RB0 ;set PORTB BIT6 High<br>PORTB,RB0 ;set PORTB BIT6 High<br>PORTB,RB0 ;set PORTB BIT6 High                                                                                                              |
| bsf<br>bsf<br>; Vin_1<br>bcf<br>bcf<br>bcf<br>bsf<br>; Vin_2<br>bcf        | <ul> <li>PORTB,RB4 ;set PORTB BIT4 High (MODE1)</li> <li>PORTB,RB5 ;set PORTB BIT5 High (START)</li> <li>PORTB,RB0 ;set PORTB BIT6 High</li> <li>PORTB,RB1 ;set PORTB BIT6 Low</li> </ul>                     |
| bsf<br>bsf<br>; Vin_1<br>bcf<br>bcf<br>bcf<br>bsf<br>; Vin_2<br>bcf<br>bsf | <ul> <li>PORTB,RB4 ;set PORTB BIT4 High (MODE1)</li> <li>PORTB,RB5 ;set PORTB BIT5 High (START)</li> <li>PORTB,RB0 ;set PORTB BIT6 High<br/>PORTB,RB0 ;set PORTB BIT6 High<br/>PORTB,RB0 ;set PORTB BIT6 High<br/>PORTB,RB0 ;set PORTB BIT6 High</li> <li>PORTB,RB1 ;set PORTB BIT6 Low<br/>PORTB,RB1 ;set PORTB BIT7 High</li> </ul> |

#### ;\*\*\*\*\*\* MAIN PROGRAM \*\*\*\*\*\*\*

| start |
|-------|
|       |

| clrf | PORTB ; Clear PORTB       |
|------|---------------------------|
| clrf | TRISB ; PORTB all outputs |
| clrf | PORTA ; Clear PORTA       |
| clrf | TRISA ; PORTA all outputs |
|      |                           |

| PWRUP   | bcf  | PORTB,RB0 ;set PORTB BIT0 High (PWRUP)   |
|---------|------|------------------------------------------|
| BGAP_EN | bcf  | PORTB,RB1 ;set PORTB BIT1 High (BGAP_EN) |
| N0_MEM  | bsf  | PORTB,RB2 ;set PORTB BIT2 High (No_MEM)  |
|         | CALL | MODE1                                    |

end  $\;$  ; and ends here - this must appear on the last line of the program so the assembler knows where to stop

# **Appendix D**

### **Divide-by-2 with Jitter**

The following Verilog<sup>®</sup> code describes a divide-by-2 frequency divider that is based on the frequency divider from www.designers-guide.org. The divide-by-2 module was used to model jitter on the input clocks to the mixer and LPF in the HTDC and the result of the output simulation can be seen in section 5.7.7.

```
// Verilog HDL for "matt_scratch", "divideby2" "functional"
module dividerby2 (clk_out, clk_in);
output out;
electrical out;
input in;
electrical in;
parameter real vh = 1.2; // output voltage high
parameter real vl = 0.0; // output voltage low
parameter real vth = (vh + vl)/2; // threshold voltage
parameter real tt = 1p from (0:inf); // transition time
parameter real tdel = 1p from (0:inf); // delay from input to output
parameter real jitter = 5f from [0:tdel/5); // edge-to-edge jitter
integer count;
integer n;
integer seed;
real delta_t;
analog begin
   @(initial_step) seed = -311;
   @(cross(V(in) - vth, +1 )) begin
```

#### Appendix D

```
count = count + 1;
if (count >= 2)
    count = 0;
n = (2*count >= 2);
    delta_t = jitter*$rdist_normal( seed, 0 ,1);
end
V(out) <+ transition(n ? vh : vl, tdel + delta_t, tt);</pre>
```

end

endmodule

# **Appendix E**

### **Publications**

During the course of this research, a number of papers have been published for publication, in [22-24] and these are listed below.

- M. Collins, B. M. Al-Hashimi, and N.Ross, "A Programmable Time Measurement Architecture for Embedded Memory Characterization", *Proceedings of the 10<sup>th</sup> IEEE European Test Symposium (ETS'05)*, pp. 128-133, Tallinn, Estonia, May 2005
- M. Collins, B. M. Al-Hashimi and P. R. Wilson, "**On-chip timing** measurement architecture with femtosecond resolution", *IEE Electronics Letters*, vol. 42, issue 9, pp. 582-583, 27th April 2006
- M. Collins and B. M. Al-Hashimi, "**On-Chip Time Measurement Architecture with Femtosecond Timing Resolution**", *Proceedings of the 11th IEEE European Test Symposium (ETS'06)*, pp. 103-108, Southampton, UK, May 2006

## References

- [1] "International Technology Roadmap for Semiconductors," http://www.itrs.net/Common/2005ITRS/Home2005.htm, 2005
- [2] R. Saleh, S. Wilton, S. Mirabbasi, A. Hu, M. Greenstreet, G. Lemieux, P. P. Pande, C. Grecu, and A. Ivanov, "System-on-Chip: Resuse and Integration," *Proceedings of the IEEE*, vol. 94, no. 6, pp. 1050-1069, 2006.
- [3] Y. Zorian, "Testing the monster chip," *IEEE Spectrum*, vol. 36, no. 7, pp. 54-60, 1999.
- [4] Y. Zorian, "Testing semiconductor chips: trends and solutions," *Proceedings of the XII Symp. on Integrated Circuits and Systems Design*, pp. 226-233, 1999.
- [5] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective: Second Edition: Addison Wesley, 1993, ISBN 0-201-53376-6.
- [6] M. Abramovici, M. A. Breuer, and A. D. Friedman, *Digital System Testing and Testable Design*: IEEE Press, 1990, ISBN: 0-7803-1093-4.
- [7] P. T. Gonciari, "Low-cost test for core-based system-on-chip," PhD thesis, University of Southampton, Southampton, UK, December 2002.
- [8] M. Burns and G. W. Roberts, *An Introduction to Mixed-Signal IC Test and Measurement*: Oxford University Press, 2001, 0-19-514016-8.
- [9] "Verigy V93000 SOC tester," http://www.verigy.com/portal/page/portal/Products%20Applications/V93000% 20SOC%20Series/Products
- [10] P. M. Levine and G. W. Roberts, "High-resolution flash time-to-digital conversion and calibration for system-on-chip," *IEE Proceedings on Computers and Digital Techniques*, vol. 152, no. 3, pp. 415-426, 2005.
- [11] R. J. Baker, *CMOS: Circuit Design, Layout, and Simulation, 2nd Edition:* Wiley-IEEE Press, 2005, ISBN 0-471-70055-X.
- [12] P. M. Levine and G. W. Roberts, "A High-Resolution Flash Time-to-Digital Converter and Calibration Scheme," *IEEE International Test Conference (ITC)*, pp. 1148-1157, 2004.

- [13] Y. Zorian, E. J. Marinissen, and S. Dey, "Testing embedded-core-based system chips," *IEEE Computer*, vol. 32, no. 6, pp. 52-60, 1999.
- [14] J. Rajski, "DFT for High-Quality Low Cost Manufacturing Test," *Proceedings* of the IEEE Asian Test Symposium (ATS'01), pp. 3-8, 2001.
- [15] B. Bottoms, "The third millennium's test dilemma," *IEEE Design and Test of Computers*, vol. 15, no., pp. 7-11, 1998.
- [16] A. Khoche and J. Rivoir, "I/O Bandwidth Bottleneck for Test: Is it Real?," *Proceedings of the IEEE Test Resource Partitioning Workshop*, pp. 2.3-1-2.3-6, 2000.
- [17] X. Du, S. M. Reddy, D. E. Ross, W.-T. Cheng, and J. Rayhawk, "Memory BIST Using ESP," Proceedings of the 22nd IEEE VLSI Test Symposium (VTS 2004), pp. 243-248, 2004.
- [18] S. Tabatabaei and A. Ivanov, "Embedded timing analysis: A SoC infrastructure," *IEEE Design and Test of Computers*, vol. 19, no. 3, no., pp. 22-34, 2002.
- [19] C.-W. Wang, J.-R. Huang, Y.-F. lin, K.-L. Cheng, C.-T. Huang, and C.-W. Wu, "Test Scheduling of BISTed Memory Cores for SOC," Proceedings of the 11th Asian Test Symposium (ATS'02), pp. 1-6, 2002.
- [20] Y. Zorian, "System-Chip Test Strategies," 35th Design Automation Conference (DAC98), 1998.
- [21] R. Rankinen, K. Maatta, and J. Kostamovaara, "Time-to-digital conversion with 10 ps single shot resolution," 6th Mediterranean Electrotechnical Conference, vol. 1, pp. 319-322, 1991.
- [22] M. Collins, B. M. Al-Hashimi, and N. Ross, "A Programmable Time Measurement Architecture for Embedded Memory Characterization," *Proceedings of the 10<sup>th</sup> IEEE European Test Symposium (ETS'05)*, pp. 128-133, 2005.
- [23] M. Collins, B. M. Al-Hashimi, and P. R. Wilson, "On-chip timing measurement architecture with femtosecond resolution," *IEE Electronics Letters*, vol. 42, no. 9, pp. 582-583, 2006.
- [24] M. Collins and B. M. Al-Hashimi, "On-Chip Time Measurement Architecture with Femtosecond Timing Resolution," *Proceedings of the 11th IEEE European Test Symposium (ETS'06)*, pp. 103-108, 2006.
- [25] F. O'Mahoney, C. P. Yue, M. A. Horowitz, and S. S. Wong, "A 10-GHz global clock distribution using coupled stanging-wave oscillators," *IEEE Journal of Solid-State Circuits*, vol. 38, no. 11, pp. 1813-1820, Nov 2003.
- [26] R. Ho, B. Amrutur, K. Mai, B. Wilburn, T. Mori, and M. Horowitz, "Applications of on-chip sampliers for test and measurement of integrated circuits," *IEEE Symp. on VLSI Circuits*, pp. 138-139, 1998.

- [27] K. Soumyanath, S. Borkar, C. Zhou, and B. A. Bloechel, "Accurate on-chip interconnect evaluation: a time- domain technique," *IEEE Journal of Solid-State Circuits*, vol. 11, no. 3, pp. 336-344, 1999.
- [28] P. Larsson and C. Svensson, "Measuring high-bandwidth signals in CMOS circuits," *IEE Electronics Letters*, vol. 29, no. 20, pp. 1761-1762, 1993.
- [29] K. Lofstrom, "Early capture for boundary scan timing measurement," *IEEE International Test Conference* (ITC), pp. 417-422, Oct 1996.
- [30] A. Hajjar and G. W. Roberts, "A high speed and area efficient-on-chip analog waveform extractor," *IEEE International Test Conference* (ITC), pp. 688-697, 1998.
- [31] M. M. Hafed and G. W. Roberts, "A 5-Channel, Variable Resolution, 10-GHz Sampling Rate Coherent Tester/Oscilloscope IC and Associated Test Vehicles," *IEEE Proceedings of the Custom Integrated Circuits Conference (CICC)*, pp. 621-624, 2003.
- [32] Y. Zheng and K. L. Shepard, "On-Chip Oscilloscopes for Noninvasive Time-Domain Measurement of waveforms in Digital Integrated Circuits," *IEEE Trans. on Very Large Integration (VLSI) Systems*, vol. 11, no. 3, pp. 336-344, June, 2003.
- [33] C. Ljuslin, J. Christiansen, A. Marchioro, and O. Klingsheim, "An Integrated 16-channel Time to Digital Converter," *IEEE Trans. on Nuclear Science*, vol. 41, no. 4, pp. 1104-1108, Aug 1994.
- [34] N. Abaskharoun, M. Hafed, and G. W. Roberts, "Strategies for On-Chip Sub-Nanosecond Signal Capture and Timing Measurements," IEEE Inter. Symp. on Circuits and Systems (ISCAS'01), vol. 4, pp. 174-177, 2001.
- [35] N. Abaskharoun and G. W. Roberts, "Circuits for On-Chip Sub-Nanosecond Signal Capture and Characterization," IEEE Custom Integrated Circuits Conference (CICC), pp. 251-254, 2001.
- [36] A. H. Chan and G. W. Roberts, "A Deep Sub-Micron Timing Measurement Circuit Using a Single-Stage Vernier Delay Line," IEEE Custom Integrated Circuits Conference (CICC), pp. 77-80, 2002.
- [37] A. H. Chan and G. W. Roberts, "A Synthesizable, Fast and High Resolution Timing Measurement Device Using a Component-Invariant Venier Delay Line," International Test Conference (ITC), pp. 858-867, 2001.
- [38] P. Chen, S.-I. Lui, and J. Wu, "A Low Power High Accuracy CMOS Time-to-Digital Converter," IEEE Inter. Symp. on Circuits and Systems, pp. 281-284, June 9-12, 1997.
- [39] P. Chen, S.-I. Liu, and J. Wu, "A CMOS Pulse-Shrinking Delay Element for Time Interval Measurement," *IEEE Trans. on Circuits and Systems-II: Analog and Digital Signal Processing*, vol. 47, No. 9, no., pp. 954-958, Sept 2000.

- [40] S. Cherubal and A. Chatterjee, "A High-Resolution Jitter Measurement Technique Using ADC Sampling," International Test Conference (ITC), pp. 838-847, 2001.
- [41] P. Dudek, S. Szczpanksi, and J. V. Hatfield, "A High-Resolution CMOS Timeto-Digital Converter Utilizing a Vernier Delay Line," *IEEE Trans. on Solid-State Circuits*, vol. 35, No.2, no., pp. 240-247, February 2000.
- [42] M. S. Gorbics, J. Kelly, K. M. Roberts, and R. L. Sumner, "A High Resolution Multihit Time to Digital Converter Integrated Circuit," *IEEE Trans. on Nuclear Science*, vol. 44, No. 3, no., pp. 379-384, June 1997.
- [43] M.-j. Hsiao, J.-R. Huang, S.-S. Yang, and T.-Y. Chang, "A Low-Cost CMOS Time Interval Measurement Core," IEEE Inter. Symp. on Circuit and Systems (ISCAS'01), vol. 4, pp. 190-193, 2001.
- [44] J.-L. Huang and K.-T. Cheng, "An On-Chip Short-Time Interval Measurement Technique for Testing High-Speed Communication Links," 19th IEEE Proceedings on VLSI Test Symposium (VTS), pp. 380-385, May 2001.
- [45] C.-S. Hwang, P. Chen, and H.-W. Tsao, "A High-Resolution and Fast-Conversion Time-to-Digital Converter," IEEE Inter. Symp. on Circuits and Systems (ISCAS '03), vol. 1, pp. I-37 - I-40, 2003.
- [46] H. C. Lin, K. Taylor, A. Chong, E. Chan, M. Soma, H. Haggag, J. Huard, and J. Braatz, "CMOS Built-In Test Architecture for High-Speed Jitter Measurement," International Test Conference (ITC), pp. 67-76, 2003.
- [47] K. Maatta and J. Kostamovaara, "A High-Precision Time-to-Digital Converter for Pulsed Time-of-Flight Laser Radar Applications," *IEEE Trans. on Instrumentation and Measurement*, vol. 47, No. 2, no., pp. 521-536, April 1998.
- [48] G. C. Moyer, M. Clements, W. Liu, T. Schaffer, and R. K. Cavin, "The Delay Vernier Pattern Generation Technique," *IEEE Journal of Solid-State Circuits*, vol. 32, No. 4, no., pp. 551-562, April 1997.
- [49] T.-I. Otsuji, "A Picosecond-Accuracy, 700-MHz Range, Si-Bipolar Time Interval Counter LSI," *IEEE Journal of Solid-State Circuits*, vol. 28, No. 9, no., pp. 941-947, Sept 1993.
- [50] R. Pelka, J. Kalisz, and R. Szplet, "Nonlinearity Correction of the Integrated Time-to-Digital Converter with Direct Coding," *IEEE Trans. on Instrumentation and Measurement*, vol. 46, No. 2, no., pp. 449-453, April 1997.
- [51] E. Raisanen-Ruotsalainen, T. Rahkonen, and J. Kostamovaara, "Time Interval Measurements using Time-to-Voltage Conversion with Built-In Dual-Slope A/D Conversion," IEEE Inter. Symp. on Circuits and Systems (ISCAS '91), vol. 5, pp. 2573-2576, 1991.
- [52] E. Raisanen-Ruotsalainen, T. Rahkonen, and J. Kostamovaara, "A BiCMOS Time-to-Digital Converter with 30 ps Resolution," IEEE Inter. Symp. on Circuit and Systems (ISCAS'99), vol. 1, pp. 278-281, 1999.

- [53] J. M. Rochelle and M. L. Simpson, "Current-Mode Time-to-Amplitude Converter for Precision Sub-Nanosecond Measurement," IEEE Nuclear Science Symposium and Medical Imaging Conference, vol. 1, pp. 468-470, 1992.
- [54] M. L. Simpson, C. L. Britton, A. L. Winterberg, and G. R. Young, "An Integrated CMOS Time Interval Measurement System with Subnanosecond Resolution for the WA-98 Calorimeter," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 2, pp. 198-205, Feb 1997.
- [55] A. E. Stevens, R. P. V. Berg, J. V. d. Spiegel, and H. H. Williams, "A Sub-Nanosecond Time-to-Voltage Converter and Analog Memory," IEEE Inter. Symp. on Circuits and Systems (ISCAS), pp. 268-271, 1989.
- [56] S. Tabatabaei and A. Ivanov, "An Embedded Core for Sub-Picosecond Timing Measurements," International Test Conference (ITC), pp. 129-137, 2002.
- [57] T. Xia and J.-C. Lo, "On-Chip Short-Time Interval Measurement for High-Speed Signal Timing Characteristics," Proceedings of the 12th Asian Test Symposium (ATS'03), pp. 326-331, 2003.
- [58] Y. Yamaguchi, N. Koyanagi, and k. Katano, "A High Resolution Time Measurement System," 8th IEEE Instrumentation and Measurement Technology Conference (IMTC'91), pp. 618-621, 1991.
- [59] J. Kalisz, R. Szplet, R. Pelka, and A. Poniecki, "Single-Chip Interpolation Time Counter with 200-ps Resolution and 43-s Range," *IEEE Trans. on Instrumentation and Measurement*, vol. 46, no. 4, no., pp. 851-856, Aug 1997.
- [60] T. Xia and J.-C. Lo, "Time-to-Voltage Converter for On-Chip Jitter Measurement," *IEEE Trans. on Instrumentation and Measurement*, vol. 52, no. 6, pp. 1738-1748, December 2003.
- [61] T. Xia and J.-C. Lo, "On-Chip Jitter Measurement for Phase Locked Loops," Proceedings of the 17th IEEE Inter. Symp. on Defect and Fault Tolerance in VLSI Systems (DFT'02), pp. 339-407, 2002.
- [62] A. E. Stevens, R. P. V. Berg, J. v. D. Spiegel, and H. H. Williams, "A time-tovoltage converter and analog memory for colliding beam detectors," *IEEE Journal of Solid-State Circuits*, vol. 24, no. 6, pp. 1748-1752, 1989.
- [63] E. Raisanen-Ruotsalainen, T. Rahkonen, and J. Kostamavaara, "A time digitizer with interpolation based on time-to-voltage conversion," 40th IEEE Midwest Symp. on Circuits and Systems, pp. 197-200, 1997.
- [64] R. L. Sumner, "Apparatus and method for measuring time intervals with very high resolution," US Patent 6137749,2000
- [65] A. H. Chan and G. W. Roberts, "A jitter characterization system using a component-invariant Vernier delay line," *IEEE Trans. on Very Large Integration (VLSI) Systems*, vol. 12, no. 1, pp. 79-95, 2004.
- [66] C. T. Gray, W. Lui, W. A. M. V. Noije, J. T. A. Hughes, and R. K. Cavin, "A sampling technique and its CMOS implementation with 1 Gb/s bandwidth and

25ps resolution," *IEEE Journal of Solid-State Circuits*, vol. 29, no. 3, pp. 340-349, 1994.

- [67] K. Jin-ku, "A CMOS High -Speed DAta Recovery Circuit using the Matched Delay Sampling Technique," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 10, pp. 1588-1596, 1997.
- [68] V. Stojanovic and V. G. Oklobdzija, "Comparative Analysis of Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 4, pp. 536-548, 1999.
- [69] U. Ko and P. T. Balsara, "High-Performance Energy-Effecient D-Flip-Flop Circuit," *IEEE Trans. on Very Large Integration (VLSI) Systems*, vol. 8, no. 1, pp. 94-98, 2000.
- [70] M. A. Abas, G. Russell, and D. J. Kinniment, "Design of sub-10-picoseconds on-chip time measurement circuit," *Proceeding of the Design Automation and Test in Europe Conference and Exhibition (DATE'04)*, pp. 804-809, 2004.
- [71] T. Xia and J.-C. Lo, "Time-to-Voltage Converter for On-Chip Jitter Measurement," *IEEE Trans. on Instrumentation and Measurement*, vol. 52, no. 6, pp. 1738-1748, 2003.
- [72] T. Xia, H. Zheng, J. Li, and A. Ginawi, "Self-refereed on-chip jitter measurement circuit using Vernier oscillators," IEEE Computer Society Annual Symp. on VLSI, pp. 218-223, 2005.
- [73] G. D. Micheli, *Synthesis and Optimization of Digital Circuits*: McGraw-Hill, Inc, 1994, ISBN 0-07-016333-2.
- [74] "Cadence Design Systems," <u>www.cadence.com</u>
- [75] B. Razavi and B. A. Wooley, "Design Techniques for High-Speed, High-Resolution Comparators," *IEEE Journal of Solid-State Circuits*, vol. 27, no. 12, pp. 1916-1926, 1992.
- [76] P. E. Allen and D. R. Holberg, CMOS Analog Circuit Design: Oxford University Press, 1987, ISBN 0-19-510720-9.
- [77] S. Limotyrakis, S. D. Kulchycki, D. K. Su, and B. A. Wooley, "A 150-MS/s 8-b 71-mW CMOS Time-Interleaved ADC," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 5, pp. 1057-1067, 2005.
- [78] B. K. Swann, B. J. Blalock, L. G. Clonts, D. M. Binkley, J. M. Rochelle, E. Breeding, and K. M. Baldwin, "A 100-ps Time-Resolution CMOS Time-to-Digital Converter for Positron Emission Tomography Imaging Applications," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 11, pp. 1839-1852, 2004.
- [79] J. Lee, P. Roux, U.-V. Koc, T. Link, Y. Baeyens, and Y.-K. Chan, "A 5-b 10-Gsample/s A/D Converter for 10-Gb/s Optical Receivers," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 10, pp. 1671-1679, 2004.

- [80] K.-L. J. Wong, H. Hatamkhani, M. Mansuri, and C.-K. K. Yang, "A 27-mW 3.6-Gb/s I/O Transceiver," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 4, pp. 602-612, 2004.
- [81] B. J. McCarroll, C. G. Sodini, and H.-S. Lee, "A High-Speed CMOS Comparator for Use in an ADC," *IEEE Journal of Solid-State Circuits*, vol. 23, no. 1, pp. 159-165, 1988.
- [82] J.-T. Wu and B. A. Wooley, "A 100-MHz Pipelined CMOS Comparator," *IEEE Journal of Solid-State Circuits*, vol. 23, no. 6, pp. 1379-1385, 1988.
- [83] P. M. Figueiredo and J. C. Vital, "Low Kickback Noise Techniques for CMOS Latched Comparators," IEEE Inter. Symp. on Circuits and Systems (ISCAS '04), pp. I-537-I-540, 2004.
- [84] J. E. Franca, "Analogue-Digital Window Comparator with Highly Flexible Programmability," *IEE ELECTRONIC LETTERS*, vol. 27, no. 22, pp. 2063-2064, 27th Aug 1991.
- [85] T. Shih, L. Der, S. H. Lewis, and P. J. Hurst, "A Fully Differential Comparator Using a Switched-Capacitor Differencing Circuit with Common-Mode Rejection," *IEEE Journal of Solid-State Circuits*, vol. 32, no. 2, pp. 250-253, 1997.
- [86] A. Yukawa, "A CMOS 8-Bit High-Speed A/D Converter IC," *IEEE Journal of Solid-State Circuits*, vol. SC-20, no. 3, pp. 775-779, 1985.
- [87] S.-R. Lee, M.-J. Hsiao, and T.-Y. Chang, "An Access Timing Measurement Unit of Embedded Memory," Proceedings of the 11th Asian Test Symposium (ATS'02), pp. 104-109, 2002.
- [88] M.-j. Hsiao, J.-R. Huang, and T.-Y. Chang, "A Built-In Parametric Timing Measurement Unit," *IEEE Design and Test of Computers*, vol. 21, no. 4, pp. 322-330, 2004.
- [89] W. Rhee, "Design of High-Performance CMOS Charge Pumps in Phase-Locked Loops," *IEEE Inter. Symp. on Circuits and Systems* (ISCAS'99), vol. 2, pp. 545-548, 1999.
- [90] "Circuit Multi-Projets (CMP)," http://cmp.imag.fr/index.html, 2003
- [91] A. Hastings, *The Art of Analog Layout*: Prentice Hall, 2001, ISBN 0-13-087061-7.
- [92] D. Johns and K. Martin, *Analog Integrated Circuit Design*: John Wiley and Sons, Inc, 1997, ISBN 0-471-14448-7.
- [93] A. Hajimiri and T. H. Lee, "A General Theory of Phase Noise in Electrical Oscillators," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 2, pp. 179-194, 1998.
- [94] "Microchip," <u>www.microchip.com</u>

- [95] "Maxim 3002 +1.2V to +5.5V 8-Channel Level Translators," http://www.maxim-ic.com/quick\_view2.cfm/qv\_pk/3672
- [96] "Maxim DS1020 Programmable 8-bit Silicon Delay Line," <u>http://www.maxim-ic.com/quick\_view2.cfm/qv\_pk/2606</u>
- [97] S. Becu, S. Cremer, and J. L. Autran, "Capacitance non-linearity study in Al<sub>2</sub>O<sub>3</sub> MIM capacitors using an ionic polarization model," *Microelectronic Engineering*, vol. 83, no. 11-12, pp. 2422-2426, 2006.
- [98] M. Oulmane and G. W. Roberts, "A CMOS time amplifier for femto-second resolution timing measurement," IEEE Inter. Symp. on Circuits and Systems, pp. 509-512, 2004.
- [99] A. M. Abas, A. Bystrov, D. J. Kinniment, O. V. Maevsky, G. Russell, and A. V. Yakovlev, "Time difference amplifier," *IEE ELECTRONIC LETTERS*, vol. 38, no. 23, pp. 1437-1438, 2002.
- [100] D. J. Kinniment, A. Bystrov, and A. V. Yakovlev, "Synchronization circuit performance," *IEEE Journal of Solid-State Circuits*, vol. 37, no. 2, pp. 202-209, 2002.
- [101] B. Razavi, *RF Microelectronics*: Prentice Hall, 1998, ISBN 0-13-88751-5.
- [102] H.-J. Song and C.-K. Kim, "An MOS four-quadrant analog multiplier using simple two-input squaring circuits with source followers," *IEEE Journal of Solid-State Circuits*, vol. 25, no. 3, pp. 841-848, 1990.
- [103] P. J. Sullivan, B. A. Xavier, and W. H. Ku, "Doubly balanced dual gate CMOS mixer," *IEEE Journal of Solid-State Circuits*, vol. 34, no. 6, pp. 878-881, 1999.
- [104] S.-C. Qin and R. L. Geiger, "A +/- 5-V CMOS Analog Multiplier," *IEEE Journal of Solid-State Circuits*, vol. SC-22, no. 6, pp. 1143-1146, 1987.
- [105] H. Wang, "A 1-V Multigigahertz RF Mixer Core in 0.5-μm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 33, no. 12, pp. 2265-2267, 1998.
- [106] M. Madihian, H. Fujii, H. Yoshida, H. Suzuki, and T. Yamazaki, "A 1-10 GHz 0.18um-CMOS Chipset for Multi-Mode Wireless Applications," IEEE MTT-S International Microwave Symposium Digest, pp. 1865-1868, 2001.
- [107] H. Kleinman, "Application of dual-gate MOS field-effect transistors in practical radio receivers," *IEEE Trans. Broadcast Television Receivers*, vol. BTR-13, no., pp. 72-81, 1967.
- [108] S. Weaver, "TV design considerations using high-gain dual-gate MOSFET's," *IEEE Trans. Broadcast Television Receivers*, vol. BTR-19, no., pp. 87-98, 1973.
- [109] C. Tsironis, R. Meierer, and R. Stahlmann, "Dual-gate MESFET mixers," *IEEE Trans. Microwave Theory Tech.*, vol. MIT-32, no., pp. 248-255, 1984.

- [110] C. F. Au-Yeung and K. K. M. Cheng, "CMOS Mixer Linearization by Low-Frequency Signal Injection Method," IEEE MTT-S International Microwave Symposium Digest, pp. 95-98, 2003.
- [111] V. Nair, S. Tehrani, R. L. Vatkus, and D. G. Scheitlin, "Low power HFET down converter MMIC's for wireless communication applications," *IEEE Trans. Microwave Theory Tech.*, vol. 43, no., pp. 3043-3047, 1995.
- [112] R. Gregorian, K. W. Martin, and G. C. Temes, "Switched-Capacitor Circuit Design," *Proceedings of the IEEE*, vol. 71, no. 8, pp. 941-966, 1983.
- [113] P. V. A. Mohan, V. Ramachandran, and M. H. S. Swamy, *Switched Capacitor Filters Theory, Analysis and Design*: Prentice Hall, 1995, ISBN: 0-13-879818-4.
- [114] J. W. Bruce-II, "Meeting the analog world challenge. Nyquist-rate analog-todigital converter architectures," *IEEE Potentials*, vol. 17, no. 5, pp. 36-39, Dec 1998 - Jan 1999.
- [115] R. Schreier and G. C. Temes, *Understanding Delta-Sigma Data Converters*: John Wiley & Sons, Inc, 2004, ISBN 0-471-46585-2.
- [116] E. C. Ifeachor and B. W. Jervis, *Digital Signal Processing: A Practical Approach*: Addison-Wesley, 1993, ISBN 0-201-54413-X.
- [117] J. G. Proakis and D. G. Manolakis, *Digital Signal Processing: Principles*, *Algorithms and Applications: Third Edition*: Prentice Hall, 1992, ISBN 0-13-394289-9.
- [118] "IEEE Standard Testability Method for Embedded Core-based Integrated Circuits," *IEEE std. 1500-2005*, 2005.
- [119] F. DaSilva, Y. Zorian, L. Whetsel, K. Arabi, and R. Kapur, "Overview of the IEEE P1500 Standard," *International Test Conference (ITC'03)*, pp. 988-997, 2003.
- [120] E. J. Marinissen, R. Kapur, M. Lousberg, T. McLaurin, M. Ricchetti, and Y. Zorian, "On IEEE P1500's Standard for Embedded Core Test," *Journal of Electronic Testing: Theory and Applications*, vol. 18, no., pp. 365-383, 2002.