Low-Cost On-Chip Clock Jitter Measurement Scheme

Martin Omaña, Daniele Rossi, Daniele Giaffreda, Cecilia Metra, Fellow, IEEE, T. M. Mak, Asifur Rahman, and Simon Tam, Senior Member, IEEE

Abstract—In this paper, we present a low-cost, on-chip clock jitter digital measurement scheme for high performance microprocessors. It enables in situ jitter measurement during the test or debug phase. It provides very high measurement resolution and accuracy, despite the possible presence of power supply noise (representing a major source of clock jitter), at low area and power costs. The achieved resolution is scalable with technology node and can in principle be increased as much as desired, at low additional costs in terms of area overhead and power consumption. We show that, for the case of high performance microprocessors employing ring oscillators (ROs) to measure process parameter variations (PPVs), our jitter measurement scheme can be implemented by reusing part of such ROs, thus allowing to measure clock jitter with a very limited cost increase compared with PPV measurement only, and with no impact on parameter variation measurement resolution.

Index Terms—Clock jitter, high performance microprocessor, jitter measurement.

I. INTRODUCTION

CLOCK is one of the most critical signal in any synchronous system, which has to be distributed throughout the chip using a complex network [1]. With the scaling of technology and increase in clock frequency, it is becoming increasingly difficult to guarantee the correctness of clock signals, due to the increasing likelihood of manufacturing defects, clock jitter, duty-cycle distortion, process parameter variations (PPVs) and power supply noise (PSN) [2]–[4].

Jitter affecting clock signal produces uncertainties in its period and rising/falling edges, thus forcing designers to either increase the time margins, or face the possibility of operating malfunctions. For high performance microprocessors, the adoption of minimum time margin is desirable, so that on-chip jitter measurement should be performed during the test or debug phase to validate the design and manufacturing assumptions for the clock. PSN modulating the delay of the clock signal is currently recognized as one of the main causes of clock jitter [4]. It is expected to increase with technology scaling, due to the increasing complexity and integration density, resulting in high switching activities [4]–[6].

Together with clock jitter, also PPV occurring during fabrication are increasingly likely and significant with technology scaling. They may induce either performance degradation, or operating malfunctions [7]. Therefore, also PPV mandate on-chip measurement during the test and debug phase to validate design and process, possibly drive speed-binning, and eventually dictate design process improvements.

Moreover, if PPV affect the buffers of the clock distribution network, they can cause clock skew [8]–[10]. Deskew buffers can be employed to compensate for PPV produced effect [8], but their application is typically still limited to some portions of the whole clock distribution network only (e.g., global distribution), due to cost limitations.

Several measurement schemes have been proposed for clock jitter [4]–[7], [11]–[15] and PPV [7], [16], [17]. The use of ring oscillators (ROs) for PPV measurement is widely assessed and adopted. Instead, schemes for clock jitter measurement are not as well established yet, mainly because of limits in their measurement resolution and accuracy [15].

In [14] and [15], jitter measurement schemes based on Vernier delay lines (VDL) have been proposed. They employ an additional delay-locked loop to calibrate the delay of the elements within the VDL against process, temperature, and voltage variations. Although these techniques provide a high measurement resolution, they imply a considerable area overhead.

In [13], a circuit based on a NOT chain delay line has been proposed. It features resolution equal to a NOT delay, and requires a considerable area overhead.

Finally, a circuit consisting of latches and NOT chains has been presented in [4]. It features a measurement resolution equal to an inverter delay, which can be calibrated to compensate the effects of PSN and PPV on the provided measurement. However, the required area overhead and power consumption are not negligible.

Based on the limitations of the approaches proposed so far to achieve high jitter measurement resolution and accuracy at limited costs, in this paper, we present a new on-chip
digital measurement scheme, whose basic structure has been introduced in [18]. It allows to measure clock jitter with high and scalable resolution at limited costs, and with high accuracy despite the presence of PSN.

The proposed approach is based on a scheme similar to [4], with the following main differences: 1) the implementation of the sampling elements [transfer gates (TGs) rather than latches]; 2) the usage of multiple out-of-phase delay lines in our scheme to increase resolution; and 3) the proposal of a sampling strategy to avoid the impact of PSN on jitter measurement.

Compared with [4], our scheme allows a 40% reduction in both area overhead and power consumption. Instead, compared with the approaches in [14] and [15], our scheme requires a considerably lower area overhead, while featuring the same measurement resolution.

Then, as introduced in [19], we show that, for high performance microprocessors, the area required by our measurement scheme can be further reduced by reusing and properly modifying part of the ROs often employed for PPV measurement. Our scheme can be set in either the PPV measurement mode, or the clock jitter measurement mode, by acting on an external control signal. The effectiveness of our approach has been verified using electrical level simulations, performed considering a PSN up to 50% of the nominal power supply voltage.

The rest of this paper is organized as follows. In Section II, we present some basics on clock jitter. In Section III, we introduce our proposed jitter measurement scheme, while in Section IV, we report some of the results of the electrical level simulations that we have performed to verify its correct behavior. We show two possible implementations of our jitter measurement scheme, one of which reuses the ROs that are usually adopted in high performance microprocessors for PPV measurement. In Section V, we evaluate the costs of our scheme and we compare them with those of alternative approaches recently proposed. Finally, we give some conclusive remarks in Section VI.

II. JITTER AFFECTING CLOCK SIGNALS

Jitter is the deviation of a signal timing event from its ideal position [20], causing displacements of clock transition times. These displacements are categorized as either deterministic, random, or both. We refer to the following jitter definitions [21]: 1) timing jitter, which is the time difference between the actual and ideal signal transition; 2) period jitter, which is the time variation of the signal period from its average value; and 3) cycle-to-cycle jitter, which is the time variation of the period of a signal within two following periods. It has been shown that these jitter definitions are mathematically related to each other [21], therefore, in the reminder of this paper, we will consider the period jitter only.

Let us first consider the jitter-free clock signal (denoted by CK) with 50% duty-cycle. It can be described by

\[
CK(t) = \begin{cases} 
1, & 0 \leq t < \frac{T_{CK}}{2} \\
0, & \frac{T_{CK}}{2} \leq t < T_{CK}.
\end{cases}
\]  

III. PROPOSED JITTER MEASUREMENT SCHEME

We measure the duration of clock high and/or low phase(s) over time, and compare the obtained results with those expected for the case of jitter-free clock. For the sake of brevity, we here present the scheme for the clock high phase measurement only, which can be easily extended to measure both the clock phases.

A. Scheme With Resolution Equal to a NOT Delay

The basic block structure of our proposed scheme is shown in Fig. 1(a). The NOT chain implements a delay line delaying the input CK, whose jitter has to be measured, by a given amount of time. The outputs of the NOT gates are sampled by the measurement sample (MS) block, when the control block (CB) gives valid measure (VM) = 1. The output stage (OS) produces the measurement encoded by a thermometer code. By making \(R_s = 1\), CB resets the measurement after a time long enough to allow the system to read it.

Denoting the delay of each NOT in the chain by \(\tau\), the total chain delay is \(N\tau\). The integer \(N\) is such that the total delay covers the whole period \(T_{CK}\) of the CK under jitter-free conditions. Therefore, it is: \(N = \lfloor T_{CK}/\tau \rfloor\). Considering the clock signal CK(t) in 1, and denoting its complemented signal by \(\bar{CK}(t)\), the signals \(p_i\ (i = 1 \ldots N)\) can be represented as

\[
p_i(t) = \begin{cases} 
CK(t-(i+1)\tau), & i \text{ odd} \\
CK(t-(i+1)\tau), & i \text{ even}.
\end{cases}
\]
are shown in Fig. 1(b). Each row represents the snapshot at one specific time instant. The CK switches at time \( t_i \) and its falling (rising) edge propagates through the chain. The position within the chain of the CK falling (rising) edge is identified by two successive zeroes, or two successive ones, whose location moves progressively to the right. The duration of the CK high phase is given by the number of NOTs within the chain that the CK rising edge has to pass through, before the CK falling edge arrives to the chain input.

To account for the effects of PSN, we have considered the realistic model in [22] and [23]. It includes also the presence of coupling capacitors, usually employed within the power distribution network (PDN) to reduce the current return paths, thus reducing PSN.

It is worth noticing that, the PDN characteristic impedance seen from different locations inside the PSN topology may exhibit significant differences [24]. Depending also on the operating frequency, the package inductance or the decoupling capacitance might prevail. Particularly, when decoupling capacitors are employed (either on-chip level, or both on on-chip and on-board level), the supply voltage waveform presents a first triangular peak that is considerably higher than the secondary peaks [25]. Therefore, for evaluation purposes, we have realistically modeled the PSN as a train of narrow triangular pulses [22], whose width depends on the number of switching gates at a single clock period. Moreover, since the pulsewidth is always very small, the PSN can be modeled as an impulse train with a uniformly distributed random shift in \([0, t_r}\), where \( t_r \) is the rise-time of the clock signal [22]. In our scheme, to reduce the effects of PSN on jitter measurement, MS samples the values present on signals \( p_i \ (i = 1 \ldots N) \) when the CK falling edge arrives at the input of the second NOT of the chain \( (p_1) \), rather than at the input of the first NOT. This allows the PSN to vanish before sampling. The sampling instant is identified by the condition \( p_s = p_1 = 1 \).

PSN may also influence the delay of some NOT gates while the clock edge travels through the delay line. Using electrical level simulations, we have verified that only the NOT propagating the CK edge when the PSN occurs is impacted, while the sampling circuitry is not. The variation in the delay of the NOT propagating the CK edge determines the impact of PSN on measurement accuracy. We have determined that such a not delay variation impacts the jitter measurement accuracy of our scheme by only 6.1%. Therefore, we have considered a constant delay \( \tau \) of the NOTs in the mathematical model of our scheme behavior.

The useful bits representing the jitter measurement start from the output of the second NOT of the chain, denoted by \( p_1 \). The output \( p_s \), together with its associated signal \( oR_i \) is used by the control block CB to determine the sampling instant. Our scheme samples the outputs of the NOT chain at a time instant denoted by \( t_j \), after the CK rising edge. It is

\[
t_s = D_{CK-H} + \tau = T_{CK}/2 \pm J + \tau. \tag{3}
\]

At time \( t_i \), CB asserts VM, and the values on signals \( p_i \) and \( p_i \ (i = 1 \ldots N) \) are sampled by MS and provided as outputs on \( oR_i \) and \( oR_i \ (i = 1 \ldots N) \), respectively. We determine the logic values sampled on each \( oR_i \) by making \( t = t_s \) in (2).

We obtain

\[
\text{out}_i = p_i(t_s) = \begin{cases} CK \left( \frac{T_{CK}}{2} \pm J - i \tau \right), & \text{if } i \text{ odd} \\ CK' \left( \frac{T_{CK}}{2} \pm J - i \tau \right), & \text{if } i \text{ even}. \end{cases} \tag{4}
\]

After sampling, the OS block gives at its outputs

\[
o_Ri = CK' \left( \frac{T_{CK}}{2} \pm J - i \tau \right), \quad (i = 1 \ldots N). \tag{5}
\]

The word on \( o_Ri \ (i = 1 \ldots N) \) is encoded by the thermometer code, as shown in Fig. 2. It consists of a number of 0 s equal to \( i_0 \), followed by \( (N - i_0) \) 1 s, so that

\[
o_Ri = \begin{cases} 0, & \text{if } 1 \leq i \leq i_0 \\ 1, & \text{if } i_0 \leq i \leq N. \end{cases} \tag{6}
\]

According to (5), \( o_Ri = 0 \ (\forall i \leq i_0) \) if the argument of \( CK' \) is greater than or equal to 0. Thus, we can simply obtain the value of \( i_0 \) by equating the argument of \( CK' \) in (5) to 0, that is: \( T_{CK}/2 \pm J - i \tau = 0 \), for \( i = i_0 \). Therefore, it is

\[
i_0 = \frac{1}{\tau} \left( \frac{T_{CK}}{2} \pm J \right). \tag{7}
\]

This resolution (Res) of our scheme is given by the minimum variation in the CK high phase duration resulting in one more 0 (1) at the outputs \( o_Ri \). The Res value can be determined as the difference between the arguments of \( o_Ri \) and \( o_R(i+1) \), when it is \( o_Ri = 0 \) and \( o_R(i+1) = 1 \). Therefore

\[
\text{Res} = \left( \frac{T_{CK}}{2} \pm J - i \tau \right) - \left( \frac{T_{CK}}{2} \pm J - (i + 1) \tau \right) = \tau. \tag{8}
\]

The thermometer encoding produced by our scheme allows to easily derive the clock jitter measurement. The encoded word \( o_Ri(i = 1 \ldots N) \) can be compared with that expected in the case of jitter-free CK through \( N \) parallel XORs. The comparison results in an N-bit vector with a number of 1 s equal to the difference between the number of 0 s in the produced encoded word and in the expected one. The number of 1 s can be counted, and jitter measurement can be obtained by multiplying it by the scheme resolution. After a time long enough to allow the system to read the performed measure, the scheme can be reset by asserting \( RS \) pulse, thus making it ready for a following measurement. We assume signal \( RS \) is activated every other CK cycle. We use a periodic signal generated reset (GR) with half the CK frequency to generate the \( RS \) pulse upon its rising edge. The timing of these signals are shown in Fig. 3.
Such signals feed the block OS, which performs the same function as in (5). This way, the jitter measurement on $o_{Rm}$ ($m = 1 \ldots 2N$) is encoded by a thermometer code. It is

$$o_{Rm} = CK'\left(\frac{T_{CK}}{2} \pm J - \frac{m}{2}\right), \quad o_{Rm} = \begin{cases} 0, & 1 \leq m \leq m_0 \\ 1, & m_0 \leq m \leq 2N \end{cases}$$

(11)

where $m_0$ is the order of the last $o_{Rm} = 0$. According to (11) and Fig. 1(b), it is $o_{Rm} = 0$, if the argument of $CK'$ is greater than or equal to 0. The value of $m_0$ can be obtained by equating the argument of $CK'$ in (11) to 0. It is

$$m_0 = \frac{2}{\tau} \left(\frac{T_{CK}}{2} \pm J\right)$$

(12)

The Res of our scheme in Fig. 4(a) can be expressed as the difference between the arguments of $o_{Rm}$ and $o_{R(m+1)}$, when it is $o_{Rm} = 0$ and $o_{R(m+1)} = 1$. From (11), it derives that

$$\text{Res} = \left(\frac{T_{CK}}{2} \pm J - \frac{m}{2}\right) - \left(\frac{T_{CK}}{2} \pm J - \frac{(m+1)}{2}\right) = \frac{\tau}{n}$$

(13)

Therefore, the resolution of our measurement scheme can be scaled by properly adding a NOT chain to the scheme in Fig. 2(a), and by properly sizing their NOTs.

C. Scheme With Resolution Higher Than Half NOT Delay

Let us consider the general case of $n$ chains. Chain 1 still consists of NOTs each with an delay equal to $\tau$. As for the remaining $n-1$ NOT chains, the first NOT of the $j$th NOT chain ($j = 2 \ldots N$) has a delay $d_{j1} = (1+(j-1)/n)\tau$ ($j = 2 \ldots N$), while all other NOTs have a delay equal to $\tau$.

By considering as output the alternated succession of the $n$ NOT chain outputs (i.e., $p_{21}, p_{31}, \ldots, p_{n1}, p_{12}, p_{22}, p_{32}, \ldots, p_{n2}, \ldots$), any two following outputs will have a phase difference equal to $\tau/n$. The expressions of signals $p_{ji}$ ($i = 1 \ldots N; j = 1 \ldots N$), as a function of time, are

$$p_{ji}(t) = \begin{cases} CK\left(t - (i + 1)\tau\right), & i \text{ odd} \\ CK'\left(t - (i + 1)\tau\right), & i \text{ even} \end{cases} \quad (i = 1 \ldots N)$$

$$p_{ji}(t) = \begin{cases} CK\left(t - (i + 1)\tau\right), & i \text{ odd} \\ CK'\left(t - (i + 1)\tau\right), & i \text{ even} \end{cases} \quad (i = 1 \ldots N)$$

(14)

Extending the function of the OS block (11) to the case of $n$ NOT chains, we obtain:

$$m_0 = \frac{n}{\tau} \left(\frac{T_{CK}}{2} \pm J\right).$$

(15)

The resolution of our scheme with $n$ NOT chains is given by the difference between the arguments of $o_{Rm}$ and $o_{R(m+1)}$, when $o_{Rm} = 0$ and $o_{R(m+1)} = 1$. From (15), it is

$$\text{Res} = \left(\frac{T_{CK}}{2} \pm J - \frac{m}{n}\right) - \left(\frac{T_{CK}}{2} \pm J - \frac{(m+1)}{n}\right) = \frac{\tau}{n}.$$  

(16)

The achievement of an increasingly higher resolution by augmenting the number of NOT chains is limited by the
difficulty in controlling the NOT delays, due to PPV. To solve this issue, the NOT chains can be implemented using balanced delay lines [26], or inverters whose delay can be calibrated after fabrication [27].

IV. IMPLEMENTATION AND VERIFICATION

Two possible implementations of our scheme to measure the CK high phase are here described. We consider a standard 65-nm CMOS technology [28], \( V_{DD} = 1.1 \) V, 3-GHz clock frequency and two NOT chains [Fig. 4(a)]. Particularly, first, we introduce a possible implementation, in which we have designed all the blocks required by our scheme (and introduced in Section III). Then, we present a possible implementation reusing ROs usually present in high performance microprocessors for PPV measurement. Finally, we show some results of the HSPICE electrical level simulations that we have performed to verify the behavior of our scheme.

A. Implementation Non Reusing ROs

Let us consider the two NOT chains [Fig. 5(a)]. In chain 1, all NOTs exhibit a delay \( \tau \geq 12 \) ps; in chain 2, the first NOT has a delay equal to \((1 + \frac{1}{2}) \tau \geq 18 \) ps, while all other NOTs have a delay \( \tau \). Since each chain needs to cover \( T_{CK} \approx 333 \) ps, the number of NOTs within each chain is \( N = \left\lceil \frac{T_{CK}}{\tau} \right\rceil = 28 \).

As for MS and OS, their possible implementation is shown in Fig. 5(b). The inputs of MS (\( p_s \) and \( p_{ji} \)) are connected to its outputs (out\(_{s}\) and out\(_{m}\), respectively) through TGs driven by VM and VM’. This way, when VM = 0, all TGs conduct and connect the outputs of the NOT chains to signals out\(_{s}\) and out\(_{m}\). Instead, when VM flips to 1, all TGs are turned off, so that the outputs of the NOT chains are sampled. Signals out\(_{s}\) and out\(_{m}\) remain in a high impedance state, keeping latched the logic values till reset.

Fig. 6. (a) Implementation of the considered NOTs with programmable delay. (b) Propagation delay variation in case of PPVs as a function of the program signals (A, B, and C).

As for OS, it buffers the out\(_{m}\) signals and encodes them by a thermometer code on signals \( o_{Rm} \). The sampled data must be maintained for one clock cycle only. Therefore, dynamic latches have been considered, rather than more costly static latches, to reduce implementation costs. When \( R_s \) is asserted, VM flips to 0, making all TGs conductive again. Thus, all signals \( o_{Rm} \) become equal to 1, thus removing the previous measurement results.

The outputs of the NOTs at the same level \( i (i = 1 \ldots N) \) within the \( j \)th chain are presented in Section IV. Using Monte Carlo simulations, we have evaluated the variations in the NOT delay due to PPV. The achieved results are shown in Fig. 6(b). We can observe that the variations of the NOT delay are well within a range of approximately \( \pm 20\% \) of its nominal value of 12 ps. Fig. 6(b) also shows that, by properly setting to 1 the program signals (i.e., A, B, and C) of the considered NOT, the delay of the NOT can be adjusted to compensate the variations due to PPV.

PPV may also imbalance the low-to-high and high-to-low transitions of the NOTs. However, we have verified this negligibly impacts the duty-cycle of the clock (by less than 1%), whose jitter is being measured. The delay of the NOTs is also sensitive to voltage and temperature variations. Such a sensitivity may be reduced by employing one of the techniques that have been proposed in the literature for the on-chip compensation of voltage and temperature variations (e.g., that in [29]).

As for CB, Fig. 7(a) and (b) shows an implementation of the circuits generating VM, \( R_s \) and their complemented signals. Signals VM and VM’ should be fine-tuned to avoid any systematic error. Their delays can be equalized by considering the scheme in Fig. 7(a). Signal VM (VM’) should flip to 1 (0) when \( p_s = p_{12} = 1 \), and it should be kept at this value till reset. This can be obtained by exploiting signals out\(_{s}\) and out\(_{m}\) generated at the output of MS, that remain latched at the high logic value till reset. The reset signal \( R_s \) (and \( R_s’ \)) is activated every other CK cycle, and may be implemented by the circuit in Fig. 8(b). The signal GR can be obtained by normally maintaining 1 V, 3-GHz clock frequency and two NOT chains [Fig. 4(a)]. Particularly, first, we introduce a possible implementation, in which we have designed all the blocks required by our scheme (and introduced in Section III). Then, we present a possible implementation reusing ROs usually present in high performance microprocessors for PPV measurement. Finally, we show some results of the HSPICE electrical level simulations that we have performed to verify the behavior of our scheme.

B. Implementation Reusing ROs

We refer to the PPV measurement strategy in [16]: it consists of many functional unit blocks (FUBs), each composed by \( q \) ROs with \( K \) (usually equal to 99) NOTS. The FUB internal structure is shown in Fig. 8 [12]. The NOTS
Fig. 8. Internal structure of the FUBs in [12].

Fig. 9. Implementation of our jitter measurement scheme with two reused ROs of the scheme, for the case of Res = τ/2.

Fig. 10. Simulation results for nominal values of electrical parameters, considering a PSN of 50% of V_{DD}.

C. Verification

We show some of the results of the HSPICE simulations that we have performed to verify the behavior of our jitter measurement scheme, considering both the implementations in Sections IV-A and IV-B. The PSN has been modeled as described in Section III.A, with a peak value of 50% of V_{DD}. We also account for the setup and hold times of the sampling circuits, whose values are: t_{setup} \cong 2.2 \text{ ps}, t_{hold} \cong 10 \text{ fs}.

Our scheme non reusing ROs produces an output \( o_{Rm} \) as in (11). Thus, when no jitter affects CK (i.e., \( J = 0 \)), outputs \( o_{Rm} \) are encoded by a thermometer code with a number of zeros equal to m_0 = T_{CK}/τ \cong 168/6 \text{ ps} = 28. Fig. 10 shows the simulation results considering the case with no jitter affecting the first measured CK high phase (CK HP 1), and a jitter of 7 ps widening the second measured CK high phase (CK HP 2). As expected, when no jitter occurs (CK HP 1), while VM = 1 (Valid meas 1) our scheme outputs a word encoded by the thermometer code with 28 zeros (i.e., \( o_{Rm} = 0 \) for 1 \leq m \leq 28, \( o_{Rm} = 1 \) for 29 \leq m \leq 58). Since we measure jitter as the difference in the number of zeros between the produced output and the one expected for the jitter-free case (equal to 28) multiplied by the resolution of our scheme (equal to 6 ps), we correctly obtain a measurement of jitter equal to 0 ps.

Instead, when for instance a jitter of \( J = 7 \text{ ps} \) affects CK (CK HP 2), while VM = 1 (Valid meas 2) our scheme outputs a word encoded by the thermometer code with 29 zeros (i.e., \( o_{Rm} = 0 \) for 1 \leq m \leq 28, \( o_{Rm} = 1 \) for 29 \leq m \leq 58), thus resulting in a jitter measure equal to \( J = 6 \text{ ps} \), with 1-ps measurement error. Therefore, our scheme is able to measure jitter with the expected resolution of 6 ps, even in the presence of PSN.

As for the implementation of our scheme reusing ROs, we have verified that:

1) the PPV measurement accuracy of the original FUB in [16] is not degraded;
2) the clock jitter measurement accuracy is the same as for the implementation of our scheme non reusing ROs.

As for point 1), we have compared the oscillation period of the ROs of the original FUB (\( T_{RO,\text{orig}} \)) with that of the ROs
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

OMAÑA et al.: LOW-COST ON-CHIP CLOCK JITTER MEASUREMENT SCHEME

Fig. 11. Oscillation periods of the ROs in [12] (TRO_orig), and of the ROs reused by our jitter measurement scheme (TRO_our), as a function of (a) threshold voltage $V_{\text{th}}$ and (b) oxide thickness $T_{\text{ox}}$.

Fig. 12. Simulation results for nominal values of electrical parameters, considering a PSN of 50% of $V_{\text{DD}}$. 

reused by our scheme for jitter measurements ($T_{\text{RO,our}}$), as a function of parameters $V_{\text{th}}$ and $T_{\text{ox}}$. Fig. 11(a) and (b) shows $T_{\text{RO,orig}}$ and $T_{\text{RO,our}}$, as a function of $V_{\text{th}}$ and $T_{\text{ox}}$ variations ($\Delta V_{\text{th}}$ and $\Delta T_{\text{ox}}$) up to $\pm 30\%$ of their nominal values. As we can see, the relative difference between $T_{\text{RO,our}}$ and $T_{\text{RO,orig}}$ is always negligible, with 4% maximum increase with $V_{\text{th}}$ variation, and 3% with $T_{\text{ox}}$ variation. Similar results have been achieved for other PPVs. Therefore, the reuse of the ROs of the FUB in [16] to allow also clock jitter measurement does not impact the PPV measurement accuracy.

As for point 2), since all NOTs of the two reused and modified ROs present a delay $\tau$ of 12 ps, our scheme should provide a clock jitter measurement resolution of $\text{Res} = \tau/2 \cong 6$ ps. Fig. 12 shows the simulation results obtained for nominal values of electrical parameters and with a PSN of 50% of $V_{\text{DD}}$. The two cases of no clock jitter (CK HP 1), and clock with a jitter of 7 ps widening the second measured CK high phase (CK HP 2) are depicted. As expected, with no jitter, while VM = 1 (Valid meas 1) our scheme provides on $O_{\text{Res}}$ the same encoded word with 28 zeros as for the implementation without reusing ROs. Analogously, the same results have been obtained considering a jitter of 7 ps affecting the CK high phase (CK HP 2), with a word encoded by the thermometer code containing 29 zeros. Therefore, our jitter measurement scheme implemented by reusing the ROs of the FUB is able to measure clock jitter with the same resolution as the scheme in Section IV-A.

V. COSTS AND COMPARISON

We have evaluated the costs of our proposed scheme, implemented with and without reusing ROs, in terms of additional area and power consumption. We have compared it with the schemes in [4], [14], and [15]. Since neither implementation details, nor costs are reported in [13], it has not been considered for comparison.

As for [4]–[14], they feature the same resolution as our approach implemented with one NOT chain, which for the considered 65-nm CMOS technology is equal to 12 ps.

For comparison purposes, the latches and logic gates of the scheme in [4] have been implemented as the standard latch in [31], and minimum sized symmetric logic gates. The area of our scheme and [4] has been roughly estimated as the gate area of all transistors, while their power consumption has been evaluated by HSPICE simulations. As for [14], we considered the costs reported by the authors, which refer to a true implementation on a test chip with a 65-nm CMOS technology.

The obtained results are shown in Table I. As can be observed, when our scheme does not reuse the ROs of the FUBs, it allows a 40% reduction in both additional area and power over [4]. Compared with [14], our approach allows 99.7% additional area reduction. Instead, our scheme requires a power higher than [14] by 11% only, for an operating frequency 30 times higher (3 GHz versus 100 MHz).

On the other hand, when our approach is implemented by reusing the ROs of the FUBs, it allows 74% and 30% reduction in area and power consumption, respectively, over the approach in [4]. Compared with [14], in this case, our approach allows a 99.9% reduction in area. Instead, as for power, it is 28% higher, for an operating frequency 30 times higher. The area reported for the scheme in [14] refers to a true implementation on a test chip, while for our scheme is a rough estimation of the gate area of all transistors.

From Table I, it can be noticed that by reusing the ROs of the FUBs to implement our jitter measurement scheme, we obtain a considerable reduction of additional area over our scheme not reusing the ROs (approximately 55%). However,

<table>
<thead>
<tr>
<th>TABLE I</th>
<th>AREA AND POWER COSTS OF THE COMPARED SCHEMES, AND RELATIVE REDUCTIONS ($\Delta(%) = 100[(\text{[4, 14]}-\text{OURS}]/(\text{[4, 14]})$)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>Additional Area ($\mu$m$^2$)</td>
</tr>
<tr>
<td>Scheme in [4]</td>
<td>14.1</td>
</tr>
<tr>
<td>Scheme in [B]</td>
<td>2700</td>
</tr>
<tr>
<td>Our scheme not re-using ROs</td>
<td>8.3</td>
</tr>
<tr>
<td>Our scheme Re-using ROs</td>
<td>3.7</td>
</tr>
</tbody>
</table>

From Table I, it can be noticed that by reusing the ROs of the FUBs to implement our jitter measurement scheme, we obtain a considerable reduction of additional area over our scheme not reusing the ROs (approximately 55%). However,
the reuse of ROs requires a limited increase in power consumption (approximately 15%), compared with the case with no ROs reuse. This is because our scheme with not reused ROs also has a considerable area overhead, while by reusing four ROs of the FUB, it allows a 97% area reduction.

Similar to [14], the area reported for the scheme in [15] in Table II refers to a true implementation on a test chip.

VI. CONCLUSION

We have proposed an on-chip clock jitter measurement scheme for high performance microprocessors. The scheme enables in situ jitter measurement during the test or debug phase. It allows to achieve a very high and scalable measurement resolution and accuracy, despite the presence of PSN. We have shown that, when our scheme is implemented to feature the same resolution as the previous approach in [4], it allows a 40% reduction in both area and power consumption. Instead, compared with the approaches in [14] and [15], our scheme requires a considerably lower area overhead, while featuring the same measurement resolution.

We have also shown that, for the case of microprocessors employing ROs to measure PPVs, our jitter measurement scheme can be implemented by reusing part of the ROs, thus allowing a 55% reduction of additional area over our scheme not reusing the ROs.

REFERENCES

Martin Omaña received the Degree in electronic engineering from the University of Buenos Aires, Buenos Aires, Argentina, and the Ph.D. degree in electronic engineering and computer science from the University of Bologna, Bologna, Italy, in 2000 and 2005, respectively. He was awarded a MADESS grant and joined the University of Bologna, in 2002, where he is currently a Post-Doctoral Fellow. His current research interests include fault modeling, on-line test, robust design, fault tolerance, and photovoltaic systems.

Daniele Rossi received the Degree in electronic engineering and the Ph.D. degree in electronic engineering and computer science from the University of Bologna, Bologna, Italy, in 2001 and 2005, respectively. He is currently a Post-Doctoral Fellow with the University of Bologna. His current research interests include fault modeling and fault tolerance, coding techniques for fault tolerance and low-power signal integrity for communication infrastructures, and robust design for soft error resiliency.

Daniele Giaffreda received the M.S. degree in electronic engineering from the University of Bologna, Bologna, Italy, in 2009, where he is currently pursuing the Ph.D. degree with the Advanced Research Center on Electronic Systems for Information and Communication Technologies E. De Castro. His current research interests include faults modeling and electrical simulations with particular emphasis on the silicon photovoltaic solar cell.

Cecilia Metra (F’14) is a Professor of Electronics with the University of Bologna, Bologna, Italy. Her current research interests include fault modeling, on-line test, robust design, fault tolerance, energy harvesting, and photovoltaic systems. Prof. Metra is a Vice-President for Technical and Conference Activities of the IEEE Computer Society (CS) for 2014, and a member of the Board of Governors of the IEEE CS 2013-2015. Since 2013, she has been Editor-in-Chief of the IEEE CS online publication Computing. She is a Golden Core Member of the IEEE CS.

T. M. Mak received the M.S. degree from Hong Kong Polytechnic University, Hong Kong. He was doing Test Research and Development at Intel Corporation, Santa Clara, CA, USA, when this paper was written, and is responsible for 2.5/3-D test and DFT strategy at GLOBALFOUNDRIES, Santa Clara. He has more than 30 years of experience in microprocessor test, product development, design automation, research mentoring, and DFT.

Asifur Rahman has been serving with Intel Corporation, Santa Clara, CA, USA, as a Platform Debug Architect for Silicon Debug Technology and Research for the past six years. Prior to that, he was actively designing circuits and performing simulation tasks for the state-of-the-art processors for 12 additional years. Since 1997, he has been actively involved in the forefront of microprocessor design at Intel and served as the Lead Designer for Floating Point Divider. He holds specific expertise for dense low-leakage circuit design with extreme high-speed operations. In 2002, he invented the first optical logic recognition technology using 1064-nm emission light directly from silicon. In 2005, he invented the industry’s first software integration engine that can connect multiple OS-based CAD applications and data models with physical probing hardware. He is an Adjunct Faculty Instructor with Portland State University, Portland, OR, USA, and actively conducts university level research with interns from various U.S. and Japan institutions. He has authored eight technical journals and conference papers, and holds three U.S. patents in the related fields of silicon debug and software integration and one international patent on solar technology.

Simon Tam (SM’07) received the B.S., M.S., and Ph.D. degrees in electrical engineering and computer sciences from the University of California at Berkeley, Berkeley, CA, USA. He was a Senior Principal Engineer with the Microprocessor Development Group, Intel Corporation, Santa Clara, CA, USA, in 2011, engaged with the design of server microprocessors with special emphasis on high-frequency clocking architecture and circuits. Before joining the Microprocessor Development Group, he was with the Intel Neural Network Group. He designed electrically programmable neural network chips using analog VLSI techniques and EEPROM technology. He was also with the Intel California Technology Development Division engaged with the development of flash memory and EEPROM. He holds 29 U.S. patents, has authored and co-authored 47 technical publications, and has authored one book chapter, all in the areas of microprocessor designs, nonvolatile memory, and neural network circuit technologies. Dr. Tam was a member of the Technical Program Committee of the 2007–2010 Symposium on VLSI Circuits and the 2009 Custom Integrated Circuit Conference.
