S & M 1130 # Time-to-Digital Converter-Based Maximum Delay Sensor for On-Line Timing Error Detection in Logic Block of Very Large Scale Integration Circuits Kentaroh Katoh\* and Kazuteru Namba1 National Institute of Technology, Tsuruoka College, 104 Inooka Aza-Sawada, Tsuruoka, Yamagata 997-8511, Japan <sup>1</sup>Graduate School of Advanced Integrated Science, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba-shi, Chiba 263-8522, Japan (Received February 16, 2015; accepted July 2, 2015) Key words: VLSI, timing error detection, maximum delay sensor, TDC, on-line delay measurement In this paper, we present a time-to-digital converter (TDC)-based maximum delay sensor (MDS) for on-line timing error detection in the logic block of very large scale integration (VLSI) circuits. The MDS captured the maximum propagation delay of the target end point for on-line timing error detection. Because the MDS was TDC-based, the resolution was high. In addition, the periodic on-line maximum delay capturing for on-line timing error detection using an MDS did not interrupt normal operation. Because the MDS was a small digital circuit, it could be easily inserted into the logic blocks of high-speed and low-power processors and systems-on-chip (SOCs). With LTSPICE simulation using 45 nm metal gate/high-K/strained-Si of the predictive technology model, the behavior of the proposed analyzer was confirmed. The results showed that the area overhead is 34.9% on average. #### 1. Introduction Smarter infrastructures, cloud computing, and large data analyses require very large scale integration (VLSI) circuits with a fast clock frequency and low power. (1,2) Highspeed and low-power VLSI circuits require further scaling of complementary metal oxide semiconductor (CMOS) technology. However, in scaled process technology, problems due to aging such as negative bias temperature instability (NBTI) become serious. (3) Aging causes timing-related circuit failure during normal operation. Because the effect of aging strongly depends on the degradation of paths inside a circuit, monitoring delay degradation obtained from periodic on-line delay measurements is useful for predicting failure due to aging.<sup>(4)</sup> Additionally, periodic on-line delay <sup>\*</sup>Corresponding author: e-mail: k-katoh@tsuruoka-nct.ac.jp measurement significantly improves the performance by enabling close to the best-margin behavior or realizing significantly lower power by adaptive voltage scaling. (5) Furthermore, on-line delay measurements can record the history of delays along internal paths. Therefore, it is also useful for chip diagnostics. In this paper, we present a time-to-digital converter (TDC)-based maximum delay sensor (MDS) for on-line timing error detection in the logic block of VLSI circuits. An MDS captures the maximum propagation delay, which is an important parameter for timing error detection. Because an MDS is TDC-based, the resolution is high. In addition, the periodic on-line maximum delay capturing for on-line timing error detection using MDS does not interrupt normal operation. Because an MDS is a small digital circuit, it can be easily inserted into the logic blocks of high-speed and low-power processors and systems-on-chip (SOCs). The rest of the paper is organized as follows. In § 2, we discuss the basics of TDCs. In § 3, we explain the target materials and the details of MDSs. In § 4, we describe the results of evaluations. After the discussion in § 5, we present our conclusions in § 6. ## 2. Materials and Methods The target block of an MDS is a logic block. Therefore, the target materials are those used for CMOS technologies or fin-shaped field-effect transistor (FinFET) CMOS technologies. (6,7) In this section, we explain the details of MDSs. MDS is based on an on-chip delay measurement circuit. In § 2.1, we give a brief explanation of the on-chip delay measurement circuit. In § 2.2, we explain the timing error detection with the collected maximum delay sequence for an MDS. ## 2.1 Basics of TDC The MDS was based on the monotonic TDC, which is a popular on-chip delay measurement circuit. Figure 1(a) shows an example of the monotonic four-stage TDC architecture. A TDC is composed of four positive edge-triggered D-type flip-flops, an upper delay line, and a lower clock line. The delay line was composed of three buffers with uniform delays. Each stage of the TDC was composed of a flip-flop and a buffer. Two input transition signals were launched from *start* and *stop* inputs. The TDC measured the time interval between a positive transition from *start* and a positive transition from *stop*. The time resolution was equal to the delay of a buffer. The thermometer code $Q_0Q_1Q_2Q_3$ indicated the time interval. Figure 1(b) shows the timing chart of the basic TDC when the time interval between a transition signal from *start* and a transition signal from *stop* was 1. In this timing chart, $\tau_0 = \tau_1 = \tau_2 = 1$ . Table 1 shows the relationship between the time interval $\Delta t$ and $Q_0Q_1Q_2Q_3$ when all buffer delays were 1. The relationship between $\Delta t$ and $Q_0Q_1Q_2Q_3$ was linear when the thermometer code was smaller than 3 and all the buffers had a uniform delay. The range of measurement of this four-stage TDC was 3. In general, the range of measurement of an N-stage TDC is N-1. Fig. 1. Basic four-stage TDC. (a) Basic four-stage TDC architecture. (b) Timing chart. Table 1 Relationship between $\Delta t$ and thermometer code. | Time interval | $Q_0$ | $Q_1$ | $Q_2$ | $Q_3$ | |----------------------|-------|-------|-------|-------| | $0 \le \Delta t < 1$ | 1 | 0 | 0 | 0 | | $1 \le \Delta t < 2$ | 1 | 1 | 0 | 0 | | $2 \le \Delta t < 3$ | 1 | 1 | 1 | 0 | | $3 \leq \Delta t$ | 1 | 1 | 1 | 1 | # 2.2 Concept of timing error detection with analysis of maximum delay We explain the idea of failure prediction using on-line maximum delay measurements. To predict failure, the maximum delay was obtained with the proposed delay sensor. As shown on the left side of Fig. 2, the input of the MDS was connected to an end point of paths (line *ep* in this example). The MDS captured the maximum delay value during normal operation. As a circuit suffers from aging, the maximum delay value increases. The circuit gives a warning when the value of the captured maximum delay is more than a predefined threshold value. The right side of Fig. 2 shows an example. The horizontal axis is the sampling time. The vertical axis is the maximum delay value. The threshold value is 2 in this example. When the sampling time is 3, the circuit gives a warning. ## 2.3 *MDS* #### 2.3.1 *Basics* The basics of the minimum on-line delay measurement using MDS are illustrated in Fig. 3. In this example, delays in the four sensitizable paths $p_0 - p_3$ were included in the input logic cone whose root was connected to ep. The MDS was composed of the embedded monotonic TDC and the extra digital elements. The embedded monotonic TDC had two inputs: *start* and *stop*. The TDC measured the time interval between two transitions launched on the two inputs.<sup>(8)</sup> The Fig. 2 (left). Basics of failure prediction using MDS. Fig. 3 (right). Basics of on-line maximum delay measurement using MDS. measured time interval was $\Delta t = t_{\text{stop}} - t_{\text{start}}$ , where $t_{\text{start}}$ is the arrival time of the transition on *start* and $t_{\text{stop}}$ is the arrival time of the transitions on *stop*. The MDS had two inputs, ep and clk. The input ep was connected to the target end point. The input clk was connected to the clock through the delay element DL. During on-line maximum delay measurements, ep was connected to stop and clk was connected to start. The MDS had two modes, an initializing mode and a delay measurement mode. During the initializing mode, ep was connected to start and clk was connected to stap. During the delay measurement mode, a transition was launched to the start point of $p_0$ that was synchronized to a positive transition of the clock signal when a path $p_0$ was sensitized during normal operation. The positive transition of the clock signal propagated to the *start* input of the TDC through the delay element DL and the clk of the MDS. The delay element DL was inserted to reduce the absolute value of the measured time interval. Because the area cost of a monotonic TDC is proportional to the absolute value of the measured time interval, the DL was inserted to reduce the extra area. The transition launched to the start point of $p_0$ arrived on the *stop* input of TDC through the end point of $p_0$ and the *ep* input of MDS. The MDS measured the time interval $\Delta t = t_{\rm P_0} - t_{\rm clk}$ , where $t_{\rm P_0}$ is the data arrival time of the transition launched to the start point of $p_0$ on *stop*, and $t_{\rm clk}$ is the data arrival time of the positive transition of the clock on *start*. If $\Delta t$ , which is the measured delay, was larger than $\Delta t_{\rm max}$ , which is the maximum delay value kept in the MDS, the value was updated to $\Delta t$ . We assumed that we realized the fine tunable delay of DL using a delay lock loop (DLL). DLL could adjust the delay value in a fine constant resolution. (9,10) From eq. (1), the measurement range of a path delay $t_p$ using MDS is given. $$t_{\rm DL} \le t_{\rm p} \le t_{\rm DL} + \sum_{i=0}^{N-1} \tau_i,$$ (1) where $t_{DL}$ is the propagation delay of DL. Furthermore, N and $\tau_i$ must meet the following eq. (2) to measure a path delay $t_n$ at least. $$\text{mod}(t_{p}, t_{\text{res}}) < \sum_{i=0}^{N-1} \tau_{i},$$ (2) where $t_{\rm res}$ was the resolution of DLL. We assumed that standard cells were used for the buffers of MDS. They were vulnerable to process variation. On the other hand, DLL was more robust to the process variation than the standard cells. Therefore $t_{\rm DL}$ was approximated to be a multiple of $t_{\rm res}$ . By adjusting $t_{\rm DL}$ and N, we could measure the arbitrary path delay in an arbitrary range. Under the condition of eq. (2), we could choose the minimum N by adjusting $t_{\rm DL}$ considering the required resolution of the measurement and the variation of the path delay due to aging. An extra area was required for DLL. However, because DLL was located outside of the target logic block, it did not degrade the performance of the target digital circuit directly. Regarding the delay of buffers ( $\tau_i$ ), they were varied with the process variation. The variation was not negligible because we assumed the use of standard cells for the buffers. To compensate for the variation, an on-line calibration, such as the ones proposed in Refs. 12 and 13, was used to obtain accurate absolute delay values. ## 2.3.2 Gate level architecture Figure 4 depicts the gate-level architecture of the four-stage MDS. The MDS had five inputs, ep, clk, mode, ntrn, and rst. The sensor had outputs of the thermometer code. A four-stage TDC has a 4-bit output, $Q_0Q_1Q_2Q_3$ . Fig. 4. Four-stage MDS. The buffer delays of the 1st, 2nd, and 3rd stages were $\tau_0$ , $\tau_1$ , and $\tau_2$ , respectively. Each output of a flip-flop was fed back to the clock input through a 2-input OR gate. Another input of each OR gate was connected to *stop*. The input *start* of the TDC was connected to the 2-to-1 multiplexer $S_0$ . The input *stop* of the TDC was connected to the 2-to-1 multiplexer $S_1$ . Both $S_0$ and $S_1$ were controlled by the *mode* input. The input *ep* was connected to a 2-input XOR gate. The other input of the XOR gate was connected to *ntrn*, which controlled the polarity of the transition from *ep*. When we measured delays of positive paths, 0 is set to *ntrn*. When we measured delays of negative paths, 1 is set to *ntrn*. When on-line delay measurement was carried out, the value of *mode* was 1. When the MDS was initialized before measurement, the value of *mode* was 0. When *mode* was 0, the output of the XOR gate was connected to *start*, and *clk* was connected to *stop*. When *mode* was 1, *clk* was connected to *start*, the output of the XOR gate was connected to *stop*, and under that condition, the output of the sensor was initialized. In this example, *ep* was connected to the end point of a path $p_T$ through the redundant line $p_R$ . The start point of $p_T$ was $FF_S$ ; the end point of $p_T$ was $FF_S$ . #### 2.4 On-line maximum delay measurement with MDS Figure 5 shows how the MDS captures the maximum delay value during normal operation. In this example, the number of stages of the embedded monotonic TDC inside the MDS was four. The delay of all the buffers inside TDC was 1. First, both *mode* and *rst* were set to 0. The outputs of the flip-flops of the TDC were reset to 0, and the clock signal was provided to *clk*. Then, the outputs of the MDS were initialized to 0. Fig. 5. Getting maximum delays using MDS. Figure 5 depicts the two cases, (a) and (b). In both cases, the thermometer code of MDS was $Q_0Q_1Q_2Q_3 = 1100$ when t = n - 1. This means that the maximum time interval arriving before t = n was 2. Note that the clock inputs of $FF_0$ and $FF_1$ were fixed to 1 because the output values of these flip-flops were 1. In case (a), the time interval 0.5 was applied to the inputs when t = n. Then, the values 1, 0, 0, 0 arrived on the inputs of the flip-flops, $FF_0$ , $FF_1$ , $FF_2$ , and $FF_3$ , respectively. The flip-flops $FF_0$ and $FF_1$ kept their previous values because the clock inputs of the flip-flops were fixed at a static 1. The flip-flops $FF_2$ and $FF_3$ captured the input values. The captured values were the same as the previous ones. Consequently, MDS kept the previous thermometer code. In case (b), the time interval 2 was applied to the inputs when t = n. Then, the values 1, 1, 1, 0 arrived on the inputs of the flip-flops, $FF_0$ , $FF_1$ , $FF_2$ , and $FF_3$ , respectively. The flip-flops $FF_0$ and $FF_1$ kept the previous values because the clock inputs of the flip-flops were fixed at a static 1. The flip-flops $FF_2$ and $FF_3$ captured the input values. The value of $FF_2$ was updated to 1, while the value of $FF_3$ was updated to 0. Accordingly, the thermometer code of the MDS was updated to $Q_0Q_1Q_2Q_3 = 1110$ , which was the thermometer code when $\Delta t$ was more than 2 and less than 3. In this way, the current thermometer code of the sensor was kept when the arrival time interval was lower than the current maximum delay. The current thermometer code was updated when the arrival time interval was longer than the current maximum delay. ## 3. Results In this section, we present the evaluation of the MDS. First, we confirmed the circuit behavior using LTSPICE. (14) In all the simulations, the VDD supply voltage was 1.0 V and the temperature was 27 °C. We implemented the proposed four-stage delay sensor using SPICE models of the standard cell libraries made with 45 nm metal gate/high-K/strained-Si of the predictive technology model. (15) First, we obtain the input-output specification of the four-stage MDS. The delay of DL is fixed to 30 ps. The time interval is swept up from 4 to 40 ps. Every 2 ps, we obtain the thermometer code. Figure 6 plots the result. The horizontal axis shows the time interval. The vertical axis shows the thermometer code. According to the result, the measurement resolution is on the order of 10 ps. Figure 7 shows an LTSPICE simulation result. This figure illustrates the waveforms of *rst*, *mode*, *clk*, *ep*, $Q_3$ , $Q_2$ , $Q_1$ , and $Q_0$ going from top to bottom in the diagram. In the first clock of clk, the sensor was initialized. In the succeeding three clocks, the time interval between clk and ep was measured three times. The time intervals of the 2nd, 3rd, and 4th clocks were 20, 10, and 40 ps, respectively. As shown in Fig. 6, the measurements when $\Delta t = 20$ , 10, and 40 ps were 1100, 1000, and 1111, respectively. In the 2nd clock, when t = 15 ns, the measurement result 1100 was captured in the flip-flops. In the 3rd clock, the launched time interval was shorter than the previous measurement, and thus the previous measurement was preserved in the flip-flops. In the 4th clock, the time interval was longer than the preserved one. Accordingly, the measurement result 1111 was captured. Fig. 6. (Color online) Relationship between time interval $\Delta t$ and thermometer code of four-stage MDS ( $t_{\rm DL} = 30 \, \rm ps$ ). Fig. 7. Waveform of LTSPICE simulation. The area overhead of the sensor was evaluated. The MDS was described using verilog hardware description language (HDL). Each verilog description was synthesized using Synopsys Design Compiler. Rohm 0.18 $\mu$ m process technology is used for this evaluation. The area and area overhead of each circuit were estimated from the result. Table 2 shows the result. N is the number of stages; $A_{\rm TDC}$ and $A_{\rm MDS}$ are the areas of the standard TDC and MDS, respectively; O is the area overhead defined by the equation O Table 2 Area overhead. | N | $A_{\mathrm{TDC}} \left( \mu \mathrm{m}^2 \right)$ | $A_{\mathrm{MDS}} (\mu\mathrm{m}^2)$ | O (%) | |------|----------------------------------------------------|---------------------------------------|-------| | 4 | 309.7 | 451.6 | 45.8 | | 8 | 632.2 | 838.7 | 32.7 | | 16 | 1277.3 | 1612.8 | 26.3 | | Ave. | _ | _ | 34.9 | = $(A_{\rm MDS}/A_{\rm TDC}-1) \times 100.0$ (%). The area overhead was 52.4% when N=4. When the number of stages of TDC was small, the area of the extra logic for MDS produces an undesirable effect on the area overhead. As the number of stages of the TDC increased, the area overhead decreased. When N=16, the area overhead was 26.3%. The average of the area overhead was 34.9%. The four-stage MDS was implemented on the Virtex-5 FPGA ML501 evaluation platform. (16) Figure 8 shows the experimental circuit. The clock frequency is 100 MHz. The upper part was the reconfigurable delay line controlled by the 6-bit control signals $S_5S_4S_3S_2S_1S_0$ . The delay became minimum when $S_5S_4S_3S_2S_1S_0 = 000000$ . The delay became maximum when $S_5S_4S_3S_2S_1S_0 = 1111111$ . The maximum delay is measured using the left-bottom four-stage MDS. Note that rst was the negative reset line. The signal ntrn was fixed to 0 in this experiment. The right-bottom extra circuit was for the measurement of each of the four-stage MDS. When the control line cal was 1, a 100 MHz clock signal was applied to *start* and *stop*. The clock phase of *stop* was swept up by 39 ps with the digital clock manager (DCM).<sup>(9)</sup> The MDS measured the applied time interval one by one. According to the results, we could obtain the delay of each $\tau_0$ , $\tau_1$ , and $\tau_2$ . Table 3 shows the delay of each $\tau_0$ , $\tau_1$ , and $\tau_2$ . When cal was 0, the MDS worked as delay sensor. We varied the delay of the reconfigurable delay line by changing $S_5S_4S_3S_2S_1S_0 = 101101$ (Step 1), $S_5S_4S_3S_2S_1S_0 = 101100$ (Step 2), and $S_5S_4S_3S_2S_1S_0 = 101100$ 101110 (Step 3) sequentially. We observed the thermometer code in each step using the Xilinx on-chip logic analyzer Chip Scope Pro 14.7. The observation results confirmed that the MDS worked correctly. # 4. Discussion The proposed MDSs were put on the end points of the critical paths to monitor if the delays of these end points suffered from aging. Let a clock frequency of a circuit be 1 GHz. When the delay of the critical path was 80% of the clock period, the nominal maximum delay was 800 ps. Let the increase of path delay for aging be 10% of the clock period. Then the variation of the path delay of the critical path was 80 ps. The resolution of the MDS was on the order of 10 ps. A 10 ps resolution was sufficient to monitor the variation of the maximum delay for timing error detection. Because an MDS was an on-chip circuit, it also suffered from aging or process variation. The error due to aging and process variation could be compensated using on-chip delay calibration. (12,13) Fig. 8. Prototype of four-stage MDS on Virtex-5 FPGA ML501 evaluation platform. Table 3 Measured delay of $\tau_i$ (ns). | $ au_0$ | $ au_1$ | $ au_2$ | |---------|---------|---------| | 0.55 | 0.77 | 0.07 | #### 5. Conclusions In this paper, we presented an MDS for the on-line timing error detection in the logic block of VLSI circuits. The MDS captured the maximum propagation delay of the target end point for on-line timing error detection. Because the MDS was TDC-based, the resolution was high. In addition, the MDS did not interrupt normal operation. Because the MDS was a small digital circuit, it could be easily inserted into the logic blocks of high-speed and low-power processors and SOCs. With LTSPICE simulation using 45 nm metal gate/high-K/strained-Si of the predictive technology model, the behavior of the proposed analyzer was confirmed. The results show that the area overhead was 34.9% on average. # Acknowledgements The authors would like to acknowledge Emeritus Prof. Hideo Ito, Prof. Haruo Kobayashi, and Prof. Naoto Miyamoto. This work was supported by the VLSI Design and Education Center (VDEC), The University of Tokyo, in collaboration with Rohm Corporation, Toppan Printing Corporation, and Synopsys, Inc. #### References - 1 IBM Corp.: Homepage of IBM Corp., http://www.ibm.com/ibm/green/?lnk=msoST-eaen. (accessed January 2015). - 2 IBM Corp.: Homepage of IBM Corp., http://www-03.ibm.com/press/us/en/pressrelease/44357.wss (accessed January 2015). - 3 I. D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, D. Blaauw, T. Austin, K. Flautner and T. Mudge: Proc. IEEE/ACM International Symposium on Microarchitecture (IEEE, New York, 2003) pp. 7–18. - 4 J. Wong and P. Y. Cheung: IEEE Trans. VLSI Syst. 21 (2013) 2307. - 5 R. M. Agarwal, V. Balakrishnan, A. Bhuyan, K. Kim, B. C. Paul, W. Wang, B. Yang, Y. Cao and S. Mitra: Proc. IEEE International Test Conference (IEEE, New York, 2008) pp. 1–10. - 6 S. M. Sze: Semiconductor Devices: Physics and Technology (Wiley, Cambridge, 2008). - 7 J. G. Fossum and V. P. Trivedi: Fundamentals of Ultra-Thin-Body MOSFETs and FinFETs (Cambridge University Press, Cambridge, 2013). - 8 R. Datta, A. Sebastine, A. Raghunathan and J. A. Abraham: Proc. Great Lakes Symposium on VLSI (IEEE, New York, 2004) pp. 145–148. - 9 Xilinx (2012) Virtex-5 User Guide, www.xilinx.com (accessed June 2015). - 10 M.-J. E. Lee, W. J. Dally, T. Greer, H.-T. Ng, R. Farjad-Rad, J. Poulton and R. Senthinathan: IEEE J. Solid-State Circuits **38** (2003) 614. - 11 Xilinx (2007) Virtex-5 CMT Characterization Report, www.xilinx.com (accessed June 2015). - 12 K. Katoh and K. Namba: Proc. International Symposium on Quality Electronics Design (IEEE, New York, 2004) pp. 430–434. - 13 R. Rashidzadeh, M. Ahmadi and W. C. Miller: IEEE Trans. Instrum. Meas. 59 (2010) 463. - 14 Linear Technology Corporation, http://www.linear.com/ (accessed June 2015). - 15 Predictive Technology Model, http://ptm.asu.edu/ (accessed June 2015). - 16 Xilinx (2009) ML501 Evaluation Platform User Guide, www.xilinx.com (accessed June 2015).