Analog Circuit Yield Optimization via Freeze-Thaw Bayesian Optimization Technique

Xiaodong Wang, Changhao Yan, Member, IEEE, Yuzhe Ma, Member, IEEE, Bei Yu, Member, IEEE, Fan Yang, Member, IEEE, Dian Zhou, Senior Member, IEEE, and Xuan Zeng, Senior Member, IEEE

Abstract—While the VLSI community cares about designs with high yields under process variations, expensive computational costs make conventional yield optimization methods for analog circuits inefficient for industrial applications. In this article, an efficient yield optimization method via the freeze-thaw Bayesian optimization technique is proposed for analog circuits. The yield analysis is integrated into the exploration process of the Bayesian optimization. With a specified Gaussian process regression method, the flexible freeze-thaw Bayesian optimization technique is utilized to automatically guide the search in the design space and control the accuracy of yield analysis in the process space. A performance optimization problem is formulated and solved to mine prior knowledge, and a further speedup is achieved. Experimental results show that the proposed method can gain a 2.47$\times$–5.73$\times$ speedup compared with the state-of-the-art methods, without loss of accuracy.

Index Terms—Design for space exploration, freeze-thaw Bayesian optimization, transistor sizing, yield modeling, yield optimization.

I. INTRODUCTION

A S SEMICONDUCTOR fabrication technology scales to the nanometer level, process variations have strong impacts on the performances, yields of analog circuits. In order to deal with the increasing challenges in reliable circuit designs, the IC community pays more and more attention to yield optimization recently [1], [2].

In general, yield optimization flow features an iterative loop as designers first adjusting design parameters, like sizes of transistors, then executing the time-consuming yield analysis. Since yield analysis (PVT or Monte Carlo simulations) needs thousands of simulations to guarantee the accuracy, the time cost of the entire yield optimization is extremely high. Considering the narrowing time-to-market, reducing the overall simulation time of yield optimization for analog circuits is the most urgent requirement.

To resolve this issue, a large number of methodologies and algorithms have been proposed, mainly including the following three categories. Corner-based methods [3]–[7] optimize the “worst case” performance of given circuits at several process corners. Although this treatment avoids costly yield estimations, it is coarse, inaccurate, and often leads to overdesign. In addition, it is difficult to search the worst case in a high-dimensional process space.

Monte-Carlo (MC)-based methods are the most straightforward and widely used methods due to their high accuracy and generality. Liu et al. [8] and Guerra-Gomez et al. [9] applied the optimal computation budget allocation (OCBA) techniques for MC speedup, and evolutionary algorithms for optimization. Wang et al. [10] employed the kernel density estimation method for yield modeling and proposed a multistart-point expectation-maximization-like algorithm to solve the problem. Wang et al. [11] proposed an adaptive yield analysis method and implemented Bayesian optimization with the weighted expected improvement (wEI) criterion to obtain the optimal design. Zhang et al. [12] employed the Gaussian process (GP) regression with the neural network and max-value entropy search (ES) methods for optimization based on [11]. Although Bayesian optimization has shown certain advantages in previous research, the computational costs are still a major deterrent to mainstream adoption for yield optimization. For example, the state-of-the-art methods [11], [12] need 6000–20 000 simulations, which is time prohibitive for analog circuits.

Response-surface-based methods [13]–[16] try to build a surrogate model of circuit performance to replace the expensive simulator, thus reducing the cost of yield optimization. However, these methods are limited for requiring lots of samples to maintain modeling accuracy and being nearly impossible to build the surrogate model in a high-dimensional space.

As we only care about the optimal design, the simulation resources allocated to selected designs should be dynamically adjusted. In fact, the idea of applying coarse yield estimations for low-yield designs has been successfully introduced in [11] for saving simulations, as shown in the low-yield region in Fig. 1(a), where the x-axis is the number of samples, the y-axis is the yield, and different lines are different designs. Those designs whose yields are much lower than design 1 are discarded with cheap estimations, e.g., design 4 and design 5. However, for designs with yields close to design 1, e.g.,
design 2 and design 3, [11] cannot distinguish them effectively, and has to calculate their high-accuracy yields immediately, leading to huge simulation cost. Intuitively, if we collect designs with potentially high yields and gradually improve their analysis accuracy rather than execute high-accuracy yield analysis at once, simulation cost may be reduced because yield analysis at most designs can be early stopped when a higher-yield region is found, even if they were ever considered candidates for optimal design in the early stage of the optimization procedure.

Recently, curve learning-related modeling methods have aroused widespread interest in the machine learning community [17], [18]. Inspired by [17], a freeze–thaw Bayesian optimization technique is introduced in this article for yield optimization. The basic idea of freeze–thaw is to integrate yield analysis into the exploration process of Bayesian optimization, automatically guide the search in the design space and allocate computational resources of yield estimations. By modeling the process of yield analysis, we can temporarily freeze the analysis at one design, if its predicted final yield does not seem to be optimal. We can also thaw the analysis if it becomes the optimal candidate again. Even we can start analysis at a new design.

Concretely, improving the yield accuracy of a design is considered as an iterative process throughout the entire optimization, i.e., samples are gradually added to improve the analysis accuracy. The freeze–thaw Bayesian optimization puts and freezes the high-yield designs in a candidate basket for their yields are currently blurry, predicts the next design with optimal yield and selects the design from basket, which maximally reduces the uncertainty of the optimal (i.e., information gain), to execute yield analysis. This technique gradually improves the accuracy of yield analysis in the high-yield region, and most designs will be allocated only a small number of simulations as shown in Fig. 1(b).

Empirically, a good design should have a high yield and good circuit performances, i.e., satisfying performance specifications, which hints us to intuitively find the design with optimal yield near those designs with good performances. Compared with the expensive yield optimization, the performance optimization at the typical–typical (TT) corner is much cheaper. Therefore, we innovatively embed the relatively cheap performance optimization into the yield optimization flow. The performance optimization at the beginning of yield optimization is able to provide a good start for yield optimization, resulting in an efficient warm start, which shows considerable advantage on efficiently exploring the large optimization space.

In this article, a general and efficient yield optimization method for analog circuits is proposed. Some key contributions are concluded as follows.

1) A freeze–thaw Bayesian optimization technique is first applied for the yield optimization of analog circuits, which automatically guides the search in the design space and gradually improves the analysis accuracy in the process space. With the high sampling efficiency of Bayesian optimization and flexibility of the freeze–thaw technique, the overall costs of yield optimization are significantly reduced.

2) A nominal performance optimization problem is formulated and solved in the yield optimization framework. This treatment helps to mine prior knowledge for yield optimization, and a further speedup is achieved.

3) Experimental results show that the proposed method achieves a 2.47×–5.73× speedup over the state-of-the-art methods, without loss of accuracy.

The remainder of this article is organized as follows. In Section II, we present the problem formulation and traditional methods for yield estimation, modeling, and optimization. The complete yield optimization approach via a freeze–thaw technique is introduced in Section III. The implementation details are discussed in Section IV. Experimental results are given to validate the proposed method in Section V. Finally, Section VI concludes this article.

II. BACKGROUND

A. Problem Formulation

In design space $D \subseteq \mathbb{R}^{d_x}$, a design point $x = [x_1, x_2, \ldots, x_{d_x}]^\top \in D$ means a $d_x$-dimensional vector. Design parameters $x_i, i = 1, \ldots, d_x$ are restricted in reasonable ranges $[l_i, u_i]$, respectively, representing values of bias voltages and currents, widths and lengths of transistors, etc.

In process space $V \subseteq \mathbb{R}^{d_s}$, a process point $s = [s_1, s_2, \ldots, s_{d_s}]^\top \in V$ denotes a $d_s$-dimensional vector. Process parameters $s_i, i = 1, \ldots, d_s$ are random variables modeling the variations of manufacturing process, such as threshold voltages, etc., following normal distributions provided by foundries [19].

Generally, in most process design kits (PDKs), process parameters are independent of the design parameters and mutually independent [10]. The probability density function (PDF) of $s$ is

$$p(s) = \prod_{i=1}^{d_s} \left[ \frac{1}{\sqrt{2\pi}} \exp \left( -\frac{s_i^2}{2} \right) \right].$$

Circuit performances $y = [y_1, y_2, \ldots, y_k]^\top$ can be regarded as a function of design and process parameters: $y_i = f^3_{\text{sim}}(x, s), x \in D, s \in V$, where $f^3_{\text{sim}}$ means that the $i$th performance $y_i$ can be obtained from a transistor-level simulation $f_{\text{sim}}$. Usually, circuits are supposed to meet corresponding specifications $c = [c_1, c_2, \ldots, c_k]^\top$ set by designers. A circuit is success if all specifications are satisfied, i.e., $y_i \geq c_i, i = 1, \ldots, k$ without loss of generality. Otherwise, it is a fail.

Given design parameters $x$, the yield $Y(x)$ can be expressed as

$$Y(x) = \int_V I(x, s)p(s)ds$$
where indicator function $I(x, s) = \text{AND}(y_i \geq c_i), i = 1, \ldots, k$, and $\text{AND}(·)$ is logical function \text{AND}.

The goal of the yield optimization problem is to find a design point $x^*$ with maximal yield $Y$ as

$$x^* = \arg \max_{x \in D} Y(x). \quad (3)$$

**B. Monte Carlo Analysis**

MC methods are the most widely used numerical methods for probability density analysis. For a given design $x$, with $N$ random samples $s_i, i = 1, \ldots, N$ drawn from $p(s)$, MC estimates the yield $Y(x)$ as

$$\hat{Y}_{MC}(x) = \frac{1}{N} \sum_{i=1}^{N} I(x, s_i). \quad (4)$$

The variance of the estimator $\hat{Y}_{MC}(x)$ is determined by the sampling size $N$ as [20]

$$\sigma^2_{\hat{Y}} \approx \frac{\hat{Y}_{MC}(1 - \hat{Y}_{MC})}{N} \cdot k_r \quad (5)$$

where the confidence level $k_r$ is a constant. For example, $k_r = 1.645$ corresponds to a 90% confidence level.

By effectively utilizing the relationship between confidence interval $\sigma^2_{\hat{Y}}$ and sample number $N$, the number of MC samples required for one yield estimation can be adjusted by given target yield and specific accuracy.

**C. Gaussian Process Regression Method**

The GP regression model [21] is a powerful and effective nonparametric probabilistic method, which has been widely studied in many engineering fields, due to its ability to provide both posterior mean and corresponding uncertainty. Given a finite collection of $N$ points, $X = \{x_1, x_2, \ldots, x_n\}$, $x_i \in D, i = 1, \ldots, n$, and the unknown true yield $Y = \{Y(x_i), i = 1, \ldots, n\}$, GP is defined as a probability distribution over $Y$ such that $Y = \{Y(x_i), i = 1, \ldots, n\}$ jointly have a multivariate Gaussian distribution

$$Y \sim N(\mu, K) \quad (6)$$

where $\mu$ is an $n \times 1$ mean vector, and $K$ is an $n \times n$ covariance matrix.

GP is fully specified by its prior mean function $m(x)$ and its covariance function $k(x_i, x_j)$. In (6), the mean vector $\mu$ is determined by $m(x)$, i.e., $\mu_i = m(x_i), i = 1, \ldots, n$, and $k(x_i, x_j)$ determines the covariance matrix $K$, i.e., $K_{ij} = k(x_i, x_j), i, j = 1, \ldots, n$. Usually, $m(x)$ is set to a constant for convenience. As for the covariance function, there are usually multiple options, e.g., the squared exponential kernel and Matérn kernels. In this article, Matérn-5/2 kernel is selected as

$$k(x_i, x_j) = \sigma_f^2 \left(1 + \sqrt{5}r^\nu + \frac{5}{3}r^2 \right) \exp\left(-\sqrt{5}r^\nu\right) \quad i, j = 1 \ldots n \quad (7)$$

where $r^2 = (x_i - x_j)^\top A^{-1}(x_i - x_j)$, $\sigma_f$ denotes the output variance, and $A = \text{diag}(l_1^2, \ldots, l_n^2), l_i, i = 1, \ldots, n$ represents the $i$th characteristic length scale.

Considering the noise $\epsilon \sim N(0, \sigma^2)$ of statistical yield analysis, noise-corrupted estimated yield $\hat{Y} = \{\hat{Y}(x_i), i = 1, \ldots, n\}$ has the form

$$\hat{Y} \sim N(Y, \sigma^2 I) \quad (8)$$

where $I$ is the identity matrix. Then, the covariance function for the elements of matrix $K'$ becomes

$$k'(x_i, x_j) = k(x_i, x_j) + \sigma^2 \delta_{ij} \quad (9)$$

where $\delta_{ij} = 1$ if $i = j$, otherwise $\delta_{ij} = 0$.

The hyperparameters $\sigma_f, l_i, i = 1, \ldots, n$ and $\sigma_x$ in the GP model can be determined by the maximum likelihood estimation (MLE). Given a new design $x_n$, its corresponding yield $\hat{Y}(x_n)$ and observed yields $Y$ follow the joint Gaussian distribution:

$$\begin{bmatrix} Y(x_1) \\ \vdots \\ Y(x_n) \end{bmatrix} \sim N\left( \begin{bmatrix} \mu \end{bmatrix}, \begin{bmatrix} K' \\ K'(x_n, X) \end{bmatrix} \right) \quad (10)$$

Then, the predictive distribution of $\hat{Y}(x_n)$ conditioned on observations $(X, Y)$ can be derived with the following posterior mean and variance:

$$\begin{align*}
\mu(\hat{Y}(x_n)) &= k'(x_n, X) K'^{-1} Y \\
\sigma^2(\hat{Y}(x_n)) &= k'(x_n, x_n) - k'(x_n, X) K'^{-1} k'(x_n, X) \quad (11)
\end{align*}$$

**D. Bayesian Optimization**

Since yield analysis tools are extremely costly and the estimation results are always noise corrupted, Bayesian optimization [22]–[24] has attracted widespread attention recently in the yield optimization problem [11], [12], [25]. In general, the Bayesian optimization framework has two key components. The first component is a probabilistic model, which reflects our beliefs about the unknown objective function, e.g., the GP model in Section II-C. The second component is an acquisition function that utilizes the posterior distribution to guide the search. By maximizing the acquisition function, Bayesian optimization aims to balance the tradeoff between exploitation and exploration, i.e., the next query point is located where the model prediction is high (exploitation) and/or the model uncertainty is large (exploration). Specifically, there are several kinds of acquisition functions, e.g., traditional Thompson sampling (TS) [26], lower confidence bound (LCB) [27], knowledge gradient [28], and the newly proposed acquisition function, named probability of further improvement (PFI) which aims to deal with stringent specifications for analog IC sizing problems [29].

A typical acquisition function is the expected improvement (EI) [30], which has a closed form. Given the current maximum yield $r$, the improvement of $Y(x)$ predicted by GP model can be calculated with $I(Y(x)) = \max(0, Y(x) - r)$. Since $Y(x)$ follows a Gaussian distribution, EI is expressed as:

$$EI(x) = \mathbb{E}[I(Y(x))] = \sigma(x) (\lambda \Phi(\lambda) + \phi(\lambda)) \quad (13)$$

where $\lambda = (r - \mu(x))/\sigma(x)$, $\mu(x)$ and $\sigma(x)$ are the posterior mean and standard deviation of $Y(x)$ in (11) and (12). $\Phi(·)$ and $\phi(·)$ represent the CDF and PDF of the standard normal distribution, respectively.

For the yield optimization problem, nominal performances can be set as constraints [11], [25], i.e., all specifications
should be satisfied at the TT corner. This treatment can help to reduce unfeasible design space for speedup. The constrained yield optimization problem is written as

$$\mathbf{x}^* = \arg \max_{\mathbf{x} \in D} Y(\mathbf{x})$$

s.t. $y_i(\mathbf{x}, \text{TT}) \geq c_i, \ i = 1, \ldots, k$ (14)

where $y_i(\mathbf{x}, \text{TT})$ is the $i$th nominal performance of design $\mathbf{x}$, and $\mathbf{x}^*$ is the global optimal design.

It should be noted that other design costs (e.g., area and power, etc.), besides the performances, can be further integrated into this formulation. We can either add them in the objective function with weight coefficients to optimize them simultaneously [31], or set them as additional constraints to meet specific requirements [32].

To deal with the constraints, the $\text{wEI}$ [33] is proposed to multiply the EI function with the probability of feasibility (PF), i.e., the probability of all constraints being satisfied, written as

$$\text{wEI}(\mathbf{x}) = \text{PF}(\mathbf{x}) \cdot \text{EI}(\mathbf{x})$$ (15)

where $\text{PF}(\mathbf{x}) = \prod_{i=1}^{n} P(y_i(\mathbf{x}, \text{TT}) \geq c_i)$.

Different from $\text{wEI}$, which focuses on the change of yield value, another acquisition function adopted in this article, i.e., ES [34], considers more about the location of the optimal design. Formally, it can be expressed as

$$\text{ES}(\mathbf{x}) = \int (H(P_x) - H(P_x^Y)) P(Y|D_o) dY$$ (16)

where the observed data $D_o = \{\mathbf{x}, Y\}$. $\mathbf{x}^*$ represents the unknown design with the optimal yield. $H(\cdot)$ denotes the differential entropy, and $D_o'$ is the updated data set of $D_o$ with one more data point $(\mathbf{x}_{n+1}, \tilde{Y}(\mathbf{x}_{n+1}))$ observed. $P_x$ and $P_x^Y$ are the estimated distributions over $\mathbf{x}^*$ given observed data $D_o$ and $D_o'$, respectively. [17].

Fig. 4(a) illustrates the ordinary Bayesian optimization flow. At each iteration, the probabilistic model is built with current training data. Then, the acquisition function can be computed with the predictive posterior distribution, and it is optimized to generate the next query design, e.g., $\mathbf{x}^*_{\text{wEI}}$. After estimating yield at $\mathbf{x}^*_{\text{wEI}}$, the data set is updated with this new sample. These steps are taken iteratively until the termination condition is reached.

III. PROPOSED APPROACH

This section will present the proposed yield optimization method for analog circuits. First, a freeze–thaw GP regression method for iterative yield analysis is discussed. Then, the framework of freeze–thaw Bayesian optimization technique is described. Next, a nominal performance optimization problem is formulated and solved to mine prior knowledge. Finally, a complete flow summarizes the proposed method.

A. Freeze–Thaw Gaussian Process Regression Model for Iterative Yield Analysis

In general, the yield analysis needs to be invoked repeatedly during the yield optimization process, thus reducing analysis costs is an important way to improve the algorithm efficiency. In previous yield optimization methods [8], [9], [11], [12], one design will only be selected once, the obtained yield value will not change after estimation. However, the proposed approach allows the same design to be repeatedly selected for estimation during the optimization. Specifically, when a design is selected for the first time, we will execute a rough analysis at it. If it is picked again later, we will add more samples to improve its analysis accuracy. Intuitively, the design with the highest yield will be repeatedly selected until the termination condition is reached.

In other words, the yield analysis is integrated into the exploration process of Bayesian optimization. Yield analysis at any given design is treated as an iterative process rather than executed at once. From this perspective, the overall optimization process is regarded as a double loop, where the outer loop explores the design space and selects candidate designs, the inner loop adds samples for yield estimations. Since this analysis method is MC-based, the estimated yield will gradually converge to its true value with samples added.

To model this iterative analysis process, a freeze–thaw GP regression method is designed. With the ability to predict the final yield with partially completed analysis, the computational costs of yield estimations can be significantly reduced.

Formally, given designs $\{\mathbf{x}_i\}_{i=1}^{m}$, let $g_i^t, j = 1, \ldots, t_i$ denote estimated yield value with $j$ batches of simulations sampled at $\mathbf{x}_i$. $T_i \in \mathbb{N}^+$ indicates the total number of batches sampled at $\mathbf{x}_i$. $g_i = [g_i^1, g_i^2, \ldots, g_i^{t_i}]$ is a $t_i$-dimensional vector representing the analysis curve, and $t_i = [1, 2, \ldots, T_i]$ is the corresponding time steps of $g_i$. In order to build a surrogate model for every analysis curve, a specified kernel [17] is utilized. For $t_i^a$, $t_i^b$ in $t_i$, the kernel function $k(t_i^a, t_i^b)$ is given by

$$k(t_i^a, t_i^b) = \frac{\beta^\alpha}{(t_i^a + t_i^b + \beta)^\alpha}$$ (17)

where $\alpha$ and $\beta$ are two hyperparameters. A feature of this kernel function is that its value will tend to be constant when $t_i^a$ or $t_i^b$ is large, so it is suitable for modeling a curve which gradually converges.

In this article, the specified kernel is used as the covariance function over time steps for an iterative yield analysis curve. Concretely, each curve is drawn from a separate GP, modeled by

$$\mathcal{N}(g_i; Y_1, K_{t_i})$$ (18)

where $Y_1$ is the asymptotic value that the $i$th curve converges to, i.e., final yield at $\mathbf{x}_i$, and $1$ is a column vector of 1’s. The specified curve kernel is selected for the elements of $K_{t_i}$, $i = 1, \ldots, n$.

Considering the correlation of yields in the design space, the final yield $Y_i$ is regarded as a latent function that specifies the asymptotic value of each analysis curve, jointly modeled by

$$\mathcal{N}(Y; \mathbf{m}, K_{\mathbf{x}\mathbf{x}})$$ (19)

where $\mathbf{Y} = \{Y_1, Y_2, \ldots, Y_n\}$. The prior mean $\mathbf{m} = [m_1, m_2, \ldots, m_n]$ remains constant, and $m_i$ is the mean of $g_i, i = 1, \ldots, n$. Matérn-5/2 kernel is selected for the elements of $K_{\mathbf{x}\mathbf{x}}$.

Subsequently, the distribution over all analysis curves $\{g_i\}_{i=1}^{n}$ can be written as

$$P(\{g_i\}_{i=1}^{n} | \{\mathbf{x}_i\}_{i=1}^{n}) = \int \prod_{i=1}^{n} \mathcal{N}(g_i; Y_1, K_{t_i}) \cdot \mathcal{N}(Y; \mathbf{m}, K_{\mathbf{x}\mathbf{x}})/Y.$$ (20)
Fig. 2 illustrates the basic idea of this GP model where \( x \)-axis is the number of iterations, \( y \)-axis is the yield. Each line represents an yield analysis curve \( g_i \) at a given design \( x_i \). It can be seen that the analysis procedure is executed as an iterative process throughout the entire optimization rather than at once. Each curve is modeled by \( \mathcal{N}(Y, K_{tt}) \), \( i = 1, \ldots, n \) and the final yields \( Y_i, i = 1, \ldots, n \) are jointly modeled by \( \mathcal{N}(m, K_{xx}) \). In other words, the final yields \( Y_i, i = 1, \ldots, n \) are first drawn over design space according to a GP prior. Conditioned on \( Y_i \), each analysis curve is modeled independently using another GP prior.

As yield estimations are always noise corrupted, the statistical noise \( \epsilon_i \sim \mathcal{N}(0, \sigma^2) \) is counted. Then, the covariance function for the elements of matrix \( K_{tt}^n \) becomes

\[
\begin{align*}
k(t_i^a, t_i^b) &= k(t_i^a, t_i^b) + \sigma^2 \delta_{ab} \\
n &= 1 \quad \text{if } a = b; \quad \text{otherwise}, \quad n = 0
\end{align*}
\]

where \( \delta_{ab} = 1 \) if \( a = b \); otherwise, \( \delta_{ab} = 0 \). To obtain the hyperparameters, the log likelihood is derived from (20) as

\[
\log P(\hat{\mathbf{g}}|\mathbf{x}) = \frac{1}{2}(\hat{\mathbf{g}} - \mathbf{O}m)^\top K_{tt}^{-1}(\hat{\mathbf{g}} - \mathbf{O}m) + \frac{1}{2}Y^\top(K_{xx}^{-1} + Q)^{-1}Y - \frac{1}{2}\log |K_{xx}| + \log |K_{tt}| + \text{const}
\]

where \( \hat{\mathbf{g}} = [\hat{g}_1, \hat{g}_2, \ldots, \hat{g}_n]^\top \), represents the vector composed of all yield analysis curves together. \( \mathbf{O} = \text{blockdiag}(\mathbf{I}_1, \mathbf{I}_2, \ldots, \mathbf{I}_n) \) is a block-diagonal matrix. Its block \( \mathbf{I}_i \) is a vector of ones with length \( T_i, i = 1, \ldots, n \). \( K_{tt} = \text{blockdiag}(K_{t1t1}, K_{t2t2}, \ldots, K_{ntnt}) \) is a block-diagonal matrix with blocks \( K_{tt}^i \) and \( \mathbf{y} \) is an \( n \)-dimensional vector with elements \( \gamma_i = (1)^\top K_{tt}^{-1}(\hat{\mathbf{g}}_i - m_i) \), and \( Q = \text{diag}(q_1, \ldots, q_n) \) with elements \( q_i = (1)^\top K_{tt}^{-1}1_i, i = 1, \ldots, n \).

All hyperparameters, including \( \alpha, \beta, \sigma_{1}, \text{etc.} \), can be learned by MLE.

Using the Bayesian inference, we can derive the required quantities for optimization from (22). Formally, given all observed designs \( \{x_i\}_{i=1}^n \) and noise-corrupted analysis curves \( \{\hat{g}_i\}_{i=1}^n \), the posterior distribution of final yields \( \hat{Y} \) at old designs \( \{x_i\}_{i=1}^n \) is expressed as

\[
p(\hat{\mathbf{Y}}|\hat{\mathbf{g}}, \{x_i\}_{i=1}^n) = \mathcal{N}(-\hat{\mathbf{Y}}; \mu, C)
\]

\[
\mu = \mathbf{m} + C\mathbf{Y}
\]

\[
C = K_{xx} - K_{xx}(K_{xx} + Q^{-1})^{-1}K_{xx}.
\]

Then, for a new design \( x_n \), the posterior mean \( \mu(\hat{\mathbf{Y}}(x_n)) \) and uncertainty measure \( \sigma^2(\hat{\mathbf{Y}}(x_n)) \) of its final yield \( \hat{Y}(x_n) \) are written as

\[
\mu(\hat{\mathbf{Y}}(x_n)) = K_{xx}^{-1}K_{x1}(\hat{\mathbf{g}} - \mathbf{O}\mathbf{m})
\]

\[
\sigma^2(\hat{\mathbf{Y}}(x_n)) = K_{xx}^{-1} + K_{xx}(\mathbf{O}^\top \mathbf{K}_{tt}^{-1}\mathbf{O})^{-1}K_{xx}.
\]

Fig. 3 shows the yield regression comparison in design space between the proposed method and existing methods [11], [12] under the same training data. The \( x \)-axis is the width of transistor M19 in a comparator circuit (Fig. 9) and the \( y \)-axis is the yield. As the yield data are often roughly estimated, yellow dots (estimated yield) may not accurately fall on the black-dotted line (true yield). Existing methods show poor fitting results since they take the noisy current yields as golden solutions, i.e., they try to fit the yellow dots, e.g., point A. However, the proposed method first predicts the final yields of current designs, e.g., point A', and then utilizes them for further yield prediction of new designs. Equations (11) and (24) show this difference intuitively, where (24) uses the final yields \( \mu \) for better prediction, rather than the current yields \( Y \) in (11).

**B. Framework of Freeze–Thaw Bayesian Optimization Technique**

To reduce the simulation cost of the yield optimization problem, a freeze–thaw Bayesian optimization method is proposed as follows. Fig. 4(b) shows the optimization flow. First, a basket \( B \) is constructed to collect candidates with potentially high yields during the optimization. Specifically, the criterion for selecting candidates is to choose the top \( N_B \) designs with the current highest lower bound of the estimated yield, formally written as

\[
B = \{ \arg \max_{x \in X} (\hat{Y}_{MC}(x) - \sigma_{\mathcal{F}}(x)), x \in X \}
\]

where \( \hat{Y}_{MC}(x) \) and \( \sigma_{\mathcal{F}}(x) \) are calculated with (4) and (5). Since the yield analysis may be rough at the early and mid term, maintaining a certain number of candidates that have already been estimated to some degree can avoid missing the optimal design. The basket size \( N_B \) is chosen to be 10 in this article.
A strategy to obtain a new design is shown in Fig. 4(b). It has been reported in [35] that different acquisition functions to guide the search in design space, shown in the proposed method effectively leverages two acquisition functions that may lead to conflicting results. Therefore, this treatment could make the results of design selection more comprehensive.

It has been shown in [36]–[38] that MSP strategy is very effective for global optimization. The BFGS method [39] is chosen for local search under the MSP framework.

Then the freeze–thaw GP model described in Section III-A is built. Different from previous methods, which use only one acquisition function to determine the next query point, either [11] with wEI or [12] with ES, as shown in Fig. 4(a), the proposed method effectively leverages two acquisition functions to guide the search in design space, shown in Fig. 4(b). It has been reported in [35] that different acquisition functions may lead to conflicting results. Therefore, this treatment could make the results of design selection more comprehensive.

Concretely, after model training, the wEI maximization process (in (27)) is solved by a multiple starting point (MSP) strategy to obtain a new design \( x^*_{wEI} \) in each iteration as

\[
x^*_{wEI} = \arg \max_{x \in D} \text{wEI}(x).
\]

It has been shown in [36]–[38] that MSP strategy is very effective for global optimization. The BFGS method [39] is chosen for local search under the MSP framework.

Next, we compare the ES values of \( x^*_{wEI} \) and candidate designs in the basket. ES is usually approximated with MC methods [17], [40]. The implementation details for ES in freeze–thaw method will be further discussed in Section IV-A. The design with the highest ES value is selected as the next query point, denoted as \( x^*_{ES} \). This step can be formulated as

\[
x^*_{ES} = \arg \max_{x \in \mathcal{B} \cup \mathcal{J}^*_{wEI}} \text{ES}(x).
\]

Then the freeze–thaw GP model described in Section III-A is built. Different from previous methods, which use only one acquisition function to determine the next query point, either [11] with wEI or [12] with ES, as shown in Fig. 4(a), the proposed method effectively leverages two acquisition functions to guide the search in design space, shown in Fig. 4(b). It has been reported in [35] that different acquisition functions may lead to conflicting results. Therefore, this treatment could make the results of design selection more comprehensive.

Concretely, after model training, the wEI maximization process (in (27)) is solved by a multiple starting point (MSP) strategy to obtain a new design \( x^*_{wEI} \) in each iteration as

\[
x^*_{wEI} = \arg \max_{x \in D} \text{wEI}(x).
\]

It has been shown in [36]–[38] that MSP strategy is very effective for global optimization. The BFGS method [39] is chosen for local search under the MSP framework.

Next, we compare the ES values of \( x^*_{wEI} \) and candidate designs in the basket. ES is usually approximated with MC methods [17], [40]. The implementation details for ES in freeze–thaw method will be further discussed in Section IV-A. The design with the highest ES value is selected as the next query point, denoted as \( x^*_{ES} \). This step can be formulated as

\[
x^*_{ES} = \arg \max_{x \in \mathcal{B} \cup \mathcal{J}^*_{wEI}} \text{ES}(x).
\]

Subsequently, one batch of simulations will be sampled at \( x^*_{ES} \) to estimate its yield. Specifically, if \( x^*_{ES} = x^*_{wEI} \), i.e., the new design \( x^*_{wEI} \) is the most promising point, the current estimation process in the basket will be paused (freeze), and yield analysis at a brand-new design will be executed with one batch of simulations. Otherwise, a previous partially completed estimation at an old design in the basket will continue (thaw) with one more batch of samples. By this two-round selection, the sampling efficiency in terms of yield estimations can be further improved.

After that, the training data set is updated with the newly sampled observation, and the basket is rebuilt with the top \( N_B \) designs with the current highest lower bound of the estimated yield of all training data, using (26). More competitive designs will be added into the basket, while some old candidate designs may be removed, even including the optimal designs ever found in the early and mid term. The optimization procedure continues until a user-defined maximal number of iterations is reached.

Essentially, the basket is a tradeoff between two acquisition functions, i.e., wEI and ES to determine the next query design. Concretely, if the basket size, i.e., the number of candidates maintained, is very small, e.g., zero, the algorithm tends to select \( x^*_{wEI} \) for yield analysis. This means it will always execute analysis on new designs. On the other hand, if the basket size becomes infinite, obtaining \( x^*_{ES} \) is equivalent to searching for optimal ES value in the whole design space. \( x^*_{wEI} \) is almost impossible to be selected as \( x^*_{wEI} \), then this method will always execute analysis on old designs. Thus, maintaining a moderate number of candidates is important for the freeze–thaw technique, so that the algorithm could switch analysis between new designs and old designs according to the acquisition functions, wEI and ES.

Fig. 5 shows the analysis curves with yields higher than 90% in one optimization execution of a comparator circuit.
Specifically, suppose there is a design with true yield $Y_s$ and an optimal design with true yield $\tau$. In order to confirm which one is better in [11] and [12], according to (5), the number of samples required for MC analysis can be calculated with

$$N \approx \frac{Y_s(1 - Y_s)}{(\tau - Y_s)^2} \cdot k^2.$$  \hfill (29)

Take $Y_s = 94\%$ for example, it will be allocated over 1000 simulations when $\tau$ is 95%. However, when $\tau$ increases to 98%, less than 100 points are sufficient. The proposed method uses a basket to collect candidates with potentially high yields, instead of comparing their yields immediately, allowing the designs found in the early stage to compete with those found in the later stage for resources, automatically cutting down the simulation costs.

For the case in Section V-A, many designs can reach a yield value of 97%, but only four designs with yields greater than 97% are allocated more than 400 simulations as shown in Fig. 5.

C. Knowledge Transfer From Nominal Design

In the freeze–thaw Bayesian optimization, the basket is built with random initialization. In order to further reduce the simulation cost, we need to pick some prior solutions for the basket. Actually, the cost of circuit performance optimization is much lower than that of yield optimization. For example, only a few hundred simulations are needed in a performance optimization execution [32], [41], [42]. However, even a single yield analysis may cost thousands of simulations. Naturally, we consider using the results of performance optimization to guide the yield optimization.

Fig. 6 shows the yield and nominal performance of a comparator circuit, i.e., the first case in the experimental section, where the D1-axis represents the width of transistor M17. The performance metric is the offset voltage $V_{\text{off}}$ with 200-MHz sampling frequency at 30 °C, and the specification is $V_{\text{off}} \leq 30\text{mV}$. We can see that a design with the best performance does not usually have the optimal yield. On the contrary, those designs which roughly meet and are close to the specifications, i.e., near the failure boundary, may have better yields.

Based on this observation, we formulate a nominal performance optimization problem as

$$\min_{x \in \mathcal{X}} f(x) = \sum_{i=1}^{k} \omega_i \cdot |(y_i(x, \text{TT}) - \epsilon \cdot c_i)/c_i|$$

s.t. $y_i(x, \text{TT}) \geq c_i, \ i = 1, \ldots, k$ \hfill (30)

where $\omega_i$ is the weight, representing the importance of the $i$th performance metric. $\epsilon$ is a constant coefficient, determining the desired distance between the nominal performance value $y_i(x, \text{TT})$ and specification $c_i$. We set $\omega_i = 1, i = 1, \ldots, k$ and $\epsilon = (3/2)$ empirically in experiments.

In this article, the nominal performance optimization problem is solved by the approach proposed in [32], i.e., WEIBO, with a maximum iteration number of $N_{\text{pre}}$, which is a state-of-the-art performance optimization method for analog circuits. To make the obtained prior solutions more widely distributed, this solver is invoked three times randomly and independently, resulting in $3 \times N_{\text{pre}}$ simulation costs. Then, all the obtained designs meeting the constraints will be divided into $N$ clusters by the $k$-means method. Every design with the smallest $f(x)$ in corresponding cluster is selected as the prior solution. A total number of $N$ designs are added to the data set and basket for yield optimization.

Fig. 7 shows the proposed strategy to leverage the knowledge from nominal performance optimization to reduce the cost of yield optimization. In the left part, D1-axis and D2-axis are the widths of transistor M17 and M19, respectively, in a comparator circuit (see Fig. 9). The $z$-axis is the optimization target $f(x)$ in (30), and its origin corresponds to the position where nominal performances $y_i(x, \text{TT})$ are equal to $\epsilon \cdot c_i, i = 1, \ldots, n$. The points sampled in the three optimizations on TT-corner are marked with diamonds, triangles, and squares, and the selected prior solutions are marked with red circles. In the right part, the $x$-axis is the number of simulations and the $y$-axis is the yield. Several prior solutions get high yields, leading to an efficient warm start. By picking prior solutions for the basket at the initial stage, prior knowledge is transferred to yield optimization. Concretely, the prior knowledge helps to reduce the number of simulations by approximately 20% on average. Moreover, this strategy is particularly useful in the problems exploring large optimization space. For example, in Section V-E, existing methods [11], [12] may fail to obtain the optimal design, but the proposed method succeeds by leveraging the prior knowledge.

D. Summary

The proposed yield optimization flow for analog circuits is depicted in Fig. 8. It consists of two parts. One is the nominal performance optimization part used to mine prior knowledge, as described in Section III-C. It is worth mentioning that our proposed approach is orthogonal to the solver for nominal performance optimization. The other is the yield optimization part via the freeze–thaw technique. Throughout the optimization process, yield analysis at one design is
executed iteratively. The analysis accuracy can be gradually improved by the freeze–thaw strategy, as described in Section III-B.

IV. IMPLEMENTATION DETAILS

A. Entropy Search for Freeze–Thaw Technique

ES considers the uncertainty reduction over the location of the optimal design when new observations, \((x_{\text{fant}}, y_{\text{fant}})\) are added, and iteratively evaluates designs which will most improve the information gain. Since \((x_{\text{fant}}, y_{\text{fant}})\) is not really evaluated, ES is often calculated with fantasized yields predicted by the GP model [40]. Concretely, by adding the fantasized observations \((x_{\text{fant}}, y_{\text{fant}})\) to the training set, ES selects the design that causes the maximum uncertainty reduction.

Due to the iterative yield analysis, yield values are gradually updated during the optimization. Whether a brand-new design is estimated for the first time, i.e., \((x_{n+1}, \hat{g}^{n+1}_{X})\), or an old design is taken one step further, i.e., \((x_{i}, \hat{g}^{T+1}_{X})\), new information about the optimal location will be provided. To compute the information gain, we have to calculate the fantasized yields of new designs, i.e., \(x_{\text{fant}}^{\text{EI}}\) and old designs, i.e., \(x \in B\) with one more batch of simulations added.

Formally, based on (20), the yield value of an old design in the basket if one more batch of simulations are added can be derived, e.g., the \((T+1)\)th point on the \(i\)th curve, \(T_{i}' = [T + 1]\). The posterior mean and variance of \(\hat{g}_{X}^{T+1}\) are given by

\[
\begin{aligned}
\mu(\hat{g}_{X}^{T+1}) &= k_{t_{i}'t_{i}}^{T}K_{t_{i}'t_{i}}^{-1}\bar{g}_{t_{i}}, \\
\sigma^{2}(\hat{g}_{X}^{T+1}) &= k_{t_{i}'t_{i}}^{T} - k_{t_{i}'t_{i}}^{T}K_{t_{i}'t_{i}}^{-1}k_{t_{i}'t_{i}'} + C_{ii}\Omega^{2}
\end{aligned}
\]

(31)

\[
\begin{aligned}
H(P_{x^{*}}) &= -P_{x^{*}} \cdot \log(P_{x^{*}}).
\end{aligned}
\]

(34)

In this article, \(N_{d}\) and \(N_{f}\) are set to 50 and 500, respectively. Finally, ES for the freeze–thaw technique is described in Algorithm 2. In each iteration, after obtaining \(x_{\text{fant}}^{\text{EI}}\), \(H(P_{x^{*}})\) is first computed with currently observed data set \(D_{o}\). Then, we calculate the fantasized yields of \(x_{i} \in B, i = 1, \ldots, N_{B}\) and \(x_{\text{fant}}^{\text{EI}}\) with (31) and (32). These fantasized observations are added to the training set, respectively. Denote \(D'_{o}\) as the updated data set with fantasized observations added. The posterior distribution of the freeze–thaw GP model will change accordingly, leading to different \(H(P_{x^{*}})\) and ES value. The design which maximizes the expected information gain over the optimal design is selected as \(x_{\text{ES}}^{*}\).

B. Computational Complexity

The computational cost for the training of conventional GP models is \(\mathcal{O}(N^{3})\) due to the inversion of the covariance matrix,
Algorithm 2 ES for Freeze–Thaw Technique

Require: Candidate designs, including old designs in the basket $B = \{x_1, x_2, \ldots, x_{NB}\}$ and a new design $x_n^\text{new}$.
1: Initialize $ES = [0, 0, \ldots, 0]$ to represent information gain;
2: Compute $H(P_{xx})$ with current data using Monte Carlo estimation;
3: for $i = 1, 2, \ldots, NB + 1$ do
4: \hspace{1em} if $x_i \in B$ then
5: \hspace{2em} Calculate a fantasized yield value $y_i^{T+1}$ with (31);
6: \hspace{1em} else
7: \hspace{2em} Calculate a fantasized yield value $y_i^T$ with (32);
8: \hspace{1em} end if
9: \hspace{1em} Add the fantasized observation to the data set, compute $H(P_{xx})$ using Monte Carlo estimation;
10: $ES(i) \leftarrow H(P_{xx}^\text{new}) - H(P_{xx})$;
11: end for
12: Select $x^\text{ES}_{n+1}$ that maximizes $ES$ as the next query point.

where $N$ is the total number of training data, i.e., designs explored in [11] and the same as the model used in [21]. As for the freeze–thaw GP model applied in this article, since each design corresponds to an analysis curve, there are actually $NT^2$ batches of samples, leading to a training data size of $N^2T^2$, where $N$ is the number of designs, and $T$ represents the average number of batches per design. Outwardly, the computational complexity of the freeze–thaw model is prohibitively expensive as $O(N^2T^3)$. However, a careful derivation shows that its complexity is affordable indeed.

To train the GP model by MLE, we need to calculate the likelihood function. According to (22), there are four items that need the inversion of the covariance matrix, including $K_{uu}^{-1}$, $K_{xx}^{-1}$, $\gamma$, and $Q$. As $K_{uu}$ is a block-diagonal matrix, we just need to compute the inversion of its blocks, leading to a complexity of $O(NT^3)$. Then, we save the Cholesky decomposition results of $K_{uu}$ to calculate $\gamma$ and $Q$, corresponding to a complexity of $O(NT^3)$. In total, the computational complexity of this model is $O(N^3 + NTT^2 + NT^3)$, where $O(N^3)$ comes from the computation of $K_{xx}^{-1}$.

Intuitively, the reason for this complexity is the conditional independence assumption described in Section III-A, i.e., each analysis curve is drawn from an independent GP prior conditioned on the final yields drawn from a global GP. Analysis curves are modeled independently by $N(Y_i, \text{I}_i, K_{tt}, \gamma_i, \mathbf{Q}_i)$, $i = 1, \ldots, n$ and their asymptotic values are jointly modeled by $N(\mathbf{m}, \mathbf{K}_{xx})$.

In the experiments, simulations are conducted in batch fashion and each batch contains 30 runs for the iterative yield analysis procedure. The maximum number of simulations allocated to one design is 1200, so the maximum length $T_{\text{max}} = 40$. Due to the freeze–thaw technique, most analysis curves have only short lengths. The average length is $\overline{T} = 4$ in our experiments, which is much smaller than $N$, as the average value of $N$ equals 141. If $\overline{T} \ll N$ is given, the computational complexity becomes $O(N^3)$, which is comparable with the conventional GP models.

V. EXPERIMENTAL RESULTS

In this section, the efficiency and efficacy of the proposed yield optimization approach will be demonstrated with four analog circuits: 1) comparator; 2) low noise amplifier; 3) three-stage amplifier; and 4) charge pump. We compare our method with two state-of-the-art methods [11], [12]. To average out the random fluctuations, all cases are executed ten times. Then, the yield of the obtained design $\mathbf{x}^*$ is estimated with 50,000 simulations in the process space to ensure the accuracy of yield analysis. Method FTBO represents the proposed method via freeze–thaw Bayesian optimization without prior knowledge. FTBO+ is the FTBO with prior knowledge transferred from nominal design. The total simulation budget of precomputation, i.e., nominal performance optimization, for FTBO+ is set to 300 ($N_{\text{pre}} = 100$) in Sections V-A–V-D. Considering the large design space, the budget in Section V-E is increased to 1800 ($N_{\text{pre}} = 600$). This precomputation cost has already been counted to the sampling number of FTBO+. All experiments are conducted on a Linux workstation with two Intel Xeon X5650 CPUs and 128-GB memory.

A. Comparator

The comparator is implemented in a 180-nm CMOS process with 1.8-V power supply, shown in Fig. 9. There are 12 design variables for this circuit, representing transistor widths. Both intradie and interdie process variations are considered. Three design specifications are listed as

$$\begin{align*}
V_{\text{off}} & \leq 30 \text{ mV} \\
V_{\text{sen}} & \leq 2 \text{ mV} \\
\text{speed} & \geq 1 \text{ GHz}
\end{align*}$$

where $V_{\text{off}}$ denotes the offset voltage with 200-MHz sampling frequency at 30 °C. $V_{\text{sen}}$ indicates the ability of comparator to distinguish input signals, and speed means the maximum operating frequency.

The proposed approach is compared with [11] and [12] under the same design space. Fig. 10 shows the obtained prior solutions, where $D_1$-axis, $D_2$-axis, and $D_3$-axis are the widths of transistor M17, M19, and M5, respectively, (see Fig. 9). We define the golden solutions as the designs with yields higher than 99%. We can see that some prior solutions are close to the golden solutions, resulting in an efficient warm start.

The quality and speed comparisons of the experimental results are presented in Table I. We take method [11] as the speed benchmark because it is faster than method [12]. All methods can obtain the golden yield. FTBO can gain a 4.86× speedup over [11], and FTBO+ further increases the speedup to 5.73× with the help of prior knowledge. Such experimental results verified the huge advantage of the proposed methods over the existing methods.

More specifically, in terms of the difference in the use of acquisition functions, the proposed method FTBO incorporating ES with wEI gained a 4.86× speedup over [11] which...
uses wEI only, and \(5.82\times\) speedup over [12] which uses ES only, demonstrating the effectiveness of the combination of wEI and ES in our method.

In each optimization iteration, our method needs \(249\) s for building the freeze–thaw GP model, \(47\) s for wEI optimization, and \(19\) s for ES calculation. So our method introduces \(6.42\% [19\ s/(249\ s + 47\ s)]\) extra time overhead in optimization as shown in Table II. Therefore, the impact of extra ES calculation on the whole optimization is negligible.

In this case, the number of iterations of yield optimization is \(327\), so the total optimization time is \(103\ 005\) s, i.e., \((249\ s + 47\ s + 19\ s) \times 327\). However, the number of invoked SPICE simulations is \(37408\), and each SPICE simulation needs \(15\) s, so the total SPICE simulation time is \(561\ 120\) s \((15\ s \times 37\ 408)\). Thus, the time of calculating wEI and ES in yield optimization is \(18.36\%\) of the SPICE simulation time, which accounts for a relatively small percentage, though the optimization algorithm is currently implemented with the slow Python language. Certainly, SPICE simulation time is highly related to the circuit type, circuit scale and simulation type, etc. Therefore, this ratio of optimization time over SPICE simulation time varies in a wide range.

Fig. 11 further reveals the advantages of the proposed method with the yield bins, where there are four yield bins on the \(x\)-axis, and the \(y\)-axis is the number of simulations located in the bins. Clearly, [11] can effectively control the analysis accuracy of low-yield designs \((\text{yield} \leq 95\%)\), i.e., applying coarse yield estimations at these designs with a small number of simulations. However, for high-yield designs, [11] wastes too many simulation resources. FTBO and FTBO+ reduce the simulation number in both low- and high-yield regions, especially in the high-yield region. Concretely, FTBO and FTBO+ reduce the simulation number in the low-yield region by \(65\%\) and \(68\%\), respectively. In the high-yield region, FTBO and FTBO+ further cut down the simulation number by \(83\%\) and \(87\%\). This result shows that FTBO and FTBO+ are more efficient in simulation resource allocation. Since the simulations allocated to high-yield designs account for more than \(80\%\) of the total, the reasonable allocation by freeze–thaw technique in the high-yield region contributes the most to the \(4.86\times\) and \(5.73\times\) speedup.

### B. Low Noise Amplifier

The second case is a low noise amplifier implemented in a 180-nm CMOS process with 3.3-V power supply. Fig. 12 shows the schematic of this radio frequency circuit. 13 design variables are considered in this case, including sizes of transistors, values of capacitors, resistors and inductors. Both intradie and interdie process variations are taken into account. Three specifications, including Gain, noise figure NF, and third-order
Fig. 13. Comparison of simulation resource allocation between [12] and the two proposed methods in the low noise amplifier case.

Table III lists the experimental quality and speed comparisons of optimizations. Method [12] is regarded as the benchmark this time for it is faster than method [11]. Again, all four methods can obtain the golden yield. FTBO and FTBO+ can gain 1.79× and 2.47× speedup over [12], respectively. The comparison of simulation resource allocation between [12] and the two proposed methods is shown in Fig. 13. A similar result can be found that both methods significantly cut down the simulation costs of all yield bins, particularly in the high-yield bin.

### C. Three-Stage Amplifier

The third case is a three-stage amplifier implemented in a 0.35-μm CMOS process [43], shown in Fig. 14. There exist 24 design variables corresponding to sizes of transistors, values of capacitors, resistors and biasings. Both intradie and interdie process variations are considered. Four specifications including gain margin GM, gain-bandwidth GBW, phase margin PM and quiescent current $I_q$ at 27 °C are listed as

$$\begin{align*}
\text{Gain} & \geq 20 \text{ dB} \\
\text{NF} & \leq 2.3 \text{ dB} \\
\text{IIP3} & \geq -10 \text{ dBm}.
\end{align*}$$

Table III lists the experimental quality and speed comparisons of optimizations. Method [12] is regarded as the benchmark this time for it is faster than method [11]. Again, all four methods can obtain the golden yield. FTBO and FTBO+ can gain $1.79 \times$ and $2.47 \times$ speedup over [12], respectively. The comparison of simulation resource allocation between [12] and the two proposed methods is shown in Fig. 13. A similar result can be found that both methods significantly cut down the simulation costs of all yield bins, particularly in the high-yield bin.

Table IV presents the experimental quality and speed comparisons of all four methods. Results of [11] and [12] are taken from [11] and [12], respectively. Method [11] is regarded as the benchmark for it is faster than method [12]. Again, FTBO and FTBO+ can achieve $2.92 \times$ and $3.22 \times$ speedup over [11]. Fig. 15 shows the comparison of simulation resource allocation between [11] and the two proposed methods.

In order to validate the effectiveness of the proposed method on more advanced technology, this three-stage-amplifier is ported to a 65-nm process with the same design space for yield optimization. The specifications become

$$\begin{align*}
\text{GM} & \geq 20 \text{ dB} \\
\text{GBW} & \geq 0.9 \text{ MHz} \\
\text{PM} & \geq 40^\circ \\
\text{I}_q & \leq 80 \mu\text{A}.
\end{align*}$$

Experimental quality and speed comparisons are shown in Table V. Method [12] is regarded as the benchmark this time.
Fig. 16. Comparison of simulation resource allocation between [12] and the two proposed methods in the three-stage amplifier (65 nm) case.

**TABLE V**

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Yield</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Worst</td>
<td>99.08%</td>
<td>99.49%</td>
<td>99.41%</td>
<td>98.89%</td>
</tr>
<tr>
<td>Mean</td>
<td>99.62%</td>
<td>99.87%</td>
<td>99.83%</td>
<td><strong>99.94%</strong></td>
</tr>
<tr>
<td>Std. Dev</td>
<td>0.35%</td>
<td>0.16%</td>
<td>0.18%</td>
<td>0.03%</td>
</tr>
<tr>
<td><strong>Number of Simulations</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Best</td>
<td>2058</td>
<td>1579</td>
<td>1728</td>
<td>1755</td>
</tr>
<tr>
<td>Worst</td>
<td>21190</td>
<td>22224</td>
<td>7183</td>
<td>3131</td>
</tr>
<tr>
<td>Mean</td>
<td>10228</td>
<td>6736</td>
<td>3006</td>
<td><strong>1952</strong></td>
</tr>
<tr>
<td><strong>Speedup</strong></td>
<td>0.66</td>
<td>1.00</td>
<td>2.24</td>
<td>3.45</td>
</tr>
</tbody>
</table>

Fig. 17. Schematic of the charge pump circuit.

for it is faster than method [11]. Similarly, FTBO and FTBO+ achieve 2.24× and 3.45× speedup over [12]. The comparison of simulation resource allocation between [12] and the two proposed methods is shown in Fig. 16.

**D. Charge Pump**

As shown in Fig. 17, the fourth case is a charge pump implemented in a 40-nm CMOS process. 36 design variables are considered in this case. Both intradie and interdie process variations are taken into account. Five specifications, including $\text{diff}_1$, $\text{diff}_2$, $\text{diff}_3$, $\text{diff}_4$, and $\text{deviation}$ are listed as:

\[
\begin{align*}
\text{diff}_1 &\leq 25 \mu A \\
\text{diff}_2 &\leq 25 \mu A \\
\text{diff}_3 &\leq 10 \mu A \\
\text{diff}_4 &\leq 10 \mu A \\
\text{deviation} &\leq 6 \mu A \\
\end{align*}
\]

where

\[
\begin{align*}
\text{diff}_1 &= I_{M1,\text{max}} - I_{M1,\text{avg}} \\
\text{diff}_2 &= I_{M1,\text{avg}} - I_{M1,\text{min}} \\
\text{diff}_3 &= I_{M2,\text{max}} - I_{M2,\text{avg}} \\
\text{diff}_4 &= I_{M2,\text{avg}} - I_{M2,\text{min}} \\
\text{deviation} &= |I_{M1,\text{avg}} - 40 \mu A| + |I_{M2,\text{avg}} - 40 \mu A|.
\end{align*}
\]

Table VI lists the experimental quality and speed comparisons of optimizations. Method [12] is regarded as the benchmark for it is faster than method [11]. All four methods can obtain the golden yield. Without loss of accuracy, FTBO and FTBO+ can gain 2.64× and 3.02× speedup over [12], respectively. As these results are similar to other experiments in Section V, this significant improvement validates the effectiveness of the proposed optimization approach on circuits in advance technology. The comparison of simulation resource allocation between [12] and the two proposed methods is shown in Fig. 18. Again, both methods significantly reduce the simulation costs of all yield bins, especially in the high-yield bin.

**E. Three-Stage Amplifier With Large Optimization Space**

Usually, in practical design, designers do not know exactly about the locations of the high-yield region. Hence, efficiently exploring a very large design space will be meaningful for designers. In the case of the three-stage amplifier in 0.35-μm process, we roughly expand 10× the optimization ranges for each of 24 design variables, and correspondingly the design space volume is expanded by more than $10^{24}$ times. The maximum number of iterations for all four methods is set to 1000.
The results in Table VII show that both [12] and FTBO fail to find the optimal design within the ten restarts. Method [11] has one successful result out of ten restarts, and the number of simulations is 17004. The FTBO+ can find the golden-yield designs at all ten restarts with an average of 4140 simulations is 17004. The FTBO has one successful result out of ten restarts, and the number of failure results demonstrated that the proposed method can gain a 2.47×10⁻⁵ speedup without loss of accuracy.

VI. CONCLUSION

In this article, a novel and efficient yield optimization approach was proposed for analog circuits. The yield analysis was integrated into the exploration process of Bayesian optimization. The freeze–thaw Bayesian optimization technique was utilized to automatically guide the search in the design space and gradually improve the accuracy in the process space. To further accelerate the yield optimization convergence, a novel performance optimization problem was formulated and solved to mine prior knowledge. Compared with the state-of-the-art methods, the experimental results demonstrated that the proposed method can gain a 2.47×10⁻⁵ speedup without loss of accuracy.

REFERENCES

Xiaodong Wang received the B.S. degree from the Department of Physics, Fudan University, Shanghai, China, in 2017, where he is currently pursuing the Ph.D. degree with the State Key Laboratory of Application Specific Integrated Circuits and System, Microelectronics Department. His current research interests include yield-related modeling and optimization.

Changhao Yan (Member, IEEE) received the B.E. and M.E. degrees from the Huazhong University of Science and Technology, Wuhan, China, in 1996 and 2002, respectively, and the Ph.D. degree in computer science from Tsinghua University, Beijing, China, in 2006. He is currently a Professor with the School of Microelectronics, Fudan University, Shanghai, China. His current research interests include analog circuit automation, yield analysis, parasitic parameter extraction, and design for manufacturability.

Yuzhe Ma (Member, IEEE) received the B.E. degree from the Department of Microelectronics, Sun Yat-sen University, Guangzhou, China, in 2016, and the Ph.D. degree from the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, in 2020. He is currently an Assistant Professor with Microelectronics Trust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou. His research interests include agile VLSI design methodologies, machine learning-aided VLSI design, and hardware-friendly machine learning.

Bei Yu (Member, IEEE) received the Ph.D. degree from the University of Texas at Austin, Austin, TX, USA, in 2014. He is currently an Associate Professor with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. Dr. Yu received the eight Best Paper Awards from ICCAD 2021 and 2013, ASPDAC 2021 and 2012, ICTAI 2019, Integration the VLSI Journal in 2018, ISPD 2017, SPIE Advanced Lithography Conference 2016, and six ICCAD/ISPD Contest Awards. He is an Editor of IEEE Technical Committee on Cyber-Physical Systems Newsletter. He has served as the TPC Chair for ACM/IEEE Workshop on Machine Learning for CAD and in many journal editorial boards and conference committees.

Fan Yang (Member, IEEE) received the B.S. degree from Xi’an Jiaotong University, Xi’an, China, in 2003, and the Ph.D. degree from Fudan University, Shanghai, China, in 2006. He is currently a Professor with the Microelectronics Department, Fudan University. His research interests include model order reduction, circuit simulation, high-level synthesis, yield analysis, and design for manufacturability.

Dian Zhou (Senior Member, IEEE) received the B.S. and Ph.D. degrees in electrical engineering from Fudan University, Shanghai, China, in 1982 and 1985, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Urbana, IL, USA, in 1990. He joined the University of North Carolina at Charlotte, Charlotte, NC, USA, as an Assistant Professor in 1990, where he became an Associate Professor in 1995. He joined the University of Texas at Dallas, Richardson, TX, USA, as a Full Professor in 1999. His research interests include high-speed VLSI systems, CAD tools, mixed-signal ICs, and algorithms.

Xuan Zeng (Senior Member, IEEE) received the B.S. degree in physics and the M.S. degree in electrical engineering from Fudan University, Shanghai, China, in 1982 and 1985, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Illinois at Urbana-Champaign, Urbana, IL, USA, in 1990. He is currently a Full Professor with the Microelectronics Department, Fudan University, where she served as the Director of the State Key Laboratory of Application Specific Integrated Circuits (ASIC) and Systems from 2008 to 2012. She was a Visiting Professor with the Department of Electrical Engineering, Texas A&M University, College Station, TX, USA, and the Microelectronics Department, Technische Universiteit Delft, Delft, The Netherlands, in 2002 and 2003, respectively. Her current research interests include analog circuit modeling and synthesis, design for manufacturability, high-speed interconnect analysis and optimization, and circuit simulation.

Prof. Zeng received the Changjiang Distinguished Professor with the Ministry of Education Department of China in 2014, the Chinese National Science Funds for Distinguished Young Scientists in 2011, the First-Class of Natural Science Prize of Shanghai in 2012, the 10th For Women in Science Award in China in 2013, and the Shanghai Municipal Natural Science Peony Award in 2014. She received the Best Paper Award from the 8th IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference 2017. She is an Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART II: EXPRESS BRIEFS, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, and ACM Transactions on Design Automation of Electronic Systems.