Aging-Aware SoC Design
Principle Investigators: Qiang Xu
Graduate Students: Lin Huang
Today's system-on-a-chips (SoCs) typically contain one or more embedded processors. While the relentless scaling of CMOS technology has brought such processor-based systems with enhanced functionality and improved performance in every new generation, at the same time, the associated ever-increasing on-chip power and temperature densities make failure mechanisms such as electromigration and time dependent dielectric breakdown (TDDB) serious threats for the lifetime reliability of embedded processors. Industrial studies have shown that the failure rates for electrical systems within its warrantee period can be very high (as high as 16% for Xbox) and the main reason for such high failure rate was traced to over-heating of the embedded processors.
Most prior work assumed an exponential lifetime distribution for failure mechanisms. In other words, the failure rate of the circuit is assumed to be only dependent on its instantaneous behavior (e.g., temperature and voltage), independent of its usage history. This assumption is apparently inaccurate: a typical wear-out failure mechanism will have increasing failure rate as the circuit ages.
In this project, we plan to target failure mechanisms with arbitrary lifetime distributions so that we could take the aging effects of failure mechanisms into account. The basic idea is to introduce a so-called aging rate concept to "hide" the impact of the SoC's reliability-related usage strategies with a single value, and use it to express reliability function.
Papers and Presentations
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms, accepted for publication in Proc. IEEE/ACM Design, Automation, and Test in Europe (DATE), Apr. 2009.
On Modeling the Lifetime Reliability of Homogeneous Manycore Systems, accepted for publication in Proc. IEEE Pacific Rim International Symposium on Dependable Computing (PRDC), Dec. 2008.