| Title: | Correct-by-construction Microarchitectural Pipelining |
| Date: |
July 25, 2008 (Friday)
|
| Time: |
2:30 p.m. - 3:30 p.m.
|
| Venue: |
Room 121, 1/F, Ho Sin-hang Engineering Building,
The Chinese University of Hong Kong, Shatin, N.T. |
| Speaker: |
Dr. Timothy Kam
Principal Engineer Intel's Strategic CAD Laboratories USA |
To introduce more elements of science into system design and explore gradual introduction of latencies into high-level functional models used in ESL domain, we research on provably-correct design transformations applicable in microarchitecture development. In this work, we focus on microarchitectural pipelining which enhances system throughput without massive hardware duplication. Given an architectural or microarchitectural specification, our problem is to find a pipeline computing the same function, through a sequence of correct-by-construction transformation steps, such that its throughput performance is optimized. Our contribution is on pipelining of cyclic systems, specifically of logic in closed loop with register-file and/or memory.
The key impediment to pipelined processor performance is the stalling of the pipeline due to data dependency between different instructions. Data hazards are commonly resolved via two techniques: forwarding and stalling. Register operands can be forwarded from the results of earlier dependent instructions, via a register-file bypass network. It can be formally contructed via the known and useful bypass transform, formally defined as: Write-data from a previous cycle can be bypassed as read-data if the read-address is identical to the previous write-address. Beyond register-file bypassing, stalling via interlocked pipeline is crucial for additional throughput improvement.
Standard bypass and retiming transforms cannot pipeline cyclic systems, e.g. datapth in cyclic connection with register-file. We extend transform methods for handling systems with cycles (such as instruction set architecture ISA specifications) by using synchronous elastic systems and adding new correct-by-construction transformations that are valid for such systems. Synchronous Elastic Systems (aka Latency Tolerant Systems) can tolerate changes in latencies of computations and communications. In such systems it is possible to insert empty FIFOs, increase capacity of FIFOs, introduce early enabling of multiplexer nodes, and insert anti-tokens to kill irreleveant activity in the machine. We demonstrate how using these transformations in addition to standard bypass and retiming allows one to pipeline cyclic systems in a correct-by-construction fashion. Specifically our implementation of stalling relies on elastic interfaces between pipe stages, and can be constructed naturally using trusted transforms. Instead of implementing a monolithic stall controller, the elastic control generated is fully distributed alongside the pipeline datapath. Manual crafting of bypass network and stall controller can be laborous and error prone. Our formal method enables user-guided pipeline construction which is provably-correct and as compact as a manual design for our ISA testcase.
Pipelining with bypassing is a ubiquious technique applicable to SoC and ESL designs. We have illustrated our transformation techniques using simple ISA specifications because they are clear and well understandable examples. In a cycle, it fetches and executes an arithmetic (reg-reg) instruction or a memory instruction. First elastic bypass transform is applied multiple times on the register-file to construct a complete bypass network. Bypass paths are dynamically multiplexed by inter-instruction data dependency through an early evaluation elastic multiplexor. To pipeline the execution datapath via retiming, registers need to be created on some bypass paths together with anti-token insertions. The anti-tokens along each bypass path correspond to the minimum stall bubbles required for the dependency case.
BIOGRAPHY:
Timothy Kam received the B.Sc. degree (with first class honors) from the University College, University of London, London, U.K., in 1986, and the M.S. and Ph.D. degrees from the University of California, Berkeley, in 1990 and 1995, respectively, all in electrical engineering and computer sciences.
Between 1986 and 1989, he was an IC Design Engineer at Motorola. He has spent summers working at AT&T Bell Laboratories and Hewlett Packard. Since 1995, he has contributed research and leadership at Intel's Strategic CAD Laboratories, where he is currently a Principal Engineer. He coauthored two technical books: "Synthesis of FSMs: functional optimization" and "Synthesis of FSMs: logic optimization." His research interests include high-level design of microarchitectures, logic synthesis, and verification of integrated circuits. He holds two U.S. patents.
Dr. Kam was the recipient of a best paper award at the 36th Design Automation Conference. He received the SRC Mahboob Khan Outstanding Mentor Award in 2004. He was the General Chair of the IEEE International Workshop on Logic and Synthesis, in 2001. He serves on numerous technical program committees.
Enquiries: Miss Temmy So at tel 2609 8444
For more information, please refer to http://www.cse.cuhk.edu.hk/seminar