A 65-nm Mobile Multimedia Applications Processor with an Adaptive Power Management Scheme to Compensate for VariationsIn this paper we present the SmartReflex <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">™</sup> power management techniques implemented on the OMAP3430 Mobile Multimedia Applications Processor. By using multiple voltage domains, fine grain power domains, split-rail memories, and adaptive compensation, SoC active power reduction of 66% and leakage power reduction of 2~3 orders of magnitude was achieved. OMAP3430 contains more than 150M transistors.
A 28 nm 0.6 V Low Power DSP for Mobile ApplicationsNathan Ickes, Gordon Gammie, Mahmut E. Sinangil et al.|IEEE Journal of Solid-State Circuits|2011 Processors for next generation mobile devices will need to operate across a wide supply voltage range in order to support both high performance and high power efficiency modes of operation. However, the effects of local transistor threshold ( <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">V</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</sub> ) variation, already a significant issue in today's advanced process technologies, and further exacerbated at low voltages, complicate the task of designing reliable, manufacturable systems for ultra-low voltage operation. In this paper, we describe a 4-issue VLIW DSP system-on-chip (SoC), which operates at voltages from 1.0 V down to 0.6 V. The SoC was implemented in 28 nm CMOS, using a cell library and SRAMs optimized for both high-speed and low-voltage operating points. A new statistical static timing analysis (SSTA) methodology was also used on this design, in order to more accurately model the effects of local <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">V</i> <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">T</sub> variation and achieve a reliable design with minimal pessimism.
Energy optimization of multilevel cache architectures for RISC and CISC processorsU. Ko, Poras T. Balsara, A.K. Nanda|IEEE Transactions on Very Large Scale Integration (VLSI) Systems|1998 In this paper, we present the characterization and design of energy-efficient, on chip cache memories. The characterization of power dissipation in on-chip cache memories reveals that the memory peripheral interface circuits and bit array dissipate comparable power. To optimize performance and power in a processor's cache, a multidivided module (MDM) cache architecture is proposed to conserve energy in the bit array as well as the memory peripheral circuits. Compared to a conventional, nondivided, 16-kB cache, the latency and power of the MDM cache are reduced by a factor of 1.9 and 4.6, respectively. Based on the MDM cache architecture, the energy efficiency of the complete memory hierarchy is analyzed with respect to cache parameters in a multilevel processor cache design. This analysis was conducted by executing the SPECint92 benchmark programs with the miss ratios for reduced instruction set computer (RISC) and complex instruction set computer (CISC) machines.
A repeater optimization methodology for deep sub-micron, high-performance processorsD. Li, A. Pua, P. Srivastava et al.|Unknown|1997 As process technology scales down to deep sub-micron and the frequency of a high-performance processor increases beyond 300 MHz, coupling induced signal integrity problems become more severe. Ignoring coupling effects can lead to functional failures or speed degradation. As a result, the traditional approach of repeater insertion driven by propagation delay and slew rate optimization becomes inadequate. The authors propose a design methodology to select optimal repeaters for high-performance processors by considering not only the delay and slew rate, but also crosstalk effects. A concurrent decision diagram (CDD) is further suggested to achieve crosstalk constraints with various trade-offs.
A self-timed method to minimize spurious transitions in low power CMOS circuitsSpurious transitions and associated power are inherent disadvantages of a static logic design. Though pre-charged dynamic logic has the advantage of one valid transition per clock cycle, it has a considerable power overhead . In this paper, a low power self-timed double pass-gate logic (DPL) circuit combining the merits of dynamic and static logic families is proposed to minimize power in a 32-bit carry look-ahead static adder. This technique can be applied to any static circuit implementation, at any level of design hierarchy where power and performance are important. For a 100 MHz, 32-bit adder implementation in a 0.6 /spl mu/m CMOS technology results on output spurious transition density, total power dissipation and energy efficiency for different loads are presented.