U. Ko

A 65-nm Mobile Multimedia Applications Processor with an Adaptive Power Management Scheme to Compensate for Variations

H. Mair, A. Wang, Gordon Gammie et al.|Unknown|2007

Cited by 65

In this paper we present the SmartReflex ™ power management techniques implemented on the OMAP3430 Mobile Multimedia Applications Processor. By using multiple voltage domains, fine grain power domains, split-rail memories, and adaptive compensation, SoC active power reduction of 66% and leakage power reduction of 2~3 orders of magnitude was achieved. OMAP3430 contains more than 150M transistors.

A 28 nm 0.6 V Low Power DSP for Mobile Applications

Nathan Ickes, Gordon Gammie, Mahmut E. Sinangil et al.|IEEE Journal of Solid-State Circuits|2011

Cited by 59

Processors for next generation mobile devices will need to operate across a wide supply voltage range in order to support both high performance and high power efficiency modes of operation. However, the effects of local transistor threshold ( V T ) variation, already a significant issue in today's advanced process technologies, and further exacerbated at low voltages, complicate the task of designing reliable, manufacturable systems for ultra-low voltage operation. In this paper, we describe a 4-issue VLIW DSP system-on-chip (SoC), which operates at voltages from 1.0 V down to 0.6 V. The SoC was implemented in 28 nm CMOS, using a cell library and SRAMs optimized for both high-speed and low-voltage operating points. A new statistical static timing analysis (SSTA) methodology was also used on this design, in order to more accurately model the effects of local V T variation and achieve a reliable design with minimal pessimism.

Energy optimization of multilevel cache architectures for RISC and CISC processors

U. Ko, Poras T. Balsara, A.K. Nanda|IEEE Transactions on Very Large Scale Integration (VLSI) Systems|1998

Cited by 59

In this paper, we present the characterization and design of energy-efficient, on chip cache memories. The characterization of power dissipation in on-chip cache memories reveals that the memory peripheral interface circuits and bit array dissipate comparable power. To optimize performance and power in a processor's cache, a multidivided module (MDM) cache architecture is proposed to conserve energy in the bit array as well as the memory peripheral circuits. Compared to a conventional, nondivided, 16-kB cache, the latency and power of the MDM cache are reduced by a factor of 1.9 and 4.6, respectively. Based on the MDM cache architecture, the energy efficiency of the complete memory hierarchy is analyzed with respect to cache parameters in a multilevel processor cache design. This analysis was conducted by executing the SPECint92 benchmark programs with the miss ratios for reduced instruction set computer (RISC) and complex instruction set computer (CISC) machines.

A repeater optimization methodology for deep sub-micron, high-performance processors

D. Li, A. Pua, P. Srivastava et al.|Unknown|1997

Cited by 36

As process technology scales down to deep sub-micron and the frequency of a high-performance processor increases beyond 300 MHz, coupling induced signal integrity problems become more severe. Ignoring coupling effects can lead to functional failures or speed degradation. As a result, the traditional approach of repeater insertion driven by propagation delay and slew rate optimization becomes inadequate. The authors propose a design methodology to select optimal repeaters for high-performance processors by considering not only the delay and slew rate, but also crosstalk effects. A concurrent decision diagram (CDD) is further suggested to achieve crosstalk constraints with various trade-offs.

A self-timed method to minimize spurious transitions in low power CMOS circuits

U. Ko, Poras T. Balsara, W. Lee|Unknown|2002

Cited by 21

Spurious transitions and associated power are inherent disadvantages of a static logic design. Though pre-charged dynamic logic has the advantage of one valid transition per clock cycle, it has a considerable power overhead . In this paper, a low power self-timed double pass-gate logic (DPL) circuit combining the merits of dynamic and static logic families is proposed to minimize power in a 32-bit carry look-ahead static adder. This technique can be applied to any static circuit implementation, at any level of design hierarchy where power and performance are important. For a 100 MHz, 32-bit adder implementation in a 0.6 /spl mu/m CMOS technology results on output spurious transition density, total power dissipation and energy efficiency for different loads are presented.

Is this you? Claim your profile.

Top publicationsby citations