A reconfigurable fabric for accelerating large-scale datacenter services

Andrew Putnam(Microsoft Research (United Kingdom)), Adrian M. Caulfield(Microsoft Research (United Kingdom)), Eric S. Chung(Microsoft Research (United Kingdom)), Derek Chiou(Microsoft (United States)), Kypros Constantinides(Microsoft (United States)), John Demme(Columbia University), Hadi Esmaeilzadeh(Microsoft (United States)), Jeremy Fowers(Microsoft Research (United Kingdom)), Gopi Prashanth Gopal(Microsoft Research (United Kingdom)), Jan Gray(Microsoft Research (United Kingdom)), Michael Haselman(Microsoft Research (United Kingdom)), Scott Hauck(Microsoft (United States)), Stephen Heil(Microsoft Research (United Kingdom)), Amir Hormati(Google (United States)), Joo-Young Kim(Microsoft Research (United Kingdom)), Sitaram Lanka(Microsoft Research (United Kingdom)), James R. Larus(Microsoft (United States)), Eric Peterson(Microsoft Research (United Kingdom)), Simon Pope(Microsoft Research (United Kingdom)), A. Gordon Smith(Microsoft Research (United Kingdom)), Jason Thong(Microsoft Research (United Kingdom)), Phillip Yi Xiao(Microsoft Research (United Kingdom)), Doug Burger(Microsoft Research (United Kingdom))
ACM SIGARCH Computer Architecture News
June 14, 2014
Cited by 709

Abstract

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29%


Related Papers