Serving DNNs in Real Time at Datacenter Scale with Project Brainwave

Eric S. Chung(Microsoft Research (United Kingdom)), Jeremy Fowers(Microsoft Research (United Kingdom)), Kalin Ovtcharov(Microsoft Research (United Kingdom)), Michael Papamichael(Microsoft Research (United Kingdom)), Adrian M. Caulfield(Microsoft Research (United Kingdom)), Todd Massengill(Microsoft (Finland)), Ming Liu(Microsoft (Finland)), Daniel Lo(Microsoft Research (United Kingdom)), Shlomi Alkalay(Microsoft Research (United Kingdom)), Michael Haselman(Microsoft (Finland)), Maleen Abeydeera(Microsoft (Finland)), Logan Adams(Microsoft (Finland)), Hari Angepat(Microsoft Research (United Kingdom)), Christian Boehn(Microsoft (Finland)), Derek Chiou(Microsoft (Finland)), Oren Firestein(Microsoft (Finland)), Alessandro Forin(Microsoft (Finland)), Kang Su Gatlin(Microsoft Research (United Kingdom)), Mahdi Ghandi(Microsoft Research (United Kingdom)), Stephen Heil(Microsoft (Finland)), Kyle Holohan(Microsoft Research (United Kingdom)), Ahmad El Husseini(Microsoft Research (United Kingdom)), Tamás Juhász(Microsoft Research (United Kingdom)), Kara Kagi(Microsoft (Finland)), Ratna K. Kovvuri(Microsoft Research (United Kingdom)), Sitaram Lanka(Microsoft Research (United Kingdom)), Friedel van Megen(Microsoft (Finland)), Dima Mukhortov(Microsoft (Finland)), Prerak Patel(Microsoft (Finland)), Brandon Perez(Microsoft (Finland)), Amanda Rapsang(Microsoft Research (United Kingdom)), Steven K. Reinhardt(Microsoft Research (United Kingdom)), Bita Darvish Rouhani(Universidad Católica Santo Domingo), Adam Sapek(Microsoft Research (United Kingdom)), Raja Seera(Microsoft (Finland)), Sangeetha Shekar(Microsoft Research (United Kingdom)), Balaji Sridharan(Microsoft (Finland)), Gabriel Weisz(Microsoft Research (United Kingdom)), Lisa Woods(Microsoft (Finland)), Phillip Yi Xiao(Microsoft Research (United Kingdom)), Dan Zhang(Microsoft Research (United Kingdom)), Ritchie Zhao(Cornell University), Doug Burger(Microsoft (Finland))
IEEE Micro
March 1, 2018
Cited by 337

Abstract

To meet the computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance. Project Brainwave, Microsofts principal infrastructure for AI serving in real time, accelerates deep neural network (DNN) inferencing in major services such as Bings intelligent search features and Azure. Exploiting distributed model parallelism and pinning over low-latency hardware microservices, Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes. A high-performance, precision-adaptable FPGA soft processor is at the heart of the system, achieving up to 39.5 teraflops (Tflops) of effective performance at Batch 1 on a state-of-the-art Intel Stratix 10 FPGA.


Related Papers

No related papers found

Powered by citation graph analysis