J

Jan Gray

Edith Cowan University

Publishes on Education Systems and Policy, Parallel Computing and Optimization Techniques, Interconnection Networks and Systems. 59 papers and 2.5k citations.

59Publications
2.5kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

A reconfigurable fabric for accelerating large-scale datacenter services
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung et al.|ACM SIGARCH Computer Architecture News|2014
Cited by 709

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29%

A reconfigurable fabric for accelerating large-scale datacenter services
Cited by 570Open Access

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurable fabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6×8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution—or, while maintaining equivalent throughput, reduces the tail latency by 29%.

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
Cited by 209Open Access

To advance datacenter capabilities beyond what commodity server designs can provide, the authors designed and built a composable, reconfigurable fabric to accelerate large-scale software services. Each instantiation of the fabric consists of a 6 x 8 2D torus of high-end field-programmable gate arrays (FPGAs) embedded into a half-rack of 48 servers. The authors deployed the reconfigurable fabric in a bed of 1,632 servers and FPGAs in a production datacenter and successfully used it to accelerate the ranking portion of the Bing Web search engine by nearly a factor of two.

A ‘Formidable Challenge’: Australia's Quest for Equity in Indigenous Education
Jan Gray, Quentin Beresford|Australian Journal of Education|2008
Cited by 156

Indigenous education in Australia has been the subject of ongoing policy focus and repeated official inquiry as the nation grapples with trying to achieve equity for these students. Perspectives from recent developments in the USA and Canada highlight the similarity of challenges. The article employs a multidisciplinary approach to social theory to examine the underlying causes of the creation of a plateau effect of progress in this area. The article argues that the lack of progress is a reflection of a complex set of underlying factors, many of which are under acknowledged in educational debates. Arising from this examination is the need for a new governance model for Indigenous education involving both horizontal and vertical policy-making structures.

Hoplite: Building austere overlay NoCs for FPGAs
Nachiket Kapre, Jan Gray|Unknown|2015
Cited by 120Open Access

Customized unidirectional, bufferless, deflection-routed torus networks can outperform classic, bidirectional, buffered mesh networks for single-flit-oriented FPGA applications by as much as 1.5× (best achievable throughputs for a 10×10 system) or 2.5× (allocating same FPGA resources to both NoCs) for uniform random traffic. We present Hoplite, an efficient, lightweight, fast FPGA overlay NoC that is designed to be small and compact by (1) eliminating input buffers, and (2) reducing the cost of switch crossbar that have traditionally limited speeds and imposed heavy resource costs in conventional FPGA overlay NoCs. We implement bufferless deflection routing cheaply, requiring the generation of only output multiplexer controls and no backpressure handshakes. Additionally, we use directional channels that help reduce crossbar cost by restricting the number of inputs to the crossbar to three instead of four. When compared to buffered mesh switches, FPGA-based deflection routers are ≈3.5× smaller (HLS-generated switch) and 2.5× faster (clock period) for 32b payloads. In a separate experiment, we hand-crafted a prototype RTL version of our switch with RLOCS that requires only 60 LUTs and 100 FFs per router and runs at 2.9 ns.