J

Joo-Young Kim

SK Group (United States)

ORCID: 0000-0001-5396-8961

Publishes on Parallel Computing and Optimization Techniques, Interconnection Networks and Systems, Educational Systems and Policies. 87 papers and 5.1k citations.

87Publications
5.1kTotal Citations

Is this you? Claim your profile.

Add your photo, update your bio, and get notified when your ranking changes.

Top publicationsby citations

Metaverse beyond the hype: Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy
Yogesh K. Dwivedi, Laurie Hughes, Abdullah M. Baabdullah et al.|International Journal of Information Management|2022
Cited by 2.6kOpen Access

The metaverse has the potential to extend the physical world using augmented and virtual reality technologies allowing users to seamlessly interact within real and simulated environments using avatars and holograms. Virtual environments and immersive games (such as, Second Life, Fortnite, Roblox and VRChat) have been described as antecedents of the metaverse and offer some insight to the potential socio-economic impact of a fully functional persistent cross platform metaverse. Separating the hype and “meta…” rebranding from current reality is difficult, as “big tech” paints a picture of the transformative nature of the metaverse and how it will positively impact people in their work, leisure, and social interaction. The potential impact on the way we conduct business, interact with brands and others, and develop shared experiences is likely to be transformational as the distinct lines between physical and digital are likely to be somewhat blurred from current perceptions. However, although the technology and infrastructure does not yet exist to allow the development of new immersive virtual worlds at scale - one that our avatars could transcend across platforms, researchers are increasingly examining the transformative impact of the metaverse. Impacted sectors include marketing, education, healthcare as well as societal effects relating to social interaction factors from widespread adoption, and issues relating to trust, privacy, bias, disinformation, application of law as well as psychological aspects linked to addiction and impact on vulnerable people. This study examines these topics in detail by combining the informed narrative and multi-perspective approach from experts with varied disciplinary backgrounds on many aspects of the metaverse and its transformational impact. The paper concludes by proposing a future research agenda that is valuable for researchers, professionals and policy makers alike.

A reconfigurable fabric for accelerating large-scale datacenter services
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung et al.|ACM SIGARCH Computer Architecture News|2014
Cited by 709

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurablefabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6x8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution--- or, while maintaining equivalent throughput, reduces the tail latency by 29%

A reconfigurable fabric for accelerating large-scale datacenter services
Cited by 570Open Access

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurable fabric to accelerate portions of large-scale software services. Each instantiation of the fabric consists of a 6×8 2-D torus of high-end Stratix V FPGAs embedded into a half-rack of 48 machines. One FPGA is placed into each server, accessible through PCIe, and wired directly to other FPGAs with pairs of 10 Gb SAS cables. In this paper, we describe a medium-scale deployment of this fabric on a bed of 1,632 servers, and measure its efficacy in accelerating the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system when ranking candidate documents. Under high load, the largescale reconfigurable fabric improves the ranking throughput of each server by a factor of 95% for a fixed latency distribution—or, while maintaining equivalent throughput, reduces the tail latency by 29%.

A cloud-scale acceleration architecture
Cited by 432

Hyperscale datacenter providers have struggled to balance the growing need for specialized hardware (efficiency) with the economic benefits of homogeneity (manageability). In this paper we propose a new cloud architecture that uses reconfigurable logic to accelerate both network plane functions and applications. This Configurable Cloud architecture places a layer of reconfigurable logic (FPGAs) between the network switches and the servers, enabling network flows to be programmably transformed at line rate, enabling acceleration of local applications running on the server, and enabling the FPGAs to communicate directly, at datacenter scale, to harvest remote FPGAs unused by their local servers. We deployed this design over a production server bed, and show how it can be used for both service acceleration (Web search ranking) and network acceleration (encryption of data in transit at high-speeds). This architecture is much more scalable than prior work which used secondary rack-scale networks for inter-FPGA communication. By coupling to the network plane, direct FPGA-to-FPGA messages can be achieved at comparable latency to previous work, without the secondary network. Additionally, the scale of direct inter-FPGA messaging is much larger. The average round-trip latencies observed in our measurements among 24, 1000, and 250,000 machines are under 3, 9, and 20 microseconds, respectively. The Configurable Cloud architecture has been deployed at hyperscale in Microsoft's production datacenters worldwide.

Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
Cited by 315

Recent breakthroughs in the development of multi-layer convolutional neural networks have led to stateof-the-art improvements in the accuracy of non-trivial recognition tasks such as large-category image classification and automatic speech recognition [1]. These many-layered neural networks are large, complex, and require substantial computing resources to train and evaluate [2]. Unfortunately, these demands come at an inopportune moment due to the recent slowing of gains in commodity processor performance. Hardware specialization in the form of GPGPUs, FPGAs, and ASICs offers a promising path towards major leaps in processing capability while achieving high energy efficiency. To harness specialization, an effort is underway at Microsoft to accelerate Deep Convolutional Neural Networks (CNN) using servers augmented with FPGAs—similar to the hardware that is being integrated into some of Microsoft’s datacenters [3]. Initial efforts to implement a single-node CNN accelerator on a mid-range FPGA show significant promise, resulting in respectable performance relative to prior FPGA designs and high-end GPGPUs, at a fraction of the power. In the future, combining multiple FPGAs over a low-latency communication fabric offers further opportunity to train and evaluate models of unprecedented size and quality. Background State-of-the-art deep convolutional neural networks are typically organized into alternating convolutional and max-pooling neural network layers followed by a number of dense, fully-connected layers—as illustrated in the well-known topology by Krizhevsky et al. in Figure 1 [1]. Each 3D volume represents an input to a layer, and is transformed into a new 3D volume feeding the subsequent layer. In the example below, there are five convolutional layers, three max-pooling layers, and three fully-connected layers. Figure 1. Example of Deep Convolutional Neural Network for Image Classification. Image source: [1]. 1 General Purpose Computing on Graphics Processing Units, Field Programmable Gate Arrays, ApplicationSpecific Integrated Circuits.