ADT-FSE: A New Encoder for SZTao Lü, Yu Zhong, Z. Sun et al.|Unknown|2023 SZ is a lossy floating-point data compressor that excels in compression ratio and throughput for high-performance computing (HPC), time series databases, and deep learning applications. However, SZ performs poorly for small chunks and has slow decompression. We pinpoint the Huffman tree in the quantization factor encoder as the bottleneck of SZ. In this paper, we propose ADT-FSE, a new quantization factor encoder for SZ. Based on the Gaussian distribution of quantization factors, we design an adaptive data transcoding (ADT) scheme to map quantization factors to codes for better compressibility, and then use finite state entropy (FSE) to compress the codes. Experiments show that ADT-FSE improves the quantization factor compression ratio, compression and decompression throughput by up to 5×, 2× and 8×, respectively, over the original SZ Huffman encoder. On average, SZ_ADT is over 2× faster than ZFP in decompression. Case studies of the TDengine time series database and HDF5 file store confirm that SZ_ADT significantly boosts user-perceived application performance. In addition, ADT-FSE makes the compression ratio prediction of SZ_ADT easy and accurate, and has the potential to dramatically reduce the area size of SZ hardware implementation.
HA-CSD: Host and SSD Coordinated Compression for Capacity and PerformanceIntegrating data compression capability into SSDs has demonstrated great potential to improve the utilization and lifetime of the storage device and also the performance of the entire system. It is advocated to add a hardware engine into the SSD for low-latency compression and decompression. However, this requires a new and long hardware product development cycle, which would prevent current storage systems from reaping the benefits of in-SSD compression. In this paper, we explore a software-based in-SSD compression solution, which can be delivered to users quickly through a simple SSD firmware update. The most critical challenge is the severe performance bottleneck caused by compression and decompression, as the in-SSD embedded CPU has quite limited computing power. To tackle this challenge, we propose a host-assisted computational storage device, called HA-CSD. It employs an offline, data hotness- and compressibility-aware compression strategy to remove compression from the critical write I/O path. A novel decompression architecture is devised to utilize the powerful host CPU for fast decompression. We implement HA-CSD in a commercial enterprise SSD with a code change of more than 25K lines in the host NVMe driver and SSD firmware. Experimental results show that HA-CSD achieves 2.1GB/s and 5.2GB/s read and write bandwidth. Compared with RocksDB built-in compression, HA-CSD can increase the YCSB benchmark throughput by up to 5.7×, and improve the host CPU efficiency significantly.
Holistic and Opportunistic Scheduling of Background I/Os in Flash-Based SSDsYu Wang, You Zhou, Fei Wu et al.|IEEE Transactions on Computers|2023 <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Background (BG)</i> tasks are maintained indispensably in multiple layers of storage systems, from applications to flash-based SSDs. They launch a large amount of I/Os, causing significant interference with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">foreground (FG)</i> I/O performance. Our key insight is that, to mitigate such interference, holistic scheduling of system-wide, multi-source BG I/Os is required and can only be realized at the underlying SSD layer. Only the SSD has a global view of all FG and BG I/Os as well as direct information and control about flash storage resources. We are thus inspired to propose a novel I/O scheduling architecture, called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HuFu</i> . It provides a framework for host software to register BG tasks and offload their I/O scheduling into the SSD. Then, the SSD-internal I/O scheduler prioritizes FG I/O processing, while BG I/Os are scheduled opportunistically by utilizing flash parallelism and idleness. To verify <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HuFu</i> , we perform case studies on RocksDB and compares it with several state-of-the-art host-side I/O scheduling schemes. Experimental results show that <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">HuFu</i> can significantly alleviate performance interference caused by BG I/Os and improve SSD bandwidth utilization, thus improving the FG throughput, average and tail latencies (e.g., by about 18% in a write-heavy workload).