Workshop: PDSW22: 7th International Parallel Data Systems Workshop
Authors: Safdar Jamil (Sogang University, South Korea); Awais Khan (Oak Ridge National Laboratory (ORNL)); Kihyun Kim (Sogang University, South Korea); Jae-Kook Lee, Dosil An, and Taeyoung Hong (Korea Institute of Science and Technology Information (KISTI)); Sarp Oral (Oak Ridge National Laboratory (ORNL)); and Youngjae Kim (Sogang University, South Korea)
Abstract: HPC facilities have employed flash-based storage tier near to compute nodes to absorb high I/O demand by HPC applications during periodic system-level checkpoints. To accelerate these checkpoints, proxy-based distributed key-value stores (PD-KVS) gained particular attention for their flexibility to support multiple backends and network configurations. PD-KVS rely internally on monolithic KVS, such as RocksDB, to exploit the KV interface and query support. However, PD-KVS are unaware of high redundancy factor in checkpoint data, which can be up to GBs to TBs, and therefore, tend to generate high write and space amplification on these storage layers. We propose DENKV which is deduplication-extended node-local LSM-tree-based KVS. DENKV employs asynchronous partially inline dedup (APID) and aims to maintain the performance characteristics of LSM-tree-based KVS while reducing the write and space amplification problems. We implemented DENKV atop BlobDB and showed that our solution maintains performance while reducing write and space amplification.
Back to PDSW22: 7th International Parallel Data Systems Workshop Archive Listing