Separating Data via Block Invalidation Time Inference for Write Amplification Reduction in Log-Structured Storage

Wang, Qiuping; Li, Jinhong; Lee, Patrick P. C.; Ouyang, Tao; Shi, Chao; Huang, Lilong

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2104.12425 (cs)

[Submitted on 26 Apr 2021 (v1), last revised 10 Feb 2022 (this version, v3)]

Title:Separating Data via Block Invalidation Time Inference for Write Amplification Reduction in Log-Structured Storage

Authors:Qiuping Wang, Jinhong Li, Patrick P. C. Lee, Tao Ouyang, Chao Shi, Lilong Huang

View PDF

Abstract:Log-structured storage has been widely deployed in various domains of storage systems, yet its garbage collection incurs write amplification (WA) due to the rewrites of live data. We show that there exists an optimal data placement scheme that minimizes WA using the future knowledge of block invalidation time (BIT) of each written block, yet it is infeasible to realize in practice. We propose a novel data placement algorithm for reducing WA, SepBIT, that aims to infer the BITs of written blocks from storage workloads and separately place the blocks into groups with similar estimated BITs. We show via both mathematical and production trace analyses that SepBIT effectively infers the BITs by leveraging the write skewness property in practical storage workloads. Trace analysis and prototype experiments show that SepBIT reduces WA and improves I/O throughput, respectively, compared with state-of-the-art data placement schemes. SepBIT is currently deployed to support the log-structured block storage management at Alibaba Cloud.

Comments:	19 pages. Accepted by the 20th USENIX Conference on File and Storage Technologies (FAST '22)
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2104.12425 [cs.DC]
	(or arXiv:2104.12425v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2104.12425

Submission history

From: Qiuping Wang [view email]
[v1] Mon, 26 Apr 2021 09:36:24 UTC (565 KB)
[v2] Tue, 27 Apr 2021 02:29:39 UTC (565 KB)
[v3] Thu, 10 Feb 2022 11:07:27 UTC (886 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Separating Data via Block Invalidation Time Inference for Write Amplification Reduction in Log-Structured Storage

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Separating Data via Block Invalidation Time Inference for Write Amplification Reduction in Log-Structured Storage

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators