PDSW22 – Invited Talk: Splinters – Distributed IO Sampling for Cloud Data Centers – Design and Applications

Workshop: PDSW22: 7th International Parallel Data Systems Workshop

Authors: Arif Merchant (Google LLC)

Abstract: Splinters is a distributed system for sampling IO metadata in Google data centers. It has been deployed in production for several years, and is the main engine for the analysis of storage systems and workloads in Google. Given the scale of the storage infrastructure, reliably collecting and processing the IO samples is a complex problem, and we explain how we design around the various challenges. We show how the collected IO samples are used for ad-hoc queries and longitudinal analysis. We also outline several applications where we used the IO samples for the design and implementation of new systems.


