Authors: Narangerelt Batsoyol, Benjamin Pullman, Mingxun Wang, Nuno Bandeira, and Steven Swanson (University of California, San Diego (UCSD))
Abstract: Queries of multi-TB Mass Spectrometry (MS) repositories provide deep insights into biological processes and pose challenging data processing problems. The key bottleneck for running these queries is the number of small random reads. Byte-addressable persistent main memory (PMEM) technologies enable real-time MS search systems by delivering low-latency, high-bandwidth storage.
This work presents P-MASSIVE, real-time multi-terabyte scale MS search system. P-MASSIVE takes advantage of PMEM and the underlying nature of its data access patterns to maximize performance. We evaluate P-MASSIVE across various storage hierarchies and project forward over the next decade to understand how MS query systems might evolve.
Our evaluation shows that P-MASSIVE offers a cost-effective solution that achieves near-DRAM performance. A single query takes 1.7 seconds in P-MASSIVE, 69× faster than state-of-the-art implementation. In an end-to-end, user-facing application, P-MASSIVE delivers a 90% shorter wait time than the latest MS search tool, returning results within seconds rather than minutes.
Back to Technical Papers Archive Listing