Author: Pengmiao Zhang (University of Southern California (USC))
Advisor: Viktor K. Prasanna (University of Southern California (USC))
Abstract: With the rise of Big Data, there has been a significant effort in increasing compute power through GPUs, TPUs, and heterogeneous architectures. As a result, the bottleneck of applications is shifting toward memory performance. Prefetching techniques are widely used to hide memory latency and improve instructions per cycle (IPC). A data prefetching process is a form of speculation that looks at memory access patterns to forecast the near future accesses and avoid cache misses. Traditional hardware data prefetchers use pre-defined rules, which are not powerful enough to adapt to the increasingly complex memory access patterns from new workloads.
We hypothesize that a machine learning-based prefetcher can be developed to achieve high-quality memory access prediction, leading to the improvement of IPC for a system. We develop several optimizations for ML-based prefetching. First, we propose RAOP, a framework for RNN augmented offset prefetcher, in which RNN provides temporal references for a spatial offset prefetcher, leading to the improvement of IPC. Second, we propose C-MemMAP, which provides clusters for downstream meta-models to balance the model size and prediction accuracy. We propose DM (delegated model) clustering method that learns latent patterns from long memory traces, which has significantly raised the prediction accuracy of the meta-models. Third, we propose TransFetch, an attention-based prefetcher that supports variable-degree prefetching by modeling prefetching as a multi-label classification problem. In addition, we propose ReSemble, a Reinforcement Learning (RL) based adaptive ensemble framework that enables multiple prefetchers to complement each other on hybrid applications and updates online.
Thesis Canvas: pdf