SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Technical Papers Archive

EL-Rec: Efficient Large-Scale Recommendation Model Training via Tensor-Train Embedding Table

Authors: Zheng Wang, Yuke Wang, and Boyuan Feng (University of California, Santa Barbara); Dheevatsa Mudigere and Bharath Muthiah (Meta); and Yufei Ding (University of California, Santa Barbara)

Abstract: Deep learning Recommendation Model (DLRM) plays an important role in various application domains. However, existing DLRM training systems require a large number of GPUs due to the memory-intensive embedding tables. To this end, we propose EL-Rec, an efficient computing framework harnessing the Tensor-train (TT) technique to democratize the training of large-scale DLRMs with limited GPU resources. Specifically, EL-Rec optimizes TT decomposition based on key computation primitives of embedding tables and implements a high-performance compressed embedding table which is a drop-in replacement of Pytorch API. EL-Rec introduces an index reordering technique to harvest the performance gains from both local and global information of training inputs. EL-Rec also highlights a pipeline training paradigm to eliminate the communication overhead between the host memory and the training worker. Comprehensive experiments demonstrate that EL-Rec can handle the largest publicly available DLRM dataset with a single GPU and achieves 3× speedup over the state-of-the-art DLRM frameworks.

Back to Technical Papers Archive Listing