Authors: Kaihua Fu, Jiuchen Shi, and Quan Chen (Shanghai Jiao Tong University); Ningxin Zheng (Microsoft Research Asia); Wei Zhang (Shanghai Jiao Tong University); Deze Zeng (China University of Geosciences); and Minyi Guo (Shanghai Jiao Tong University)
Abstract: With collaborative DNN inference, part of queries run on their source edge device to reduce latencies. Because edges show diverse performance and network conditions, different layers should run on different devices, and queries on the datacenter show irregular structures. However, emerging schemes are not able to process such irregular queries. We propose ICE, a collaborative inference service scheme that effectively supports irregular queries. ICE comprises a query slicer, a query manager, and a lag enhancer. The query slicer maps the execution of queries based on the edges' performance and network conditions. The query manager batches irregular queries adaptively and schedules the irregular queries based on their progress. The lag enhancer reduces the QoS violation when queries run slower due to interference on the edge. Experiments show that ICE improves the supported peak load of the datacenter by 43.2% on average while guaranteeing the required 99%-ile latencies compared with state-of-the-art techniques.
Back to Technical Papers Archive Listing