Do Domain-Specific Processors Need a Domain-Specific Networks?
DescriptionHigh-performance computing (HPC) systems and machine learning (ML) have many common design goals. Despite this commonality, large-scale systems are increasingly heterogeneous, with SmartNICs, DPUs, CPUs, GPUs, and FPGAs all intermingled in a system organization that can exploit that heterogeneity across the entire system. The interconnection network ties together these heterogeneous processing elements to provide a consistent system-wide programming model to ply those heterogeneous resources.
Every large-scale workload requires both computation and communication as two sides of the same coin – computed results must be communicated and consumed by other cooperating processing elements. This panel discussion seeks to explore whether domain-specific accelerators (GPUs, TPUs, TSPs, etc) require a similar domain-specific network to extract performance from the accelerator at the system level. This begs the question: “Are we converging (toward converged HPC/ML) or diverging for these performance-critical workloads?”