SC22 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Birds of a Feather Archive

Challenges in Mainstreaming Programmable Networks

Authors: Raj Kettimuthu (Argonne National Laboratory (ANL), University of Chicago), Ganesh Chennimalai Sankaran (Information Sciences Institute, University of Southern California (USC)), Joaquin Chung (University of Chicago, Argonne National Laboratory (ANL)), Joe Mambretti (Northwestern University), Paul Ruth (RENCI Fabric), Venkat Pullela (Keysight Technologies), Minlan Yu (Harvard University), Pete Beckman (Argonne National Laboratory (ANL))

Abstract: We envision scientific computing as a key beneficiary of the "deep programmable networks" paradigm, which provide advanced processing capabilities at terabit speeds. Together with high-performance compute nodes, this creates a large distributed system that pushes the performance envelope beyond the currently known bounds. Despite holding a lot of promise, this is far from becoming mainstream. Key hurdles facing programmable networks are in building and operating them. This session will benefit scientific computing, network programming, and operations communities. We intend to have a series of lightning talks followed by moderated panel discussion. Audience will interact with experts and seek their vision.

Long Description: Recent trends in programmable networks are reimagining what networks can do. Deep programmable networks cut across layers to realize a specific intent, while in-network and in-storage computing apply a small function on billions of packets. Programmable networks are performing payload-based filtering, redirection, and even donning the role of a responder. Recently, researchers are trying to expand the scope of functions ranging from logarithm functions to machine learning to solving graph algorithms on the network.

Programmable network capabilities offer promising opportunities for scientific computing. Advanced networks can also provide storage capabilities in addition to compute capabilities. For example, the recent NSF funded FABRIC test bed offers both compute and storage capabilities in the network. These new capabilities open new opportunities to overcome challenges related to available bandwidth and resource co-scheduling in remote data analysis use cases for in-storage scientific computing. We envision new infrastructures capable of “parking” data in the network while waiting for HPC resources to become available. Furthermore, while data is waiting for HPC resources, we could perform computation in the network (or storage) devices by keeping data in movement across the infrastructure.

Next, we outline the challenges in mainstreaming programmable networks:

1. Programmable networks come with new domain specific languages (DSL) such as P4 from Intel and NPL from Broadcom to name a few. These languages are hard to master and require deep hardware expertise. Associated toolchains stitch many tools used in compilers to simulation to verification making the overall learning curve steep and complex.

Software engineering workflow for popular languages such as C or Java or Python have a lot of user-friendly features. Integrated development environment and debugging features that are available for popular languages aren’t available for programmable network DSLs. Recently, automated test suites are being built but still the overall development and test experience is far from development of similar complex features on popular languages.

2. Security and Network operations stack are built for well-defined network functions and protocols. However, programmable networks support custom protocols and custom network functions. Ops teams resist custom protocols or functions because their current stacks are not agile enough. Without proper support, Ops teams cannot monitor or operate them.

On the other hand, Ops teams can now define custom functions that enhance monitoring and recovery capabilities. For instance, on encountering a specific event, SecOps can collect all information required for examining and correcting the behavior in a closed loop within a programmable network. This is critical in recovering from security attacks.

3. Access to equipment and toolchain for learning and conducting research in programmable networks are marred by inconsistent logistics. Each OEM vendor has placed several checks and balances to protect their intellectual property. A few experiences takes us back to the pre-SDN era, where researchers had to partner with an OEM vendor. OEM vendor bandwidth is scarce, this limits innovation.

It is high time, community took note of these challenges and identified ways to address them amicably. This step is critical to reemphasize SDN's vision of enabling research and innovation.


Back to Birds of a Feather Archive Listing