Thursday, June 15th
02:00-3:20 PM
C-202: Academic Research Review (Academic Track)
Paper Title: FpgaNIC: A Versatile FPGA-based 100Gb SmartNIC for GPUs

Paper Abstract: Recently, network bandwidth has grown much faster than host CPU compute capacity, making it impossible for CPUs to handle packets at line speed. SmartNICs provide a way to offload packet processing. Meanwhile, GPUs have quickly become a key component for applications such as AI and HPC. They often need computing power well beyond what a single GPU server can offer. During application execution, multiple GPUs typically generate most network traffic. Commercially available multi-core SmartNICs cannot process 100Gb network traffic at line rate with their embedded CPUs, which can manage control planes only. Most FPGA-based devices have similar limitations. Our current project is a GPU-oriented SmartNIC that accelerates distributed applications on distributed GPUs. It is FpgaNIC, an FPGA-based GPU-centric versatile SmartNIC that enables direct PCIe P2P communication with local GPUs using virtual addresses. It also provides reliable 100Gb hardware network transport to communicate with remote GPUs. FpgaNIC allows a variety of SmartNICs models (e.g., direct, on-path, and off-path), each of which benefits certain types of applications. FpgaNIC opens up many research opportunities, such as accelerating distributed applications by combining GPUs and FPGAs.

Paper Author: Jie Zhang, Graduate Student, Zhejiang University

Author Bio: Jie Zhang is a doctoral student at Zhejiang University (China). His main research interests are disaggregated memory/storage systems, data center networking, and FPGAs. He has been an intern at Alibaba Cloud, where he worked on accelerating disaggregated storage with SmartNICs and exploring the design space of different kinds of SmartNICs. He has four published papers, including an article in IEEE Transactions on Computers and a presentation at the International Symposium on Computer Architecture (ISCA).