Wednesday, April 27th
4:20 PM-
B-103: High-Performance Computing (Edge/Data Center Applications Track)
Paper Title: Accelerating HPC Applications with SmartNICs

Paper Abstract: SmartNICs can do much more than offload network protocol analysis. They can take over a wide range of overhead tasks in a variety of applications. An example is message passing in clouds, high-performance computers, and data centers. The Message Passing Interface (MPI) standard, commonly used in HPC environments and applications, uses complex and dense collective communication patterns such as all-to-all and all-reduce. The open challenge is whether some dense non-blocking collective operations can be offloaded to SmartNIC processors to exploit overlap of computation with communication and hence accelerate the performance of HPC applications. Such offloads require significant changes in the underlying MPI library. A revised MPI library is now available. A demonstration has shown that complex scientific applications using FFT operations can utilize the revised library to increase their performance by up to 20%.

Paper Author: Donglai Dai, Chief Engineer, X-Scale Solutions

Author Bio: Donglai Dai is the Chief Engineer at X-Scale Solutions and leads the company’s R&D team. He currently focuses on developing scalable efficient communication libraries, checkpointing and restart libraries, and performance analysis tools for distributed and parallel HPC and deep learning applications on HPC systems and clouds. He is the principal investigator (PI) for several DoE SBIR grants and a member of the checkpoint restart standard steering committee. He has more than 20 years of industry experience in engineering management and development of computer systems, VLSI, IoT, and interconnection networks while working at Intel, Cray, SGI, and startups. He holds over 10 granted US patents and has published over 40 technical papers, presentations, or book chapters. He earned a PhD in computer science from The Ohio State University.