Thursday, June 15th
02:00-3:20 PM
A-202: Application Acceleration 1 - Special Workloads (Application Acceleration Track)
Paper Title: Accelerating HPC and Deep Learning (DL) Applications with SmartNICs

Paper Abstract: SmartNICs can avoid processor overload by offloading tasks from CPUs in many applications. For example, HPC and DL applications often use the Message Passing Interface (MPI) standard, which employs complex communication patterns such as all-to-all and all-reduce. The new DPUs and IPUs can offload dense non-blocking collective operations and parallel I/O operations used for accessing data files and checkpoint saving, and traditional SmartNICs can offload restart to exploit the overlap of computation with communication and I/O operations. This approach improves performance, but it requires re-designs of the underlying MPI and DL libraries. An example involving three such offloaded libraries shows that complex scientific applications and AI applications can increase performance by up to 24% and 17%, respectively.

Paper Author: Donglai Dai, Chief Engineer, X-ScaleSolutions

Author Bio: Donglai Dai is Chief Engineer at X-ScaleSolutions, where he leads the company’s R&D team. His current work focuses on developing and enhancing communication libraries, checkpointing and restart libraries, performance analysis tools for distributed and parallel HPC and deep learning applications on HPC systems and clouds. He is the principal investigator for several DOE SBIR grants and a member of the checkpoint restart standard steering committee. Before joining X-Scale Solutions, he worked at Intel, Cray, and SGI. He holds over 10 US patents and has published over 40 technical papers, presentations, or book chapters. He earned a PhD in computer science from Ohio State University.