Wednesday, June 14th
04:20-5:40 PM
A-103: Network Acceleration 2 - System Applications (Networking Track)
Paper Title: Using a SmartNIC to Recover Dropped Packets

Paper Abstract: Distributed datacenter applications, when run over today's best practice networks, run into problems whenever the network drops a packet or session. At the very least, the application goes through timeouts and tail latency rises. At worst, data can be silently corrupted without any warning. The most common outcome is a "gray failure" in which the application slows significantly as it waits out a timeout storm or simply tries to resynchronize state. One approach to recovering from drops involves custom programming an FPGA-based NIC to connect adjacent servers tiled in a plane, rather than connecting each one to a switch as is customary. The result is "determinism on demand" at very low latency to each software endpoint in the distributed application, regardless of whether the message it just sent reached its recipient. Failures in the mesh are routed around at microsecond speeds without losing "information" in flight. The revised connection method makes networks much more reliable at low cost.

Paper Author: Paul Borrill, CEO, Daedaelus

Author Bio: Paul Borrill is founder and Chief Product Officer of Daedaelus and a leading industry expert on resilient network and storage infrastructures. He has been a major contributor to modern infrastructure development at such technology-leading companies and organizations as NASA, Apple, Sun Microsystems, and Quantum. Paul was cofounder of the Hot Interconnects Symposium and founding chair of the Storage Networking Industry Association (SNIA). Paul served as VP of Technical Activities and VP of Standards for the IEEE Computer Society. Paul earned a PhD in Physics from University College London. He has presented at many conferences on distributed systems and holds nine patents in that area.