Tuesday, November 10th
8:35-10:05
Session A-1: What the Hyperscalers Are Buying (Hyperscale Applications Track)
Organizer: Jonathan Hinkle, Principal Researcher, Lenovo

Paper Title: SSD reliability and debug at scale

Paper Abstract: There are some really exciting innovations in the world of storage, on physical memory we have QLC and Optane and ZNAN etc, from specification side NVMe has added IODeterminism, Sets, and now Zone Namespaces. With all these advancements we still have a really high amount of SSD failures in production. These failures bring with them unique challenges on how do we debug them and based on these learnings how do we design them reliable. This presentation describes some of the challenges we face at scale in Facebook around SSD debug and how do we overcome that. There are suggestions and call for actions as to how the storage industry can come together and help advance this area.

Paper Author: Vineet Parekh, Hardware Systems Engineer, Facebook

Author Bio: Vineet Parekh currently works as a Hardware Engineer at Facebook responsible for design and managing flash in the hardware fleet for Facebook's datacenters. Previously he worked at Intel in the storage group and has held various roles around design and testing of Intel's SSDs.