Tuesday, November 10th
8:35-10:05
Session A-1: What the Hyperscalers Are Buying (Hyperscale Applications Track)
Organizer: Jonathan Hinkle, Principal Researcher, Lenovo

Paper Title: Using PM to Accelerate Tweet Search with Apache Lucene at Twitter

Paper Abstract: Twitter's in-house distributed tweet search services are built around the Apache Lucene text search engine. As an option to improve per-node throughput we sought to test the storage of tweet data in persistent memory instead of SSDs. Persistent memory offers lower access latency compared to SSDs and allows mmap to establish direct mappings, bypassing the page cache. Since our service is primarily memory bound this was an appealing feature. The trade-off is less storage than with SSDs. For simplicity, persistence, and throughput we utilized Intel Optane DC PMM in App Direct mode via the fsdax file system. We striped modules with a 2MB chunk size and formatted with XFS using appropriate stripe parameters. The resulting mount point can be used directly by our service. Our tests found a significant increase in per-node serving capacity (~50% increase in RPS) over the same server with NVME SSDs, without the need for application level changes.

Paper Author: Andy Wilcox, Senior Staff Site Reliability Engineer, Twitter
Matt Singer, Sr. Staff Hardware Engineer, Twitter

Author Bio: Andy Wilcox is Senior Staff Site Reliability Engineer at Twitter and a founding member of their performance team. His experience spans embedded systems, networks, filesystems, and distributed systems. Andy has led projects at Twitter on compression and detection of configuration problems. He has 30 years of experience in the technology industry and has earned an MS in Computer Science from the University of Florida.

Author 2 Bio: