PPD: Prefill-Decode Disaggregation for Multi-turn LLM Serving

Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving.

We identify that in multi-turn LLM serving, repeated KV transfers between prefill and decode nodes saturate the bandwidth. We propose PPD disaggregation, a dynamic routing system that determines when to process subsequent turns locally on decode nodes. Our approach reduces Turn 2+ time-to-first-token (TTFT) by 68% while maintaining competitive time-per-output-token (TPOT).

Authors: Zongze Li, Jingyu Liu, Zach Xu, Yineng Zhang, Tahseen Rabbani, Ce Zhang

Paper: https://arxiv.org/abs/2603.13358

Code: https://github.com/freelulul/vllm-ppd