PPD: Prefill-Decode Disaggregation for Multi-turn LLM Serving
Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving.
We identify that in multi-turn LLM serving, repeated KV transfers between prefill and decode nodes saturate the bandwidth. We propose PPD disaggregation, a dynamic routing system that determines when to process subsequent turns locally on decode nodes. Our approach reduces Turn 2+ time-to-first-token (TTFT) by 68% while maintaining competitive time-per-output-token (TPOT).
Authors: Zongze Li, Jingyu Liu, Zach Xu, Yineng Zhang, Tahseen Rabbani, Ce Zhang
