PowerInfer with AMD

Multi-platform Support and Optimization of PowerInfer Inference Engine.

This project builds upon the PowerInfer project, aiming to address its current lack of support for AMD series graphics cards and optimize its performance on AMD RDNA series GPUs.

Through a series of optimizations on PowerInfer under the AMD toolchain, including operator optimization, IO optimization, and algorithm optimization, the AMD Radeon RX 7900 XTX GPU ultimately achieves a token generation rate from 24.52 tokens per second to 101.81 tokens per second on the Q4 quantized model.

View this project on GitHub : https://github.com/freelulul/PowerInfer_AMD