MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

Research output: Contribution to journalConference articlepeer-review

Fingerprint

Dive into the research topics of 'MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads'. Together they form a unique fingerprint.

Keyphrases

Computer Science

Engineering