Skip to main navigation
Skip to search
Skip to main content
Illinois Experts Home
LOGIN & Help
Link opens in a new tab
Search content at Illinois Experts
Home
Profiles
Research units
Research & Scholarship
Datasets
Honors
Press/Media
Activities
MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
, Yuhong Li
, Zhengyang Geng
, Hongwu Peng
, Jason D. Lee
,
Deming Chen
, Tri Dao
Electrical and Computer Engineering
Siebel School of Computing and Data Science
Coordinated Science Lab
Information Trust Institute
Grainger College of Engineering
Research output
:
Contribution to journal
›
Conference article
›
peer-review
Overview
Fingerprint
Fingerprint
Dive into the research topics of 'MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads'. Together they form a unique fingerprint.
Sort by
Weight
Alphabetically
Keyphrases
Model Inference
100%
Medusa
100%
Acceleration Framework
100%
Large Language Models
100%
Inference Acceleration
100%
Multistage Decoding
100%
Two-level
7%
Tree-based
7%
Prediction Accuracy
7%
Full Model
7%
Training Data
7%
Parallel Processing
7%
Model Capabilities
7%
Multiple Candidates
7%
Caching
7%
Token
7%
Attention Mechanism
7%
Acceptance Rate
7%
Training Procedure
7%
Specialist Training
7%
High Bandwidth Memory
7%
Sequential Computation
7%
Tuning Procedure
7%
Draft Model
7%
Self-distillation
7%
Computer Science
Large Language Model
100%
Training Data
20%
Use Case
20%
Parallel Processing
20%
Prediction Accuracy
20%
Bandwidth Memory
20%
Acceptance Rate
20%
Attention (Machine Learning)
20%
Engineering
Model Parameter
100%