KV Cache Calculator
Add Custom Model
HuggingFace Model Path
Display Name
Add
Data Type:
BF16
FP8
KV Cache Bytes per Token
KV Cache Size vs Sequence Length
Max Requests per GPU
GPU Memory:
80 GB
Sequence Length:
32K
Model Details
Model
Type
Layers
KV Heads
Head Dim
BF16 B/tok
FP8 B/tok
128K BF16
128K FP8
MLA
Multi-head Latent Attention |
MHA
Multi-Head Attention |
SWA/Hybrid-SWA
Sliding Window