FlashMemory-DeepSeek-V4

(arxiv.org)

1 points | by GaggiX 2 hours ago

1 comments

  • GaggiX 2 hours ago
    Not to be confused with Flash Attention.

    What's novel here is the extremely small KV cache memory usage per long context windows, like 0.77GB with 512K, a 90% memory usage reduction compare to the already really small KV cache memory usage of Deepseek V4 Flash.