MODEL

Kimi K2.5

modeltopic-note

Overview

Kimi K2.5 is a 1-trillion-parameter mixture-of-experts model. In May 2026 it became notable as the subject of the first documented LLM inference build using Intel Optane Persistent Memory, demonstrating local inference at frontier scale on unusual prosumer hardware.

Timeline

  • 2026-05-12-AI-Digest — An r/LocalLLaMA post (517 points) documented the first known LLM inference build using Intel Optane Persistent Memory (DIMM-form-factor non-volatile memory between DRAM and SSD on the latency curve) to run Kimi K2.5 locally at over four tokens per second. Optane has been EOL since 2022, so the parts pool is fixed and shrinking, but the build demonstrates a memory tier that materially expands the addressable working-set for very large models on commodity hardware — contributing to the MTP-and-memory-tier moment in the local inference community.

Key Developments

  1. Optane PMem Inference Build: First documented use of Intel Optane Persistent Memory for LLM inference — running a 1T-parameter MoE model at 4+ tok/s on prosumer hardware, showing that non-standard memory tiers can unlock frontier-scale local inference even with limited GPU memory.