Kimi K2.5

Overview

Kimi K2.5 is a 1-trillion-parameter mixture-of-experts model. In May 2026 it became notable as the subject of the first documented LLM inference build using Intel Optane Persistent Memory, demonstrating local inference at frontier scale on unusual prosumer hardware.

Timeline

2026-05-12-AI-Digest — An r/LocalLLaMA post (517 points) documented the first known LLM inference build using Intel Optane Persistent Memory (DIMM-form-factor non-volatile memory between DRAM and SSD on the latency curve) to run Kimi K2.5 locally at over four tokens per second. Optane has been EOL since 2022, so the parts pool is fixed and shrinking, but the build demonstrates a memory tier that materially expands the addressable working-set for very large models on commodity hardware — contributing to the MTP-and-memory-tier moment in the local inference community.

Key Developments

Optane PMem Inference Build: First documented use of Intel Optane Persistent Memory for LLM inference — running a 1T-parameter MoE model at 4+ tok/s on prosumer hardware, showing that non-standard memory tiers can unlock frontier-scale local inference even with limited GPU memory.

2026-07-16-AI-Digest — KnowAct-GUIClaw paper (arXiv:2607.12625, ▲29) uses the Kimi-2.6 open-weights build as the base model for a Know-Route-Act-Reflect GUI-agent framework that hits 64.1% on MobileWorld, beating Seed-2.0-Pro and GPT-5.5. Reference-substrate signal rather than fresh Kimi first-party news — the Kimi lineage is now the open-weights anchor for a personal-GUI-assistant benchmark leader against closed models. Corpus framing: open-weights GUI-agent stack overtaking closed models on a long-horizon benchmark reinforces that memory + skill libraries — not raw base models — are becoming the differentiator.

Kimi-2.6 as Base Substrate for KnowAct-GUIClaw MobileWorld Leader (July 16, 2026): The Know-Route-Act-Reflect GUI-agent framework built on Kimi-2.6 hits 64.1% on MobileWorld, beating Seed-2.0-Pro and GPT-5.5. Kimi lineage is the open-weights anchor for a closed-model-beating benchmark on a long-horizon GUI task — reinforces the frame that memory + skill libraries are becoming the differentiator over raw base models.