Liquid AI released LFM2.5-230M on June 25, a 230-million-parameter open-weight model built to run AI agent tasks directly on consumer hardware — phones, single-board computers, and robots — without requiring a cloud connection.
What it does
The model achieves 213 tokens per second decode speed on a Samsung Galaxy S25 Ultra and 42 tokens per second on a Raspberry Pi 5, using under 375 MB of memory. On a server-grade H100 GPU, latency stays below 50 milliseconds at low concurrency.
Despite its compact size, LFM2.5-230M outperforms several models more than twice as large on key benchmarks. On instruction following (IFEval) it scores 71.71, compared to 63.49 for Google’s Gemma 3 1B. On structured data extraction (CaseReportBench), it scores 22.51 against 0.84 for IBM’s Granite 4.0-350M.
Architecture and training
The model is built on Liquid AI’s LFM2 architecture — a non-transformer design that forgoes the attention mechanism in favour of state-space and liquid neural network formulations. That design choice makes sustained inference more efficient on constrained hardware. The model was pre-trained on 19 trillion tokens and supports a 32,000-token context window.
Post-training used a three-stage recipe: supervised fine-tuning with distillation, direct preference optimization, and multi-domain reinforcement learning.
Use cases and limits
Liquid AI positions LFM2.5-230M for data extraction pipelines, lightweight on-device agentic workflows, and tool use on edge hardware — phones, IoT sensors, automotive systems, and robotics onboard computers. The company is explicit that the model is not suitable for advanced mathematics, code generation, or long-form creative writing.
Both the base (LFM2.5-230M-Base) and post-trained versions are freely available on Hugging Face and work with llama.cpp, MLX, vLLM, SGLang, and ONNX across Apple, AMD, Qualcomm, and NVIDIA hardware.