Blog
Practical, honest writing on tiny LLMs, ternary models, and on-device AI — every claim backed by the open repository.
Milestone
No longer QEMU-only: the 944K model runs on a real ESP32-WROOM-32, fully offline, ~1 tok/s — with a prebuilt binary, a serial log and a one-command flash you can verify yourself. Honest scope: a proof of execution, not a benchmark win.
Edge AI
Most “tiny” LLMs need megabytes of RAM. Here is the real memory math for running a language model on an STM32, RP2040 or ESP32 — and what actually fits in 256 KB.
Comparison
A side-by-side comparison of TinyLlama, llama2.c, TinyMaix and Atome against real microcontroller RAM and flash budgets — with an honest verdict on what fits a $2 MCU.
Explainer
Ternary weights let a trained language model fit in flash. Here is what 1.58-bit BitNet quantization means, why it is fast on a microcontroller, and what it costs in accuracy.
Honest results
A two-directional benchmark of a ternary tiny LLM against a vanilla FP32 transformer: a clear 60K-parameter win, a clear 944K-parameter loss, and why the reversal matters.
Engineering
How a ternary language model runs on ESP32 and STM32 through a heap-free C99 engine — and why bit-exact Python-to-C parity matters for shipping edge AI you can trust.
Use cases
A grounded look at what a kilobyte-class on-device language model is genuinely good for — command parsing, anomaly flags, intent routing — and three things it cannot do.
Architecture
Why the Atome C engine allocates nothing at runtime, talks to nothing, and what a zero-heap, air-gapped language model buys you for privacy, security and reliability.
Hardware guide
A per-chip fit guide for running a language model on a microcontroller, using Atome's measured RAM and flash budgets across STM32, RP2040 and ESP32-S3.
Guide
Privacy, latency, cost and reliability trade-offs between running a language model on the device and calling a cloud API — a practical guide for embedded product teams.
Opinion
A five-point test for any “runs on the edge” LLM claim — RAM fit, flash fit, heap-free, reproducible, measured not estimated — applied honestly, including to Atome.