Atome lm Blog — Tiny LLMs, Edge AI & On-Device Inference

Milestone

Atome Now Runs on a Physical ESP32 — Measured on Real Silicon

No longer QEMU-only: the 944K model runs on a real ESP32-WROOM-32, fully offline, ~1 tok/s — with a prebuilt binary, a serial log and a one-command flash you can verify yourself. Honest scope: a proof of execution, not a benchmark win.
Edge AI

How to Run an LLM on a Microcontroller (What Actually Fits in 256 KB)

Most “tiny” LLMs need megabytes of RAM. Here is the real memory math for running a language model on an STM32, RP2040 or ESP32 — and what actually fits in 256 KB.
Comparison

Best Tiny LLM for a $2 MCU: TinyLlama vs llama2.c vs TinyMaix vs Atome

A side-by-side comparison of TinyLlama, llama2.c, TinyMaix and Atome against real microcontroller RAM and flash budgets — with an honest verdict on what fits a $2 MCU.
Explainer

What Is a Ternary LLM? BitNet 1.58-bit Weights Explained

Ternary weights let a trained language model fit in flash. Here is what 1.58-bit BitNet quantization means, why it is fast on a microcontroller, and what it costs in accuracy.
Honest results

Tiny LLM Benchmark: Where a Ternary Model Beats a Vanilla Transformer — and Where It Loses

A two-directional benchmark of a ternary tiny LLM against a vanilla FP32 transformer: a clear 60K-parameter win, a clear 944K-parameter loss, and why the reversal matters.
Engineering

Running an LLM on ESP32 and STM32 with a Bit-Exact C Engine

How a ternary language model runs on ESP32 and STM32 through a heap-free C99 engine — and why bit-exact Python-to-C parity matters for shipping edge AI you can trust.
Use cases

On-Device LLM Use Cases: 5 Things a Microcontroller AI Can Do Today

A grounded look at what a kilobyte-class on-device language model is genuinely good for — command parsing, anomaly flags, intent routing — and three things it cannot do.
Architecture

Offline, Air-Gapped LLM: Edge AI With No Cloud and No Network

Why the Atome C engine allocates nothing at runtime, talks to nothing, and what a zero-heap, air-gapped language model buys you for privacy, security and reliability.
Hardware guide

Which Microcontrollers Can Run an LLM? STM32, RP2040 & ESP32-S3 Guide

A per-chip fit guide for running a language model on a microcontroller, using Atome's measured RAM and flash budgets across STM32, RP2040 and ESP32-S3.
Guide

On-Device vs Cloud LLM: An Edge-AI Decision Guide for Embedded Products

Privacy, latency, cost and reliability trade-offs between running a language model on the device and calling a cloud API — a practical guide for embedded product teams.
Opinion

Edge-AI Checklist: Does Your “Tiny LLM” Actually Run on a Microcontroller?

A five-point test for any “runs on the edge” LLM claim — RAM fit, flash fit, heap-free, reproducible, measured not estimated — applied honestly, including to Atome.