Atome Now Runs on a Physical ESP32 — Measured on Real Silicon

For months our honest line was the same: the Atome inference engine passed bit-exact parity in Python and under QEMU, but we had not yet flashed a physical board. We said, in writing, that when we did we would publish it with the same candor as everything else. This is that post. The 944K Atome model now runs on a real, $5 ESP32-WROOM-32 — generating coherent text, fully offline, with nothing but the chip and a USB cable.

What actually ran

The board is a classic ESP32-WROOM-32 (the dual-core ESP32-D0WD-V3, 4 MB flash, no PSRAM). It boots, loads the 944K checkpoint straight from flash, and generates on the chip with no network and no cloud:

config : d=256 layers=8 head=64 seq=24  state=159 KB   (cpu 240 MHz, flash 80 MHz)
Once     ->  upon a time, there
The dog  ->  was so excited.
A girl   ->  was so happy to h
average  :  1.0 tok/s   (953 ms/token)

Coherent, on-distribution TinyStories text, produced entirely on a microcontroller. The full boot log — chip ID, clocks, memory, generations — is committed in the repository, and the exact binary that produced it is in the release.

Reproduce it yourself in two minutes

A claim about hardware is only worth what someone else can check. Grab the prebuilt merged image from the release, verify it against the published SHA256SUMS, flash it with one command, and watch:

pip install esptool pyserial
esptool.py --chip esp32 --port /dev/ttyUSB0 --baud 460800 write_flash 0x0 atome_esp32_merged.bin
python3 -m serial.tools.miniterm /dev/ttyUSB0 115200   # press the board's EN button

No ESP-IDF, no build toolchain needed to verify. The firmware embeds the repository's own C99 engine — the same one with bit-exact Python parity — so what you flash is what we measured.

No PSRAM? A short window instead of a smaller brain

A bare ESP32 reports 369 KB of free RAM, but its largest contiguous block is only ~168 KB — classic ESP32 memory fragmentation. The 944K model's quality is independent of context length; only its RAM scales with it. So instead of swapping in a dumber model, we shrank the context window to 24 tokens, which brings the working state to 159 KB and fits. You get the full-quality model with shorter completions. A PSRAM board (ESP32-WROVER or ESP32-S3-R8) runs the full-length configuration; that is the next data point.

What this is — and what it is not

This is a proof of execution: the model runs on real silicon, reproducibly, and you can check it. It is not a benchmark win and not a moat. About 1 token per second is in known territory — the throughput is flash-bound, since roughly 270 KB of ternary weights are streamed from SPI flash for every token. We have not run a same-chip, same-budget head-to-head against another microcontroller LM, and the favourable parts of Atome's story remain regime-dependent (they hold at the ~60K engine-default size, not at 944K). Calling this "the model runs on a chip" is true; calling it "fastest" or "best" would not be, and we are not.

Why it still matters

It closes the one caveat that had hung over every number on this site: "QEMU only, no silicon." That sentence is now retired. A language model that lived as a simulation and a set of tests is, today, a thing you can hold, power from a phone charger, and watch write words with no internet attached. From here the work is concrete and measurable: the full-context configuration on a PSRAM board, energy per token on real hardware, and an honest same-chip comparison. We will publish each one the way we published this — with the binary, the log, and the caveats in the same breath.

Frequently asked questions

Does Atome run on a real ESP32 now, not just QEMU?

Yes. The 944K checkpoint runs on a physical ESP32-WROOM-32 (no PSRAM), generating coherent text fully offline at about 1 token per second. A prebuilt binary, the serial log, and a one-command flash are in the GitHub release so anyone can reproduce it.

How fast is Atome on a real ESP32?

About 1.0 token per second (~953 ms/token) at 240 MHz with 80 MHz flash. It is flash-bound — roughly 270 KB of weights are read per token. This is a proof-of-execution number, not a benchmark win.

← All posts Get the binary & serial log → Source on GitHub