Which Microcontrollers Can Run an LLM? STM32, RP2040 & ESP32-S3 Guide

Whether a model fits a microcontroller is a memory question with a yes-or-no answer, not a matter of opinion. Here is the per-chip fit guide across five common parts, using Atome's measured configurations from a real Cortex-M3 build under QEMU. Use it to pick the cheapest part that runs the configuration your task needs.

The chips

MCU	SRAM	Flash	~Price	Largest Atome config that fits
STM32F103 (Blue Pill)	20 KB	128 KB	$2–4	nano / classifier (~14 KB RAM)
RP2040 (Pico)	264 KB	2 MB	$4	tinystories 64-dim (~104 KB RAM)
STM32F411 (Nucleo)	128 KB	512 KB	$15	64-dim (128-dim is RAM-bound)
STM32F7	512 KB	2 MB	$15–30	prod_1m 256-dim (~412 KB RAM)
ESP32-S3	512 KB	4 MB	$5–10	prod_1m 256-dim (~412 KB RAM)

How to read it

RAM is almost always the binding constraint, not flash. A 20 KB-SRAM chip such as the Blue Pill runs the small classifier configurations; you need an RP2040-class 264 KB part for the 64-dimension story model; and the 944K “prod” configuration needs a 512 KB part like an STM32F7 or an ESP32-S3. Flash is comfortable on all of these, because the packed ternary weights plus the engine are only tens to a few hundred kilobytes. When a configuration does not fit, it is the SRAM that runs out first.

Choosing a part for your task

Wake-word or single-command routing, lowest cost: a 20 KB STM32F103 running the classifier config.
Richer narrow-domain text on a hobby board: an RP2040 (Pico) at 264 KB.
The full 944K model with headroom: an ESP32-S3 or STM32F7 at 512 KB.

A caveat on the numbers

These are QEMU measurements of a Cortex-M3 build, not bench results on each specific part. Treat them as accurate memory accounting — flash and peak RAM per configuration — rather than as silicon throughput. We have not yet published on-device tokens-per-second or power figures, and we will not present QEMU numbers as if they were silicon. The exact per-config figures are in the repository's RAM_TABLE.md, regenerable with the bundled measurement script.

A sizing workflow you can follow

Picking a chip does not have to be guesswork. Start from the task and work down to the part in four steps. First, decide the breadth: is this a few-class classifier, a narrow continuation, or something wider? Second, choose the smallest Atome configuration that clears your accuracy bar — bigger configurations cost RAM, so do not over-provision. Third, read the peak RAM and flash for that configuration straight from the measured table. Fourth, pick the cheapest part whose SRAM exceeds the peak RAM with a little headroom for your own application code. Because RAM is the binding constraint, the SRAM column is the one that decides the chip; flash is almost always comfortable.

Headroom and the rest of your firmware

One mistake worth avoiding is sizing the chip to the model and forgetting that your application also needs RAM. The peak-RAM figures in the table are for the inference engine; your firmware has its own stack, buffers, drivers and state on top. A safe rule is to leave a comfortable margin between the engine's peak RAM and the part's total SRAM — enough for your application's working set plus a safety buffer. That is why the table flags the 128 KB STM32F411 as a good fit for the 64-dimension model but tight for the 128-dimension one: on paper the larger model might squeeze in, but once you add real application code the headroom disappears. Choosing one size down is usually the cheaper, calmer decision.

Beyond the five chips in the table

The five parts in the table are common reference points, but the underlying rule generalizes to any microcontroller you might consider. Because the engine is portable C99 with no operating-system or accelerator dependency, the only questions that decide compatibility are whether the part's SRAM exceeds the configuration's peak RAM with headroom for your application, and whether its flash holds the packed weights plus the engine. A Nordic nRF52840, an STM32L4 in a low-power design, or a larger ESP32 variant can all be evaluated with the same two-column check: SRAM against peak RAM, flash against packed size. That portability is deliberate — the engine makes as few assumptions about the host as possible — so the table is a worked set of examples rather than a closed list, and you can slot a new part into the same reasoning without guessing.

Bottom line

Whether a microcontroller can run a language model is a memory question with a yes-or-no answer: does the SRAM exceed the configuration's peak RAM with headroom, and does the flash hold the packed weights plus the engine? On that basis a $2 STM32F103 runs the small classifier configs, an RP2040 runs the 64-dimension model, and a 512 KB ESP32-S3 or STM32F7 runs the full 944K model. The table is worked examples, not a closed list — the same two-column check generalizes to any part. Just remember the figures are QEMU Cortex-M3 measurements, accurate memory accounting rather than silicon throughput.

Frequently asked questions

What is the cheapest microcontroller that can run a language model?

An STM32F103 “Blue Pill” at about $2–4 runs Atome's small classifier configuration in roughly 14 KB of RAM. For richer text you want an RP2040 (about $4) or larger.

Can a Raspberry Pi Pico (RP2040) run an LLM?

Yes — its 264 KB of SRAM fits Atome's 64-dimension model at about 104 KB of RAM. It cannot fit the 944K “prod” config, which needs a 512 KB part.

← All posts Source & data on GitHub