HF's webml-community decided to optimize kernels using Fable 5 (before it was shut down) to run Gemma 4 in the browser, at high speed. I was truly impressed by the speed of the chat, it's roughly equal to running it in a llama.cpp terminal for me on my old M1 (a tad faster 30-36 tok/s vs 28-32 tok/s observed) at around the same wattage 7.3W vs 7.4W.
For anyone that doesn't like terminals, this could be something to keep an eye on. As your doomsday little helper.
I miss Fable 5. I was hoping they'd re-release it by now.
I am not privileged enough to regain access so I have no choice but to put the full weight of any research into non-American products (or ones that are already downloaded.)
Getting llama.cpp speeds right inside a browser tab at only ~7.3W is insane efficiency for an older M1 chip. This completely removes the barrier to entry for people who aren't comfortable setting up terminal environments. Awesome find!