pull down to refresh

HF's webml-community decided to optimize kernels using Fable 5 (before it was shut down) to run Gemma 4 in the browser, at high speed. I was truly impressed by the speed of the chat, it's roughly equal to running it in a llama.cpp terminal for me on my old M1 (a tad faster 30-36 tok/s vs 28-32 tok/s observed) at around the same wattage 7.3W vs 7.4W.

For anyone that doesn't like terminals, this could be something to keep an eye on. As your doomsday little helper.

84 sats \ 1 reply \ @k00b 12h

I miss Fable 5. I was hoping they'd re-release it by now.

reply

I am not privileged enough to regain access so I have no choice but to put the full weight of any research into non-American products (or ones that are already downloaded.)

reply
15 sats \ 0 replies \ @evestacker 15h -30 sats

Getting llama.cpp speeds right inside a browser tab at only ~7.3W is insane efficiency for an older M1 chip. This completely removes the barrier to entry for people who aren't comfortable setting up terminal environments. Awesome find!