pull down to refresh

afaik if you're running the embedding model on a GPU, or quantized on a CPU, it shouldn't be super slow. But I also haven't run much of this stuff locally yet.

147 sats \ 2 replies \ @optimism 2h

I've been running it on Apple Metal - torch says it is using the NPU, but the Apple part is probably why it is such a mess.

reply
147 sats \ 1 reply \ @k00b OP 2h

We were only scratching the surface when I was in college, but everyone imagined inference would be much cheaper/more efficient than it ended up being.

If bigger=smarter forever, edge inference will always be relatively slow/dumb.

reply
147 sats \ 0 replies \ @optimism 1h

Like with all things, that extrapolation of the upslope fails to consider that fun isn't infinite (I hate this fact of life.) So there's a time when bigger=smarter, and there is a time when the diminishing returns on how much smarter you get for your bigger, and at that equilibrium, suddenly smarter=smarter.

We'll get there.

reply