pull down to refresh

I don't have a use for this right now, but I'm a big fan of sqlite[1] and tiny monoliths that can be pulled apart if need be.

  1. mostly in theory because I've never really used it outside of toy projects

147 sats \ 6 replies \ @optimism 2h

I've been using Chroma for this.

reply
147 sats \ 5 replies \ @k00b OP 1h

Ah I hadn't heard of it which shows how unfamiliar am I with these tools.

reply
147 sats \ 4 replies \ @optimism 1h

It's probably not as fast as the one you linked, but the vector db is not the bottleneck for me as most of the text embedding models are super slow for me and I'm not entirely sure why yet.

This morning I used Chroma for embedding audio. With a cheap, old tokenizer, but just to see if that actually works (it does, because apparently maffs don't give a shit if a float comes from text, pictures, audio)

reply
147 sats \ 3 replies \ @k00b OP 57m

afaik if you're running the embedding model on a GPU, or quantized on a CPU, it shouldn't be super slow. But I also haven't run much of this stuff locally yet.

reply
147 sats \ 2 replies \ @optimism 55m

I've been running it on Apple Metal - torch says it is using the NPU, but the Apple part is probably why it is such a mess.

reply
147 sats \ 1 reply \ @k00b OP 21m

We were only scratching the surface when I was in college, but everyone imagined inference would be much cheaper/more efficient than it ended up being.

If bigger=smarter forever, edge inference will always be relatively slow/dumb.

reply
147 sats \ 0 replies \ @optimism 9m

Like with all things, that extrapolation of the upslope fails to consider that fun isn't infinite (I hate this fact of life.) So there's a time when bigger=smarter, and there is a time when the diminishing returns on how much smarter you get for your bigger, and at that equilibrium, suddenly smarter=smarter.

We'll get there.

reply