I do see a LOT of errors, yeah. But from my observations it's very good at self-correcting and fixing its mistakes.
This I see too (it's better trained at this than Gemma) and I do think the self-correction is working more often than not (though what a waste of compute!) grep -i wait on thinking blocks is still "fun" too.
But this project is quite simple - not much specific technical knowledge needed.
My main use case nowadays is feeding LLMs file diffs and strace logs, mostly of third party code from npm/cargo/pubdev/mvn/pypi, to help me make security assessments. Thanks to LLMs this now only takes me a day a week instead of 4 last year, with about 5x the workload and an 100x threat increase.
Did you try Pi Agent?
Nope. The problem I'm running into is that I am swamped, but I'll try to find a moment to test your setup.
This I see too (it's better trained at this than Gemma) and I do think the self-correction is working more often than not (though what a waste of compute!)
grep -i waiton thinking blocks is still "fun" too.My main use case nowadays is feeding LLMs file diffs and
stracelogs, mostly of third party code from npm/cargo/pubdev/mvn/pypi, to help me make security assessments. Thanks to LLMs this now only takes me a day a week instead of 4 last year, with about 5x the workload and an 100x threat increase.Nope. The problem I'm running into is that I am swamped, but I'll try to find a moment to test your setup.