Same issue is observable in LLM coding... `yOu'Re AbSoLuTeLy RiGhT!` syndrome, even when you're bs-ing the bot.

optimism

"We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks."  Hmmm...not ready for prime time use.

AI Agents vs Cybersecurity Professionals in Real-World Penetration Testing

"We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks."  Hmmm...not ready for prime time use.