@anon
sign up
@anon
sign up
pull down to refresh
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
arxiv.org/abs/2508.18106
32 sats
\
0 comments
\
@optimism
1 Sep
AI
related
MCP-Bench: Benchmarking Tool-Using LLM Agents
arxiv.org/abs/2508.20453
239 sats
\
0 comments
\
@optimism
30 Aug
AI
Are LLMs Racist?
461 sats
\
11 comments
\
@Tony
23 Oct
AI
Claude 3.5 Sonnet
www.anthropic.com/news/claude-3-5-sonnet
411 sats
\
0 comments
\
@k00b
21 Jun 2024
tech
LLM Rankings: programming | OpenRouter
openrouter.ai/rankings/programming
96 sats
\
0 comments
\
@m0wer
28 May
tech
AI is actually bad at math, ORCA shows
www.theregister.com/2025/11/17/ai_bad_math_orca/
167 sats
\
4 comments
\
@0xbitcoiner
18 Nov
AI
Gemini 3 and Antigravit : Why Google's latest AI releases are a big deal
fortune.com/2025/11/19/google-gemini-3-antigravity-ai-explained/?utm_source=flipboard&utm_content=fortune/magazine/Personal+finance
131 sats
\
1 comment
\
@DrBrader99
19 Nov
AI
Qwen3-235B-A22B-2507
xcancel.com/Alibaba_Qwen/status/1947344511988076547
218 sats
\
0 comments
\
@m0wer
24 Jul
AI
AI agents find $4.6M in blockchain smart contract exploits
red.anthropic.com/2025/smart-contracts/
259 sats
\
2 comments
\
@0xbitcoiner
2 Dec
AI
"Benchwashing" - how do you defend against this?
1648 sats
\
10 comments
\
@optimism
9 Aug
AskSN
Alibaba has released its flagship Qwen3-Max model with a trillion parameters
chat.qwen.ai/
167 sats
\
0 comments
\
@lunin
25 Sep
AI
Introducing Claude Opus 4.5
www.anthropic.com/news/claude-opus-4-5
418 sats
\
0 comments
\
@lunin
25 Nov
AI
GDPval: Measuring the performance of our models on real-world tasks - OpenAI
openai.com/index/gdpval/
358 sats
\
8 comments
\
@Scoresby
2 Oct
AI
Boring is good
jenson.org/boring
231 sats
\
0 comments
\
@deSign_r
9 Oct
Design
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
arxiv.org/abs/2509.03867
306 sats
\
0 comments
\
@optimism
7 Sep
AI
Wairdle
1133 sats
\
4 comments
\
@crrdlx
9 Aug
AI
Opti's Claude 4.5 Sonnet "vibe coding" report
1125 sats
\
13 comments
\
@optimism
5 Oct
AI
My lived experience writing with ChatGPT
567 sats
\
10 comments
\
@realBitcoinDog
15 Apr
BooksAndArticles
The flagship model, Qwen3-Max-Preview, has been released
100 sats
\
0 comments
\
@lunin
5 Sep
AI
Vals AI — Finance Agent Benchmark
www.vals.ai/benchmarks/finance_agent-04-22-2025?utm_campaign=wp_the_technology_202&utm_medium=email&utm_source=newsletter
54 sats
\
3 comments
\
@BlokchainB
24 Apr
AI
Adversarial Confusion Attacks: Disrupting Multimodal LLMs - Jakub Hoscilowicz
www.researchgate.net/publication/396235412_Adversarial_Confusion_Attacks_Disrupting_Multimodal_LLMs
146 sats
\
0 comments
\
@Scoresby
6 Oct
AI
pylint MCP provider
1428 sats
\
6 comments
\
@optimism
4 Jun
builders
more