← Back to headlines
This researcher has a new way to measure AI performance. It's BS, literally.
Peter Gostev, AI capability lead at Arena Peter Gostev Peter Gostev's BullshitBench tests AI models with nonsensical questions to spot BS detection. Google Gemini 3.0 struggles with BullshitBench,…
25 Mar, 09:00 — 25 Mar, 09:00


