DeepSeek-R1, the flagship reasoning model from Chinese lab DeepSeek, hallucinates at 14.3% according to Vectara’s HHEM 2.1 benchmark. That is nearly four times higher than its non-reasoning predecessor DeepSeek-V3, which scored 3.9%.
The gap raises...