The claim, stated with its context, because a number without one is noise. On LongMemEval-S, graded by the official per-type judge (deepseek-v4-pro), simba answers 0.823 with a deepseek-v4-flash answerer. The strongest comparable system I could re-run on the identical judge, hebb-mind, scores 0.7927 (its own raw outputs …
No comments yet. Log in to reply on the Fediverse. Comments will appear here.