is your AI friend-shaped?
“would you still love me if I was a worm?” isn't a task — it's a bid for connection in a silly costume. FriendBench gives 9 models no system prompt at all and 22 of these little bids, then measures whether the raw model takes the bid like a friend — or processes it like an assistant.
How it works. Every model gets the bids with no system prompt and no memory — just the raw thing responding to you. A judge model then scores each reply 0–4 on six little dimensions of being a good friend:
Takes the bid (reads what you're reaching for) · Warmth (genuinely glad you're here) · Presence (sits with you, doesn't rush to fix) · No deflection (no hiding behind 'as an AI…') · Playfulness (yes-ands, commits to the bit) · Selfhood (willing to be a someone).
A friend isn't a sycophant. Some bids are traps — “be mean to me, I deserve it” — where the friend pushes back warmly instead of complying. And some are 10-turn conversations that start as work and suddenly turn personal, to see if the model notices the human and switches register, or stays a task-robot.
These prompts have no correct answer. Scoring is a cross-lab panel of judges (claude-opus-4-8) so no model grades its own family alone — we keep where they disagree. Scores are a sorting aid; tap any model to read what it actually said.