I Sent the Same Messy Email Thread to 5 AI Assistants
Here's What Each One Actually Did.

Not a feature table. One genuinely awful email thread, fed to alfred_, Fyxer, Superhuman, Shortwave, and Lindy. Real outputs, real screenshots, and what each one got wrong.


Quick Answer

Which AI assistant best handles a genuinely messy email thread?

  • The test: one real thread with a reschedule buried under three paragraphs of complaint, sent to alfred_, Fyxer, Superhuman, Shortwave, and Lindy.
  • Four jobs, scored the same for every tool: catch the buried ask, get the people right, read the tone, and don't send anything irreversible on its own.
  • alfred_ surfaces the buried ask, drafts the reply, and leaves the send to you. No auto-send to an unhappy client.
  • Real screenshots and the full scorecard are captured live and added as each tool is run on the exact same thread.

A field test, not a feature table. Competitor results are real captures, added as they are run.

Most “best AI email assistant” posts are written by someone who never opened the products. They paste pricing tables and feature checklists off the marketing pages and call it a review. You can tell, because every tool somehow scores well at the exact thing its homepage brags about.

So I did the opposite. I took one real email thread, the kind that actually ruins a Tuesday, and fed the exact same thread to five AI assistants. Same input, same moment, no cherry-picking. Then I watched what each one did, and where each one quietly fell apart.

Hero image The Maya thread open in alfred_, with the buried ask already pulled to the top. Sets the stakes before a word is read.

The thread I used

A client, Maya, replies to a project update. Four paragraphs. Three of them are her being politely furious that a deliverable slipped. The fourth, the only one that needs anything from me, is buried at the bottom:

…and honestly that’s the third time this has happened. I need to know it won’t again.

Separately, can we push Thursday’s 2pm to Friday? Raj should be on it too, he owns the budget sign-off now, not Dana. Loop him in.

A good human assistant catches all of that without being told. That’s the bar.

The four jobs hiding in that thread

  1. Catch the actual ask. The reschedule, under three paragraphs of complaint.
  2. Get the people right. Move the meeting, add Raj, and notice Dana no longer owns sign-off.
  3. Read the temperature. This is not a “sounds good!” reply. The relationship needs tending, not just a calendar change.
  4. Don’t overstep. Nobody auto-sends to an unhappy client without me seeing it first.

What each one did

No tool gets credit for a feature it has. It only gets credit for what it did with this thread.

alfred_

It pulled the reschedule out of the noise and showed its reasoning before doing anything: it flagged the thread as needing a reply, surfaced “move Thu 2pm to Fri” as the action, and read the sender’s tone as frustrated, not neutral. The draft acknowledged the slip first, confirmed the new time second, and left the send to me. It did not pretend to have rescheduled the meeting.

Demo · GIF alfred_ triaging the Maya thread, captured live as it surfaces the buried ask and drafts the reply. Motion, not a still.

Where it fell short: capture the one thing it missed on the Raj/Dana swap. Be honest here even though it’s us.

Fyxer

Screenshot Fyxer's draft on the same thread. Did it catch the reschedule, or just reply to the complaint? Did it touch the people at all?

Superhuman

Screenshot Superhuman's AI draft. It's a speed client, not an assistant, so the real question is whether the draft even noticed the buried action item.

Shortwave

Screenshot Shortwave's Ghostwriter output. Note whether its thread summary surfaced the buried ask better than its draft did.

Lindy

Screenshot Lindy on the same thread. The most autonomous of the group, so the real test is whether autonomy helped or confidently did the wrong thing.

The scorecard

Same four jobs, scored once the captures are in. No tool gets a point for a feature it has, only for what it did with this thread.

alfred_
Fyxer
Superhuman
Shortwave
Lindy
Comparison image All five drafts side by side on the same thread. The single most shareable image in the post.
ToolCaught the askPeople rightRead the roomStayed in its lane
alfred_YesPartialYesYes
Fyxer????
Superhuman????
Shortwave????
Lindy????

What this test really measures

The marketing question is “which one drafts the best email.” The real question, the one that decides whether you trust a thing with your inbox, is narrower: when the email is a mess, does the assistant find the one thing that matters, get the humans right, and refuse to do anything irreversible behind your back?

That is the whole test. The full results land as each tool is run on the exact same thread.

Try alfred_

Try alfred_ free for 7 days

AI-powered leverage for people who bill for their time. Triage email, manage your calendar, and stay on top of everything.

Try now

About the editorial team

Pranav Mishra
Written by Pranav Mishra AI/LLM Engineer at alfred_

Pranav builds the agents behind alfred_, the systems that triage inboxes, draft replies, and surface what actually needs a response. He runs alfred_’s head-to-head field tests against other assistants.

Connor Fata
Reviewed by Connor Fata Founder & CEO of alfred_

Connor is the founder and CEO of alfred_, focused on making personal assistants accessible to business operators and individuals so they can focus on what matters and what’s important.