Do AI meeting assistants work for in-person meetings, or only video calls?

Most AI meeting assistants are optimized for video calls. They join via a bot (Otter.ai, Fireflies) or integrate with the platform's recording API (Zoom, Teams, Google Meet). For in-person meetings, the options are more limited: some tools offer mobile recording apps that use your phone's microphone, which works but produces lower-quality audio than video call setups. Speaker diarization, which attributes speech to specific individuals, is also significantly harder for in-person recordings where participants are not identified by their login credentials. If in-person meeting documentation is your primary use case, evaluate tools specifically on their in-person recording quality before committing.

Will meeting participants know they're being recorded?

In most jurisdictions, recording a conversation without the consent of all participants is legally problematic. Most enterprise AI meeting tools display a notification to all participants when a recording bot joins, and reputable tools require acknowledgment from all attendees. This is both a legal requirement in many U.S. states (all-party consent states) and best practice for organizational trust. The practical implication: if you're using an AI meeting assistant in a context where participants might be uncomfortable with recording, such as a sensitive HR conversation, a negotiation, or a confidential client call, you need an explicit consent workflow, not just a notification banner.

How accurate is AI action item extraction in practice?

Honest answer: it varies considerably based on how the meeting was run. For well-structured meetings where actions are explicitly stated, such as 'Sarah will send the contract by Friday' or 'James owns the Q3 forecast review,' AI extraction accuracy is high enough to be a useful first draft that requires light human review. For free-flowing discussions where decisions are made implicitly or through inference, AI extraction will miss items and misattribute ownership. The practical workflow that works: use AI extraction as a starting point, review it against your own recollection of the meeting, and edit before sending the follow-up. This compresses the task from 30 minutes of writing to 5 minutes of review, which is the realistic value proposition, not fully autonomous action item capture.

What Is an AI Meeting Assistant? (Recording, Transcription, Summarization Explained)

The Meeting Problem

Meetings consume a disproportionate share of organizational time by any measure. Knowledge workers attend an average of 11–25 meetings per week depending on seniority and role. The average employee spends 31 hours per month in meetings, with approximately 50% of that time considered wasted across multiple surveys (Atlassian, HBR, and others). Atlassian’s survey of 5,000 workers across five countries found meetings were considered “pointless almost 75% of the time.” HBR found 71% of senior managers call meetings unproductive and inefficient.

The AI meeting assistant market reflects the scale of this problem: valued at $1.6 billion in 2024, it is projected to reach $6.2 billion by 2033 at a 21.3% CAGR. The AI note-taker sub-market alone is projected to reach $11.32 billion by 2030 at an 11.53% CAGR. These are large numbers because the problem is large, and because the category is genuinely solving something, even if it is solving it imperfectly.

The specific place AI meeting assistants have the most traction is not the meeting itself; it’s the work around the meeting. The 20 minutes of prep most people skip because they don’t have time. The 30-minute follow-up email most people delay for two days because writing it requires reconstructing the meeting from memory. These are the inefficiencies that AI addresses most reliably.

31 hours/month in meetings

The average employee spends 31 hours per month in meetings, roughly one full work week every four weeks, and approximately 50% of that time is considered wasted. AI meeting assistants address the preparation and follow-up overhead around this time, not the meeting itself.

Multiple meeting management research sources, 2024; Atlassian survey of 5,000 workers; Harvard Business Review.

What an AI Meeting Assistant Actually Means

The category name bundles four distinct capabilities that are often treated as a single product but represent fundamentally different levels of AI sophistication. Understanding the distinction between these four functions is the most useful thing you can do before evaluating tools.

Recording is audio and video storage. Every AI meeting assistant does this. It is infrastructure, not intelligence. The recording is the raw material that the other three functions process. Recording itself requires no AI.

Transcription is the conversion of recorded audio to text. This uses automatic speech recognition (ASR) models, the same technology that powers voice assistants and live captions. Transcription accuracy for English in controlled conditions now reaches 85–95% for standard accents and vocabulary. Accuracy drops for non-native accents, technical jargon, overlapping speakers, and poor audio quality. The transcript is a complete record of what was said, not what was decided. That distinction matters enormously for practical use.

Summarization uses a large language model (LLM) to compress the transcript into a structured summary. Given the full text of a 60-minute meeting, the LLM generates a shorter document covering the main topics discussed, key points made, and overall arc of the conversation. This is genuinely useful. A 3-paragraph summary of a 60-minute meeting is far more actionable than a 15,000-word transcript, but the summary reflects what was discussed, which may differ significantly from what was decided.

Action extraction is the hardest function and the most valuable: identifying specific decisions made, action items assigned, owners named, and deadlines stated, and distinguishing these from the surrounding discussion. This requires the AI to understand not just what was said but what it meant: that “I’ll handle the client proposal” is an action item, that “we should probably look at this” is not, and that “by end of month” means something specific. Action extraction is where current AI tools most often fall short and where human review remains most important.

How AI Meeting Assistants Work

The typical technical pipeline: audio is recorded via a bot that joins the video call (Zoom, Teams, or Google Meet), or via integration with the meeting platform’s native recording API. The audio is processed by an ASR model that produces a timestamped transcript, with each spoken segment attributed to a speaker (speaker diarization). The transcript is then passed to an LLM with a structured prompt: summarize this meeting, extract decisions, list action items with owners and deadlines.

Speaker diarization, meaning knowing who said what, is a distinct challenge from transcription. It requires either integrating with the meeting platform’s speaker identification (which works when participants are logged in with their accounts) or using acoustic models to distinguish voices (which is less reliable for participants with similar vocal characteristics or poor microphone quality).

The quality of action extraction depends heavily on how explicitly action items were stated during the meeting. If someone says, “John, can you send the updated contract to Sarah by Friday?” that is unambiguous. If someone says, “We should think about the contract situation,” the AI may extract this as an action item, a discussion point, or miss it entirely. Meetings where commitments are made explicitly and directly produce better AI action extraction than meetings where decisions are implied or discussed indirectly.

What AI Meeting Assistants Can Do

Eliminate manual note-taking. The most immediate and reliable value: you do not need to take notes during a meeting because the system is capturing everything. This frees your attention for the actual conversation, asking better questions, reading the room, making decisions, rather than trying to write and think simultaneously.
Produce a searchable meeting archive. Every meeting produces a searchable transcript. “What did we decide about the Q3 budget?” becomes answerable by text search rather than by emailing the four people who attended the meeting two months ago.
Generate meeting summaries reliably. Summarization of well-recorded, clearly spoken meetings is consistently accurate. A 3-paragraph summary from a 60-minute transcript saves meaningful review time for participants and makes the meeting accessible to people who couldn’t attend.
Draft follow-up emails. Given the meeting summary and action items, an AI can draft a follow-up email that documents what was decided and who owns each next step. This compresses a 30-minute writing task to a 5-minute review-and-send.
Provide meeting preparation material. Some tools pull relevant context before the meeting: who you’re meeting with, what was discussed in the last meeting with them, what tasks are outstanding. This addresses the prep problem that most people solve by not solving it at all.

What AI Meeting Assistants Still Can’t Do

Reliably extract all action items. The gap between “what was said” and “what was decided and assigned” is where AI most consistently falls short. Items that were discussed but not explicitly assigned, or assigned indirectly, are often missed or incorrectly extracted. Human review of AI-generated action items is not optional; it is the workflow.
Fix meeting culture. An AI meeting assistant produces a better record of a bad meeting. It does not make the meeting more productive, reduce the number of meetings, or address the underlying organizational dynamics that make meetings feel pointless. Atlassian found 75% of meetings considered pointless. An AI transcript of a pointless meeting is a searchable pointless meeting.
Accurately transcribe technical jargon, accents, or poor audio. Transcription accuracy drops significantly for domain-specific terminology (medical, legal, financial), non-native accents, simultaneous speakers, and poor microphone quality. A 90% accuracy rate on standard English becomes 70% on a highly technical engineering discussion with three non-native speakers on laptop microphones.
Capture what wasn’t said. Body language, emotional tone, subtext, relationship dynamics, and what was deliberately not said in a meeting, the information that an experienced EA or chief of staff would note, is invisible to a transcript-based AI system.
Guarantee data privacy without careful vendor selection. Every meeting is recorded. That recording contains potentially sensitive business discussions, personnel matters, and strategic information. Where the data is stored, who can access it, whether it trains the vendor’s models, and what the retention policy is: these are not minor details.

How to Evaluate an AI Meeting Assistant

Identify your actual pain point first. Note-taking friction? Missed action items? Follow-up email delay? Lack of meeting searchability? Different tools are optimized for different sub-problems. Otter.ai is optimized for transcription accuracy. Fireflies is built for enterprise team-level search and CRM sync. Grain is sales-call focused. Fellow is designed around compliance and meeting governance. alfred_ focuses on the prep-before and follow-up-after workflow.
Ask about transcription accuracy for your specific context. Ask vendors for accuracy data on meetings with non-native speakers, technical vocabulary, or your specific industry jargon. Generic accuracy benchmarks are not useful if your team has four non-native English speakers and regularly discusses financial instruments.
Understand the data model explicitly. Where is the recording stored? Is it encrypted at rest and in transit? Is the content used to train the vendor’s AI models? What is the retention policy: does the recording persist indefinitely or is it deleted after processing? Fireflies claims a 0-day retention policy for training data and uses 256-bit AES encryption at rest; Otter.ai has historically lacked admin-level controls. These differences matter for enterprise buyers.
Evaluate the action extraction quality directly. Run a few real meetings through any tool you’re evaluating and compare the AI-extracted action items against your own notes. The gap between what the AI extracted and what you know was actually decided is the most honest measure of the tool’s quality for your use case.

Where alfred_ Fits

alfred_’s meeting assistance is positioned differently from transcription-first tools. Rather than joining meetings as a bot and recording the session, alfred_ focuses on the edges of the meeting, the preparation before and the follow-up after, where the most recoverable time lives.

Meeting prep in alfred_ is informed by your email: because alfred_ reads the communications surrounding each meeting, it can surface what was discussed in the last meeting with this person, what emails are relevant to the agenda, and what open action items are still outstanding. This is context that a transcript-only tool cannot generate because it only has data from inside the meeting, not from the broader communication context.

Post-meeting follow-up drafting in alfred_ uses the same email-plus-calendar context to produce a follow-up email that reflects both what happened in the meeting and the broader relationship history, not just a summary of the transcript. For executives whose meetings are embedded in ongoing relationships and multi-week email threads, this context-aware follow-up is more useful than a generic transcript summary.