When transcribing meetings or interviews, you often need to record "who said what." There are two technologies for distinguishing speakers — speaker diarization and speaker identification — but they're frequently confused.
This article clarifies the difference and explains how to practically record "who said what" without needing a voice enrollment database.
Speaker Diarization vs. Speaker Identification
Speaker Diarization
Speaker diarization detects how many different speakers are in an audio recording and segments the audio: "this section is Speaker A," "this section is Speaker B."
Key point: Diarization does NOT identify who the speakers are. It only recognizes that different people are speaking and assigns anonymous labels (Speaker 1, Speaker 2...).
Speaker Identification
Speaker identification matches voices against a pre-registered voiceprint database to automatically determine "this voice is Tanaka" or "this voice is Sato."
Requirements:
- Pre-record and register each speaker's voiceprint
- Build and maintain a voiceprint database
- Unregistered speakers cannot be identified
Comparison
| Speaker Diarization | Speaker Identification | |
|---|---|---|
| Purpose | Detect "different speakers exist" | Determine "whose voice this is" |
| Setup required | None | Voice enrollment needed |
| Output | Speaker A / Speaker B (anonymous) | Tanaka / Sato (named) |
| Unknown speakers | No problem (new label assigned) | Cannot identify |
| Practicality | Ready to use immediately | Setup overhead |
Why Speaker Identification Is Often Unnecessary
Speaker identification might seem more convenient, but in practice, diarization alone is sufficient for most use cases. Here's why.
Reason 1: Voice Enrollment Is Tedious
Speaker identification requires pre-recording every participant's voice and registering it in a database. New participants need enrollment each time — impractical for real-world use.
Reason 2: Meeting Participants Change
Even recurring meetings have guest attendees or roster changes. Maintaining an up-to-date voiceprint database becomes a burden, and unregistered speakers end up as "Unknown."
Reason 3: Privacy Concerns
Voiceprints are biometric data, requiring careful privacy management. Collecting and storing voice biometrics raises concerns under data protection regulations.
Reason 4: Tag Editing Is a Perfect Substitute
After transcription, you can manually change speaker labels (Speaker A → Tanaka, Speaker B → Sato) to achieve the same result as identification. Since you know who attended, recognizing voices from context is straightforward.
In Practice: Recording Who Said What with Diarization
Step 1: Record the Audio
Tips for better diarization accuracy:
- Microphone placement: Center of the table, capturing all speakers equally
- Minimize overlapping speech: Simultaneous talking is the biggest factor reducing accuracy
- Recording format: WAV recommended (higher quality than MP3)
- Reduce noise: Minimize air conditioning, PC fan noise, etc.
Step 2: Transcribe with Speaker Diarization
In WhisperApp, follow these steps for diarization-enabled transcription:
- Import your audio file
- Select the Whisper model (large-v3-turbo recommended)
- Enable "Speaker Diarization"
- Set the number of speakers (if known)
- Run transcription
About speaker count: Specifying the correct number improves accuracy. However, an incorrect count can reduce accuracy, so use auto-detection when you're unsure.
Step 3: Batch-Edit Speaker Tags
Diarization output looks like this:
[Speaker 1] Let's begin today's meeting.
[Speaker 2] Sure, let's start by reviewing last meeting's minutes.
[Speaker 1] There were three action items from last time. First...
[Speaker 3] Sorry, I have a question about that.
If you attended the meeting, you can easily identify who each speaker is from their voice or what they said. WhisperApp's batch rename feature lets you change "Speaker 1" → "Director Yamada" across all occurrences at once.
One rename updates every utterance by that speaker — no need to edit them one by one.
Result:
[Director Yamada] Let's begin today's meeting.
[Sato] Sure, let's start by reviewing last meeting's minutes.
[Director Yamada] There were three action items from last time. First...
[Suzuki] Sorry, I have a question about that.
The same result as speaker identification — without any voice enrollment.
Factors Affecting Diarization Accuracy
Recording Environment
| Factor | Impact | Solution |
|---|---|---|
| Background noise | Speaker voice features get masked | Record in quiet environments |
| Overlapping speech | Mixed voices are hard to separate | Establish one-at-a-time speaking |
| Volume differences | Quiet speakers may be missed | Adjust microphone placement |
| Echo/reverb | Voice features become blurred | Use rooms with acoustic treatment |
Speaker Characteristics
- Similar voices: Same gender and similar age can reduce accuracy
- Number of speakers: 2-4 speakers gives the best accuracy; more speakers increase difficulty
- Utterance length: Very short utterances (like "uh-huh") are harder to attribute
Use Cases
Meetings
Auto-record who said what in team meetings and project discussions. Dramatically improves meeting minutes efficiency.
Lectures & Panel Discussions
Separate presenters from audience questions. Structure lecture content and organize Q&A sessions.
Interviews & Conversations
Automatically separate host and guest speech. Makes extracting quotes and highlights much easier.
Call Centers
Separate agent and customer speech for quality analysis and training.
Academic Research
Record individual participants' statements in group interviews and focus groups for individual analysis.
Conclusion
For recording "who said what," speaker diarization — which requires zero setup — is more practical than speaker identification with its voice enrollment requirements.
Diarize first to segment speakers, then batch-rename labels in your tool to achieve identification-equivalent results without any voiceprint database. With WhisperApp's batch rename feature, this takes just a few clicks.
Try speaker diarization at your next meeting or interview. You'll never have to wonder "who said that?" again.



