OpenAI Whisper offers multiple model sizes from tiny to large-v3-turbo. To answer the common question "which model should I use?", here's a thorough comparison of each model's characteristics.
Model Lineup and Specifications
| Model | Parameters | Model Size | VRAM Required | Relative Speed |
|---|---|---|---|---|
| tiny | 39M | 75MB | ~1GB | 32x |
| base | 74M | 142MB | ~1GB | 16x |
| small | 244M | 466MB | ~2GB | 6x |
| medium | 769M | 1.5GB | ~5GB | 2x |
| large-v2 | 1550M | 2.9GB | ~10GB | 1x |
| large-v3 | 1550M | 2.9GB | ~10GB | 1x |
| large-v3-turbo | 809M | 1.6GB | ~6GB | 6x |
Relative speed is approximate, with large-v2 as baseline (1x).
Model Characteristics
tiny / base
Use case: Testing, prototypes, real-time on low-spec PCs
Lightest and fastest but lower accuracy. Noticeable errors with some languages. Use for testing or when speed matters more than accuracy.
small
Use case: Everyday notes, casual transcription
A balanced model that runs on GPUs with ~2GB VRAM. Provides sufficient accuracy for everyday use.
medium
Use case: Business documents, meeting minutes (medium quality)
Delivers practical accuracy for most use cases. Runs on mid-range GPUs with ~5GB VRAM.
large-v2 / large-v3
Use case: High-quality transcription, accuracy-critical scenarios
The largest models offering the highest accuracy. large-v3 improves multilingual performance over v2. Requires 10GB+ VRAM GPUs (RTX 3080 or higher recommended).
large-v3-turbo (Recommended)
Use case: Optimal for nearly all scenarios
Released in October 2024, this is a distilled model that reduces large-v3's decoder from 32 layers to 4:
- Accuracy: Nearly identical to large-v3 (less than 1% WER difference)
- Speed: ~6x faster than large-v3
- VRAM: ~6GB (60% of large-v3)
- Size: 1.6GB (55% of large-v3)
Recommended Models by Use Case
| Use Case | Recommended | Reason |
|---|---|---|
| Meeting minutes / business | large-v3-turbo | High accuracy + fast |
| Video subtitles | large-v3-turbo / large-v3 | Accuracy-first |
| Real-time transcription | large-v3-turbo / small | Speed-accuracy balance |
| Batch processing many files | large-v3-turbo | Fast processing saves time |
| GPU with 4GB or less VRAM | small | Largest model that fits |
| CPU only (no GPU) | tiny / base | Minimize processing time |
| Testing / development | tiny | Fastest results |
GPU VRAM Guide
| GPU | VRAM | Recommended Model |
|---|---|---|
| GTX 1650 / RTX 3050 | 4GB | small |
| RTX 3060 / RTX 4060 | 8GB | large-v3-turbo |
| RTX 3070 / RTX 4070 | 8-12GB | large-v3-turbo / large-v3 |
| RTX 3080 / RTX 4080+ | 10-16GB | large-v3 |
WhisperApp auto-detects your GPU and selects the optimal backend (CUDA, OpenVINO, Vulkan). Model downloads are available with one click from within the app.
Conclusion
When in doubt, choose large-v3-turbo. It offers the best accuracy-speed balance for virtually every use case.
Use small when VRAM is limited, and large-v3 when maximum accuracy is essential. Matching the model to your needs ensures the best results every time.



